pyspark: how to use filter function to compare a rdd with a list

Question

Home

pyspark: how to use filter function to compare a rdd with a list

0

pyspark
rdd

So i have a list

list = [11, 5, 7, 2, 18]

and a RDD of a list

RDD = sc.parallelize([5, 4, 3, 2, 6])

and I want to the filter function on RDD to return every element, when summed with the corresponding element in list, has a sum less than or equals to 10. So in this example, I want it to return an RDD with the elements 4, 3, 2. How do I do this?

Edit: So I’ve tried turning RDD into a key value pair where the key is the index, then I do this

def compare(x, list_):     i = x[0]     if x[1] + list_[i]) <= 10:         return x      rdd_new = rdd.filter(compare)

but it doesnt seem to work as when I do rdd_new.collect() I get a bunch of errors

Rafaeldorthyrae Asked on July 16, 2020 in Python.

Share
Comment(0)

Add Comment

0 Answer(s)

Votes
Oldest

Your Answer

Answer 1

BuddyPress is a plugin for WordPress that enables you to create a social network or community website. It has all the...

Answer 2

I value you getting some margin to help me with this task. Without you, no part of this would have...

Answer 3

Try to define a Cohesive class, until and unless the methods are written relevant to the class and it defines...

Answer 4

Try to add exportAllData: true, as an other option, hope it helps :)

Answer 5

DataSet can read an XML, infer schema and create a tabular representation that's easy to manipulate: DataSet ip1 = new...

Answer 6

I created a class and used Xml Linq : using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Xml; using...

Answer 7

XDocument first = XDocument.Load(args[0]); XDocument second = XDocument.Load(args[1]); var result = new XElement( "ipaddresses", first.Root.Elements("ip") .Zip(second.Root.Elements("ip"), (f, s) => {...

Answer 8

Following your code for the header row, you could achieve this by an <xsl:apply-templates select="/report/order_actions/order_action[order_id = current()/order_id]" /> As well...

Answer 9

BuddyPress is a plugin for WordPress that enables you to create a social network or community website. It has all the...

Answer 10

I value you getting some margin to help me with this task. Without you, no part of this would have...

Answer 11

Try to define a Cohesive class, until and unless the methods are written relevant to the class and it defines...

Answer 12

Try to add exportAllData: true, as an other option, hope it helps :)

Answer 13

DataSet can read an XML, infer schema and create a tabular representation that's easy to manipulate: DataSet ip1 = new...

Answer 14

I created a class and used Xml Linq : using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Xml; using...

Answer 15

XDocument first = XDocument.Load(args[0]); XDocument second = XDocument.Load(args[1]); var result = new XElement( "ipaddresses", first.Root.Elements("ip") .Zip(second.Root.Elements("ip"), (f, s) => {...

Answer 16

Following your code for the header row, you could achieve this by an <xsl:apply-templates select="/report/order_actions/order_action[order_id = current()/order_id]" /> As well...

LATEST ANSWERS

pyspark: how to use filter function to compare a rdd with a list

Your Answer

TOP USERS

HOT QUESTIONS

LATEST ANSWERS

pyspark: how to use filter function to compare a rdd with a list

Your Answer

Tags Widget

TOP USERS

HOT QUESTIONS