pyspark: how to use filter function to compare a rdd with a list

So i have a list

list = [11, 5, 7, 2, 18] 

and a RDD of a list

RDD = sc.parallelize([5, 4, 3, 2, 6]) 

and I want to the filter function on RDD to return every element, when summed with the corresponding element in list, has a sum less than or equals to 10. So in this example, I want it to return an RDD with the elements 4, 3, 2. How do I do this?

Edit: So I’ve tried turning RDD into a key value pair where the key is the index, then I do this

def compare(x, list_):     i = x[0]     if x[1] + list_[i]) <= 10:         return x      rdd_new = rdd.filter(compare)  

but it doesnt seem to work as when I do rdd_new.collect() I get a bunch of errors

Add Comment
0 Answer(s)

Your Answer

By posting your answer, you agree to the privacy policy and terms of service.