pyspark: how to use filter function to compare a rdd with a list
So i have a list
list = [11, 5, 7, 2, 18]
and a RDD of a list
RDD = sc.parallelize([5, 4, 3, 2, 6])
and I want to the filter function on RDD to return every element, when summed with the corresponding element in list, has a sum less than or equals to 10. So in this example, I want it to return an RDD with the elements 4, 3, 2. How do I do this?
Edit: So I’ve tried turning RDD into a key value pair where the key is the index, then I do this
def compare(x, list_): i = x[0] if x[1] + list_[i]) <= 10: return x rdd_new = rdd.filter(compare)
but it doesnt seem to work as when I do rdd_new.collect()
I get a bunch of errors