How to evaluate a queryset in batches?
I have a model with 100,000+ rows. I want to do some operation on it, but can’t do it in one go, because of the size. So, I thought of using Paginator like this:
def fun(): paginator = Paginator(Model.objects.filter(**some_filter), 10000) for page_no in paginator.page_range: page = paginator.get_page(page_no) queryset = page.object_list # Do some operation on queryset # Check if new records are added in the Model, (if yes, then do the operation on new records only)
The final comment in the code says, that while running the above code, if new records are added (because this is a live application), then we have to do the same operation on those records too.
So my question is how do I get the remaining (new) records only to run the same code?
it’s easy. if you have a datetime field in your model, on the last item in the ‘for’ you can keep the datetime field in a variable and after the ‘for’ check if there is any object with datetime field bigger than the last item datetime field do the operation just on them. this prevent doing an operation twice on one object.
NOTE: if your object doesn’t have datetime field add to it.