Scikit-learn's DBSCAN is giving me me memory errors

Question

Home

Scikit-learn's DBSCAN is giving me me memory errors

0

When I cluster on 200.000 x 4 everything seems to be fine and I get good results, but as soon as I reach out to 500.000 x 4 it flops. The entire table is 9.800.000 x 4 so I’m not even close yet. Looked online for solutions but haven’t been able to find any.
I’m not a coder by any means so the lines that are written are not difficult (nor are they written by myself) but not sure where else to go for an answer to my question.

import pandas as pd import numpy as np  from sklearn.cluster import DBSCAN from sklearn import metrics from sklearn.datasets import make_blobs from sklearn.preprocessing import StandardScaler   data = pd.read_csv(r'C:\Users\David\Documents\Kutscriptie\hardcoreg2.csv') data.fillna('0', inplace=True)  data = data.head(500000)  data_cluster = StandardScaler().fit_transform(data)  db = DBSCAN(eps=0.5, min_samples=6).fit(data_cluster)

MemoryError                               Traceback (most recent call last) <ipython-input-6-b09830897f6a> in <module> ----> 1 db = DBSCAN(eps=0.5, min_samples=6).fit(data_cluster)  ~\anaconda3\lib\site-packages\sklearn\cluster\_dbscan.py in fit(self, X, y, sample_weight)     333         # This has worst case O(n^2) memory complexity     334         neighborhoods = neighbors_model.radius_neighbors(X, --> 335                                                          return_distance=False)     336      337         if sample_weight is None:  ~\anaconda3\lib\site-packages\sklearn\neighbors\_base.py in radius_neighbors(self, X, radius, return_distance, sort_results)     973                               sort_results=sort_results)     974  --> 975                 for s in gen_even_slices(X.shape[0], n_jobs)     976             )     977             if return_distance:  ~\anaconda3\lib\site-packages\joblib\parallel.py in __call__(self, iterable)    1002             # remaining jobs.    1003             self._iterating = False -> 1004             if self.dispatch_one_batch(iterator):    1005                 self._iterating = self._original_iterator is not None    1006   ~\anaconda3\lib\site-packages\joblib\parallel.py in dispatch_one_batch(self, iterator)     833                 return False     834             else: --> 835                 self._dispatch(tasks)     836                 return True     837   ~\anaconda3\lib\site-packages\joblib\parallel.py in _dispatch(self, batch)     752         with self._lock:     753             job_idx = len(self._jobs) --> 754             job = self._backend.apply_async(batch, callback=cb)     755             # A job can complete so quickly than its callback is     756             # called before we get here, causing self._jobs to  ~\anaconda3\lib\site-packages\joblib\_parallel_backends.py in apply_async(self, func, callback)     207     def apply_async(self, func, callback=None):     208         """Schedule a func to be run""" --> 209         result = ImmediateResult(func)     210         if callback:     211             callback(result)  ~\anaconda3\lib\site-packages\joblib\_parallel_backends.py in __init__(self, batch)     588         # Don't delay the application, to avoid keeping the input     589         # arguments in memory --> 590         self.results = batch()     591      592     def get(self):  ~\anaconda3\lib\site-packages\joblib\parallel.py in __call__(self)     254         with parallel_backend(self._backend, n_jobs=self._n_jobs):     255             return [func(*args, **kwargs) --> 256                     for func, args, kwargs in self.items]     257      258     def __len__(self):  ~\anaconda3\lib\site-packages\joblib\parallel.py in <listcomp>(.0)     254         with parallel_backend(self._backend, n_jobs=self._n_jobs):     255             return [func(*args, **kwargs) --> 256                     for func, args, kwargs in self.items]     257      258     def __len__(self):  ~\anaconda3\lib\site-packages\sklearn\neighbors\_base.py in _tree_query_radius_parallel_helper(tree, *args, **kwargs)     786     cloudpickle under PyPy.     787     """ --> 788     return tree.query_radius(*args, **kwargs)     789      790   sklearn\neighbors\_binary_tree.pxi in sklearn.neighbors._kd_tree.BinaryTree.query_radius()  sklearn\neighbors\_binary_tree.pxi in sklearn.neighbors._kd_tree.BinaryTree.query_radius()  MemoryError:

Ernieralphleanna Asked on July 16, 2020 in Python.

Share
Comment(0)

Add Comment

0 Answer(s)

Votes
Oldest

Your Answer

Answer 1

BuddyPress is a plugin for WordPress that enables you to create a social network or community website. It has all the...

Answer 2

I value you getting some margin to help me with this task. Without you, no part of this would have...

Answer 3

Try to define a Cohesive class, until and unless the methods are written relevant to the class and it defines...

Answer 4

Try to add exportAllData: true, as an other option, hope it helps :)

Answer 5

DataSet can read an XML, infer schema and create a tabular representation that's easy to manipulate: DataSet ip1 = new...

Answer 6

I created a class and used Xml Linq : using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Xml; using...

Answer 7

XDocument first = XDocument.Load(args[0]); XDocument second = XDocument.Load(args[1]); var result = new XElement( "ipaddresses", first.Root.Elements("ip") .Zip(second.Root.Elements("ip"), (f, s) => {...

Answer 8

Following your code for the header row, you could achieve this by an <xsl:apply-templates select="/report/order_actions/order_action[order_id = current()/order_id]" /> As well...

Answer 9

BuddyPress is a plugin for WordPress that enables you to create a social network or community website. It has all the...

Answer 10

I value you getting some margin to help me with this task. Without you, no part of this would have...

Answer 11

Try to define a Cohesive class, until and unless the methods are written relevant to the class and it defines...

Answer 12

Try to add exportAllData: true, as an other option, hope it helps :)

Answer 13

DataSet can read an XML, infer schema and create a tabular representation that's easy to manipulate: DataSet ip1 = new...

Answer 14

I created a class and used Xml Linq : using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Xml; using...

Answer 15

XDocument first = XDocument.Load(args[0]); XDocument second = XDocument.Load(args[1]); var result = new XElement( "ipaddresses", first.Root.Elements("ip") .Zip(second.Root.Elements("ip"), (f, s) => {...

Answer 16

Following your code for the header row, you could achieve this by an <xsl:apply-templates select="/report/order_actions/order_action[order_id = current()/order_id]" /> As well...

LATEST ANSWERS

Scikit-learn's DBSCAN is giving me me memory errors

Your Answer

TOP USERS

HOT QUESTIONS

LATEST ANSWERS

Scikit-learn's DBSCAN is giving me me memory errors

Your Answer

Tags Widget

TOP USERS

HOT QUESTIONS