Histograms with a lot of dimensions in Python

Question

Home

Histograms with a lot of dimensions in Python

0

I’m doing a simulation of a stochastic many-body system, and currently I need to obtain a multidimensional probability distribution from generated data. For this purpose, I was trying to use np.histogramdd as in:

bins = np.linspace(start = -x_max, stop = x_max, num = n_bins) hists = np.histogramdd(Data, bins = [bins] * dimensions, density = True)

However, this code produces a MemoryError (or throws an exception about some array being too large) already for n_bins = 20, dimensions = 5 and np.shape(Data) = (1000, 5), which is much lower than target values. Number of buckets grows exponentially with a number of dimensions, so it is easy to see why such problems arise. So, the question is: how can a histogram of large dimensional magnitude be generated, stored and worked with in Python? Are there any existing frameworks for this? Is it better to switch to something else?

Edit: MCEV and examples of error code.

x_max = 10  n_bins = 20  Data = np.random.uniform(-x_max, x_max, size=(1000, dimensions))  bins = np.linspace(start = -x_max, stop = x_max, num = n_bins) hists = np.histogramdd(Data, bins = [bins] * dimensions, density = True)

Putting dimensions = 7, I get:

lib\site-packages\numpy\lib\histograms.py in histogramdd(sample, bins, range, normed, weights, density) 1066 # Compute the number of repetitions in xy and assign it to the 1067 # flattened histmat. -> 1068  hist = np.bincount(xy, weights, minlength=nbin.prod()) MemoryError:

dimensions = 15:

   1062     # Compute the sample indices in the flattened histogram matrix.    1063     # This raises an error if the array is too large. -> 1064     xy = np.ravel_multi_index(Ncount, nbin)    1065     1066     # Compute the number of repetitions in xy and assign it to the  ValueError: invalid dims: array size defined by dims is larger than the maximum possible size.

dimensions = 10

   1066     # Compute the number of repetitions in xy and assign it to the    1067     # flattened histmat. -> 1068     hist = np.bincount(xy, weights, minlength=nbin.prod())    1069     1070     # Shape into a proper matrix  ValueError: 'minlength' must not be negative

Clevelandpeterlindsey Asked on July 16, 2020 in Numpy, Python.

Share
Comment(0)

Add Comment

1 Answer(s)

Votes
Oldest

0

If the histogram has a fixed bin width on each axis, you can do your own book-keeping and use a low-memory datatype for the counts (for example 1 byte per bin). In the following example, the bins are the same for each axis, but you can adapt it for bin ranges that differ along the axes, as long as the bin edges are equidistant.

This code will not do range checking; you need to ensure that the histogram bins are wide enough to fit the data or you’ll get an error.

import numpy as np  x_max = 10  n_dim = 7 n_data = 100000 data = np.random.uniform(-x_max, x_max-0.01, size=(n_data, n_dim))  # assume bins are the same for all dimensions. Bin edges at x0+i*xstep. n_bins = 5 x0 = -x_max xstep = 2*x_max/n_bins  # high-dimensional histogram hist = np.zeros((n_bins,)*n_dim, dtype=np.int8)  # build the histogram indices corresponding to the data samples. ii = ((data - x0)*(1/xstep)).astype(np.int16) # shape (n_data, n_dim)  # increment the histogram bins. The np.add.at will correctly handle  # bins that occur multiple times in the input. np.add.at(hist, tuple(ii.T), 1)

But with n_dim=8 or 9 you’ll run out of memory on most systems anyway.

The question is what you’re going to do with a histogram that has 10**10 bins; do you have 10**11 or more samples?

It’s more practical to keep the ii array and generate lower-dimensional histograms when you need them. For example, if you want to reduce the 7D histogram to a 4D histogram on axes 0, 1, 5, 6:

hist_4d = np.zeros((n_bins,)*4, dtype=np.int16) np.add.at(hist_4d, tuple(ii[:, [0, 1, 5, 6]].T), 1)

Note: I recommend that you use signed integers for the bin counts. Integer overflows will be silent, but at least negative numbers in the bins will indicate that you had an overflow.

Chancerobbiehelene Answered on July 16, 2020.

Share
Comment(0)

Add Comment

Your Answer

Answer 1

BuddyPress is a plugin for WordPress that enables you to create a social network or community website. It has all the...

Answer 2

I value you getting some margin to help me with this task. Without you, no part of this would have...

Answer 3

Try to define a Cohesive class, until and unless the methods are written relevant to the class and it defines...

Answer 4

Try to add exportAllData: true, as an other option, hope it helps :)

Answer 5

DataSet can read an XML, infer schema and create a tabular representation that's easy to manipulate: DataSet ip1 = new...

Answer 6

I created a class and used Xml Linq : using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Xml; using...

Answer 7

XDocument first = XDocument.Load(args[0]); XDocument second = XDocument.Load(args[1]); var result = new XElement( "ipaddresses", first.Root.Elements("ip") .Zip(second.Root.Elements("ip"), (f, s) => {...

Answer 8

Following your code for the header row, you could achieve this by an <xsl:apply-templates select="/report/order_actions/order_action[order_id = current()/order_id]" /> As well...

Answer 9

BuddyPress is a plugin for WordPress that enables you to create a social network or community website. It has all the...

Answer 10

I value you getting some margin to help me with this task. Without you, no part of this would have...

Answer 11

Try to define a Cohesive class, until and unless the methods are written relevant to the class and it defines...

Answer 12

Try to add exportAllData: true, as an other option, hope it helps :)

Answer 13

DataSet can read an XML, infer schema and create a tabular representation that's easy to manipulate: DataSet ip1 = new...

Answer 14

I created a class and used Xml Linq : using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Xml; using...

Answer 15

XDocument first = XDocument.Load(args[0]); XDocument second = XDocument.Load(args[1]); var result = new XElement( "ipaddresses", first.Root.Elements("ip") .Zip(second.Root.Elements("ip"), (f, s) => {...

Answer 16

Following your code for the header row, you could achieve this by an <xsl:apply-templates select="/report/order_actions/order_action[order_id = current()/order_id]" /> As well...

LATEST ANSWERS

Histograms with a lot of dimensions in Python

Your Answer

TOP USERS

HOT QUESTIONS

LATEST ANSWERS

Histograms with a lot of dimensions in Python

Your Answer

Tags Widget

TOP USERS

HOT QUESTIONS