Creating interaction terms quickly without SKLearn

I am using the following code to create interaction terms in my data:

def Interaction(x):   for k in range(0,x.shape[1]-1):     for j in range(k+1,x.shape[1]-1):       new = x[:,k] * x[:,j]       x = np.hstack((x,new[:,None]))   return x 

My problem is that it is extremely slow compared to SKLearn’s PolynomialFeatures. How can I speed it up? I can’t use SKLearn because there are a few customizations that I would like to make. For example, I would like to make an interaction variable of X1 * X2 but also X1 * (1-X2), etc.

Asked on July 16, 2020 in Numpy,   Python.
1 Answer(s)

We should multiply each element of each row pairwise we can do it as np.einsum('ij,ik->ijk, x, x). This is 2 times redundand but still 2 times faster than PolynomialFeatures.

import numpy as np  def interaction(x):     """     >>> a = np.arange(9).reshape(3, 3)     >>> b = np.arange(6).reshape(3, 2)     >>> a     array([[0, 1, 2],            [3, 4, 5],            [6, 7, 8]])     >>> interaction(a)     array([[ 0,  1,  2,  0,  0,  2],            [ 3,  4,  5, 12, 15, 20],            [ 6,  7,  8, 42, 48, 56]])     >>> b     array([[0, 1],            [2, 3],            [4, 5]])     >>> interaction(b)     array([[ 0,  1,  0],            [ 2,  3,  6],            [ 4,  5, 20]])     """     b = np.einsum('ij,ik->ijk', x, x)     m, n = x.shape     axis1, axis2 = np.triu_indices(n, 1)     axis1 = np.tile(axis1, m)     axis2 = np.tile(axis2, m)     axis0 = np.arange(m).repeat(n * (n - 1) // 2)     return np.c_[x, b[axis0, axis1, axis2].reshape(m, -1)] 

Performance comparision:

c = np.arange(30).reshape(6, 5) from sklearn.preprocessing import PolynomialFeatures poly = PolynomialFeatures(2, interaction_only=True) skl = poly.fit_transform print(np.allclose(interaction(c), skl(c)[:, 1:])) # True   In [1]: %timeit interaction(c) 118 µs ± 172 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)  In [2]: %timeit skl(c) 243 µs ± 4.69 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 
