Adjusting the number of features for TF-IDF/logistic regression sentiment analysis

Question

Home

Adjusting the number of features for TF-IDF/logistic regression sentiment analysis

0

I’m doing a sentiment analysis project on a Twitter dataset. I used TF-IDF feature extraction and a logistic regression model for classification. So far I’ve trained the model with the following:

def get_tfidf_features(train_fit, ngrams=(1,1)):     vector = TfidfVectorizer(ngrams, sublinear_tf=True)     vector.fit(train_fit)     return vector  X = tf_vector.transform(traintest['text'])  y = traintest['sentiment']  X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.01, random_state = 42)  LR_model = LogisticRegression(solver='lbfgs') LR_model.fit(X_train, y_train) y_predict_lr = LR_model.predict(X_test)

This logistic regression model was trained on a dataset of about 1.5 million tweets. I have a set of about 1.7 million tweets I’m trying to use this sentiment analysis model on, df_april. On my first attempt, I extract the features as follows:

tfidf = TfidfVectorizer(ngram_range = unigrams, max_features = None, sublinear_tf = True) X_april = tfidf.fit_transform(df_april['text'].values.astype('U'))

My first thought was to just call predict on X_april but this gives me an error:

y_predict_april = LR_model.predict(X_april)  ValueError: X has 208976 features per sample; expecting 271794

This made sense to me: the shape of these feature vectors was different:

X.shape (1578614, 271794)  X_april.shape (1705758, 208976)

So I know I need to somehow adjust the number of features to match between X and X_april to call predict on X_april. My attempt to do this was:

x = pd.DataFrame.sparse.from_spmatrix(X) x_april = pd.DataFrame.sparse.from_spmatrix(X_april)  not_existing_cols = [c for c in x.columns.tolist() if c not in x_april] x_april = x_april.reindex(x_april.columns.tolist() + not_existing_cols, axis=1) x_april = x_april[x.columns.tolist()]

I’m working in a Jupyter notebook, and this code results in a dead kernel every time I’ve tried it. How can I adjust the features so that I can call the logistic regression model?

Devindoreenlakeisha Asked on July 16, 2020 in Python.

Share
Comment(0)

Add Comment

0 Answer(s)

Votes
Oldest

Your Answer

Answer 1

BuddyPress is a plugin for WordPress that enables you to create a social network or community website. It has all the...

Answer 2

I value you getting some margin to help me with this task. Without you, no part of this would have...

Answer 3

Try to define a Cohesive class, until and unless the methods are written relevant to the class and it defines...

Answer 4

Try to add exportAllData: true, as an other option, hope it helps :)

Answer 5

DataSet can read an XML, infer schema and create a tabular representation that's easy to manipulate: DataSet ip1 = new...

Answer 6

I created a class and used Xml Linq : using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Xml; using...

Answer 7

XDocument first = XDocument.Load(args[0]); XDocument second = XDocument.Load(args[1]); var result = new XElement( "ipaddresses", first.Root.Elements("ip") .Zip(second.Root.Elements("ip"), (f, s) => {...

Answer 8

Following your code for the header row, you could achieve this by an <xsl:apply-templates select="/report/order_actions/order_action[order_id = current()/order_id]" /> As well...

Answer 9

BuddyPress is a plugin for WordPress that enables you to create a social network or community website. It has all the...

Answer 10

I value you getting some margin to help me with this task. Without you, no part of this would have...

Answer 11

Try to define a Cohesive class, until and unless the methods are written relevant to the class and it defines...

Answer 12

Try to add exportAllData: true, as an other option, hope it helps :)

Answer 13

DataSet can read an XML, infer schema and create a tabular representation that's easy to manipulate: DataSet ip1 = new...

Answer 14

I created a class and used Xml Linq : using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Xml; using...

Answer 15

XDocument first = XDocument.Load(args[0]); XDocument second = XDocument.Load(args[1]); var result = new XElement( "ipaddresses", first.Root.Elements("ip") .Zip(second.Root.Elements("ip"), (f, s) => {...

Answer 16

Following your code for the header row, you could achieve this by an <xsl:apply-templates select="/report/order_actions/order_action[order_id = current()/order_id]" /> As well...

LATEST ANSWERS

Adjusting the number of features for TF-IDF/logistic regression sentiment analysis

Your Answer

TOP USERS

HOT QUESTIONS

LATEST ANSWERS

Adjusting the number of features for TF-IDF/logistic regression sentiment analysis

Your Answer

Tags Widget

TOP USERS

HOT QUESTIONS