Adjusting the number of features for TF-IDF/logistic regression sentiment analysis

I’m doing a sentiment analysis project on a Twitter dataset. I used TF-IDF feature extraction and a logistic regression model for classification. So far I’ve trained the model with the following:

def get_tfidf_features(train_fit, ngrams=(1,1)):     vector = TfidfVectorizer(ngrams, sublinear_tf=True)     vector.fit(train_fit)     return vector  X = tf_vector.transform(traintest['text'])  y = traintest['sentiment']  X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.01, random_state = 42)  LR_model = LogisticRegression(solver='lbfgs') LR_model.fit(X_train, y_train) y_predict_lr = LR_model.predict(X_test) 

This logistic regression model was trained on a dataset of about 1.5 million tweets. I have a set of about 1.7 million tweets I’m trying to use this sentiment analysis model on, df_april. On my first attempt, I extract the features as follows:

tfidf = TfidfVectorizer(ngram_range = unigrams, max_features = None, sublinear_tf = True) X_april = tfidf.fit_transform(df_april['text'].values.astype('U')) 

My first thought was to just call predict on X_april but this gives me an error:

y_predict_april = LR_model.predict(X_april)  ValueError: X has 208976 features per sample; expecting 271794 

This made sense to me: the shape of these feature vectors was different:

X.shape (1578614, 271794)  X_april.shape (1705758, 208976) 

So I know I need to somehow adjust the number of features to match between X and X_april to call predict on X_april. My attempt to do this was:

x = pd.DataFrame.sparse.from_spmatrix(X) x_april = pd.DataFrame.sparse.from_spmatrix(X_april)  not_existing_cols = [c for c in x.columns.tolist() if c not in x_april] x_april = x_april.reindex(x_april.columns.tolist() + not_existing_cols, axis=1) x_april = x_april[x.columns.tolist()] 

I’m working in a Jupyter notebook, and this code results in a dead kernel every time I’ve tried it. How can I adjust the features so that I can call the logistic regression model?

Add Comment
0 Answer(s)

Your Answer

By posting your answer, you agree to the privacy policy and terms of service.