如何解决如何进行特征提取以建立情感分析模型?
我正在尝试进行特征提取并为Twitter情绪分析项目构建模型。但是,我遇到以下错误,我想知道是否有人可以帮助我?
错误:
ValueError: np.nan is an invalid document,expected byte or unicode string.
我的代码:
import re
import pickle
import numpy as np
import pandas as pd
# nltk
from nltk.stem import WordNetLemmatizer
# sklearn
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
df = pd.read_csv("updated_tweet_info.csv")
train,test = train_test_split(df,test_size = 0.2,random_state = 42)
train_clean_tweet=[]
for tweet in train['tweet']:
train_clean_tweet.append(tweet)
test_clean_tweet=[]
for tweet in test['tweet']:
test_clean_tweet.append(tweet)
v = CountVectorizer(analyzer = "word")
train_features= v.fit_transform(train_clean_tweet)
test_features=v.transform(test_clean_tweet)
lr = RandomForestRegressor(n_estimators=200)
fit = lr.fit(train)
pred = lr.predict(test)
accuracy = r2_score(train,test)
解决方法
您可以尝试用空格替换NaN-这样可以消除错误:
data = df.fillna(' ')
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。