如何解决使用ColumnTransformer向矢量化器内容添加功能,在尝试适合内容时出现尺寸错误
我在向矢量化器内容添加功能时遇到问题。我具有文本内容和页面数,并且正在使用ColumnTransformer sklearn函数将页面添加到矢量化器输入中,
training_content = pd.DataFrame({'text': training_text,'pages': training_pages})
文本内容和页面的尺寸相同
19872 19872
生成的DataFrame具有这种形状
(19872,2)
然后我正在使用ColumnTransformer生成用于特征预处理的管道
pipe = ColumnTransformer([('text',TfidfVectorizer(tokenizer=remove_strings_smaller_three_chars_tokenizer,ngram_range=(1,ngram)),['text'])],remainder=MinMaxScaler())
pipe = pipe.fit(training_content)
但是我收到此错误
Traceback (most recent call last):
File "test_clfs.py",line 336,in <module>
pipe = pipe.fit(training_content)
File "/root/semantic_env/lib/python3.7/site-packages/sklearn/compose/_column_transformer.py",line 494,in fit
self.fit_transform(X,y=y)
File "/root/semantic_env/lib/python3.7/site-packages/sklearn/compose/_column_transformer.py",line 553,in fit_transform
return self._hstack(list(Xs))
File "/root/semantic_env/lib/python3.7/site-packages/sklearn/compose/_column_transformer.py",line 639,in _hstack
return np.hstack(Xs)
File "<__array_function__ internals>",line 6,in hstack
File "/root/semantic_env/lib/python3.7/site-packages/numpy/core/shape_base.py",line 346,in hstack
return _nx.concatenate(arrs,1)
File "<__array_function__ internals>",in concatenate
ValueError: all the input array dimensions for the concatenation axis must match exactly,but along dimension 0,the array at index 0 has size 1 and the array at index 1 has size 19872
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。