如何解决如何将 BinaryRelevance.predict 结果转换为标签名称?
我创建了一个使用 skmultilearn 尝试进行多标签文本分类的小例子:
import skmultilearn
from sklearn.feature_extraction.text import TfidfVectorizer
import pandas as pd
from scipy.sparse import csr_matrix
from pandas.core.common import flatten
from sklearn.naive_bayes import MultinomialNB
from skmultilearn.problem_transform import BinaryRelevance
TRAIN_DATA = [
['How to connect to MySQL using PHP ?',['development','database']],['What are the best VPN clients these days?',['networks']],['What is the equivalent of the boolean type in Oracle?',['database']],['How to remove unwanted entity from Hibernate session?',['development']],['How to implement TCP connection pooling in java?','networks']],['How can I connect to PostgreSQL database remotely from another network?',['database',['What is the python function to remove accents in a string?',['How to remove indexes in SQL Server?',['How to configure firewall with DMZ?',['networks']]
]
data_frame = pd.DataFrame(TRAIN_DATA,columns=['text','labels'])
corpus = data_frame['text']
unique_labels = set(flatten(data_frame['labels']))
for u in unique_labels:
data_frame[u] = 0
data_frame[u] = pd.to_numeric(data_frame[u])
for i,row in data_frame.iterrows():
for u in unique_labels:
if u in row.labels:
data_frame.at[i,u] = 1
tfidf = TfidfVectorizer()
Xfeatures = tfidf.fit_transform(corpus).toarray()
y = data_frame[unique_labels]
binary_rel_clf = BinaryRelevance(MultinomialNB())
binary_rel_clf.fit(Xfeatures,y)
predict_text = ['SQL Server and PHP?']
X_predict = tfidf.transform(predict_text)
br_prediction = binary_rel_clf.predict(X_predict)
print(br_prediction)
然而,结果是这样的:
(0,1) 1.
有没有办法将此结果转换为标签名称,例如 ['development','database']
?
解决方法
BinaryRelevance
估计器的返回类型是 scipy csc_matrix
。您可以执行以下操作:
首先,将 csc_matrix
转换为 bool
类型的密集 numpy 数组:
br_prediction = br_prediction.toarray().astype(bool)
然后,使用转换后的预测作为 y
可能标签名称的掩码:
predictions = [y.columns.values[prediction].tolist() for prediction in br_prediction]
这会将每个预测映射到相应的标签。例如:
print(y.columns.values)
# output: ['development' 'database' 'networks']
print(br_prediction)
# output: (0,1) 1
br_prediction = br_prediction.toarray().astype(bool)
print(br_prediction)
# output: [[False True False]]
predictions = [y.columns.values[prediction].tolist() for prediction in br_prediction]
print(predictions)
# output: [['database']]
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。