代码:
from sklearn.feature_extraction.text import TfidfVectorizer
corpus = ['I had had a dream',
'My dream will come true']
vectorizer = TfidfVectorizer()
matrix = vectorizer.fit_transform(corpus)
print("特征词IDF值:\n", vectorizer.idf_)
print("特征词TF-IDF矩阵:\n", matrix.toarray())
print("特征词坐标与TF-IDF值:\n", matrix)
print("特征词:\n", vectorizer.get_feature_names())
print("特征词与索引:\n", vectorizer.vocabulary_)
输出:
特征词IDF值:
[1.40546511 1. 1.40546511 1.40546511 1.40546511 1.40546511]
特征词TF-IDF矩阵:
[[0. 0.33517574 0.94215562 0. 0. 0. ]
[0.47107781 0.33517574 0. 0.47107781 0.47107781 0.47107781]]
特征词坐标与TF-IDF值:
(0, 1) 0.33517574332792605
(0, 2) 0.9421556246632359
(1, 4) 0.47107781233161794
(1, 0) 0.47107781233161794
(1, 5) 0.47107781233161794
(1, 3) 0.47107781233161794
(1, 1) 0.33517574332792605
特征词:
['come', 'dream', 'had', 'my', 'true', 'will']
特征词与索引:
{'had': 2, 'dream': 1, 'my': 3, 'will': 5, 'come': 0, 'true': 4}