sklearn——TfidfVectorizer笔记

代码:

from sklearn.feature_extraction.text import TfidfVectorizer

corpus = ['I had had a dream',
          'My dream will come true']

vectorizer = TfidfVectorizer()
matrix = vectorizer.fit_transform(corpus)
print("特征词IDF值:\n", vectorizer.idf_)
print("特征词TF-IDF矩阵:\n", matrix.toarray())
print("特征词坐标与TF-IDF值:\n", matrix)
print("特征词:\n", vectorizer.get_feature_names())
print("特征词与索引:\n", vectorizer.vocabulary_)

 输出:

特征词IDF值:
 [1.40546511 1.         1.40546511 1.40546511 1.40546511 1.40546511]
特征词TF-IDF矩阵:
 [[0.         0.33517574 0.94215562 0.         0.         0.        ]
 [0.47107781 0.33517574 0.         0.47107781 0.47107781 0.47107781]]
特征词坐标与TF-IDF值:
   (0, 1)	0.33517574332792605
  (0, 2)	0.9421556246632359
  (1, 4)	0.47107781233161794
  (1, 0)	0.47107781233161794
  (1, 5)	0.47107781233161794
  (1, 3)	0.47107781233161794
  (1, 1)	0.33517574332792605
特征词:
 ['come', 'dream', 'had', 'my', 'true', 'will']
特征词与索引:
 {'had': 2, 'dream': 1, 'my': 3, 'will': 5, 'come': 0, 'true': 4}

猜你喜欢

转载自blog.csdn.net/qq_38890412/article/details/107593877