特征提取—文本,字典特征提取

字典特征提取:

from sklearn.feature_extraction import DictVectorizer
alist = [
    {
    
    'city':"BJ",'temp':33},
    {
    
    'city':"GZ",'temp':42},
    {
    
    'city':"SH",'temp':40},
]
d = DictVectorizer(sparse=False)
feature = d.fit_transform(alist)
print(d.get_feature_names())
print(feature)# 返回矩阵

运行结果:
在这里插入图片描述

文本特征提取:

import jieba
jb1 = jieba.cut("人生苦短,我用python")
jb2 = jieba.cut("人生漫长,不用python")
ct1 = ' '.join(list(jb1))
ct2 = ' '.join(list(jb2))
from sklearn.feature_extraction.text import CountVectorizer
vector = CountVectorizer()
res = vector.fit_transform([ct1,ct2])
# 单个汉字不统计
print(res)
print(vector.get_feature_names())
print(res.toarray())

运行结果:
在这里插入图片描述

猜你喜欢

转载自blog.csdn.net/weixin_45666249/article/details/115057125