wordnet python处理中文与英文

一、英文

1.安装nltk,导入Wordnet,

Python版本3.5

from nltk.corpus import wordnet as wn

2.同义词集的定义 

car.n.01 是car的一个名词意义,被称之为同义词集,也就是意义相同的词(或词条)的集合。这里的’dog.n.01’指:dog的第一个名词意思;’chase.v.01’指:chase的第一个动词意思,后面我们可以用definition()这个对同义词集进行解释:

print( wn.synset('apple.n.01').definition())
fruit with red or yellow or green skin and sweet to tart crisp whitish flesh

3.查找同义词集的所有词

print(wn.synset('car.n.01').lemma_names())
['car', 'auto', 'automobile', 'machine', 'motorcar']

4.进行造句

print(wn.synset('dog.n.01').examples())
['the dog barked all night']

5.同义词查询

for synset in wn.synsets('car'):
   print(synset.lemma_names())
['car', 'auto', 'automobile', 'machine', 'motorcar']
['car', 'railcar', 'railway_car', 'railroad_car']
['car', 'gondola']
['car', 'elevator_car']
['cable_car', 'car']

下位词

下位词(hyponym),指概念上内涵更窄的主题词。 例如:”鲜花速递”的下位词包括”上海鲜花速递” 、”深圳鲜花速递”、 ”网上鲜花速递”,”笨小孩”是”歌”的下位词,”笨小孩”是”刘德华”的下位词。 ++ 下位词是相对某主题词的,也有它自己的等同词、上位词、下位词、同类词。 

motorcar = wn.synset('car.n.01')
types_of_motorcar=motorcar.hyponyms()
print(sorted([lemma.name() for synset in types_of_motorcar for lemma in synset.lemmas()]) )
['Model_T', 'S.U.V.', 'SUV', 'Stanley_Steamer', 'ambulance', 'beach_waggon', 'beach_wagon', 'bus', 'cab', 'compact', 'compact_car', 'convertible', 'coupe', 'cruiser', 'electric', 'electric_automobile', 'electric_car', 'estate_car', 'gas_guzzler', 'hack', 'hardtop', 'hatchback', 'heap', 'horseless_carriage', 'hot-rod', 'hot_rod', 'jalopy', 'jeep', 'landrover', 'limo', 'limousine', 'loaner', 'minicar', 'minivan', 'pace_car', 'patrol_car', 'phaeton', 'police_car', 'police_cruiser', 'prowl_car', 'race_car', 'racer', 'racing_car', 'roadster', 'runabout', 'saloon', 'secondhand_car', 'sedan', 'sport_car', 'sport_utility', 'sport_utility_vehicle', 'sports_car', 'squad_car', 'station_waggon', 'station_wagon', 'stock_car', 'subcompact', 'subcompact_car', 'taxi', 'taxicab', 'tourer', 'touring_car', 'two-seater', 'used-car', 'waggon', 'wagon']

6. 利用词条查询反义词

good = wn.synset('good.a.01')
print(good.lemmas()[0].antonyms())
[Lemma('bad.a.01.bad')]

7.查询synonyms and antonyms

for syn in wordnet.synsets("good"):
   for l in syn.lemmas():
      synonyms.append(l.name())
      if l.antonyms():
         antonyms.append(l.antonyms()[0].name())

print(set(synonyms))
print(set(antonyms))

{'proficient', 'trade_good', 'expert', 'skilful', 'salutary', 'dear', 'commodity', 'goodness', 'respectable', 'right', 'undecomposed', 'just', 'serious', 'skillful', 'ripe', 'honorable', 'effective', 'secure', 'well', 'in_effect', 'soundly', 'dependable', 'in_force', 'estimable', 'unspoilt', 'adept', 'thoroughly', 'honest', 'full', 'beneficial', 'upright', 'practiced', 'safe', 'good', 'unspoiled', 'sound', 'near'}
{'badness', 'evilness', 'evil', 'bad', 'ill'}

二、中文

1.查询同义词

for synset in wn.synsets(u'计算机', lang='cmn'):
	types_of_computer = synset.hyponyms()
	print(sorted([lemma.name() for synset in types_of_ computer for lemma in synset.lemmas('cmn')]))
['便携式计算器', '加数器', '加法器', '加法器', '加法机', '加法计算器', '手摇计算器', '算术计算机', '算盘', '计数器', '计算机']
['家用电脑', '家用计算机', '数字计算机', '模拟计算机', '网站', '网络站点']
for synset in wn.synsets(u'计算机', lang='cmn'):
   for lemma in synset.lemma_names('cmn'):
	print(lemma)
加数器
加法器
加法机
加法计算器
算术计算机
计算机
计算器
计算机
电子计算机
电脑
计算机

2.利用中文查找同义词的英文

print(wn.lemmas(u'选择', lang='cmn'))
[Lemma('choose.v.01.选择'), Lemma('elect.v.02.选择'), Lemma('pick.v.02.选择'), Lemma('option.n.02.选择'), Lemma('selection.n.02.选择')]

3.中文的两个词相似度的查找

select = wn.synsets(u'选择', lang='cmn')[0]
selectn3= wn.synsets(u'找出', lang='cmn')[0]
print(select.path_similarity(selectn3))
0.25

参考文献:

1. https://blog.csdn.net/huxuanlai/article/details/62894413

2. https://blog.csdn.net/pearyangyang/article/details/70208863

3. https://blog.csdn.net/King_John/article/details/80252594










猜你喜欢

转载自blog.csdn.net/pursue_myheart/article/details/80631278