NLTK在python3中的变化

1.Here are some changes you may need to make:
grammar: ContextFreeGrammar → CFG, WeightedGrammar → PCFG, StatisticalDependencyGrammar → ProbabilisticDependencyGrammar, WeightedProduction → ProbabilisticProduction
draw.tree: TreeSegmentWidget.node() → TreeSegmentWidget.label(), TreeSegmentWidget.set_node() → TreeSegmentWidget.set_label()
parsers: nbest_parse() → parse()
ccg.parse.chart: EdgeI.next() → EdgeI.nextsym()
Chunk parser: top_node → root_label; chunk_node → chunk_label
WordNet properties are now access methods, e.g. Synset.definition → Synset.definition()
sem.relextract: mk_pairs() → _tree2semi_rel(), mk_reldicts() → semi_rel2reldict(), show_clause() → clause(), show_raw_rtuple() → rtuple()
corpusname.tagged_words(simplify_tags=True) → corpusname.tagged_words(tagset=’universal’)
util.clean_html() → BeautifulSoup.get_text(). clean_html() is now dropped, install & use BeautifulSoup or some other html parser instead.
util.ibigrams() → util.bigrams()
util.ingrams() → util.ngrams()
util.itrigrams() → util.trigrams()
metrics.windowdiff → metrics.segmentation.windowdiff(), metrics.windowdiff.demo() was removed.
parse.generate2 was re-written and merged into parse.generate

2.Creating objects from strings:
Many objects now support a fromstring() method
tree.Tree.parse() → tree.Tree.fromstring()
tree.Tree() → tree.Tree.fromstring()
chunk.RegexpChunkRule.parse() → chunkRegexpChunkRule.fromstring()
grammar.parse_cfg() → CFG.fromstring() (same for other types of grammar)
sem.LogicParser.parse() → sem.Expression.fromstring()
sem.DrtParser.parse() → sem.DrtExpression.fromstring()
sem.parse_valuation() → sem.Valuation.fromstring()
sem.parse_type() → sem.Type.fromstring()
Operations on lists of sentences or other items:
tokenize.batch_tokenize() → tokenize.tokenize_sents()
tag.batch_tag() → tag.tag_sents()
parse.batch_parse() → parse.parse_sents()
classify.batch_classify() → classify.classify_many()
sem.batch_interpret() → sem.interpret_sents()
sem.batch_evaluate() → sem.evaluate_sents()
chunk.batch_ne_chunk() → chunk.ne_chunk_sents()
Changes in probability.FreqDist:
fdist.keys() → sorted(fdist)
fdist.inc(x) → fdist[x] += 1
fdist.samples() → sorted(fdist.keys())
fdist.Nr® → fdist.Nr()[r]
fdist.Nr_nonzero() → fdist.Nr().items()
cfdist.conditions() → sorted(cfdist.conditions())
Porter stemmer changes:
adjust_case(), cons(), cvc(), doublec(), m(), step1ab(), step1c(), step2(), step3(), step4(), step5(), vowelinstem() made private
ends(), r(), setto() removed

3.Removed modules, classes and functions:
classify.svm was removed. For classification based on support vector machines (SVMs) use classify.scikitlearn or scikit-learn directly. See https://github.com/nltk/nltk/issues/450.
probability.GoodTuringProbDist class was removed. See https://github.com/nltk/nltk/issues/381.
HiddenMarkovModelTaggerTransformI and its subclasses are removed. See https://github.com/nltk/nltk/issues/374.
classify.maxent no longer support algorithms backed by scipy.maxentropy. See https://github.com/nltk/nltk/issues/321.
misc.babelfish was removed. See https://github.com/nltk/nltk/issues/265.
sourcedstring was removed. See https://github.com/nltk/nltk/issues/322.
yamltags was removed. JSON is now preferred instead. See https://github.com/nltk/nltk/issues/540
mallet was removed, including the tag.crf module. See https://github.com/nltk/nltk/issues/104
tag.simplify was removed. See https://github.com/nltk/nltk/issues/483
model was removed. See https://github.com/nltk/nltk/issues?labels=model
corpus.reader.wordnet._lcs_by_depth was removed. See https://github.com/nltk/nltk/issues/422.

4.Miscellaneous changes:
probability.ConditionalProbDist.default_factory now inherits from dict instead of defaultdict
probability.ConditionalProbDistI.default_factory now inherits from dict instead of defaultdict
probability.DictionaryConditionalProbDist.default_factory now inherits from dict instead of defaultdict
tag.senna.SennaTagger → classify.Senna
tag.senna.POSTagger → tag.SennaTagger
tag.senna.CHKTagger → tag.SennaChunkTagger

5.Printing changes (from 3.0.2, see https://github.com/nltk/nltk/issues/804):
classify.decisiontree.DecisionTreeClassifier.pp → pretty_format
metrics.confusionmatrix.ConfusionMatrix.pp → pretty_format
sem.lfg.FStructure.pprint → pretty_format
sem.drt.DrtExpression.pretty → pretty_format
parse.chart.Chart.pp → pretty_format
Tree.pprint() → pformat
FreqDist.pprint → pformat
Tree.pretty_print → pprint
Tree.pprint_latex_qtree → pformat_latex_qtree
Environment variables for third-party software:
These have been normalised; please see Installing Third Party Software
More background on Python 3 and NLTK 3:
http://docs.python.org/2/library/2to3.html
http://docs.python.org/dev/whatsnew/3.0.html

NLTK在python3中的变化

猜你喜欢