1, Word2Vector parameter explanation
Word2Vector is a module encapsulated by gensim, and gensim is the abbreviation of generate similatirity.
This article has the basis of word vector by default. parameter:
from gensim.models import Word2Vec #The following parameters are default values Word2Vec(sentences=None, #sentences can be a list of word segmentation or a large corpus size=100, #The dimension of the feature vector alpha=0.025, #learning rate window=5,#In a sentence, the maximum distance between the current word and the predicted word min_count=5,#Minimum word frequency max_vocab_size=None,# sample=0.001, #threshold for random downsampling seed=1,#random number seed workers=3, #Number of processes min_alpha=0.0001, #The minimum value of the learning rate drop sg=0, #The choice of training algorithm, sg=1, use skip-gram, sg=0, use CBOW hs=0,# hs=1, using hierarchyca·softmax, hs=10, using negative sampling negative=5, #This value is greater than 0, use negative sampling to remove the number of 'noise words' (usually set to 5-20); if it is 0, do not use negative sampling cbow_mean=1, # is 0, using the sum of word vectors, 1, using the mean; only applicable to the case of cbow iter = 5, # iterative times null_word = 0, trim_rule = None, #Trim vocabulary rules, use None (minimum min_count will be used) sorted_vocab = 1, # sort vocabulary in descending order batch_words = 10000, # During training, the number of words in each batch compute_loss = False, callbacks = ())
2, kaggle movie review actual combat
- Import required modules
import pandas as pd import numpy as np from gensim.models import word2vec from bs4 import BeautifulSoup from nltk.corpus import stopwords import nltk.data import re
- Training data details
train = pd.read_csv('../Bag of Words Meets Bags of Popcorn/labeledTrainData.tsv/labeledTrainData.tsv',header=0,delimiter='\t',quoting=3) print(train.head())#The first 5 data print(train.tail())#The last 5 data
result:
id sentiment review 0 "5814_8" 1 "With all this stuff going down at the moment ... 1 "2381_9" 1 "\"The Classic War of the Worlds\" by Timothy ... 2 "7759_3" 0 "The film starts with a manager (Nicholas Bell... 3 "3630_4" 0 "It must be assumed that those who praised thi... 4 "9495_8" 1 "Superbly trashy and wondrously unpretentious ... id sentiment review 24995 "3453_3" 0 "It seems like more consideration has gone int... 24996 "5064_1" 0 "I don't believe they made this film. Complete... 24997 "10905_3" 0 "Guy is a loser. Can't get girls, needs to bui... 24998 "10194_3" 0 "This 30 minute documentary Buñuel made in the... 24999 "8478_8" 1 "I saw this movie as a child and it broke my h...