基于IMDB(数据库)使用一维度卷积文本分类
数据集下载 imdb.npz(网络问题官网下载不了,可用)
代码注释
imdb_cnn.py(点击查看原文)
'''This example demonstrates the use of Convolution1D for text classification. 基于IMDB(数据库)使用一维度卷积文本分类 Gets to 0.89 test accuracy after 2 epochs. 2个周期后89%精确度 90s/epoch on Intel i5 2.4Ghz CPU. 90秒/周期,intel i5 2.4Ghz CPU 10s/epoch on Tesla K40 GPU. 10秒/周期,Tesla K40 GPU ''' from __future__ import print_function from keras.preprocessing import sequence from keras.models import Sequential from keras.layers import Dense, Dropout, Activation from keras.layers import Embedding from keras.layers import Conv1D, GlobalMaxPooling1D from keras.datasets import imdb # set parameters: # 设置参数 max_features = 5000 maxlen = 400 batch_size = 32 embedding_dims = 50 filters = 250 kernel_size = 3 hidden_dims = 250 epochs = 2 print('Loading data...') (x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features) print(len(x_train), 'train sequences') print(len(x_test), 'test sequences') print('Pad sequences (samples x time)') x_train = sequence.pad_sequences(x_train, maxlen=maxlen) x_test = sequence.pad_sequences(x_test, maxlen=maxlen) print('x_train shape:', x_train.shape) print('x_test shape:', x_test.shape) print('Build model...') model = Sequential() # we start off with an efficient embedding layer which maps # our vocab indices into embedding_dims dimensions # 从一个有效的嵌入层开始,该层映射词索引到embedding_dims维度 model.add(Embedding(max_features, embedding_dims, input_length=maxlen)) model.add(Dropout(0.2)) # we add a Convolution1D, which will learn filters # word group filters of size filter_length: # 添加一维卷积层, model.add(Conv1D(filters, kernel_size, padding='valid', activation='relu', strides=1)) # we use max pooling: # 池化处理 model.add(GlobalMaxPooling1D()) # We add a vanilla hidden layer: # 添加vanilla(多层感知器)隐藏层 # 多层感知器有时被俗称为“香草”。参考:https://en.wikipedia.org/wiki/Multilayer_perceptron model.add(Dense(hidden_dims)) model.add(Dropout(0.2)) model.add(Activation('relu')) # We project onto a single unit output layer, and squash it with a sigmoid: # 对应单个单元(神经元节点)的输出层,使用sigmod函数处理后输出 model.add(Dense(1)) model.add(Activation('sigmoid')) model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_data=(x_test, y_test))
代码执行
C:\ProgramData\Anaconda3\python.exe E:/keras-master/examples/imdb_cnn.py Using TensorFlow backend. Loading data... 25000 train sequences 25000 test sequences Pad sequences (samples x time) x_train shape: (25000, 400) x_test shape: (25000, 400) Build model... Train on 25000 samples, validate on 25000 samples Epoch 1/2 32/25000 [..............................] - ETA: 39:26 - loss: 0.6986 - acc: 0.4688 160/25000 [..............................] - ETA: 8:00 - loss: 0.6962 - acc: 0.4250 24480/25000 [============================>.] - ETA: 0s - loss: 0.4068 - acc: 0.7977 24608/25000 [============================>.] - ETA: 0s - loss: 0.4057 - acc: 0.7984 24704/25000 [============================>.] - ETA: 0s - loss: 0.4053 - acc: 0.7987 24832/25000 [============================>.] - ETA: 0s - loss: 0.4048 - acc: 0.7991 24960/25000 [============================>.] - ETA: 0s - loss: 0.4044 - acc: 0.7993 25000/25000 [==============================] - 19s 754us/step - loss: 0.4044 - acc: 0.7994 - val_loss: 0.3281 - val_acc: 0.8565 Epoch 2/2 32/25000 [..............................] - ETA: 11s - loss: 0.2567 - acc: 0.9062 160/25000 [..............................] - ETA: 11s - loss: 0.2439 - acc: 0.9125 288/25000 [..............................] - ETA: 11s - loss: 0.2417 - acc: 0.9097 416/25000 [..............................] - ETA: 11s - loss: 0.2196 - acc: 0.9135 23616/25000 [===========================>..] - ETA: 0s - loss: 0.2307 - acc: 0.9086 23712/25000 [===========================>..] - ETA: 0s - loss: 0.2308 - acc: 0.9085 23840/25000 [===========================>..] - ETA: 0s - loss: 0.2304 - acc: 0.9086 23968/25000 [===========================>..] - ETA: 0s - loss: 0.2302 - acc: 0.9086 24096/25000 [===========================>..] - ETA: 0s - loss: 0.2299 - acc: 0.9087 24224/25000 [============================>.] - ETA: 0s - loss: 0.2300 - acc: 0.9087 24320/25000 [============================>.] - ETA: 0s - loss: 0.2299 - acc: 0.9087 24448/25000 [============================>.] - ETA: 0s - loss: 0.2298 - acc: 0.9086 24576/25000 [============================>.] - ETA: 0s - loss: 0.2299 - acc: 0.9087 24704/25000 [============================>.] - ETA: 0s - loss: 0.2299 - acc: 0.9088 24832/25000 [============================>.] - ETA: 0s - loss: 0.2300 - acc: 0.9086 24960/25000 [============================>.] - ETA: 0s - loss: 0.2306 - acc: 0.9083 25000/25000 [==============================] - 15s 606us/step - loss: 0.2306 - acc: 0.9083 - val_loss: 0.2987 - val_acc: 0.8752 Process finished with exit code 0
Keras相关资料
中文:http://keras-cn.readthedocs.io/en/latest/
实例下载
https://github.com/keras-team/keras
https://github.com/keras-team/keras/tree/master/examples
完整项目下载
方便没积分童鞋,请加企鹅452205574,共享文件夹。
包括:代码、数据集合(图片)、已生成model、安装库文件等。