[] Tensorflow2.0 processing structured data -titanic predictor of survival

1, prepare data

import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt
import tensorflow as tf 
from tensorflow.keras import models,layers
 
dftrain_raw = pd.read_csv('./data/titanic/train.csv')
dftest_raw = pd.read_csv('./data/titanic/test.csv')
dftrain_raw.head(10)

part of data:

Related Field Description:

  • Survived: 0 represents the death, 1 for survival [y] label
  • Pclass: passenger holding a ticket category, there are three values ​​(2,3) is converted into [encoding] onehot
  • Name: Passenger Name [truncated]
  • Sex: Passengers gender converted to bool [feature]
  • Age: Passenger Age (missing) [value features, add "age is missing" as a secondary feature]
  • SibSp: [wherein the number of passengers] NUMERICAL sibling / spouse (integer)
  • Number (integer) [value] characteristic passengers parent / child: Parch
  • Ticket: ticket number (string) [truncated]
  • Fare: passenger holding ticket price (floating-point numbers, ranging from 0-500) [characteristic value]
  • Cabin: Passengers cabin where the (missing) [add "where the cabin is missing" as a secondary feature]
  • Embarked: passenger embarkation port: S, C, Q (deletion) is converted into [onehot coding, four dimension S, C, Q, nan}

2, to explore data

(1) Label Distribution

%matplotlib inline
%config InlineBackend.figure_format = 'png'
ax = dftrain_raw['Survived'].value_counts().plot(kind = 'bar',
     figsize = (12,8),fontsize=15,rot = 0)
ax.set_ylabel('Counts',fontsize = 15)
ax.set_xlabel('Survived',fontsize = 15)
plt.show()

(2) the age distribution

Age distribution

%matplotlib inline
%config InlineBackend.figure_format = 'png'
ax = dftrain_raw['Age'].plot(kind = 'hist',bins = 20,color= 'purple',
                    figsize = (12,8),fontsize=15)
 
ax.set_ylabel('Frequency',fontsize = 15)
ax.set_xlabel('Age',fontsize = 15)
plt.show()

(3) the correlation between age and labels

%matplotlib inline
%config InlineBackend.figure_format = 'png'
ax = dftrain_raw.query('Survived == 0')['Age'].plot(kind = 'density',
                      figsize = (12,8),fontsize=15)
dftrain_raw.query('Survived == 1')['Age'].plot(kind = 'density',
                      figsize = (12,8),fontsize=15)
ax.legend(['Survived==0','Survived==1'],fontsize = 12)
ax.set_ylabel('Density',fontsize = 15)
ax.set_xlabel('Age',fontsize = 15)
plt.show()

3, data preprocessing

(1) to convert the one-hot encoding Pclass

= dfresult pd.DataFrame ()
 # the ticket type into a one-hot encoding 
dfPclass = pd.get_dummies (dftrain_raw [ " pClass " ])
 # Set the column name 
dfPclass.columns = [ ' Pclass_ ' + STR (X) for X in dfPclass.columns]
dfresult = pd.concat([dfresult,dfPclass],axis = 1)
dfresult

(2) is converted to the Sex One-hot encoding

#Sex
dfSex = pd.get_dummies(dftrain_raw['Sex'])
dfresult = pd.concat([dfresult,dfSex],axis = 1)
dfresult

(3) Age column filled with missing values ​​0 and redefine a position marker Age_null for missing values

# Missing values filled with zeroes 
dfresult [ ' Age ' ] = dftrain_raw [ ' Age ' ] .fillna (0)
 # increasing a data Age_null, while the data is not 0 by 0, 1 for the data represented by 0 , i.e. the position of the mark appears 0 
dfresult [ ' Age_null ' ] = pd.isna (dftrain_raw [ ' Age ' ]). asType ( ' Int32 ' )
dfresult

(4) directly spliced ​​SibSp, Parch, Fare

dfresult['SibSp'] = dftrain_raw['SibSp']
dfresult [ ' Respect ' ] = dftrain_raw [ ' Respect ' ]
dfresult['Fare'] = dftrain_raw['Fare']
dfresult

(5) Cabin deletion marker position

#Carbin
dfresult['Cabin_null'] =  pd.isna(dftrain_raw['Cabin']).astype('int32')
dfresult

(6) to convert to a one-hot encoding Embarked

# Embarked 
# parameters should be noted that dummy_na = True, the additional missing values marked 
dfEmbarked = pd.get_dummies (dftrain_raw [ ' Embarked ' ], dummy_na = True)
dfEmbarked.columns = ['Embarked_' + str(x) for x in dfEmbarked.columns]
dfresult = pd.concat([dfresult,dfEmbarked],axis = 1)
dfresult

Finally, the package as a function of the operation:

def preprocessing(dfdata):
 
    dfresult= pd.DataFrame()
 
    #Pclass
    dfPclass = pd.get_dummies(dfdata['Pclass'])
    dfPclass.columns = ['Pclass_' +str(x) for x in dfPclass.columns ]
    dfresult = pd.concat([dfresult,dfPclass],axis = 1)
 
    #Sex
    dfSex = pd.get_dummies(dfdata['Sex'])
    dfresult = pd.concat([dfresult,dfSex],axis = 1)
 
    #Age
    dfresult['Age'] = dfdata['Age'].fillna(0)
    dfresult['Age_null'] = pd.isna(dfdata['Age']).astype('int32')
 
    # SibSp, Respect, Fare 
    dfresult [ ' SibSp ' ] = dfdata [ ' SibSp ' ]
    dfresult [ ' Respect ' ] = dfdata [ ' Respect ' ]
    dfresult [ ' Making ' ] = dfdata [ ' Making ' ]
 
    #Carbin
    dfresult['Cabin_null'] =  pd.isna(dfdata['Cabin']).astype('int32')
 
    #Embarked
    dfEmbarked = pd.get_dummies(dfdata['Embarked'],dummy_na=True)
    dfEmbarked.columns = ['Embarked_' + str(x) for x in dfEmbarked.columns]
    dfresult = pd.concat([dfresult,dfEmbarked],axis = 1)
 
    return(dfresult)

Then data preprocessing:

x_train = preprocessing(dftrain_raw)
y_train = dftrain_raw['Survived'].values
 
x_test = preprocessing(dftest_raw)
y_test = dftest_raw['Survived'].values
 
print("x_train.shape =", x_train.shape )
print("x_test.shape =", x_test.shape )

x_train.shape = (712, 15)

x_test.shape = (179, 15)

3, using tensorflow model definition

Keras used to build the model interface has the following three ways: by using the Sequential layer sequentially building the model, using the API to build any configuration function expression model, Model base class inherits build custom model. Here choose the simplest Sequential, according to the order of the model layer.

tf.keras.backend.clear_session()
 
model = models.Sequential()
model.add(layers.Dense(20,activation = 'relu',input_shape=(15,)))
model.add(layers.Dense(10,activation = 'relu' ))
model.add(layers.Dense(1,activation = 'sigmoid' ))
 
model.summary()

4, training model

Trainers usually have three kinds of methods, built-fit method, built train_on_batch methods, as well as custom training cycle. Here we have chosen the most common and most simple built-fit method

# Binary classification in Dual cross entropy loss function 
model.compile (Optimizer = ' ADAM ' ,
            loss='binary_crossentropy',
            metrics=['AUC'])
 
history = model.fit(x_train,y_train,
                    batch_size= 64,
                    epochs= 30,
                    validation_split = 0.2 # divided part of the training data for verification 
                   )

result:

Epoch 1/30
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/resource_variable_ops.py:1817: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
9/9 [==============================] - 0s 30ms/step - loss: 4.3524 - auc: 0.4888 - val_loss: 3.0274 - val_auc: 0.5492
Epoch 2/30
9/9 [==============================] - 0s 6ms/step - loss: 2.7962 - auc: 0.4710 - val_loss: 1.8653 - val_auc: 0.4599
Epoch 3/30
9/9 [==============================] - 0s 6ms/step - loss: 1.6765 - auc: 0.4040 - val_loss: 1.2673 - val_auc: 0.4067
Epoch 4/30
9/9 [==============================] - 0s 7ms/step - loss: 1.1195 - auc: 0.3799 - val_loss: 0.9501 - val_auc: 0.4006
Epoch 5/30
9/9 [==============================] - 0s 6ms/step - loss: 0.8156 - auc: 0.4874 - val_loss: 0.7090 - val_auc: 0.5514
Epoch 6/30
9/9 [==============================] - 0s 5ms/step - loss: 0.6355 - auc: 0.6611 - val_loss: 0.6550 - val_auc: 0.6502
Epoch 7/30
9/9 [==============================] - 0s 6ms/step - loss: 0.6308 - auc: 0.7169 - val_loss: 0.6502 - val_auc: 0.6546
Epoch 8/30
9/9 [==============================] - 0s 6ms/step - loss: 0.6088 - auc: 0.7156 - val_loss: 0.6463 - val_auc: 0.6610
Epoch 9/30
9/9 [==============================] - 0s 6ms/step - loss: 0.6066 - auc: 0.7163 - val_loss: 0.6372 - val_auc: 0.6644
Epoch 10/30
9/9 [==============================] - 0s 6ms/step - loss: 0.5964 - auc: 0.7253 - val_loss: 0.6283 - val_auc: 0.6646
Epoch 11/30
9/9 [==============================] - 0s 7ms/step - loss: 0.5876 - auc: 0.7326 - val_loss: 0.6253 - val_auc: 0.6717
Epoch 12/30
9/9 [==============================] - 0s 6ms/step - loss: 0.5827 - auc: 0.7409 - val_loss: 0.6195 - val_auc: 0.6708
Epoch 13/30
9/9 [==============================] - 0s 6ms/step - loss: 0.5769 - auc: 0.7489 - val_loss: 0.6170 - val_auc: 0.6762
Epoch 14/30
9/9 [==============================] - 0s 6ms/step - loss: 0.5719 - auc: 0.7555 - val_loss: 0.6156 - val_auc: 0.6803
Epoch 15/30
9/9 [==============================] - 0s 6ms/step - loss: 0.5662 - auc: 0.7629 - val_loss: 0.6119 - val_auc: 0.6826
Epoch 16/30
9/9 [==============================] - 0s 6ms/step - loss: 0.5627 - auc: 0.7694 - val_loss: 0.6107 - val_auc: 0.6892
Epoch 17/30
9/9 [==============================] - 0s 6ms/step - loss: 0.5586 - auc: 0.7753 - val_loss: 0.6084 - val_auc: 0.6927
Epoch 18/30
9/9 [==============================] - 0s 6ms/step - loss: 0.5539 - auc: 0.7837 - val_loss: 0.6051 - val_auc: 0.6983
Epoch 19/30
9/9 [==============================] - 0s 7ms/step - loss: 0.5479 - auc: 0.7930 - val_loss: 0.6011 - val_auc: 0.7056
Epoch 20/30
9/9 [==============================] - 0s 9ms/step - loss: 0.5451 - auc: 0.7986 - val_loss: 0.5996 - val_auc: 0.7128
Epoch 21/30
9/9 [==============================] - 0s 7ms/step - loss: 0.5406 - auc: 0.8047 - val_loss: 0.5962 - val_auc: 0.7192
Epoch 22/30
9/9 [==============================] - 0s 6ms/step - loss: 0.5357 - auc: 0.8123 - val_loss: 0.5948 - val_auc: 0.7212
Epoch 23/30
9/9 [==============================] - 0s 6ms/step - loss: 0.5295 - auc: 0.8181 - val_loss: 0.5928 - val_auc: 0.7267
Epoch 24/30
9/9 [==============================] - 0s 6ms/step - loss: 0.5275 - auc: 0.8223 - val_loss: 0.5910 - val_auc: 0.7296
Epoch 25/30
9/9 [==============================] - 0s 6ms/step - loss: 0.5263 - auc: 0.8227 - val_loss: 0.5884 - val_auc: 0.7325
Epoch 26/30
9/9 [==============================] - 0s 7ms/step - loss: 0.5199 - auc: 0.8313 - val_loss: 0.5860 - val_auc: 0.7356
Epoch 27/30
9/9 [==============================] - 0s 6ms/step - loss: 0.5145 - auc: 0.8356 - val_loss: 0.5835 - val_auc: 0.7386
Epoch 28/30
9/9 [==============================] - 0s 6ms/step - loss: 0.5138 - auc: 0.8383 - val_loss: 0.5829 - val_auc: 0.7402
Epoch 29/30
9/9 [==============================] - 0s 7ms/step - loss: 0.5092 - auc: 0.8405 - val_loss: 0.5806 - val_auc: 0.7416
Epoch 30/30
9/9 [==============================] - 0s 6ms/step - loss: 0.5082 - auc: 0.8394 - val_loss: 0.5792 - val_auc: 0.7424

5, evaluation model

We first assess the effect of the model in the training set and validation set.

%matplotlib inline
%config InlineBackend.figure_format = 'svg'
 
import matplotlib.pyplot as plt
 
def plot_metric(history, metric):
    train_metrics = history.history[metric]
    val_metrics = history.history['val_'+metric]
    epochs = range(1, len(train_metrics) + 1)
    plt.plot(epochs, train_metrics, 'bo--')
    plt.plot(epochs, val_metrics, 'ro-')
    plt.title('Training and validation '+ metric)
    plt.xlabel("Epochs")
    plt.ylabel(metric)
    plt.legend(["train_"+metric, 'val_'+metric])
    plt.show()
plot_metric(history,"loss")
plot_metric(history,"auc")

Then look at the effect on the test set:

model.evaluate(x = x_test,y = y_test)

result:

6/6 [==============================] - 0s 2ms/step - loss: 0.5286 - auc: 0.7869
[0.5286471247673035, 0.786877453327179]

6, using the model

(1) predicted probability

model.predict(x_test[0:10])

result:

array([[0.34822357],
       [0.4793241 ],
       [0.43986577],
       [0.7916608 ],
       [0.50268507],
       [0.536609  ],
       [0.29079646],
       [0.6085641 ],
       [0.34384924],
       [0.17756936]], dtype=float32)

(2) predicted category

model.predict_classes(x_test[0:10])

result:

WARNING:tensorflow:From <ipython-input-36-a161a0a6b51e>:1: Sequential.predict_classes (from tensorflow.python.keras.engine.sequential) is deprecated and will be removed after 2021-01-01.
Instructions for updating:
Please use instead:* `np.argmax(model.predict(x), axis=-1)`,   if your model does multi-class classification   (e.g. if it uses a `softmax` last-layer activation).* `(model.predict(x) > 0.5).astype("int32")`,   if your model does binary classification   (e.g. if it uses a `sigmoid` last-layer activation).
array([[0],
       [0],
       [0],
       [1],
       [1],
       [1],
       [0],
       [1],
       [0],
       [0]], dtype=int32)

7. Save the model

You can use Keras way to save the model, you can use TensorFlow natively save. The former is only suitable for use Python environment recovery model, the latter can be cross-platform deployment model. After the recommended one way to save

1) Use keras way to save

# Save the model structure and weights 
model.save ( ' ./data/keras_model.h5 ' )  
 del Model   # delete an existing model

(1) Load Model

# identical to the previous one
model = models.load_model('./data/keras_model.h5')
model.evaluate(x_test,y_test)
WARNING:tensorflow:Error in loading the saved optimizer state. As a result, your model is starting with a freshly initialized optimizer.
6/6 [==============================] - 0s 2ms/step - loss: 0.5286 - auc_1: 0.7869
[0.5286471247673035, 0.786877453327179]

(2) Save the model structure and model structure recovery

# Save the model structure 
json_str = model.to_json ()
 # restore the model structure 
model_json = models.model_from_json (json_str)

(3) Save the model weights

# Save the model weights 
model.save_weights ( ' ./data/keras_model_weight.h5 ' )

(4) recovery model structure and load weights

# Restore the model structure 
model_json = models.model_from_json (json_str)
model_json.compile(
        optimizer='adam',
        loss='binary_crossentropy',
        metrics=['AUC']
    )
 
# Load weight 
model_json.load_weights ( ' ./data/keras_model_weight.h5 ' )
model_json.evaluate(x_test,y_test)
6/6 [==============================] - 0s 3ms/step - loss: 0.5217 - auc: 0.8123
[0.521678626537323, 0.8122605681419373]

2) tensorflow natively

# Save weight, the only way to save weight tensor 
model.save_weights ( ' ./data/tf_model_weights.ckpt ' , save_format = " TF " )
 # Save the model structure and model parameters to a file, save the model has a cross-platform way facilitate deployment 
 
model.save ( ' ./data/tf_model_savedmodel ' , save_format = " TF " )
 Print ( ' Export Model saved. ' )
 
model_loaded = tf.keras.models.load_model('./data/tf_model_savedmodel')
model_loaded.evaluate(x_test,y_test)
INFO:tensorflow:Assets written to: ./data/tf_model_savedmodel/assets
export saved model.
6/6 [==============================] - 0s 2ms/step - loss: 0.5286 - auc_1: 0.7869
[0.5286471247673035, 0.786877453327179]

 

reference:

Open source e-book Address: https: //lyhue1991.github.io/eat_tensorflow2_in_30_days/

GitHub project address: https: //github.com/lyhue1991/eat_tensorflow2_in_30_days

Guess you like

Origin www.cnblogs.com/xiximayou/p/12638196.html