Article Directory
1. Experimental Purpose
Hiring.csv contains the company's recruitment information, such as the candidate's work experience, written test results and personal interview results. Based on these three factors, human resources will determine wages. With this data, you need to build a machine learning model for the human resources department to help them determine the salary of future candidates. Use this predicted salary to predict the salary of the following candidates,
(1) 2 years work experience, test score 9, interview score 6
(2) 12 years work experience, test score 10, interview score 10
2. Import the necessary modules and read the data
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from word2number import w2n
df = pd.read_csv('hiring.csv')
df
3. Process the data
3.1. Experience field digitization
df.experience = df.experience.fillna('zero') #NaN统一替换为zero
df
df.experience = df.experience.apply(w2n.word_to_num) #运用w2n.word_to_num将字母转化为数字
df
3.2. Test_score (out of 10) field NaN is replaced with average
import math
median_test_score = math.floor(df['test_score(out of 10)'].mean()) #取平均数并向下取整
median_test_score
#输出
7
df['test_score(out of 10)'] = df['test_score(out of 10)'].fillna(median_test_score) #用平均数填充NaN
df
4. Training + prediction
reg = LinearRegression() #实例化模型
reg.fit(df[['experience','test_score(out of 10)','interview_score(out of 10)']],df['salary($)']) #训练
reg.coef_ #系数
reg.intercept_ #截距
reg.predict([[2,9,6]]) #预测一
reg.predict([[12,10,10]]) #预测二