27 Sep 2018 R 语言 logistics 回归学习笔记

Logistics regression有着非常好的模型解释,
以下为本人总结的在adult数据集上的模型解释步骤

##第1步:load data
experiment_data<-read.table(‘C:\Users\data\adult.txt’,sep = ‘,’, header = TRUE)
colnames(experiment_data) <- c(“age30”,“age60”,“private”,“self_emp”,“gov”,“edu12”,“edu9”,“prof”,“white”,“male”,“hours50”,“hours30”,“us”,“outcome”)
M<-ncol(experiment_data)
Y<-experiment_data[,14]
obsX<-experiment_data[,-14]
############################################### Algorithm One ###################################################

Aim is to explanatory the model

logistic regression model

##第2步:load data
glm.fit <- glm(Y ~ age30 + edu12 + edu9 + prof + white + male + hours50 + hours30,
experiment_data, family = binomial(link = ‘logit’))

Analysis results of the retured model

summary(glm.fit)
glm.probs <- predict(glm.fit, type=“response”)
glm.pred <- ifelse(glm.probs > 0.4, “1”, “0”)
p <- exp(glm.probs)/(1+exp(glm.probs))

#To caucluate the mis-classification rate in-sample.
mean(ifelse(fitted(glm.fit)<0.4,0,1)!=Y)
#第3步:逐步回归
glm.fit1<-step(glm.fit)
summary(glm.fit1)#逐步回归

#第4步:模型解读
exp(glm.fit1KaTeX parse error: Expected 'EOF', got '#' at position 15: coefficients) #̲解释Odds比与x的关系 ex…coefficients[] #求使得pi为0.5的x
ratio05<-glm.fit1KaTeX parse error: Expected 'EOF', got '#' at position 21: …icients[]*0.25 #̲pi为0.5处的pi关于x的变…deviance-glm.fit1KaTeX parse error: Expected 'EOF', got '\n' at position 54: …ll R2=",R2cox,"\̲n̲") R2nag<-R2cox…null.deviance)/length(Y)))
#5.3计算Nagelkerke拟合优度
cat(“Nagelkerke R2=”,R2nag,"\n")
#5.4残差分析
plot(residuals(glm.fit1))
#5.5异常值诊断
library(car)
influencePlot(glm.fit1)
#第6步:分类表

True Positive(真正,TP):将正类预测为正类数

True Negative(真负,TN):将负类预测为负类数

False Positive(假正,FP):将负类预测为正类数误报 (Type I error)

False Negative(假负,FN):将正类预测为负类数→漏报 (Type II error)

fitt.pi<-fitted(glm.fit1)#同predict(glm.safe1,data.frame(x2=x2),type=“resp”)
ypred<-1*(fitt.pi>0.5) #1逻辑变量就变成了0和1变量
length(ypred)
n<-table(experiment_data$outcome,ypred)
Precision=n[1,1]/sum(n[1,])
recall=n[1,1]/sum(n[,1])
ACC=(n[1,1]+n[2,2])/(sum(n[1,])+sum(n[2,]))
F1=2
n[1,1]/(2*n[1,1]+n[1,2]+n[2,1])
specificity=n[2,2]/sum(n[2,])
Percantage<-c(n[1,1]/sum(n[1,]),n[2,2]/sum(n[2,]))

猜你喜欢

转载自blog.csdn.net/u012373972/article/details/82868365
今日推荐