[MATLAB Issue 77] MATLAB code implementation of dimensionality reduction/feature sorting/data processing regression/classification problems based on MATLAB proxy model algorithm
This article introduces a collection of feature sorting methods based on the libsvm proxy model algorithm, including:
1. Sorting based on the prediction accuracy of each feature (libsvm proxy model)
2. Feature sorting based on the correlation coefficient corr (libsvm proxy model)
3. svmrfe_ker (two classifications) ) [Subsequent update]
4. Feature sorting svmrfe_ori (two classifications) based on SVM-RFE recursive feature elimination [Subsequent update]
1. Multi-input single-output multi-classification problem
Data settings:
categorical data, 12 inputs, 1 output, 4 categories, 357 samples
classdata=xlsread('数据集C.xlsx');
X=classdata(:,1:end-1)';% 输入变量
Y=classdata(:,end);%输出标签
[X, ps_input] = mapminmax(X, 0, 1);
X=X';
ptrain_per=0.7;%训练比例
trainIdx = randperm(size(X,1),ceil(size(X,1)*ptrain_per));%训练样本编号
testIdx = setdiff(1:size(X,1),trainIdx);%测试样本编号
K=10;%10折
cvObj = cvpartition(Y(testIdx),'k',K);
userdata.cvObj = cvObj;
userdata.ft = X(testIdx,:);%测试集输入
userdata.target = Y(testIdx);%测试集输出
nSel = size(X,2);%选择的特征数量 ,可以小于等于变量特征数
1. Sorting based on the prediction accuracy of each feature (libsvm proxy model)
That is, each variable is used as an input feature, and the features are sorted by the ten-fold average error rate.
The cumulative contribution is 0.9
2. Feature ranking based on correlation coefficient corr (libsvm proxy model)
Fitness function - the average R2 of the test set is: 0.88588
2. Multi-input single-output regression problem
Data settings:
categorical data, 7 inputs and 1 output, 107 samples
%% 清空环境变量
warning off % 关闭报警信息
close all % 关闭开启的图窗
clear % 清空变量
clc % 清空命令行
%% 导入数据
res = xlsread('数据集.xlsx');
%% 划分训练集和测试集
ptrain_per=0.7;%训练比例
trainIdx = randperm(size(res,1),ceil(size(res,1)*ptrain_per));%训练样本编号
testIdx = setdiff(1:size(res,1),trainIdx);%测试样本编号
P_train = res(trainIdx, 1: 7)';
T_train = res(trainIdx, 8)';
M = size(P_train, 2);
P_test = res(testIdx, 1: 7)';
T_test = res(testIdx, 8)';
N = size(P_test, 2);
%% 数据归一化
[p_train, ps_input] = mapminmax(P_train, 0, 1);
p_test = mapminmax('apply', P_test, ps_input);
[t_train, ps_output] = mapminmax(T_train, 0, 1);
t_test = mapminmax('apply', T_test, ps_output);
K=10;%10折
cvObj = cvpartition(Y(testIdx),'k',K);
userdata.cvObj = cvObj;
userdata.ft = X(testIdx,:);%测试集输入
userdata.target = Y(testIdx);%测试集输出
nSel = size(X,2);%选择的特征数量 ,可以小于等于变量特征数
1. Sorting based on the prediction accuracy of each feature (libsvm proxy model)
That is, each variable is used as an input feature, and the features are sorted by the ten-fold average error rate.
The cumulative contribution is 0.9
2. Feature ranking based on correlation coefficient corr (libsvm proxy model)
3. Code acquisition
CSDN private message reply "Issue 77" to get the download method.