Use the command line to call libsvm for classification under windows

Goal: Use SVM to classify 4 types of feature samples.

Step 1: Divide the training set and test set for 4 types of feature samples (sample size is 120*7), the classification labels correspond to one of them, and the two data sets and corresponding labels are stored in train_sample.mat, test_sample.mat, respectively , Where the classification labels are placed in the first column of the two mat files.
Step 2: Convert the file in mat format to the format used by libsvm.
Step 3: Use win+R and cmd to enter the command prompt window of windows.
Step 4: Input commands to scale, train, and test the data to obtain classification accuracy.

1. The contents of the libsvm toolbox:
(1) The Java folder, which is mainly used in the java platform;
(2) The Python folder, which is a tool for parameter optimization, which will be introduced later;
(3) The tools folder, which mainly contains Four python files for data set sampling (subset), parameter selection (grid), integration test (easy), data check (checkdata);
(4) The windows folder contains four exe packages of libSVM, the libraries we use It's them. There is also a heart_scale inside, which is a sample file, which can be opened with Notepad for testing.
(5) The svm-toy file, a visual tool used to display the training data and classification interface, contains the source code, and the compiled program is in the windows folder;
(6) The heart_scale file is the training file for testing
( 7) Other .h and .cpp files are the source code of the program, and the corresponding .exe files can be compiled. Among them, the most important are the svm.h and svm.cpp files, svm-predict.c, svm-scale.c and svm-train.c (there is also a svm-toy.c in the svm-toy folder). It is the interface function in this file that is called. After compilation, it is the corresponding four exe programs under windows. In addition, the README and FAQ inside are good help files.

2. Libsvm data format
[label] [index1] [value1] [index2] [value2]…
[label] [index1] [value1] [index2] [value2]…
label: target value, classification label, usually some integer. If it is regression, the target value is a continuous value.
index: Index value, in ascending order of integers.
value: Feature value (attribute data), data used for training, usually composed of a bunch of real numbers.
Copy write4libsvm.m from the reference blog [1]. This is a format conversion program. You don't need to understand it. Enter the name of the calling function in the command window of matlab, and then select the data file to be converted. The code of write4libsvm.m is as follows:

function write4libsvm 
% 为了使得数据满足libsvm的格式要求而进行的数据格式转换 注意原始格式是mat的数据格式,转化成txt或者dat都可以。
% 原始数据保存格式为: 
%             [标签 第一个属性值 第二个属性值...] 
% 转换后文件格式为满足libsvm的格式要求,即: 
%             [标签 1:第一个属性值 2:第二个属性值 3:第三个属性值 ...] 
% Genial@ustc 
% 2004.6.16 
[filename, pathname] = uigetfile( {
    
    '*.mat', ... 
       '数据文件(*.mat)'; ... 
       '*.*',                   '所有文件 (*.*)'}, ... 
   '选择数据文件'); 
try 
   S=load([pathname filename]); 
   fieldName = fieldnames(S); 
   str = cell2mat(fieldName); 
   B = getfield(S,str); 
   [m,n] = size(B); 
   [filename, pathname] = uiputfile({
    
    '*.txt;*.dat' ,'数据文件(*.txt;*.dat)';'*.*','所有文件 (*.*)'},'保存数据文件'); 
   fid = fopen([pathname filename],'w'); 
   if(fid~=-1) 
       for k=1:m 
           fprintf(fid,'%3d',B(k,1)); 
           for kk = 2:n 
               fprintf(fid,'\t%d',(kk-1)); 
               fprintf(fid,':'); 
               fprintf(fid,'%d',B(k,kk)); 
           end 
           k 
           fprintf(fid,'\n'); 
       end 
       fclose(fid); 
   else 
       msgbox('无法保存文件!'); 
   end 
catch 
end 

I chose to save the text in the txt format, but later found that there was a problem when I entered the command in the Windows command line prompt window. The key solution will be explained later.

3. Data scaling
svm-scale is used to scale the original sample. The range can be set by yourself, usually [0,1] or [-1,1]. The main purpose of scaling is to: (1) prevent a feature from being too large or too small, which will play an unbalanced role in training; (2) for the calculation speed, because the inner product operation or exp is used in the kernel calculation Calculation, unbalanced data may cause calculation difficulties.
Usage: svm-scale [-l lower -u upper]
[-y y_lower y_upper]
[-s save_filename]
[-r restore_filename] filename
Among them, [] are all optional: -l: set the lower limit of data; lower: The set data lower limit, the default is -1; -u sets the data upper limit; upper: the set data upper limit, the default is 1; -y: whether to scale the target value at the same time; y_lower is the lower Limit, y_upper is the upper limit; -s save_filename: means to save the scaling rules as a file save_filename; -r restore_filename: means to scale according to the existing rule file restore_filename; filename: the data file to be scaled, the file format is in accordance with libSVM format.
For example:
(1) Convert the libsvmtrain_sample file to libSVM format
svm-scale -l 0 -u 1 -s data.range libsvmtrain_sample>train_scale.txt
(2) Scale and convert the libsvmtest_sample file according to the existing rule file data.range For libSVM format
svm-scale -r data.range libsvmtest_sample>test_scale.txt

4. Training data svm-train
svm-train mainly implements training on training data sets and can obtain SVM models.
Usage: svm-train [options] training_set_file [model_file]
where: (1) options are operating parameters, and the available options are the meanings of which refer to blog [5];
(2) training_set_file is the data to be trained (already zoomed) ;
(3) model_file is the result file to be saved, called the model file, for use in prediction;
after training, we can predict the data, but the training data may not be the optimal parameters, so further optimization is required , Here we skip this step for now, readers can refer to the blog [3], which has a detailed process.

V. Test data svm- predict
usage: svmpredict [options] test_file model_file output_file
Among them: (1) options are operating parameters, and the available options are the meanings represented in the blog [5];
(2) test_file: the data to be predicted The format of the file must conform to the libSVM format. Even if you don’t know the value of label, you must fill in one at will. svmpredict will give the correct label result in output_file. If you know the value of label, it will output the correct rate;
(3) model_file : The model trained by the previous svm-train;
(4) output_file: is the output file of svmpredict, representing the predicted result value.

6. Example
Use win+R and cmd to enter the command prompt window of windows. Enter the command as follows.
Enter E: (indicating to enter the E disk), then E:> appears, then enter the cd\ path (indicating to enter the root directory of the current disk),
and then enter the following four lines of code.
Data scaling: svm-scale -l 0 -u 1 -s data.range libsvmtrain_sample>train_scale.txt
svm-scale -r data.range libsvmtest_sample>test_scale.txt
Data training: svm-train -t 0 -c 1.4142 train_scale.txt Model
data test: svm-predict test_scale.txt model predict_result
will have the problem shown in Figure 1, showing that the file cannot be opened.
Figure 1 shows that libsvmtrain_sample cannot be opened
The solution is to change the format of the training data file and the test data file, click the .txt suffix, the file icon will become blank, but it can still be opened with Notepad.
Re-enter the above command, the problem is solved, as shown in Figure 2.
Figure 2 Input command and test result
The test results show that the classification accuracy is 90%.

Refer to the blog
[1] How to convert to the data format supported by libsvm and do regression analysis.
[2] MATLAB environment uses LIBSVM-data format analysis (2).
[3] Introduction to libsvm and description of function call parameters.
[4] How to use LIBSVM and examples under windows.
[5] Usage and parameter meaning of svmtrain and svmpredict.

Guess you like

Origin blog.csdn.net/weixin_45317919/article/details/108433618