深度神经网络在入侵检测系统(IDS)中的应用

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/SHERO_M/article/details/80943676

深度神经网络在入侵检测系统(IDS)中的应用

操作系统:Ubuntu16.04LTS 64位

GPU:GTX 1060 3GB

开发环境:Python 2.7、MATLAB R2016a

深度学习框架:TensorFlow 1.1.0

0、总体介绍

基于以上,首先进行数据预处理,然后设计了一种新的深度神经网络并将其应用在入侵检测中,与传统方法相比,检测率有了显著提升,误报率也随之下降。

入侵检测是对入侵行为的发掘,是采集和分析计算机网络或计算机系统中若干关键点信息,从中发现网络或系统中是否有违反安全策略的行为和被攻击的迹象。

入侵检测系统(Intrusion Detection System,IDS)则是完成如上功能的独立系统,对确保网络系统的安全具有十分重要的意义。

传统的IDS是采取分析和提取入侵模式和攻击特点,建立检测规则库及模式库,所以在检测准确率和智能上存在明显不足,也导致过多的人工参与。希望通过深度学习的方法进行改进。

1、IDS模型

数据采集处理模块:主要是对入侵信息的采集或收集,根据后边模型的需要进行特征数据的预处理,一般包括以下部分:数据过滤、规范化和归一化等。

特征学习模块:主要功能是利用NDNN网络模型对大量用于训练的网络数据进行网络特征提取,不断优化各个网络层次的参数,并将训练好的NDNN模型保存下来。

入侵识别分类模块:主要功能是利用已保存的NDNN网络模型去识别和分类未知的数据,并规范化其分类结果。若将其判定为攻击类型,则触发响应并报警。

2、数据预处理

KDD99数据集 是美国麻省理工学院林肯实验室提供的一种被广泛使用的入侵检测比赛数据。

NSL-KDD数据集KDD99数据集的改进,去除冗余或重复记录,训练和测试中的记录数量更合理。

 训练集包含大约500万条连接记录,测试集包含大约300万条连接记录。

数据集中每个连接用41个特征来描述:

        TCP连接的基本特征(共9种,1~9

        TCP连接的内容特征(共13种,10~22

        基于时间的网络流量统计特征 (共9种,23~31)

        基于主机的网络流量统计特征 (共10种,32~41)

每个记录包含42个属性,其中包含3个字符型特征、38个数字型特征和1个属性标签, 每个网络连接被标记为正常(normal)或异常(attack),异常类型被细分为4大类共39种攻击类型,Probe(扫描与探测)、Dos(拒绝服务攻击)、U2R(对本地超级用户的非法访问)和R2L(未经授权的远程访问)。测试集中包含一些训练集中没有出现过的攻击类型,为了系统的泛化性能。

1规范化:将三个字符型特征和最后一列的属性标签数值化,即编码处理。标签值进行one-hot编码

Protocal type: 1 icmp; 2 tcp; 3 udp; 4 others.

Service: 1 domain-u; 2 ecr_i; 3 eco_i; 4 finger; 5 ftp_data;  6 ftp; 7 http; 8 hostnames; 9 imap; 10 login;

              11 mtp; 12 netstat; 13 other; 14 private; 15 smtp; 16 systat; 17 telnet; 18 time; 19 uucp; 20 others.

Flag: 1 REJ; 2 RSTO; 3 RSTR; 4 SO; 5 S3; 6 SF; 7 SH; 8 others.

2归一化利用如下函数对数值型属性做归一化    y = (x-xmin)/(xmax-xmin)

3、网络模型

神经网络模型

1 ReLU非线性激活函数:不仅在一定程度上能够防止Sigmoid函数易造成“梯度消失”现象的弊端,而且求导简单。                                                            

2自适应的Adam优化器经过偏置校正后每一次迭代学习率都有一个确定范围,使得参数比较平稳。它为不同的参数计算不同的自适应学习率,对内存需求也较少,收敛速度更快,学习效果更有效,而且可以防止学习率消失、收敛过慢或是高方差的参数更新导致损失函数波动较大等问题

3Softmax激活函数:通常用于具有多个输出神经元的网络,是一种多输出竞争型分类算法。每一个输出取值在 0 1 之间并保证所有的输出神经元之和为1,每个输出代表一种分类类别的概率。

4、全连接层:在softmax 层之前通过一个5个节点的全连接层,将上一隐藏层100维的输出变成5维的输出,使得softmax层输入和输出的维度保持一致。

神经网络模型选择

Types of detected intrusion

 

Predicted

Attack

Normal

Actual

Attack

TP

FN

Normal

FP

TN

The specific definitions of the five metrics are as follows:

          

检测率(DR)=R=(检测出的异常数据个数/异常数据总数)×100%

误检率(FDR)=(误认为异常的正常数据个数/正常数据总数)×100%

漏检率(MAR)=(1-DR)×100%

Algorithm

DR

FDR

MAR

Adaboost [28]

0.8340

0.1740

0.1660

Auto-encoder Network [29]

0.9890

0.0110

0.0110

LSSVM-IDS + FMIFS [33]

0.9946

0.0013

0.0054

LSSVM-IDS + MIFS (β=0.3) [33]

0.9938

0.0023

0.0062

LSSVM-IDS + FLCFS [33]

0.9847

0.0061

0.0153

LSSVM-IDS + All features [33]

0.9916

0.0097

0.0084

Unoptimized DBN-PNN [20]

0.9931

-

0.0069

Optimized DBN-PNN [20]

0.9914

-

0.0086

PCA-PNN [20]

0.9828

-

0.0172

PNN [20]

0.9904

-

0.0096

Proposed algorithm

0.9995

0.0003

0.0005



 

Related intrusion detection algorithms based on deep neural networks

References

Methods

Performance

Fiore et al. [17]

Restricted Boltzmann Machine (RBM)

Accuracy: around 94%

K. Do et al. [18]

An ensemble of Deep Belief Nets (DBNs)

Detection F-score on mixed data is around 72%.

Khaled et al. [19]

RBM together with DBNs

Detection rate is 97.9%.

G. Zhao et al. [20]

DBNs with probabilistic neural network (PNN)

Detection accuracy is about 99%. Detection rate is about 90%.

Niyaz et al. [16]

Self-taught learning (STL)

Accuracy rate is more than 98%, a little lower than 99%.

F-measure can achieve  98.84%

S. Potluri et al. [21]

Accelerated Deep Neural Network (DNN)

The highest detection accuracy is 97.7%

Roy et al. [22]

Deep Neural Network (DNN)

Better than SVM in intrusion detection.

Yin et al. [15]

Recurrent neural networks (RNN-IDS)

Superior to traditional machine learning classification methods.

以一条连接记录为例,原始数据如下

0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,255,1.00,0.00,1.00,0.00,0.00,0.00,0.00,0.00,smurf.

标准化后的数据样例:

0.0 0.0 0.0526315789474 0.714285714286 1.48837071923e-06 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 1.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0 1 0 0 0

 Details of the KDD 99 dataset

Intrusion category

Number of training data

Number of testing data

Probe

3723

384

DoS

356691

34767

U2R

41

11

R2L

1024

102

Normal

88515

8763

Intrusion category

DR

FDR

MAR

Probe

0.9896

0.0001

0.0104

DoS

0.9997

0

0.0003

U2R

0.9091

0.0001

0.0909

R2L

0.9804

0.0001

0.0196

overall

0.9995

0.0003

0.0005

Intrusion category

recall

accuracy

F-measure

Precision

Probe

0.9896

0.9818

0.9909

0.9922

DoS

0.9997

0.9990

0.9998

0.9999

U2R

0.9091

0.8182

0.8333

0.7692

R2L

0.9804

0.9706

0.9756

0.9709

Normal

0.9995

0.9997

0.9997

0.9999

Details of the NSL-KDD dataset

Intrusion category

Number of training data

Number of testing data

Probe

10422

1235

DoS

41407

4520

U2R

41

11

R2L

896

98

Normal

61110

6233

Intrusion category

DR

FDR

MAR

Probe

0.9935

0.0009

0.0065

DoS

0.9940

0

0.0060

U2R

0.9091

0.0002

0.0909

R2L

0.9796

0.0005

0.0204

overall

0.9935

0.0016

0.0065

Intrusion category

recall

accuracy

F-measure

precision

Probe

0.9935

0.9773

0.9927

0.9920

DoS

0.9940

0.9867

0.9959

0.9978

U2R

0.9091

0.8182

0.6452

0.5000

R2L

0.9796

0.9694

0.9412

0.9057

Normal

0.9935

0.9984

0.9959

0.9983

features of an original intrusion data record

Description

Feature

Data attributes

Basic features of individual TCP connections.

duration

continuous

protocol_type

symbolic

service

symbolic

flag

symbolic

src_bytes

continuous

dst_bytes

continuous

land

symbolic

wrong_fragment

continuous

urgent

continuous

Content features within a connection suggested by domain knowledge

hot

continuous

num_failed_logins

continuous

logged_in

symbolic

num_compromised

continuous

root_shell

continuous

su_attempted

continuous

num_root

continuous

num_file_creations

continuous

num_shells

continuous

num_access_files

continuous

num_outbound_cmds

continuous

is_host_login

symbolic

is_guest_login

symbolic

Traffic features computed using a two-second time window

count

continuous

srv_count

continuous

serror_rate

continuous

srv_serror_rate

continuous

rerror_rate

continuous

srv_rerror_rate

continuous

same_srv_rate

continuous

diff_srv_rate

continuous

srv_diff_host_rate

continuous

Traffic features computed in and out a host

dst_host_count

continuous

dst_host_srv_count

continuous

dst_host_same_srv_rate

continuous

dst_host_diff_srv_rate

continuous

dst_host_same_src_port_rate

continuous

dst_host_srv_diff_host_rate

continuous

dst_host_serror_rate

continuous

dst_host_srv_serror_rate

continuous

dst_host_rerror_rate

continuous

dst_host_srv_rerror_rate

continuous

猜你喜欢

转载自blog.csdn.net/SHERO_M/article/details/80943676
今日推荐