用于高速网络的实时且可靠的基于异常的入侵检测

摘要

Existing machine learning solutions for network- based intrusion detection cannot maintain their reliability over time when facing high-speed networks and evolving attacks. In this paper, we propose BigFlow, an approach capable of processing evolving network traffic while being scalable to large packet rates. BigFlow employs a verification method that checks if the classifier outcome is valid in order to provide reliability. If a suspicious packet is found, an expert may help BigFlow to incrementally change the classification model. Experiments with BigFlow, over a network traffic dataset spanning a full year, demonstrate that it can maintain high accuracy over time. It requires as little as 4% of storage and between 0.05% and 4% of training time, compared with other approaches. BigFlow is scalable, coping with a 10-Gbps network bandwidth in a 40-core cluster commodity hardware.

目的

Existing machine learning solutions for network- based intrusion detection can maintain their reliability over time when facing high-speed networks and evolving attacks.

方法（模型）

decision tree (DT) [42], random forest (RF) [43], gradient boosting (GB) [44], and an ensemble [45] classifier composed from DT, RF, and GB that decides based on majority voting across each classifier’s decisions.

The operation of BigFlow proceeds in two main stages: feature extraction and reliable stream learning.

BigFlow can extract up to 158 features

As an example, consider two distinct monitored agents: a switch and a router. The switch exports network packet headers, while the router exports expired netflow records. The Message Consumer module reads both types of events from the message queue, and simply distributes them through the available Message Parser module, keeping the computing load even. The Message Parser module, in turn, processes the packet headers and netflow records according to each event type, collecting the relevant fields.
The Host Aggregator and Flow Aggregator modules perform the actual network flow statistics computation (feature extraction). To do that in near real-time and in a distributed manner, both aggregators receive messages through a keyed stream. The key for the Host Aggregator module is calculated by hashing the event source addresses (source IP address), while the key for the Flow Aggregator module relies on the XOR operation on both source and destination addresses (source and destination IP addresses). To divide the load, each module is responsible for a range of hash values. Thus, through XOR’ing, it is possible to forward

messages from two specific hosts to the same flow aggregator PE, regardless of the direction taken by a packet.
To compute feature values from the grouped events, BigFlow discretizes them in time intervals, referred to as the Tumbling Window modules. Each Tumbling Window module stores and updates the features’ values for a specific period, according to each received event. When a Tumbling Window expires (i.e., the period is over), the values of the flow features are exported in a host or flow statistics format, and the computation of the features’ values starts over for a new window.

The verification module receives (from the classifier), the instance, the assigned class, and the classifier confidence measure on the assigned class. Using the classification thresholds established during the setup stage, the verifier module decides whether the classification outcome should be accepted or not. For instance, consider a confidence threshold of 70% for the Attack class; then, the verifier module accepts an instance labeled as Attack only if its confidence level is above 70%; otherwise, the event is rejected.
The rejected instances are stored (Figure 4). Periodically, these instances are retrieved, and their labels are requested by an administrator. This administrator can be a human that verifies the event label using publicly available label sources, such as the CVE, Twitter or security newsfeed, or the one that is able to understand new legitimate applications or traffic behaviors on the network. It can also be an auxiliary system composed of signature-based NIDSs that are periodically and automatically updated with a new indicator of compromises (e.g., Snort [49], and Bro [50]), which hopefully capture novel attack behaviors.
If an event is labeled, the instance and its correct label are used for the incremental model update; otherwise, the event remains stored until its class (normal or attack) becomes publicly known, or a certain threshold time is reached. In the latter case, the instance can be either discarded or assumed to be Normal. For example, a rejected event is stored for a month, and after this time, if it still had not been associated with an Attack using any of the public sources for labeling, it is deemed as a Normal event.
BigFlow assumes that when an unknown event (attack or not) is classified, the classification confidence level is not reached; thereby, the event is rejected rather than being misclassified. The core idea is simple: high-confidence accepted results represent patterns which the classifier model is still able to identify, while low-confidence results require more attention on the administrator’s side as they potentially represent new traffic behaviors that must be learned by the system.

数据集

We provide the first publicly available dataset for benchmarking intrusion detection engines over a long period, called MAWIFlow

This dataset contains real and labeled network traffic records with 158 features each, extracted from 15-min-long daily traces spread over a year of real network traffic. MAWIFlow is composed of over 6 billion network flows with almost 8 TB of data;

it is based on the network flows that were extracted from the MAWI network packets traces [22] (Samplepoint-F in MAWI archive), collected daily for a 15-min-long interval, from a transit link between Japan and USA.

network anomalies can be made of several types of portscan, network scan, denial-of-service, distributed denial-of-service, amongst others network-level attacks

Realism: The network traffic used for building the dataset was obtained from real network traces. Moreover, MAWIFlow was built from over a year-long observation data of real network traces, enabling not only evaluation of the detection system during a specific period of time, but also the evaluation of its behavior over time, when facing new network traffic behavior;
Validity: The network traces used for building the MAWIFlow dataset were collected from real network traces. Although MAWI (network traces used in MAWIFlow) is provided in a sanitized manner, i.e., payload is removed and sensitive data from network packet headers are encrypted, the network flow reconstruction is still possible. In this manner, the sanitization process used by MAWI does not affect the features’ values;
Prior labeling: The event labels were identified by state-of- the-art unsupervised ML techniques (assessed by MAWILab). In this manner, supervised ML techniques can be evaluated regarding their performance as compared to unsupervised techniques;
High Variability: MAWIFlow is highly variable not only owing to the used network traces but also owing to its long period of recording. The used network traces are real, valid, and collected from real network infrastructure, thereby it presents the expected variability from production environments. Nonetheless, owing to its long period of recording (the entire year 2016), the detection system can be evaluated considering the environment variability during an entire year.
Reproducibility and Public Availability: The used network traces were collected from publicly available sources (MAWI). Moreover, BigFlow (Section IV) source code is also publicly available.

currently the most used dataset is still the DARPA1998 dataset

MAWIFlow tackles the problem of creating representative datasets by using real and valid network traces, while labeling is achieved using state-of- the-art signature-based detection techniques.

特征

158 host-based and flow-based features

15 features in [14], 21 features in [15], 60 features in [16], and 62 features in [17]

incremental model 增量模型

stream learning techniques to analyze traffic in near real time 流学习技术实时分析流量

For each of the evaluated classifiers two update schemes were tested: no-update and weekly-update. The no-update scheme used a single training step using the data of MAWIFlow from the first seven days of January, and then employed the built model for the remainder of the year. In the weekly-update scheme, the model lasted for only seven days, and then a new model was built using the previous seven days of data as training, thus retraining (rebuilding) the classifier 52 times during the year (once every week).

In summary, this experiment provides evidence that in production high-speed networks, anomaly detection classifiers must be updated periodically; otherwise, their outputs become unreliable over time. However, regularly updating the classifiers is challenging in high-speed networks, because the networks’ activity must be stored for further analysis and should be labeled accordingly.

使用验证策略大大减少了拒绝事件的发生

BigFlow employs a verification mechanism, which checks whether the classification outcome should be accepted in order to avoid high confidence in classification mistakes.

效果

In this paper, we assess this accuracy loss experimentally, using a real network traffic dataset spanning a year and four ML classifiers. Our experiments show that the accuracy of classifiers trained in the beginning of the year can decrease by up to 23% during the year.

As a result, BigFlow provides an updated stream learning classification in near real-time with selective human assistance. This is because only instances that passed through the classifiers and were rejected require action from experts. Thereby, this approach requires minimal human intervention and, most importantly, mitigates the false positives/negatives alarms. As the models are incrementally updated only with instances that were previously rejected, the proposal also minimizes the cost of model updates.

for all evaluated classifiers, one can further reduce the average BigFlow error rate, when a certain rate of rejection can be tolerated

training

the random undersampling without replacement method [24] was applied during the training stage, to balance the classes (Normal and Attack). The true negative (Normal accuracy) and true positive (Attack accuracy) rates are shown in Figure 1

in this sence 在这些领域中
processing elements (PE) 处理元素

西杭

发布了267 篇原创文章 · 获赞 51 · 访问量 25万+

他的留言板关注