CIDDS - Coburg Intrusion Detection Data Sets

Research Group and Contents

The group is currently researching ways to generate evaluation data sets for network-based intrusion detection systems. Contributions and ideas for improvements are welcome.

The CIDDS repository has been conducted as part of the project WISENT which is supported by the Bavarian Ministry for Economic affairs under grant no. IUK 452/002.

The CIDDS Concept

General idea of CIDDS

CIDDS (Coburg Intrusion Detection Data Sets) is a concept to create evaluation data sets for anomaly-based network intrusion detection systems. Since the IT industry is constantly evolving, attackers are forced to adapt and find new ways to penetrate their target of interest. Hence, the development of intrusion detection systems can be seen as a constantly evolving process between the attackers attempts and the triggered adjustments of the defenders. From this perspective, it is not expedient to test current intrusion detection systems with old data sets. Thus, the main objective of CIDDS is the generation of customizable and up-to-date data sets. In order to tackle this objective, the basic idea behind CIDDS is to create labelled flow-based data sets in a virtual environment using OpenStack.

Exemplary implementation

In the following, we want to explain the general idea of CIDDS with the exemplarily generated CIDDS-001 data set. For creating the CIDDS-001 data set, we emulated a small small business environment. This environment includes several clients and typical servers like an E-Mail server or Web servers (see adjacent figure). Python scripts are used for emulating benign user behaviour like browsing the web, sending and receiving emails, as well as exchanging files. To ensure as realistic user behaviour as possible, clients perform their activities with respect to an individual working schedule which considers working hours and lunch breaks. We are also taking into account that different employees tend to have different working tasks. A manager, for example, would be attending more meetings, thus generating less network traffic than a researcher who is constantly browsing. Hence, characteristics of every user are set using a configuration file.

For generating malicious traffic, Denial of Service (DoS), Brute Force attacks and Port Scans were executed within the network. Since origins, targets, and timestamps of the executed attacks are known, labelling of the recorded NetFlow data was easily possible. For inclusion of actual network traffic, which has its origin outside the OpenStack environment, an external server was deployed. It provides a file synchronization service (Seafile) and a HTTP webserver for the clients. Since this server had a publicly accessible IP address, it was exposed to real and up-to-date attacks from the internet. For further information about the individual data sets, we recommend you to read the corresponding technical report.

Ongoing research

Right now, we extend our user behavior scripts to generate even more realistic network traffic. For example, we plan to integrate additional servers like repository servers and want to add further user activities like Skype etc. Further, we plan to exploit more sophisticated attack scenarios (Browser-Exploits and Trojans) within our OpenStack environment.

Download

CIDDS data sets

Public IP addresses are anonymized (see technical reports). The non-anonymised versions of the data sets cannot be publicly shared. Access on non-anonymised data sets can only be allowed on site at the University of Coburg upon request.

Python scripts

The Python scripts as well as their documentary files are published on Github (https://github.com/markusring/CIDDS). Since OpenStack is an open-source software, we strongly encourage fellow researchers to set up a test environment of their own using our provided scripts. All researchers are welcome to adjust the scripts to the needs of their intrusion detection system.

Terms of Use

If you publish material based on CIDDS or the generation scripts, please cite our papers:

  • Ring, M., Wunderlich, S., Gruedl, D., Landes, D., Hotho, A.: "Flow-based benchmark data sets for intrusion detection." In: Proceedings of the 16th European Conference on Cyber Warfare and Security (ECCWS), pp. 361-369. ACPI (2017)

    BiBTeX citation: @incollection{ring2017flow,
    title={Flow-based benchmark data sets for intrusion detection},
    author={Ring, Markus and Wunderlich, Sarah and Grüdl, Dominik and Landes, Dieter and Hotho, Andreas},
    booktitle={Proceedings of the 16th European Conference on Cyber Warfare and Security (ECCWS)},
    year={2017},
    pages={361--369},
    publisher={ACPI}}
  • Ring, M., Wunderlich, S., Gruedl, D., Landes, D., Hotho, A.: "Creation of Flow-Based Data Sets for Intrusion Detection”. In: Journal of Information Warfare (JIW), Vol. 16, Issue 4, pp. 40-53, 2017

    BiBTeX citation: @article{ring2017creation,
    title={ Creation of Flow-Based Data Sets for Intrusion Detection},
    author={Ring, Markus and Wunderlich, Sarah and Grüdl, Dominik and Landes, Dieter and Hotho, Andreas},
    journal={Journal of Information Warfare},
    volume={16},
    issue={4},
    year={2017},
    pages={40--53},
    publisher={JIW}}