Intelligent Intrusion Detection Systems Flashcards
What are the types of IDS?
Host Network Hypervisor Application Protocol Server
How does host based IDS work?
Monitor resource utilisation and audit trails of specific servers or devices.
How does network based IDS work?
Monitor and analyse traffic flows on the network
How do application based IDS work?
Separate IDS for different resources IE: Email and web
How do protocol based IDS work?
Protocols grouped together, such as those based used in web communications
How do server based IDS work?
Subnets or server groups share an IDS
What are the methods of IDS?
Misuse based
Anomaly based
Classification Based
Combination Based
How do Misuse based IDS work?
Look at patterns, signature definitions etc.
How do Anomaly based IDS work?
Statistical, machine learning or data mining approaches.
Use baselines and checks for deviations
How do classification based IDS work?
Binary or multi-classification: Decision trees, Bayes, K nearest Neighbour.
Limited in that these require pre-phase of labelling.
How do combination based IDS work?
Utilise the best of each technique but suffer from high computational costs.
What are data stream methods and what are its benefits?
Used to build models from big datasets
Don’t suffer from the concept or feature drift of batch streaming.
What is concept drift?
When a data distribution varies over time and describes the nature of network traffic
What is feature drift?
When features change over time as changes in data patterns dictate different levels of features.
What are the issues with current IDS datasets?
They are outdated by around 20 years as much of the research is still being done against DARPA and KDD datasets.
What are the common issues of datasets?
Some pre-proccessing is necessary to clean the data and transform it for training.
What are the issues of class balancing?
Labels are applied to packet and in some datasets the attack labels make up a small portion of the dataset.
This leads to some models favouring normal traffic.
What are some of the ways to resolve the issues of class balancing?
Cost function based approaches to assign costs to minority instances
How are predictive models evaluated?
Leave one out
Hold out
Prospective sampling
Randomization
How does Leave One Out work?
Uses K-fold partitioning which is a technique to build a model on K-1 folds of the data and evaluate against the final fold.
How does Hold out evaluation work?
Divide the data into two, use for section for training and one for testing.
How does prospective sampling work?
Uses a new sampled dataset seperate from the dataset
How do Randomisation methods work?
Use sample instances without replacement
What metrics are used to ass classification models?
True Positive (TP) True Negative (TN) False positive (FP) False Negative (FN)
What is sensitivity?
The True Positive Rate.
TP/(TP+FN)
What is specificity?
The True Negative Rate.
TN/(TN+FP)
What is precision
TP/(TP+FP)
How many positive were correct
What is the F-Measure
Harmonic mean of precision and sensitivity.
2TP/(2TP + FN + FN)
How are thresholds used in ML algorithms?
Can return predictions using label or probability scores.
The discriminating threshold will be used to label data based on where it lies on the cutoff (line).
What is the ROC?
Receiver Operator Characteristic (ROC).
Curve plots the FPR against the TPR for every threshold value used to assign instances to their class.
What are the research issues of IDS using ML?
Using fixed thresholds to flag anomalies may not be as accurate as adaptable thresholds.
Scalability issues from data mining but the higher the volume of data the more difficult it is to process in real time.