MLSEC 10 Flashcards
ML Pipeline
Data Collection and Labeling
System Design and Learning
Performance Evaluation
Deployment and Operation
Pitfall in Data Collection and Labeling
Sampling Bias
Label Inaccuracy
Pitfall in System Design and Learning
Biased parameters
Spurious correlations
Data snooping
Pitfall in Performance
Evaluation
Inappropriate baselines
Inappropriate measures
Base-rate fallacy
Pitfall in Deployment and Operation
Lab-only evaluation
Inappropriate threat model
Sampling Bias
The collected data does not sufficiently represent the true data distribution of the underlying security problem
Label Inaccuracy
The ground-truth labels are inaccurate, unstable, or errorenous, affecting the estimated performance.
Data Snooping
The learning-based system is trained with data or knowledge typically not available in practice.
Spurious Correlations
Artefacts unrelated to the security problem create shortcut patterns for separating the classes
Biased Parameter Selection
Parameters of the learning-based systems are not entirely fixed at training time and indirectly depend on the test data
Inappropriate Baseline
The evaluation is conducted with limited baseline methods
Inappropriate Performance Measures
The performance measures do not account for the constraints of the security problem, such as class imbalance
Base Rate Fallacy
Class imbalance is ignored when interpreting the performance measures, leading to overestimations
Lab-Only Evaluation
The learning-based system is solely evaluated in a laboratory setting. Practical constraints are not considered.
Inappropriate Threat Model
Security of machine learning is not considered, exposing the learning-based system to attacks