Analysis Flashcards
In ROC (Receiver Operating Characteristic) analysis, what 2 measurements do we take for each threshold level?
- Specificity and Sensitivity
OR
- False Positive Rate and True Positive Rate
What does the ROC curve plot?
- Specificity(x-axis) and Sensitivity (y-axis)
OR
- False Positive Rate (x-axis) and True Positive Rate (y-axis)
What is Sensitivity?
TP/ (number of real positives)
classifies how good the model is at picking out positive values
What is Specificity?
TN/ (number of real negatives)
classifies how good the model is at picking out negative values
What does a good ROC curve look like?
Like a top-left corner
It should show a sharp rise in the True Positive Rate, without much increase in the False Positive Rate
This means it can classify a lot of positive samples correctly, without misclassifying negative samples
What metric do we use to show how good an ROC curve is?
We look at the area underneath the ROC curve,
the ideal case is an area of 1
Are KNNs good with large scale data?
No
There is a high computational complexity of neighbour search and distance calculation with lots of dimensions
Why do KNNs have a high memory cost?
They need to store all the training data.
Are KNNs good with dealing with imbalanced data?
No
Are KNNs sensitive to outliers?
Yes
What is the no free-lunch theorem?
This is more of a philosophy which states that:
Given no prior information to the learning task or data distribution
We can never say that any particular algorithm has a guaranteed advantage over any other.
What do we need to decide when using KNN?
The neighbour number K
The distance measure
Can a KNN handle both linear and non-linear data patterns?
Yes
Can we use Regularised Linear Least Squares with a small dataset?
Yes, good results can still be achieved
Does Linear Regression have a low computational cost?
Yes