Data Science MODULE 5 Flashcards
In machine learning we describe the learning of the target function from training data as
Inductive learning
What is feature selection?
Methods, employed to reduce the amount of input variables to those that are believed to be most useful to a model
Unsupervised feature selection?
Ignores the target variable
Supervised feature selection
Use the target variable in the selection process
Wrapper feature selection
Gaan basies en gebruik verskillende inputs, uit jou data, om die beste fit te bepaal
Filter feature selection
Soos ek dit sien, basies korrelasies tussen individuele features en die response. Die bestes word gebruik om die model te train
Derde tipe feature selection?
Hulle noem dit intrinsic - so n tree based model is baie goeie voorbeeld van dit
Is dimensionality reduction n feature selection metode?
Eintlik nie, want nuwe features word eintlik geskep, vanaf die oorspronklike inputs
Drie tipe categorical data
Nominal (r,g,b)
Ordinal (1st, 2nd, 3rd)
Boolean (true and false)
Filter feature selection
Numeries - numeries
Pearson
Spearman
Filter feature selection
Numeries - kategoriee
Anova
Kendall
Filter feature selection
Kategoriee Kategoriee
Chi-squared
Mutual info
Verskil tussen Pearson en Spearman?
Pearson vir lineer
Spearman vir nie-lineer
Verskil tussen Anova en Kendal
Anova - lineer
Kendall - nie lineer
Scikit libraries vir:
Pearson
ANOVA
Chi squared
Mutual info
f_regression()
f_classif()
chi2()
mutual_info_classif() en mutual_info_regression()