preprocessing Flashcards
What is the main goal of the Savitzky-Golay filter?
To smooth data while preserving features like peaks and edges better than simple moving averages.
How does the Savitzky-Golay filter compute smoothed values?
By fitting a low-degree polynomial to a subset of the data points (window) using least squares and evaluating the polynomial at the central point.
Do the Savitzky-Golay filter coefficients depend on the data?
No, the coefficients depend only on the window size and the degree of the polynomial.
What does mutual information measure?
The amount of information shared between two variables.
How is mutual information related to entropy?
I(X;Y)=H(X)+H(Y)βH(X,Y), where H represents entropy.
What is the range of mutual information values?
Non-negative, I(X;Y)β₯0, and it is
0 if X and Y are independent.
How is mutual information used in feature selection?
By ranking features based on their MI with the target variable to select the most informative ones.
What does the Chi-Square test measure?
The difference between observed and expected frequencies in categorical data.
What is the Chi-Square statistic formula?
π^2= β(ππβπΈπ)^2/πΈπ
where ππ is the observed frequency and πΈπ is the expected frequency.
What is the purpose of the Chi-Square test in feature selection?
To assess the independence of a feature and the target variable.
What are the assumptions of the Chi-Square test?
Data must be categorical, and expected frequencies should generally be greater than 5.
What does the correlation coefficient measure?
The strength and direction of the linear relationship between two variables.
What is the range of the correlation coefficient?
[β1,1], whereβ1 indicates a perfect negative correlation,
1 indicates a perfect positive correlation, and 0 indicates no linear relationship.
What is the formula for the Pearson correlation coefficient?
r = (SUM(xi - x)(yi - y))/(sqrt(SUM(xi - x)^2 * SUM(yi - y)^2)
What is the goal of PCA?
To reduce the dimensionality of data while retaining as much variance as possible.