Stats Flashcards
What are the 2 names of the Tukey test for outliers
Tukey’s Range Test
Tukey’s Honestly Significant Difference (HSD)
What are the 5 steps in the Tukey test
1 - Conduct ANOVA
2- Calculate the Critical Value
3 - Compute the Honestly Significant Difference (HSD)
4 - Compare Mean Differences
5 - Interpretation
What does ANOVA stand for
Analysis of variance test - a method used to compare means across multiple groups or treatments.
What is an F-statistic
is a ratio of two variances: the variance between group means and the variance within groups. It quantifies the extent to which the variation among group means is greater than the variation within individual groups.
What is a p-value
indicates the significance of the observed differences
What are the 7 outputs of the Tukey test
1- Comparison Matrix
2 - Mean Differences
3- Tukey’s HSD
4 - Confidence Intervals
5 - Adjusted p-values
6 - Significance Indicators
7 -Summary Statistics
Define Tukey’s HSD
The Honestly Significant Difference (HSD) value calculated based on the critical value and the standard error of the means. This value is used to determine whether the observed differences between group means are statistically significant.
What Python library supports Tukey
statsmodels.stats.multicomp
explain Z-Scores
indicates how many standard deviations a data point is from the mean of the dataset.
What are the 4 steps to calculating a Z-Score
1 - Calculate the Mean (μ): Find the average of the dataset.
2 - Calculate the Standard Deviation (σ)
3 - Subtract the Mean from the Data Point
4 - Divide by the Standard Deviation.
what does a Z-score of 0 indicate
Data is at the mean
what does a Z-score of +1indicate
Z-score of +1 (or -1) indicates that the data point is 1 standard deviation above (or below) the mean
What is R Squared
represents the proportion of the variance in the dependent variable that is predictable from the independent variable(s) in a regression model. In simpler terms, it indicates how well the independent variable(s) explain the variability of the dependent variable.
What is the difference between R Squared and correlation
Correlation measures the strength and direction of the linear relationship between two continuous variables.
R-squared represents the proportion of the variance in the dependent variable that is predictable from the independent variable(s).
In summary, correlation tells you how closely related two variables are in a linear sense, while R-squared tells you how much of the variability in the dependent variable is explained by the independent variable(s).
Name 8 methods that a data analyst may use to test equity signals
1 Correlation coefficient
2 Scatter plots
3 Covariance
4 Cross-correlation analysis
5 Granger causality test
6 Regression analysis
7 Correlation heatmaps
8 Principal Component Analysis (PCA)