Stats Flashcards

1
Q

What are the 2 names of the Tukey test for outliers

A

Tukey’s Range Test
Tukey’s Honestly Significant Difference (HSD)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 5 steps in the Tukey test

A

1 - Conduct ANOVA
2- Calculate the Critical Value
3 - Compute the Honestly Significant Difference (HSD)
4 - Compare Mean Differences
5 - Interpretation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does ANOVA stand for

A

Analysis of variance test - a method used to compare means across multiple groups or treatments.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is an F-statistic

A

is a ratio of two variances: the variance between group means and the variance within groups. It quantifies the extent to which the variation among group means is greater than the variation within individual groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a p-value

A

indicates the significance of the observed differences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the 7 outputs of the Tukey test

A

1- Comparison Matrix
2 - Mean Differences
3- Tukey’s HSD
4 - Confidence Intervals
5 - Adjusted p-values
6 - Significance Indicators
7 -Summary Statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define Tukey’s HSD

A

The Honestly Significant Difference (HSD) value calculated based on the critical value and the standard error of the means. This value is used to determine whether the observed differences between group means are statistically significant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What Python library supports Tukey

A

statsmodels.stats.multicomp

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

explain Z-Scores

A

indicates how many standard deviations a data point is from the mean of the dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the 4 steps to calculating a Z-Score

A

1 - Calculate the Mean (μ): Find the average of the dataset.
2 - Calculate the Standard Deviation (σ)
3 - Subtract the Mean from the Data Point
4 - Divide by the Standard Deviation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what does a Z-score of 0 indicate

A

Data is at the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what does a Z-score of +1indicate

A

Z-score of +1 (or -1) indicates that the data point is 1 standard deviation above (or below) the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is R Squared

A

represents the proportion of the variance in the dependent variable that is predictable from the independent variable(s) in a regression model. In simpler terms, it indicates how well the independent variable(s) explain the variability of the dependent variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the difference between R Squared and correlation

A

Correlation measures the strength and direction of the linear relationship between two continuous variables.

R-squared represents the proportion of the variance in the dependent variable that is predictable from the independent variable(s).

In summary, correlation tells you how closely related two variables are in a linear sense, while R-squared tells you how much of the variability in the dependent variable is explained by the independent variable(s).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Name 8 methods that a data analyst may use to test equity signals

A

1 Correlation coefficient
2 Scatter plots
3 Covariance
4 Cross-correlation analysis
5 Granger causality test
6 Regression analysis
7 Correlation heatmaps
8 Principal Component Analysis (PCA)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the Correlation coefficient

A

Pearson correlation coefficient is widely used to measure the linear relationship between two signals. Analysts calculate correlation to understand how closely the movements of two signals are related.

17
Q

What are scatter plots

A

Visual representation of the data points from both signals on a scatter plot helps analysts to visually inspect the relationship between the signals. Patterns such as clustering around a line or curve can indicate a relationship.

18
Q

What is Covariance

A

Covariance measures how much two signals vary together. Positive covariance indicates that the signals move in the same direction, while negative covariance indicates they move in opposite directions.

19
Q

what is Cross-correlation analysis

A

This involves analyzing the correlation between the signals at different time lags. It helps in understanding whether one signal leads or lags the other and the strength of this relationship at different time intervals.

20
Q

what is the Granger causality test

A

This statistical hypothesis test is used to determine whether one time series is useful in forecasting another. It helps analysts understand if one signal is a leading indicator of the other.

21
Q

what is Regression analysis

A

regression analysis quantifies the relationship between the two signals. They may use simple linear regression or more complex models to estimate how changes in one signal affect the other.

22
Q

what are Correlation heatmaps

A

These are visual representations of correlation coefficients between multiple signals. Heatmaps help in identifying patterns of correlation among multiple signals simultaneously.

23
Q

What is Principal Component Analysis (PCA)

A

PCA is a dimensionality reduction technique that can be used to identify underlying patterns and relationships between multiple signals. It helps in understanding the dominant sources of variability in the data.