10 Introduction to Inferential Statistics Flashcards
What is the main focus of the chapter?
The chapter focuses on data analysis, specifically t-tests, chi-square, correlations, and simple linear regression analyses
Includes hands-on practice to reinforce understanding.
What are the three main types of t-tests?
- Independent t-test
- Dependent t-test
- One-sample t-test
What does an independent t-test compare?
Two groups that are independent of each other
Samples include different people or observations.
What is a dependent t-test?
A t-test that compares two groups that are inherently related
Example: collecting data from the same group at two different times.
What does a one-sample t-test compare?
One group against a single value
Example: comparing class test scores against the mean of all test scores.
What does a t-test generate that is used to determine statistical significance?
A t-score
Typically, the p-value is provided as a result by analytics tools.
What are the key assumptions for an independent t-test?
- Independence
- Normality
- Homogeneity of variance
- n ≥ 30
- The independent variable is categorical
- The dependent variable is numerical
What is the assumption of normality in a t-test?
The data is normally distributed.
What does homogeneity of variance mean in the context of a t-test?
The variance of both groups is approximately the same.
What is the significance of having at least 30 observations in each sample for a t-test?
It generally leads to better results.
What is the independent variable in a t-test?
The variable that is controlled or changed, separating the two groups.
What is the dependent variable in a t-test?
A numerical variable being compared between the two groups.
What is a chi-square goodness of fit test used for?
To compare a sample to a population to see if the sample is a good representation.
What does a chi-square test for independence compare?
Two categorical variables to see whether there is a relationship between them.
What is the null hypothesis in a chi-square test for independence?
There is no relation between the two variables.
What are the assumptions for a chi-square test for independence?
- Both variables are categorical
- Independence of observations
What is a contingency table?
A frequency table that looks at more than one variable.
What programming environment is used for running the t-test in this chapter?
Jupyter Notebooks
What Python library is commonly used for statistical tests like t-tests?
SciPy
Fill in the blank: A t-test compares two groups that contain _______.
[quantitative data]
True or False: You need to perform a t-test for the exam.
False
You need to know the definition and application of a t-test.
What is the purpose of running a t-test on Yorkshire Terriers and Singapura weights?
To determine if there is a significant difference in weight between the two breeds.
What does the p-value indicate in a t-test result?
The statistical significance of the results.
What is the mean weight of Singapura cats according to the example?
6.1 lb
What is the mean weight of Yorkshire Terriers according to the example?
5.5 lb
What is the difference in mean weight between Singapura cats and Yorkshire Terriers?
0.6 lb
What is the main takeaway from performing t-tests according to the chapter?
Understanding how to differentiate t-tests from other analyses.
What is the next analysis covered after t-tests in this chapter?
Chi-square analysis.
What does a chi-square test for independence compare?
It compares two categorical variables by analyzing a contingency table.
Define a contingency table.
A contingency table is another name for a frequency table that looks at more than one variable.
List the assumptions of a chi-square test.
- Both variables are categorical
- Independence of observations
- Contingency cell exclusivity
- 80% of cells should have a value of at least 5
- n ≥ 50
What is meant by ‘independence of observations’ in a chi-square test?
Each observation needs to be independent of every other observation.
Explain contingency cell exclusivity.
Each observation is only counted once in the contingency table.
What is the minimum requirement for cell values in a chi-square test?
80% of the cells should have a count of 5 or more.
What is the minimum sample size recommended for a chi-square test?
n ≥ 50.
How do you calculate the number of cells in a contingency table?
Multiply the number of possible outcomes of one variable by the number of possible outcomes of the second variable.
What does the p-value represent in a chi-square test?
It indicates whether there is a statistically significant relationship between the variables.
What conclusion is drawn if the p-value is larger than 0.05 in a chi-square test?
Accept the null hypothesis and reject the alternative hypothesis.
Define correlation in statistical terms.
A correlation is a relationship between two variables that can be positive or negative.
What does a positive correlation indicate?
As one variable increases, so does the other.
What does a negative correlation indicate?
As one variable increases, the other variable decreases.
What does it mean when there is no correlation between two variables?
What happens to one variable has no impact on the other.
True or False: Correlation implies causation.
False.
What is the correlation coefficient?
A number that indicates the strength of the relationship between two variables, ranging from -1 to 1.
What is the significance of the correlation coefficient being close to 1 or -1?
The closer the coefficient is to 1 or -1, the stronger the relationship.
What type of analysis tests the strength of the relationship between two numerical variables?
Correlation analysis.
List the assumptions for Pearson’s correlation analysis.
- Level of measurement
- Linearity
- Normality
- Related pairs
- Lack of outliers
- n ≥ 30
- Two continuous variables
What is simple linear regression primarily used for?
Prediction of one variable based on another.
In simple linear regression, what does the x-axis represent?
The independent variable (predictor variable).
In simple linear regression, what does the y-axis represent?
The dependent variable (criterion variable).
What does the R² value indicate in regression analysis?
It tells how much variance in the dependent variable is explained by the independent variable.
What is the null hypothesis in regression analysis?
The independent variable is NOT a predictor of the dependent variable.
What is the alternative hypothesis in regression analysis?
The independent variable IS a predictor of the dependent variable.
What is the minimum sample size recommended for correlation analysis?
n ≥ 30.
What statistical method is used to determine if the number of nuts fed to squirrels can predict their weight?
Simple linear regression
The p-value from the regression analysis was 0.043, indicating a statistically significant prediction.
What is the p-value threshold commonly used to determine statistical significance in regression analysis?
0.05
A p-value below this threshold suggests that the independent variable can predict the dependent variable.
In simple linear regression, which variable is considered the independent variable when predicting height?
Weight
Height is the dependent variable in this context.
What are the assumptions of simple linear regression? List them.
- Linearity
- Normality
- Independence
- Homoscedasticity
- The dependent variable is numeric
- The independent variable is numeric
- n % 100
Each assumption must be checked to ensure valid results.
What does the term ‘homoscedasticity’ refer to in regression analysis?
The variance in the residuals remaining constant for different levels of the independent variable
It means that the residuals are evenly spread around the regression line.
What is the minimum sample size recommended for simple linear regression?
10 observations
However, best practice suggests using at least 100 observations.
What does the R-squared value represent in regression analysis?
The proportion of variance in the dependent variable explained by the independent variable
An R-squared value of 0.92 means that 92% of the variance in height can be explained by weight.
What function is used in Python to create a simple linear regression model?
sm.OLS()
This function is part of the statsmodels library in Python.
Fill in the blank: If you want to see whether there is a statistically significant difference between two groups using numeric variables, you would use a _______.
T-test
Which analysis would be appropriate to determine if there is a relationship between two categorical variables?
Chi-square test
This test is used for independence between two categorical variables.
True or False: Correlation can be used to see whether two numeric variables are related and how strongly they are related.
True
What is the appropriate analysis to predict one numeric variable using another numeric variable?
Simple linear regression