10 Introduction to Inferential Statistics Flashcards

1
Q

What is the main focus of the chapter?

A

The chapter focuses on data analysis, specifically t-tests, chi-square, correlations, and simple linear regression analyses

Includes hands-on practice to reinforce understanding.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the three main types of t-tests?

A
  • Independent t-test
  • Dependent t-test
  • One-sample t-test
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does an independent t-test compare?

A

Two groups that are independent of each other

Samples include different people or observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a dependent t-test?

A

A t-test that compares two groups that are inherently related

Example: collecting data from the same group at two different times.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does a one-sample t-test compare?

A

One group against a single value

Example: comparing class test scores against the mean of all test scores.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does a t-test generate that is used to determine statistical significance?

A

A t-score

Typically, the p-value is provided as a result by analytics tools.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the key assumptions for an independent t-test?

A
  • Independence
  • Normality
  • Homogeneity of variance
  • n ≥ 30
  • The independent variable is categorical
  • The dependent variable is numerical
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the assumption of normality in a t-test?

A

The data is normally distributed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does homogeneity of variance mean in the context of a t-test?

A

The variance of both groups is approximately the same.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the significance of having at least 30 observations in each sample for a t-test?

A

It generally leads to better results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the independent variable in a t-test?

A

The variable that is controlled or changed, separating the two groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the dependent variable in a t-test?

A

A numerical variable being compared between the two groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a chi-square goodness of fit test used for?

A

To compare a sample to a population to see if the sample is a good representation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does a chi-square test for independence compare?

A

Two categorical variables to see whether there is a relationship between them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the null hypothesis in a chi-square test for independence?

A

There is no relation between the two variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the assumptions for a chi-square test for independence?

A
  • Both variables are categorical
  • Independence of observations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is a contingency table?

A

A frequency table that looks at more than one variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What programming environment is used for running the t-test in this chapter?

A

Jupyter Notebooks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What Python library is commonly used for statistical tests like t-tests?

A

SciPy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Fill in the blank: A t-test compares two groups that contain _______.

A

[quantitative data]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

True or False: You need to perform a t-test for the exam.

A

False

You need to know the definition and application of a t-test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the purpose of running a t-test on Yorkshire Terriers and Singapura weights?

A

To determine if there is a significant difference in weight between the two breeds.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What does the p-value indicate in a t-test result?

A

The statistical significance of the results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the mean weight of Singapura cats according to the example?

A

6.1 lb

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is the mean weight of Yorkshire Terriers according to the example?

A

5.5 lb

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is the difference in mean weight between Singapura cats and Yorkshire Terriers?

A

0.6 lb

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is the main takeaway from performing t-tests according to the chapter?

A

Understanding how to differentiate t-tests from other analyses.

28
Q

What is the next analysis covered after t-tests in this chapter?

A

Chi-square analysis.

29
Q

What does a chi-square test for independence compare?

A

It compares two categorical variables by analyzing a contingency table.

30
Q

Define a contingency table.

A

A contingency table is another name for a frequency table that looks at more than one variable.

31
Q

List the assumptions of a chi-square test.

A
  • Both variables are categorical
  • Independence of observations
  • Contingency cell exclusivity
  • 80% of cells should have a value of at least 5
  • n ≥ 50
32
Q

What is meant by ‘independence of observations’ in a chi-square test?

A

Each observation needs to be independent of every other observation.

33
Q

Explain contingency cell exclusivity.

A

Each observation is only counted once in the contingency table.

34
Q

What is the minimum requirement for cell values in a chi-square test?

A

80% of the cells should have a count of 5 or more.

35
Q

What is the minimum sample size recommended for a chi-square test?

36
Q

How do you calculate the number of cells in a contingency table?

A

Multiply the number of possible outcomes of one variable by the number of possible outcomes of the second variable.

37
Q

What does the p-value represent in a chi-square test?

A

It indicates whether there is a statistically significant relationship between the variables.

38
Q

What conclusion is drawn if the p-value is larger than 0.05 in a chi-square test?

A

Accept the null hypothesis and reject the alternative hypothesis.

39
Q

Define correlation in statistical terms.

A

A correlation is a relationship between two variables that can be positive or negative.

40
Q

What does a positive correlation indicate?

A

As one variable increases, so does the other.

41
Q

What does a negative correlation indicate?

A

As one variable increases, the other variable decreases.

42
Q

What does it mean when there is no correlation between two variables?

A

What happens to one variable has no impact on the other.

43
Q

True or False: Correlation implies causation.

44
Q

What is the correlation coefficient?

A

A number that indicates the strength of the relationship between two variables, ranging from -1 to 1.

45
Q

What is the significance of the correlation coefficient being close to 1 or -1?

A

The closer the coefficient is to 1 or -1, the stronger the relationship.

46
Q

What type of analysis tests the strength of the relationship between two numerical variables?

A

Correlation analysis.

47
Q

List the assumptions for Pearson’s correlation analysis.

A
  • Level of measurement
  • Linearity
  • Normality
  • Related pairs
  • Lack of outliers
  • n ≥ 30
  • Two continuous variables
48
Q

What is simple linear regression primarily used for?

A

Prediction of one variable based on another.

49
Q

In simple linear regression, what does the x-axis represent?

A

The independent variable (predictor variable).

50
Q

In simple linear regression, what does the y-axis represent?

A

The dependent variable (criterion variable).

51
Q

What does the R² value indicate in regression analysis?

A

It tells how much variance in the dependent variable is explained by the independent variable.

52
Q

What is the null hypothesis in regression analysis?

A

The independent variable is NOT a predictor of the dependent variable.

53
Q

What is the alternative hypothesis in regression analysis?

A

The independent variable IS a predictor of the dependent variable.

54
Q

What is the minimum sample size recommended for correlation analysis?

55
Q

What statistical method is used to determine if the number of nuts fed to squirrels can predict their weight?

A

Simple linear regression

The p-value from the regression analysis was 0.043, indicating a statistically significant prediction.

56
Q

What is the p-value threshold commonly used to determine statistical significance in regression analysis?

A

0.05

A p-value below this threshold suggests that the independent variable can predict the dependent variable.

57
Q

In simple linear regression, which variable is considered the independent variable when predicting height?

A

Weight

Height is the dependent variable in this context.

58
Q

What are the assumptions of simple linear regression? List them.

A
  • Linearity
  • Normality
  • Independence
  • Homoscedasticity
  • The dependent variable is numeric
  • The independent variable is numeric
  • n % 100

Each assumption must be checked to ensure valid results.

59
Q

What does the term ‘homoscedasticity’ refer to in regression analysis?

A

The variance in the residuals remaining constant for different levels of the independent variable

It means that the residuals are evenly spread around the regression line.

60
Q

What is the minimum sample size recommended for simple linear regression?

A

10 observations

However, best practice suggests using at least 100 observations.

61
Q

What does the R-squared value represent in regression analysis?

A

The proportion of variance in the dependent variable explained by the independent variable

An R-squared value of 0.92 means that 92% of the variance in height can be explained by weight.

62
Q

What function is used in Python to create a simple linear regression model?

A

sm.OLS()

This function is part of the statsmodels library in Python.

63
Q

Fill in the blank: If you want to see whether there is a statistically significant difference between two groups using numeric variables, you would use a _______.

64
Q

Which analysis would be appropriate to determine if there is a relationship between two categorical variables?

A

Chi-square test

This test is used for independence between two categorical variables.

65
Q

True or False: Correlation can be used to see whether two numeric variables are related and how strongly they are related.

66
Q

What is the appropriate analysis to predict one numeric variable using another numeric variable?

A

Simple linear regression