STATISTICS QUALI (PART 2) Flashcards

1
Q

CHAPTER 5
___ occurs when the null hypothesis is incorrectly rejected when it is actually true. (False positive). It means concluding that there is an effect or difference when in fact there isn’t.
• The probability of making this error is denoted by the significance level.
• The area under the curve beyond the critical value represents the probability of a type 1 error

EX : Imagine a pharmaceutical company testing a new drug to determine if it is more effective than a placebo. The result shows a p-value of 0.04 and the significance level is set at 0.05.
H0 : The new drug is no more effective than the placebo
H1: The new drug is more effective than the placebo
Decision: The null hypothesis is rejected
• if the null hypothesis is actually true (the drug is not more effective) this conclusion is a type 1 error

A

Type 1 error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

CHAPTER 5

___ Occurs when a researcher reject the null hypothesis that is actually false. (False negative)
• The probability of making this error is denoted by beta (b)
• The power of a test is the probability of correctly rejecting a false null hypothesis. It is calculated as (1-\beta)
• “ Higher power reduces the likelihood of a type 2 error”
• The area under the curve that represents the probability of a type 2 error is typically on the opposite side of the critical region for a type 1 error

A

Type 2 error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

CHAPTER 5 (MAKING SENSE OF STATISTICAL SIGNIFICANCE)

____ a quantitative measure of the magnitude of the difference between groups or the strength of the relationship between variables. Unlike p-values, which only tell you wether an effect exists, “ It tells how large that effect is, providing a sense of its practical significance”
• Indicates the size of the effect not just its existence
• Unlike p-values, effect sizes are not influenced by the sample size
• In Cohen’s D, it is used to measure the difference between two means in terms of standard deviation
• In Pearson’s r, it is used to measure the strength and direction of the relationship between two variables
• Used in ANOVA to measure the proportion of total variance that is attributed to an effect

A

Effect size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

CHAPTER 5 MAKING SENSE OF STATISTICAL SIGNIFICANCE
___ Help to interpret the magnitude of an effect in a study. Also known as Cohen’s convention
1. COHEN’S D
• small effect - .20
• Medium effect - .50
• Large effect - .80
2. PEARSON’S R
• Small effect - 0.1
• Medium effect - 0.3
• large effect - 0. 5
3. ETA SQUARE (Measuring the proportion of total variance attributed to an effect in ANOVA)
• Small effect - 0.01
• Medium effect - 0.06
• Large effect - 0.14
4. COHEN’S F (Measuring effect size in ANOVA )
• Small effect - 0.10
• Medium effect - 0.25
• Large effect - 0.40
5 COHEN’S W ( measuring effect size in chi-square test)
• Small effect - 0.10
• medium effect - 0.30
• Large effect - 0.50

A

Effect size conventions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

CHAPTER 5
___ used to combine the results of multiple studies that address a similar research question. Aims to provide a more precise estimates of the effect size and resolve inconsistencies among individual studies.
• Involves systematically reviewing and statistically combining results from different studies to draw general conclusions about a specific research question
• Uses the effect size to calculate a weighted average effect size, providing a more precise estimate of the overall effect.
• Synthesizes data from MULTI studies to calculate a combines effect size
• Increase the statistical power and precision of the estimates by pooling data

A

Meta analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

CHAPTER 5 MAKING SENSE OF STATISTICAL SIGNIFICANCE

___ probability that a test will correctly reject a null hypothesis. It helps determine the likelihood of avoiding a TYPE 2 ERROR (failing to detect a true effect)
• Larger sample size increases power
• Larger effect size increases power
• Higher significance levels increase power but also increase the risk of Type 1 error
• Low variability Increases power
• Power analysis is used to determine the necessary sample size to achieve a desired power level, typically 80%

A

Power

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

CHAPTER 5 MAKING SENSE OF STATISTICAL SIGNIFICANCE

____ tools used in statistical power analysis to determine the sample size needed to achieve a desired level of power for a given effect size and significance level.
• Helps to determine the number of participants needed to achieve a desired power level for a given effect size and significance level
• Helps balance the risk of Type 1 and type 2 error by adjusting sample size, effect size and significance level.

A

Power table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

WHAT DETERMINES THE POWER OF A STUDY
1. ____ Increases power because it provides more information about the population, reducing the standard error making it easier to detect a true effect.
2. ___ Easier to detect, increasing the power of the study
3. ___ Increases power but also increases the risk of Type 1 error (false negative - set 0.05)
4. ____ using matched pairs, repeated measures or other designs that control for extraneous variables can increase power
5. ____ more powerful if the direction of the effect is correctly Specifed because the critical region is concentrated in one tail
6. ____ less powerful because the critical region is split between two tails, but it is more conservative and less prone to type 1 error

A

Larger sample size
Larger effect size
High significance level
Study design
One tailed
Two tailed test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

CHAPTER 5 MAKING SENSE OF STATISTICAL SIGNIFICANCE

____ refers to the real world importance or relevance of a research, beyond just its statistical significance.
• It considers the size of the effect. Even if the result is statistically significant, it might not be practically significant if the effect size is too small to matter in real-world applications
• Studies with larger effect sizes require smaller sample sizes to achieve a high power.

A

Practical intelligence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

CHAPTER 5 MAKING SENSE OF STATISTICAL SIGNIFICANCE

___ hypothesis test used to compare the means of two groups. It helps determine whether there is a significant difference between the group’s means, which can indicate an effect or relationship.
• If > the critical value,fail to reject the null hypothesis, if less than < or equal to the critical value, reject the null hypothesis
• Use Non-parametric test like the MANN-WHITNEY U TEST if the data does not meet the normality
• Use Welch’s T-test, if the variance is unequal

A

T-test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

CHAPTER 5 MAKING SENSE OF STATISTICAL SIGNIFICANCE

___ used to determine whether the mean of a single sample is significantly different from a KNOWN or POPULATION MEAN.
• Particularly useful when you want to compare the sample mean to a specific value
—— ASSUMPTIONS —
1. Data normally distributed (especially for sample sizes)
2. Observations should be independent
3. Measured on interval or ratio scale

• If non-normal distribution, use Non-parametric test like Wilcoxon signed rank test

EX : Testing if the average height of a sample of students is different from the national average height.

A

One sample t-test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

CHAPTER 5 MAKING SENSE OF STATISTICAL SIGNIFICANCE

___ represent the number of independent pieces of information available to estimate another piece of information.
• Often calculated as the sample size minus the number of parameters instead.
• They affect the shape of the sampling distribution and the precision of parameter estimates
• T-TEST - Used to compare means between groups. It affect the critical t- value
• CHI-SQUARE- calculated based on the number of categories
• ANOVA - used to determine the F- distribution
• SIMPLE REGRESSION - (n-2), n is the number of observation
• MULTIPLE REGRESSION- (n-k-1), k is the number of predictors
• GOODNESS OF FIT - (K-1) k, is the number of categories
• TEST OF INDEPENDENCE - (r-1) (c-1), where r is the number of rows and c is the number of columns

EX : suppose you have a sample of 10 students and you want to calculate the mean daily calcium intake.
df= n-1 = 10-1 = 9

A

Degrees of freedom

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

CHAPTER 5 MAKING SENSE OF STATISTICAL SIGNIFICANCE

___ a type of probability distribution that is symmetric and bell shaped
• Particularly useful when dealing with small sample sizes or when the population standard deviation is unknown
• The shape of the distribution depends on the degrees of freedom, which are related to sample sizes. As the degrees of freedom increases, the t-distribution approaches the normal distribution

A

T-distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

CHAPTER 5 MAKING SENSE OF STATISTICAL SIGNIFICANCE

___ a standardized test statistic used in hypothesis testing particularly when dealing with small sample sizes or when the population standard deviation is unknown.
• A large ___ indicates a greater differences between the sample mean and the population mean.
• A smaller indicates a smaller difference

A

T-score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

CHAPTER 5 MAKING SENSE OF STATISTICAL SIGNIFICANCE

___ used to compare the means of two related groups.
• Particularly useful when the same subjects are measured under two different conditions or at two different times.
• Use, when you have paired data, such as measurement taken from the same subjects before and after a treatment
• When data is normally distributed
• Use Non-parametric test like Wilcoxon signed rank test if the difference do not meet the normality assumption

EX : suppose you want to test wether a new teaching method improves student performance. You measure the test scores of 10 students before and after using the new method.

A

T-test for dependent samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

CHAPTER 5 MAKING SENSE OF STATISTICAL SIGNIFICANCE

___ The ability of a statistical method to remain effective even when certain assumptions are violated or when there are small deviations from ideal conditions.
• Robust methods are less effective by outliers or extreme values. ( Ex: The median is robust measure of central tendency because it is not influenced by extreme values unlike mean)
• Robust measures perfrom well even when assumptions (normality, homoscedasticity) are not fully met.
- Non-parametric test like Mann Whitney U test are robust alternative when the assumptions of normality is violated
• Robust regression analysis such as (LAD) are used when data contain outliers

A

Robustness

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

CHAPTER 5 MAKING SENSE OF STATISTICAL SIGNIFICANCE
—– T-TEST FOR INDEPENDENT SAMPLES —
____ a type of experimental design where different participants are assigned to each condition or group.
• Participants are divided into separate groups with each group experiencing a different condition or level of independent variable
EX : In a study testing a new drug, one group receives the drug, while another group receives a placebo.
• Variability between participants can affect the results, though random assignment helps mitigate this.

A

Between-subjects design

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

CHAPTER 5 MAKING SENSE OF STATISTICAL SIGNIFICANCE
____ compares the means of two independent groups to see if they are significantly different from each other.
• Population variance is not known
• When dependent variable is continous (test scores, weight)
• When you have two independent, unrelated groups (diff participants in each group)
• When the data is approximately normally distributed
• When the variances of the two groups are equal (homogeneity of variance) - can be tested using levene’s test

A

T-test for independent samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

CHAPTER 5 MAKING SENSE OF STATISTICAL SIGNIFICANCE

___ a non parametric test used to compare difference between two independent groups when the sample distributions are not normally distributed. It is an alternative to the independent samples t,-test and is particularly useful for ordinal data or when the assumptions of the t-test are not met.
• Data are not normally distributed
• When the data are ordinal
• Sample sized are small (less 30)
• If the u is less than the critical value, reject the null hypothesis

A

Mann Whitney U test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

CHAPTER 5 INTRODUCTION TO ANALYSIS OF VARIANCE

___ a statistical method used to compare the means of three or more groups to determine if thetr are any statistically significant difference between them. It helps in understanding wether the variation in a data is due to the independent variable or by chance.
• It allows researcher to compare more than two groups simultaneously, which is more efficient than concluding multiple t-test
• It partitions the total variance between groups and variance within groups, helping to identify the source of variability.
• Helps in understanding the effect of different treatments or conditionds on a dependent variable
• Tests the null hypothesis that all group means are equal against the alternative hypothesis that atleast one group mean is different.
• Conduct a post hoc test to determine which specific groups differ from each other.

A

ANOVA

21
Q

CHAPTER 5 INTRODUCTION TO ANOVA

____ used to determine whether there are significant difference between the means of three or more groups.
• Helps in testing the null hypothesis that all group means are equal.
• Used in ANOVA to determine if at least one group mean is different from others
• It partitions the total variance into variance due to the treatment (between groups) and variance due to error (within groups)
• Helps in understanding the sources of variability in the data
• DEGREES OF FREEDOM - It has two sets of degrees of freedom; one for the numerator(between groups) and one for the denominator (within groups)

A

F ratio

22
Q

CHAPTER 5 ANOVA

BASIC LOGIC OF ANOVA
• Use ANOVA, when you have more than two groups, using multiple t-test increases the risk of TYPE 1 error (false positive) ANOVA helps control this error rate by testing all groups simultaneously.
• NULL HYPOTHESIS (H0) - All groups are equal
• ALTERNATIVE HYPOTHESIS (H1) - At least one group mean is different.
• If the between group variance is significantly larger than the within group variance. The F-statistic will be large, suggesting that at least one group mean is different

• If the F statistics is greater than the critical value, reject the null hypothesis, indicating that atleast one group mean is different
• If the F statistics is less than the critical value, fail to reject the null hypothesis, indicating no significant difference between the group means

A
23
Q

CHAPTER 5 INTRODUCTION TO ANOVA

____ a measure of effect size used in the context of ANOVA to indicate the proportion of the total variance in the dependent variable that is associated with the independent variable.
• Values range from 0 to 1
• small effect - n²≈ 0.01
• Medium effect - n² ≈ 0.06
• Large effect - n²≈ 0.14

A

Eta square (n²)

24
Q

CHAPTER 4 INTRODUCTION TO ANOVA

____ Conducted after an ANOVA to determine exactly which group means are significantly different from each other.
• When ANOVA results indicate a significant difference among groups means, it help identify which specific groups differ.
• Useful when comparing three or more groups to control for the increased risk of TYPE 1 error

A

Post hoc comparison

25
Q

CHAPTER 5 INTRODUCTION TO ANOVA

____ The impact of one independent variable on a dependent variable, ignoring the effects of other independent variable.
• measures the influence of one independent variable independently of other variables.
• It can be additive, meaning the total effect is the sum of individual effects if there are no interaction.
• Confounding variable - (solution) include control variables in study to isolate the main effect of interest.

EX: consider a study examining the effects of tutoring and extra homework on students math scores. ( Tutoring and extra homework are independent variable) Math scores are dependent variable. The main effect would be:
• The effect of tutoring on math scores
• The effect of extra homework on math scores

A

Main effect

26
Q

CHAPTER 5 INTRODUCTION TO ANOVA
____ occurs when the effect of one independent variable on a dependent variable changes depending on the level of another independent variable. In other words the impact of one factors depends on the presence or level of another factor.
• The effect of one variable depends on the level of another variable
• This can complicate the interpretation of main effect
• The combined effect of two variables is not simply the sum of their individual effects.

EX ; Consider a study examining the effects of study environment (quiet vs noisy) and study method (group study vs individual study) on test performance. A study environment and study method are independent variables and test performance is the dependent variable. An interaction would occur if the effect of the study method on test performance differs depending on wether the environment is quiet or noisy.
• Two levels of study environment (quiet vs noisy)
• Two levels of study method (group study vs individual study)
– if students perform better in a quiet environment when studying alone but perfrom worse in a noisy environment when studying in a group, this indicates an interaction between study environment and study method.

A

Interaction

27
Q

CHAPTER 5 CORRELATION AND PREDICTION
____ a non-experimental research method used to examine the relationship between two or more variables without manipulating them. This design helps researchers understand wether and how variables are related.
• Examines the strength and direction of relationship between variables
• Uses a numerical value (ranging from -1 to +1 to represent the strength and direction of relationship.
• CAUSALITY - Use experimental designs to test causality if a strong correlation is found.
• THIRD VARIABLES - Include potential confounding variables in the analysis to isolate the relationship of interest

EX : Consider a study investigating the relationship between stress levels and sleep quality among college students. Researcher might measure students stress level using a questionnaire and their sleep quality using a sleep diary. The study would then analyze wether higher stress levels are associated with poorer sleep quality.
• Positive: Both variables increases or decreases together
• Negative: one variable increases while the other decreases
• No : No consistent relationship between variables

A

Correlational design

28
Q

CHAPTER 5 CORRELATION AND PREDICTION

____ A statistical measure that describes the extent to which two variables are related. It indicates wether an increase or decrease in one variable corresponds to an increase or decrease in another variable.
• Scatter plots are often used to visualize the relationship between variables.
• The pattern of points can indicate the type and strength of the correlation
• CORRELATION VS CAUSATION - Correlation does not imply causation. Just because two variables are correlated does not mean one causes the other. (Conduct experiments to test causal relationship if a strong correlation is found)

A

Correlation

29
Q

CHAPTER 5 CORRELATION AND PREDICTION
___ a type of graph used to display the relationship between two quantitative variables. Each point represents a pair of values for the two variables.
• It shows data for two variables
• helps in identifying the type and strength of the relationship between variables.
• Points trend upwards from left to right, indicating that as one variable increases, so does the other.
• Point trends downwards from left to right, indicating that as one variable increases the other decreases.

A

Scatter diagram

30
Q

CHAPTER 5 CORRELATION AND PREDICTION
___ a numerical measure that quantifies the strength and direction of the relationship between two variables. It is often represented by the letter (r) and ranges from -1 to +1
• Closer to one (+/-) is stronger
• OUTLIERS -extreme values can distort the correlation coefficient (use methods that minimize the impact of outliers such as Spearman’s RHO)

–TYPES;
1. PEARSON’S R - measures the linear relationship between two continous variables
2. SPEARMAN’S RHO - measures the rank order relationship between two variables, useful for ordinal data or non linear relationship
3. KENDALL’S TAU-B - often used for smaller and sample sizes

EX: Explore variables such as stress levels and sleep quality. A negative correlation coefficient (-0.65) would suggest that the higher the stress levels are associated with poorer sleep quality

A

Correlation coefficient

31
Q

CHAPTER 5 CORRELATION AND PREDICTION
___ (Pearson r ) a measure of the strength and direction of the linear relationship between two continous variables
• Measures only linear relationship between variables
• The correlation between X and Y is the same as the correlation between Y and X
• It is also used to identify potential predictors in regression analysis.

• Pearson’s only measures linear relationship, if the relationship is non-linear consider using spearman’s RHO

A

PEARSON’S product moment correlation

32
Q

CHAPTER 5 CORRELATION AND PREDICTION
____ Refers to wether the observed correlation between two variables is statistically significant, meaning it is unlikely to have occured by chance. Typically tested using hypothesis test.
• To test the significance you calculate the p-value. If the p-value is less than a chosen significance level (0.05) you reject the null hypothesis, indicating that the correlation is statistically significant

A

Significant of r

33
Q

CHAPTER 5 CORRELATION AND PREDICTION

____ measures the proportion of the variance in the dependent variable that is predictable from the independent variable. It is a square of the Pearson correlation coefficient and ranges from 0 to 1
• 0 - the model does not explain Any of the variance in the dependent variable
• 1 - the model explains all the variance in the dependent variable.
• The higher (R²) values indicate a better fit of the model to the data.
• A very high R² might indicate overfitting, where the model is too closely fitted to the specific data set. (Use cross validation techniques to ensure the model generalized well to a new data)

EX : consider a study examining the relationship between hours of study and exam scores. If the (R²) value is 0.72, this means that 72% of the variance in exam scores can be explained by the hours of study.
• High R² indicates that a large proportion of the variance in the dependent variable is explanained by the independent variable

A

Coefficient of determination ( R²)

34
Q

CHAPTER 5 CORRELATION AND PREDICTION
——- ISSUES INTERPRETING THE R—
1. ASSUMING CAUSATION
A significant correlation does not imply that one variable causes a change in another. Correlation indicates that two variables are relatively. ( Ex: Finding a high correlation between ice cream sales and drowning incidents does not mean ice cream causes drowning, both might be related to a third variable such as hot weather) - use experimental design to test for causation and consider potential confounding variables.
2. OVERLOOKING NON-LINEAR RELATIONSHIP
Pearson’s r only measures linear relationship. Non linear relationship can be missed (EX : The relationship between stress and performance might be curvilinear, where moderate stress improves performance but too much or too little stress reduces it) - Use SCATTER PLOT to visualize the data and consider other correlation like spearman’s RHO for non-linear relationship

A
35
Q

CHAPTER 5 CORRELATION AND PREDICTION
——- ISSUES INTERPRETING THE R—
3. IMPACT OF OUTLIERS
Outliers can significantly distort the correlation coefficient, making it either higher or lower than it should be (EX: In a study of relationship between study hours and exam scores, a student who studied excessively but performed poorly due to illness could skew the results.) - Identify and analyze the outlier, use statistical robust measure to minimize their impact
4. RANGE RESTRICTIONS
restricting the range of data can lead to underestimating the true correlation. (EX : Studying the relationship between height and weight only among professional basketball players might show a weaker correlation than in the general population)

A
36
Q

CHAPTER 5 PREDICTION

____ used to examine the relationship between one dependent variable and one or more independent variable. It helps in understanding how the dependent variable changes when any one of the independent variable is varied while the other independent variable is held fixed
• Dependent variable (Y) - the outcome or variable you are trying to predict or explain
• Independent variable (X) - The predictors or variable you use to predict the dependent variable
—- PROBLEMS & SOLUTION
1. MULTICOLLINEARITY - when independent variable are highly correlated, it can distort the result ( Use VIF to detect multicollinearity and consider removing or combining variables)
2. OVERFITTING - a model is too complex may fit the training data well but perform poorly on new data (Use cross validation techniques to ensure the model generalizes well to new data)

A

Regression

37
Q

CHAPTER 5 PREDICTION
____ tools and techniques used to evaluate the validity and reliability of s regression model. They help identify any violations of the assumptions underlying the model, detect outliers and assess the overall fit of the model.
• RESIDUALS PLOTS- Scatter plots of residuals versus predicted values to check for non-linearity and heteroscedasticity ( If the residuals are randomly scattered around zero (0), it suggest that the linearity and homosexuality assumptions are met)

• Q-Q plots - plots to check if residuals are normally distributed ( if the residuals follow a straight line, it indicates that they are normally distributed)

• VIF - measures the multicollinearity among independent variables ( A VIF value greater than 10 indicates significant multicollinearity, suggesting that some independent variables are highly correlated)

• LEVERAGE & INFLUENCE - high leverage points can disproportionately effect the regression line. Cook’s distance is often used to identify influential points.

A

Regression diagnostic

38
Q

CHAPTER 5 PREDICTION

PROBLEMS AND SOLUTION TO REGRESSION DIAGNOSTICS
• NON LINEARITY - apply transformation (log, square root) to the variables to achieve linearity
• HETEROSCEDASTICITY - The variance errors is not constant ( use weighted least squares regression to handle heteroscedasticity)
• MULTICOLLINEARITY - independent variable are highly correlated ( Remove or combine highly correlated variables to reduce multicollinearity)

A
39
Q

REGRESSION DIAGNOSTICS
1. Q-Q PLOT
- Check if residuals follow a normal distribution
2. SHAPIRO WILK - TEST
- A formal statistical test for normality
2.1 . BREUSCH-PAGAN TEST
- Test for homoscedasticity
3. DURBIN-WATSON TEST
- Ensure residuals are independent
4. COOK’S DISTANCE
- Identify any influential observations that might skew the results
5. VIF
- check for multicollinearity among predictors
6 LEVERAGE
Measures how far an observation is from the mean of the predictor variables.

A
40
Q

CHAPTER 5 PREDICTION
____ Are statistical method used when the data doesn’t meet the assumptions required.
• No assumptions about distribution
• Ordinal or nominal data, but can also be applied to interval or ratio data that do not meet the parametric assumptions
• Robust to outliers and skewed data
—COMMON TEST —-
1. ___ compares the difference between two independent groups
(Ex: comparing the test scores of two different teaching method)
2. ___ compares difference between two related groups
(Ex: Comparing pre-test scores of the same group of students)
3. ___ compares difference between three or more independent groups
(Ex: comparing customer satisfaction rating across multiple stores)
4. ___ measures the strength and direction of association between two ranked variables
(Ex: assessing the relationship between ranks in two different exams)

A

Non parametric test
Mann Whitney U test
Wilcoxon signed rank test
Kruskal wallis H test
Spearman’s RHO

41
Q

CHAPTER 5 PREDICTION
___ a type of statistical test that makes certain assumptions about parameters of the population from which sample is drawn.
• Data should be interval or ratio scale
• Homogeneity of variance - variances within each group being compared should be approximately equal.
COMMON :
1. T-test : Used to compare the means of two groups
• Independent t-test - Compares means between two independent groups
• Paired t-test - Compares means within the same group at different times
2. ANOVA : Used to compare means among three or more groups
3. Pearson’s correlation: Measures the strength and direction of the linear relationship between two continous variables
4. Regression analysis: Examines the relationship between a dependent variable and one or more independent variable

A

Parametric test

42
Q

CHAPTER 5 CHI-SQUARE STATISTICS

___ Refers to the actual count of occurrences of a specific event or category in data.
• It is the number of tumes an event is recorded during an experiment or study.
• These are the counts you collect from your data. (Ex: if you are studying the color preferences of 100 people and 30 people prefer red, the (f_0) for red is 30
• Chi-square test can be unreliable if expected frequencies are too small, (combine categories to ensure that expected frequencies are sufficiently large, typically atleast 5 to meet the assumptions of the chi-square test)
• EX : (birds) House sparrow : 15, House finch : 12

A

Observed frequency

43
Q

CHAPTER 5 CHI-SQUARE STATISTICS

___ theoretical count of occurrences that we expect to observe in an experiment or study.
• They are calculated based on the null hypothesis

EX : (CHI- GOODNESS OF FIT) If a shop owner expects an equal number of customers each day of the week and records 250 customers in a week, the (f_e) is?
1/7 x 250 = 35.71

EX: (CHI INDEPENDENCE)
In a study examining the association between gender and political party preferences, if the total number of male respondents is 230 the total number of republican respondents is 250 and the total number of respondents is 500 the (f_e) is ?
Eíj= (230×250)/500= 115

A

Expected frequency

44
Q

CHAPTER 5 CHI-SQUARE STATISTICS

____ Non parametric statistical test used to determine whether there is a significant association between categorical variables. It compares the observed frequencies in each category to the frequencies expected under a specific hypothesis.
• It measures the discrepancy between the observed and expected frequencies. A larger (chi²) value indicates a greater discrepancy suggesting that the observed frequencies differ significantly from the expected frequencies.

• CHI GOODNESS OF FIT: Test wether the observed frequency distribution of a single categorical variable matches an expected distribution
• CHI TEST OF INDEPENDENCE: Test wether two categorical variables are independent of each other by comparing the observed frequencies in a contingency table to the expected frequencies

A

Chi square statistics

45
Q

CHAPTER 5 CHI-SQUARE STATISTICS
___ continous probability distribution that arises in statistics when a sum of the squares of independent standard normal random variables is calculated.
• The shape of the chi-square distribution is determined by the degrees of freedom (k). The degree of freedom typically represent the number of independent variables being summed.
• The chi -square distribution is always non-negative because it is based on squared values.
• The distribution is positively skewed, especially for lower degrees of freedom. As the degrees of freedom increases the distribution becomes more symmetrical
• The distribution changes with the degrees of freedom. For small (k) the distribution is highly skewed; for larger (K) it approaches a normal distribution.
• DF : Ex (5-1= 4)
• The mean of a chi-square distribution is equal to the degrees of freedom ((k))
• The variance is 2k
• The distribution is positively skewed, especially for low degrees of freedom

A

Chi square distribution

46
Q

CHAPTER 5 CHI-SQUARE STATISTICS
___ these values are usef to determine whether the observed data significantly deviates from expected data under given hypothesis.
HOW TO USE?
1. DF - For goodness of fit (df) is the number of categories minus one. For a test of independence df is calculated as (number of rows (r) -1 × (number of columns (c) -1)
2. Choose alpha (0.05)
3. Locate the intersection of the df row and the significance level column in the table to find the critical value

A

Chi square table

47
Q

CHAPTER 5 CHI-SQUARE STATISTICS

___ also known as (cross tabs) A type of table in matrix format that displays the frequency distribution of variables. Used to analyze the relationship between two or more categorical variables.
• ROWS : represents one variable (EX: gender, male or female)
• COLUMN: ( EX: Represent computer, MAC OR PC)
• Marginal tools - The totals for each row and column (ex: 50 males and 50 females participated in the study)
• Cell frequencies - The counts within each cell ( Ex: 30 males prefer MAC, 20 males prefer PC)

A

Contingency table

48
Q

CHAPTER 5 CHI-SQUARE STATISTICS

___ (mean of square contingency coefficient) measure of association between two binary variables
• It quantifies the strength and direction of the relationship between these variables, similar to the pearson correlation coefficient but specificy for Binary data.
1. RANGE - -1 to +1
• + 1 perfect positive association
• -1 perfect negative association
• 0 no association
2. SYMMETRY - The phi coefficient is symmetrical meaning it does not matter which variable is placed in the rows or columns of the contingency table

FORMULA : ø= AD-BC/ √(A+B) (C+D) (A+C) (B+D)
• (A) (B) (C) (D) are the frequencies in a 2x2 contingency table

A

Phi coefficient

49
Q

CHAPTER 5 CHI-SQUARE STATISTICS

___ a measure of association between two nominal variables. Ranging from 0 to 1
• 0 indicates no association
• 1 indicates perfect association
• It is symmetrical, meaning it does not matter which variable is placed in the rows or columns
• Suitable for larger contingency (more than 2x2 ), it is equivalent to the phi coefficient.

A

Cramer’s phi