statistical analysis Flashcards

1
Q

Independent Samples t-test

Definition and Assumptions

A

Definition:
Determine if there is a significant difference between the means of two independent groups.

Assumptions:
Data from each group are independent.
Data are approximately normally distributed.
The variances of the two groups are approximately equal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Analysis with t-test

Analysis and Interpretation

A

compare the p-value with the chosen significance level (usually 0.05).

If the p-value is less than the significance level (e.g., p < 0.05), it suggests that there is a significant difference.

If the p-value is greater than the significance level (e.g., p ≥ 0.05), there is insufficient evidence to conclude a significant difference.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Repeated sample t-test

A

Definition: tests for a significant difference between the means of related groups, where each subject is measured at two or more time points or conditions.

math skills over time: beginning (time 1) and end (time 2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Mean

A

Average of a set of numbers, calculated by summing all the values and dividing by the total count.

Calculation: Mean = sum of all values / total count

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Median

A

The middle value of a dataset when arranged in ascending order
Representative of Central Value (especially when the data is skewed or contains outliers)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Mode

A

The value that appears most frequently in a dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Histogram

A

A graph showing how often different values occur in a dataset. It’s like splitting data into groups and counting how many values fall into each group.
Helps us see if data is skewed, has outliers, or follows a specific pattern, like a bell curve for normal distribution.

Helps us understand how data is spread out

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Histograms and Normality

A

A normal distribution looks like a symmetric, bell-shaped curve. It means most data points are in the middle, tapering off towards the ends.
This shape indicates that data is evenly spread around the average, making it easier to predict outcomes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Normality

A

data is symmetrically distributed around the mean, with the majority of values clustered near the center and fewer values spread out towards the tails.

assessed visually using histograms, Q-Q plots, or box plots

t-tests, ANOVA, and regression, rely on the assumption of normality

more representable and valid

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Sample Size and Noramlity

A

As the sample size increases, the variability of sampling distribution decreases. Also, as the sample size increases the shape of the sampling distribution becomes more similar to a normal distribution regardless of the shape of the population.

Need atleast 30 participants

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Q-Q Plot Graph

A

Q-Q plot compares our data to a perfect “normal” dataset. It plots how our data points stack up against the ideal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Normality and Q-Q Plot Graph

A

Picture two sets of dots on a graph. If they make a straight line, our data is “normal”. If they curve or stray, it’s not.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Box Plot

A

displays the five-number summary of a set of data. The five-number summary is the minimum, first quartile, median, third quartile, and maximum

insights into the variability and central tendency of the data, as well as the presence of outliers across groups or conditions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Scatterplot

A

the relationship between two continuous variables, with each data point representing an observation.

allows for the identification of patterns, trends, or correlations between variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Bar Chart

A

representation of categorical data, where the height or length of each bar represents the frequency or proportion of observations in each category.

facilitates comparisons between categories and visualizes differences in frequencies or proportions across groups or conditions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Plot graph

A

individual data points are connected by straight lines, typically used to show trends or changes over time.

longitudinal data or changes in variables across different time points or conditions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Degrees of Freedom

A

tell us how much data can vary without messing up our calculations (how free it is).

df = n1 + n2 - 2 (the number of groups minus 1)

crucial in determining the appropriate critical values for hypothesis testing and estimating the variability of sample statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Sampling distribution of mean differences

A

the distribution of the differences in means between two samples that are randomly drawn from the same population.

understand how much variability we might expect in the differences between sample m

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Sampling distribution of mean differences and t-test

A

compare the means of two independent groups to determine if there is a significant difference between them.

By comparing the observed difference in sample meansto the distribution of mean differences from the sampling distribution, we can assess whether the observed difference is statistically significant (if its larger than the variability expected by chance then its not significant)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

P-value

A

A p-value, or probability value, is a number describing how likely it is that your data would have occurred by random chance

quantifies the evidence against a null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

P-value interpretation

A

The smaller the p-value, the less likely the results occurred by random chance, and the stronger the evidence that you should reject the null hypothesis.

alpha: a set probability threshold (often 0.05)
A p-value less than or equal to your significance level (typically ≤ 0.05) is statistically significant.

It’s a piece of evidence, not a definitive proof.

22
Q

Effect size measure

A

Effect size measures quantify how big the difference or relationship is between variables in a study. They show how meaningful the findings are beyond just whether they’re statistically significant.

cohen’s d, pearson’s r, phi coefficent, partial eta-squared

23
Q

Effect Size

A

Effect size is like a ruler for measuring how strong a relationship is between things or how big a difference is between groups in a study. I

a way to talk about how important the findings are, so researchers can compare results from different studies easily.

24
Q

Cohen’s D

A

effect size measure used to quantify the standardized difference between two group means in a study.

mean (1) - mean (2) / total SD

Helps compare the magnitude of differences between groups in various studies, regardless of sample size.

TWO GROUPS (indepednet t-test)

25
Q

Pearson’s r

A

Pearson’s r is used to quantify the strength and direction of the linear relationship between two continuous variables.

Range: -1 (perfect negative correlation) to +1 (perfect positive correlation)
Magnitude: Closer to 1 indicates a stronger correlation.

Helps understand how closely related two variables are and whether changes in one variable predict changes in the other.

26
Q

Displaying Cohen’s D

A

bar graph: you would have two bars representing the means of each group, with error bars indicating the variability or standard error of the means.
box plot: you would have two boxes representing the distribution of scores in each group, with whiskers indicating the range of the data and possibly outliers.

compare means between two groups, which is what Cohen’s d represents.

27
Q

Violin Plot

A

combines a box plot with a kernel density plot to show the distribution of data.
Provides a visual representation of data distribution, including information about central tendency, spread, and multimodality.
Suitable for comparing distributions across different groups or visualizing the distribution of a single variable.

28
Q

Advantages of Violin Plot

A

Individual Data Points: shows the distribution and density of data points, providing insights into clusters or gaps.
Outliers:
Easily noticeable due to the combination of box plot and kernel density plot.
Statistics: Displays key summary statistics such as mean, median, and quartiles, enhancing interpretability.

29
Q

Disadvantages of Violin Plot

A

Complexity: visually complex, especially when comparing multiple groups or variables.
Limited for Nominal Data: Less effective for nominal or categorical data compared to continuous data.

30
Q

Advantages of Box Plots

A

Efficiently summarizes the distribution of data, including median, quartiles, and outliers.
Useful for identifying skewness, variability, and outliers in the data.
Facilitates easy comparison of distributions between groups or categories.

31
Q

Disadvantages of Box Plots

A

May not provide detailed information about the shape of the distribution or individual data points.
Less effective for displaying the density or frequency of data points compared to histograms.

32
Q

Advantages of Histogram

A

Provides a visual representation of the frequency distribution of data.
Allows for easy identification of patterns, central tendency, and spread.
Suitable for displaying both continuous and discrete data.

33
Q

Diadvantages of Histogtram

A

Choice of bin width can influence the appearance and interpretation of the histogram.
May not accurately represent the underlying distribution if the number of bins is not chosen appropriately.
Not as effective for comparing distributions between groups or categories as box plots.

34
Q

Advantages of Scatter Plot

A

Visualizes the relationship between two continuous variables.
Allows for the identification of patterns, trends, and correlations in the data.
Useful for detecting outliers and assessing the strength and direction of relationships.

35
Q

Disadvantages of Scatter Plot

A

Limited to visualizing relationships between two variables and may not capture more complex patterns.
Requires a large sample size to accurately represent the underlying population distribution.
May be less effective for categorical or ordinal data compared to continuous data.

36
Q

Relative Frequency

A

The proportion or percentage of data values that fall into each category or interval.
Relative frequency = frequency of a category or interval / total number of data points.
Provides insights into the distribution of data by showing the proportion of observations in each category or interval.

37
Q

Cumulative Frequencies

A

The running total of frequencies as you move through the categories or intervals from the lowest to the highest.
Cumulative frequency of a category or interval = sum of frequencies up to that category or interval.
Helps visualize the accumulation of data values and identify patterns or trends in the distribution.

38
Q

Continuous Data

A

Data that can take any value within a certain range and can be measured with precision.
Height, weight, temperature, time.

Can take an infinite number of values within a range.
Often measured using instruments with fine precision.
Can be subdivided into smaller units (e.g., fractions or decimals).

39
Q

Discrete Data

A

Data that can only take specific values and cannot be subdivided further.
Number of siblings, number of cars in a parking lot, number of goals scored in a football match.

Can only take distinct, separate values.
Often represented by integers.
Cannot be measured with infinite precision.

40
Q

Continuous vs Discrete

A

Continuous data can take any value within a range, while discrete data can only take specific, distinct values.

Continuous data is measured with precision, often using instruments, while discrete data is counted or observed.

Continuous data can be subdivided into smaller units, such as fractions or decimals, while discrete data cannot be further divided.

41
Q

Bar Chart vs. Histogram for Categorical Groups

A

Bar Chart: Use for distinct categories to compare frequencies.
Histogram: Use for continuous data to display distribution.

42
Q

Misleading Bar Chart

A

Adjusting the scale of the y-axis to exaggerate differences between categories.

Makes differences between categories appear larger or smaller than they actually are, leading to inaccurate interpretations or conclusions.

e.g. A bar chart with a truncated y-axis that starts at a value greater than zero, making small differences between categories appear larger than they are.

43
Q

Sample Standard Deviation Denominator (n - 1)

A

It’s a way to calculate standard deviation that helps us get a better estimate of how spread out our data is.
By using n - 1 instead of just n, we give our data a bit more room to vary, which makes our estimate more accurate.

44
Q

Statistical Inference

A

It’s like making educated guesses about a whole group based on what we see in a smaller group.
This helps us test ideas, draw conclusions about larger groups, and figure out if what we’re seeing is really important or just random.

45
Q

Probability on Standard Normal Curve

A

There’s a 50% chance (or 0.5) that something falls below the middle point.
Because the normal curve is perfectly symmetrical, half of the data falls below the middle point, which is why it’s 50%.

46
Q

Percentage between -1 and +1 SD

A

Around 68% of the data falls within one standard deviation of the average.
This is a rule we use that says most of our data (about 68%) is within a certain distance from the average, which helps us understand how spread out our data is.

47
Q

Chi-Square Test

A

Statistical test used to determine if there is a significant association between categorical variables.
Assess whether observed frequencies differ significantly from expected frequencies.

Pearson chi-square, likelihood ratio chi-square, Fisher’s exact test.

48
Q

Variables Suitable for Chi-Square Goodness-of-Fit Test

A

Used when comparing observed frequencies to expected frequencies within one categorical variable.

Testing whether observed frequencies of blood types in a population match the expected frequencies based on a genetic model.

49
Q

Chi-Square Goodness-of-Fit Test Calculation

A

χ² = Σ((O-E)² / E)

where O is observed frequency, E is expected frequency, Σ represents the summation over all cells of the contingency table.

Calculate expected frequencies, find the difference between observed and expected frequencies, square the differences, divide by expected frequencies, sum up these values to get the chi-square statistic.

Compare the calculated chi-square value to a critical value from the chi-square distribution to determine significance.

50
Q

Follow-up after Significant Chi-square Test

A

To explore the nature of the significant association found in the chi-square test.
Post-Hoc Analysis: Conduct additional analyses to determine which categories are driving the significant result.

If the chi-square test indicates a significant association between two categorical variables, follow-up analyses such as residual analysis or pairwise comparisons can be performed to identify specific categories contributing to the association.