220 final exam last chap Flashcards

1
Q

What does statistical conclusion validity refer to?
A. Accuracy of measuring instruments
B. Whether we used qualitative data
C. Reasonableness of conclusions about variable relationships
D. Whether our sample matches the population

A

C. Reasonableness of conclusions about variable relationships

➡️ Is there enough statistical evidence to say, “Yes, these variables are related”?

Statistical conclusion validity is about using proper statistical methods to determine whether your data truly supports a relationship between the IV and DV, or if it’s just a result of random chance.

It’s like ensuring your data doesn’t lead you to the wrong conclusion by applying the right statistical tests, having an appropriate sample size, and avoiding errors like:

Type I errors (false positives)

Type II errors (false negatives)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the two possible conclusions in statistical analysis?
A. One variable caused the other OR there’s no effect
B. Correlation is weak OR strong
C. We used descriptive OR inferential statistics
D. There is a relationship OR there isn’t a relationship

A

D. There is a relationship OR there isn’t a relationship

If you’re studying the relationship between study hours and test scores, you could conclude that there is a relationship (study hours affect test scores) or there isn’t a relationship (study hours do not impact test scores).

Conclusion validity checks if the math (stats) shows a real result or just random chance.

Construct validity checks if you’re measuring what you actually meant to measure.

Internal validity checks if your result was truly caused by your variable, not something else.

External validity checks if your results can apply to other people or situations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a Type I error?
A. Failing to detect a real relationship
B. Concluding a relationship exists when it doesn’t
C. Mixing up descriptive and inferential stats
D. Ignoring statistical power

A

B. Concluding that a relationship exists when it doesn’t

A study looks at whether gang programs reduce youth crime.
The results show no effect — so they say the program doesn’t work.

But in truth, the program does help, they just didn’t have enough data to detect it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a Type II error?
A. Concluding no relationship exists when there actually is one
B. Assuming causation from correlation
C. Using the wrong variables
D. Using descriptive stats for multivariate data

A

A. Concluding no relationship exists when there actually is one

A researcher studies whether police patrols reduce car thefts.
They find no significant effect, so they say patrols don’t help.

But in reality, patrols do reduce thefts — the study just didn’t have enough data.

✅ A real relationship exists
❌ The study failed to detect it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

type II Threats (The Haystack Problem)
5. In the “needle in a haystack” analogy, what does the needle represent?
A. Sample size
B. External validity
C. The true relationship you’re trying to detect
D. Random assignment

A

C. The true relationship you’re trying to detect

The needle: the relationship you are trying to see

The haystack: the “noise” that
obscures your vision

Type 1 Threat: Searching for patterns randomly can make you find false relationships.

Example: Finding a fake link between eating chocolate and happiness because you didn’t have a clear plan.

Type 2 Threat: A weak study design can make you miss real relationships.

Example: Missing the effect of a drug because your study had too small of a sample size.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which of the following is NOT a source of noise in statistical conclusion validity?
A. Random heterogeneity of participants
B. Poor implementation fidelity
C. High statistical power
D. Low reliability of measures

A

C. High statistical power

the ones that are is in the below

Low reliability of measures
Poor implementation fidelity
Random irrelevancies in the setting
Random heterogeneity of participants
Insufficient statistical

Statistical noise = Random variations in data due to factors that aren’t part of the study, like individual differences, measurement errors, or other variables that affect the outcome.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does insufficient statistical power affect?
A. The central tendency of the data
B. Ability to detect the needle in the haystack
C. Calculation of percentiles
D. External validity

A

B. Ability to detect the needle in the haystack

Haystack = noise/sample size/variability

Small needle in a big haystack = hard to find = low statistical power
Big needle = easy to find -> high statistical power

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

If a result is significant at p < .05, what does this mean?
A. There’s less than a 5% chance the result is due to luck
B. It’s more than 50% true
C. It’s always generalizable
D. The test was one-tailed

A

A p-value of 0.05 or less indicates a statistically significant result, meaning there’s less than a 5% chance the result is due to random chance, while a p-value greater than 0.05 suggests the result is not significant and may be due to random variation.

If the p-value is less than 0.05 (p < 0.05), it means the result is statistically significant (likely a real effect).

If the p-value is greater than 0.05 (p > 0.05), it means the result is not statistically significant (likely due to chance).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Which of the following are conventional levels of significance? standard cutoff for deciding if a result is meaningful.
A. .75, .25, .10
B. .05, .01, .001
C. .95, .85, .50
D. .03, .07, .15

A

B. 0.05, .01, .001

A conventional level of significance refers to the commonly accepted thresholds that researchers use to decide if a result is statistically significan

so if p value is at 0.05 or below it is close to the real reslt being statistically siginificant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a Type I threat often caused by?
A. Small sample sizes
B. Data fishing (running too many tests)
C. Using qualitative variables
D. High statistical power

A

B. Data fishing (running too many tests)
you say there is a relationship when really, there isn’t one (A false positive)

Out of many tests, one test might show a result that looks promising typically considered “significant”).
researcher might think this is a real finding because it’s what they were looking for.

it increases the risk of a Type I error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the main purpose of descriptive statistics?
A. Summarize or describe sample data
B. Prove a hypothesis
C. Generalize to populations
D. Predict future events

A

A. Summarize or describe sample data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the purpose of inferential statistics?
A. Clean the data
B. Find frequency distributions
C. Make conclusions about populations from sample data
D. Create visual charts

A

C. Make conclusions about populations from sample data

use a sample of data to make generalizations or predictions about a larger population. It helps researchers go beyond the data at hand and draw conclusions about a broader group.

study large populations without needing to ask every single person. (sampling)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does “univariate” analysis focus on?
A. Two variables
B. One variable
C. No variables
D. Three or more variables

A

B. One variable

focuses on just one variable at a time.
You’re not comparing it to any other variable
You collect data on students’ test scores and look at the average score.
That’s univariate — you’re only analyzing test scores, nothing else.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the three main features of univariate analysis?
A. Distribution, central tendency, dispersion
B. Sampling, testing, plotting
C. Normality, reliability, validity
D. Mean, correlation, frequency

A

A.
Distribution: How the data is spread out.

Central Tendency: The “center” of the data (mean, median, mode).

Dispersion: How spread out the data is (range, variance, standard deviation).

Imagine two classes of students with their test scores:

Class A: 90, 91, 92, 93, 94

Class B: 60, 70, 80, 90, 100

Distribution: Both classes have scores, but Class A’s scores are closely packed around the 90s, while Class B’s scores are more spread out across a larger range.

Dispersion: Class B has higher dispersion because their scores vary more widely (from 60 to 100), while Class A has low dispersion because the scores are tightly grouped together.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the mean?
A. Middle number in a list
B. Most frequent value
C. Average of all values
D. Difference between max and min

A

C. Average of all values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the median for this list: 0, 2, 3, 3, 4, 4, 6?
A. 2
B. 3
C .3.5
D. 4

17
Q

What is the mode in this list: 0, 2, 3, 3, 4, 4, 4, 6?
A. 3
B. 2
C. 4
D. 6

18
Q

In a normal distribution, which statements are true?
A. Mean is higher than median
B. Mode is the lowest value
C. Mean = Median = Mode
D. There’s no dispersion

A

C. Mean = Median = Mode

Imagine you measured the heights of 100 people. If their heights form a normal distribution:

The mean height will be the same as the median height (middle value) and the mode (most common height).

19
Q

Why might the mean be misleading?
A. It’s hard to calculate
B. It can be affected by outliers
C. It doesn’t apply to normal data
D. It’s always the same as the median

A

B. It can be affected by outliers (odd one)
you are looking at the number of burglaries in 5 neighborhoods:

Neighborhood 1: 5 burglaries

Neighborhood 2: 7 burglaries

Neighborhood 3: 6 burglaries

Neighborhood 4: 4 burglaries

Neighborhood 5: 50 burglaries (this is the outlier)

Calculation of Mean:
Add up the burglaries: 5 + 7 + 6 + 4 + 50 = 72 burglaries

Divide by the number of neighborhoods (5): 72 ÷ 5 = 14.4 burglaries

The mean number of burglaries is 14.4, but most neighborhoods have much fewer burglaries. Neighborhood 5 has an unusually high number of burglaries, which pulls the mean up, making it seem like burglaries are higher across all neighborhoods than they actually are.

20
Q

What is the range?
A. The average of all scores
B. The difference between highest and lowest values
C. The middle value
D. The most common value

A

B. The difference between highest and lowest values

21
Q

What does a percentile rank show?
A. What percent of scores fall at or below a value
B. How tall the histogram is
C. The SD of scores
D. The average error

A

A. What percent of scores fall at or below a value

A percentile rank tells you how many people scored lower than or the same as you.

Imagine you are in a criminology class with 100 students, and you get a score of 80 on a test.

If your percentile rank is 90, it means 90% of the students scored lower than or the same as you.

So, you did better than 90 out of 100 students.

22
Q

Which is NOT a valid quartile range?
A. 0–25%
B. 26–50%
C. 51–75%
✅ D. 60–100%

A

Quartiles divide data into four equal parts, each representing a range of 25% of the data. These ranges are:

0–25%: The first quartile (Q1), or the lower 25% of the data.

26–50%: The second quartile (Q2), or the middle 25% of the data (this also includes the median).

51–75%: The third quartile (Q3), or the upper 25% of the data.

76–100%: The fourth quartile (Q4), or the top 25% of the data.

so D has too much big jump from the set % scale

23
Q

Standard Deviation (Steps)
24. What is the first step to compute standard deviation?
A. Find each value’s distance from the mean
B. Calculate the median
C. Multiply the mean by the number of values
D. Divide total by 100

A

A. Find each value’s distance from the mean

24
Q

Why do we square the deviations when calculating SD?
A. To prevent negative values from canceling out
B. To make values easier
C. To find the median
D. Because it’s required by SPSS

A

A. To prevent negative values from canceling out

Deviation=Score−Mean

8−10=-2
(−2)^2 = 4

25
Q

Measures like mean and SD are only appropriate for which types of variables?
A. Nominal and ordinal
B. Continuous (interval & ratio)
C. Categorical
D. Discrete only

A

B. Continuous (interval & ratio)
First Example: Temperature in Celsius (Interval)
You can go below zero (like –10°C).

0°C does NOT mean “no temperature” — it’s just a point on the scale.

So: No true zero ✅

Also: You can’t say 20°C is twice as hot as 10°C — that doesn’t make sense in Celsius.

📏 Second Example: Height in cm (Ratio)
You cannot go below zero height (0 cm = literally no height).

So: 0 cm means none at all = true zero ✅

You can say 180 cm is twice as tall as 90 cm — because the scale has a real zero.

Nominal, ordinal, and discrete variables either can’t be measured on a scale or have limitations that make mean and SD inappropriate.

26
Q

Why is it inappropriate to take the mean of a nominal variable like religion?
A. Because religion has outliers
B. Numbers represent categories, not amounts
C. Mean can’t be used on any variable
D. Religion is multivariate

A

B. Numbers represent categories, not amounts

nominal data (like religion), the numbers assigned to each category are just labels. For example, you might assign the number 1 to “Christian,” 2 to “Muslim,” and 3 to “Hindu,” but those numbers don’t actually mean anything in terms of quantity. They are just arbitrary labels.

27
Q

What’s the formula for computing a murder rate per 100,000 people?
A. (Population ÷ Murders) × 100,000
B. (100,000 ÷ Murders) × Population
C. (Murders ÷ Population) × 100,000
D. (Murders ÷ 100,000) × Population

A

C. (Murders ÷ Population) × 100,000

28
Q

What does a bivariate contingency table show?
A. Relationship between two variables
B. Mean scores of one variable
C. Total variance of all data
D. External validity

A

A. Relationship between two variables

The relationship between gender and voting preference.

How many males and females voted for Party A or Party B.

29
Q

What does a bivariate contingency table show?
A. Relationship between two variables
B. Mean scores of one variable
C. Total variance of all data
D. External validity

A

A. Relationship between two variables

A study looks at whether gender (male or female) affects the choice of TV show type (crime drama, comedy, or action). The table below shows the number of males and females who prefer each type of show:

used to find out if one variable is connected to or affected by the other. The table helps you see if patterns or trends exist between the two variables.

30
Q

What is the purpose of multivariate contingency tables?
A. To describe distributions
B. Show connections between 3+ variables
C. Test standard deviation
D. Present quartiles

A

B. Show connections between 3+ variables

A multivariate contingency table is used when you want to look at three or more variables and understand how they are related to each other.

31
Q

What is the PRE model?
✅ A. A way to see how much better we can predict a variable when we know another
B. A test for skewness
C. A type of percentile
D. A form of frequency count

A

The Proportional Reduction in Error (PRE) model helps us figure out how much better we can predict something when we know another piece of information.

imagine you’re trying to predict how much crime happens in different cities. If you don’t know anything about the city, your prediction will probably be pretty off (a lot of error). But if you know things like education levels in the city, your predictions about crime might get a lot better because education can help explain crime rates.

32
Q

Which measure is used for nominal variables?
A. Lambda
B. Gamma
C. Pearson
D. Standard Deviation

A

A. Lambda
Lambda is used for nominal variables and measures how much knowing one variable (like gender) reduces the error in predicting another variable (like crime type). It’s a measure within the PRE model that helps us understand how much better we can predict one thing when we know another.

33
Q

Why is Lambda useful?
A. It shows standard deviation
B. It tells us how much better we can guess a nominal variable with another variable
C. It measures percentiles
D. It creates quartiles

A

B. It tells us how much better we can guess a nominal variable with another variable

Lambda is a measure used in the Proportional Reduction in Error (PRE) model to show how much knowing one nominal variable (like gender) helps reduce the error in predicting another nominal variable (like crime type).”

34
Q

Which measure do we use for ordinal variables?
A. Lambda
B. Gamma
C. Mode
D. Variance

A

B. Gamma

Gamma measures the relationship between ordinal variables (variables with an ordered ranking), like education level and job satisfaction.

Example:
You’re studying the relationship between education level and job satisfaction.

Education Level: High school, College, Graduate (ordered from lowest to highest)

Job Satisfaction: Low, Medium, High (ordered from low to high)

35
Q

What is the difference between external validity and generalizability?
A. Generalizability is about applying results to the population; external validity is about real-world settings
B. They are the same
C. External validity is about sampling error
D. Generalizability refers to internal analysis only

A

A. Generalizability is about applying results to the population; external validity is about real-world settings

External Validity = Can the study’s findings apply to many different situations, settings, times, and populations?

Generalizability = Can the study’s findings apply to a larger population beyond the specific sample studied?

36
Q

Which measure do we use for interval/ratio variables?
A. Mode
B. Gamma
C. Lambda
✅ D. Pearson’s product-moment correlation

A

Pearson’s Product-Moment Correlation is a statistic used to measure the strength and direction of the relationship between two continuous variables (interval or ratio data).

Strength: How strongly the two variables are related (strong or weak).

Direction: Whether the relationship is positive (both increase together) or negative (one increases while the other decreases).

Easy Example:
Imagine you’re looking at how study hours and test scores are related for a group of students.

The more hours students study, the higher their test scores tend to be. This would give a positive correlation.