stats midterm Flashcards

1
Q

Statistics:

A
  • The practice or science of collecting and analyzing numerical data in large quantities to interpret, summarize, and present it in a meaningful way.
  • A numerical fact or datum: a piece of data that provides information on a particular subject, often used in reference to quantitative research or studies.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Data

A

-Information, especially facts or numbers, collected to be examined and considered and used to help decision-making
-Information in an electronic form that can be stored and used
by a computer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Data literacy

A

the combination of skills and mindsets that allows individuals to find insights and meaning within their data to enable effective, data-informed decision-making

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Data literacy imparts the skills and mindset to find

A

meaning
within data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Politics

A

-The activities of the government, members of law-making organizations, or people who try to influence the way a country is governed
-The relationships within a group or organization that allow
particular people to have power over others

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Political science

A

uses data to figure out the correct answer to important questions like these

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Two styles of research

A

-Qualitative
-Quantitative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Qualitative research

A

based on information that cannot be easily measured, such as people’s feelings, rather than on information that can be shown in numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Quantitative research

A

related to information that can be shown in numbers and amounts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Topic

A

a matter dealt with in a text, discourse, or conversation; a
subject

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Theory

A

a plausible general principle or body of principles offered to
explain phenomena

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

A causal theory differs from a theory in that it

A

explicitly states the
relationship between two variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Variable

A

a characteristic, number, or quantity that can be measured
or counted and can take on different values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Invariance

A

The property of remaining unchanged regardless of changes in the conditions of measurement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

A hypothesis is even more - than a causal theory

A

specific

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

A hypothesis - the variables

A

operationalizes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Operationalization

A

precisely defining the variables and how they are measured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Pre-registration

A

makes your hypothesis and plan for
hypothesis testing public

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Once you have - your research plan, you can test your hypotheses

A

pre-registered

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Hypothesis testing

A

the use of statistics on data to test a hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Methodology

A

the use of statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Empirical analysis

A

the use of statistics on observational
data – not experimental data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Empirical testing

A

the use of statistics on observational data to test a hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Hypothesis testing uses statistics to test:

A
  1. whether an association exists between the two variables,
  2. the strength of any association between the two variables, and
  3. the probability that the association between the two variables is
    due to random chance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Normative arguments include words like

A

“should” or “ought to.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

parsimonious

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Time dimension

A

points at time in which your data changes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Time-series data

A

a sequence of data points collected or recorded at successive points
in time, typically at equally spaced intervals, that represents how a particular variable or set of variables changes over time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Hierarchical dimension

A

the level at which your data changes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Multi-level data

A

data that is structured in multiple nested levels, where observations are grouped within higher-level units

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Spatial dimension

A

geographic locations in which your data changes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Cross-sectional data

A

data collected at a single point in time from multiple units, such as states or countries, to analyze variations across those units

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Moderator (Z)

A

a variable that influences the strength or direction of the
relationship between an independent and a dependent variable in a study.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Mediator (Z)

A

a variable that explains the process or mechanism through
which an independent variable affects a dependent variable, acting as an
intermediary in the relationship11

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Formal theory

A

a framework that uses mathematical models and logical structures to rigorously analyze and predict the behavior of complex systems or phenomena

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Rational choice theory

A

individuals make decisions by systematically evaluating the costs and benefits to maximize their personal utility or advantage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Utility

A

the sum of all benefits of an action minus the sum of all costs from that
action

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Utility maximizer

A

an individual who seeks to make choices that yield the highest possible level of benefit based on their preferences and available options

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Expected utility

A

the overall anticipated satisfaction or benefit (utility) derived from a particular choice or outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Game theory

A

a branch of formal modeling that focuses on analyzing strategic interactions between rational decision-makers, where the outcome for each participant depends not only on their own choices but also on the choices of others

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

The prisoner’s dilemma

A

a classic game theory scenario where two individuals, who cannot communicate, face a choice between cooperating with each other or betraying one another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

Social choice theory

A

a domain within formal modeling that examines how individual
preferences can be aggregated to make collective decisions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Intransitive Preferences

A

a preference structure that violates the transitivity condition. For example, an individual might prefer option A over option B, option B over
option C, but still prefer option C over option A (A > B, B > C, but C > A).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

Spatial models

A

a specialized form of formal modeling that incorporate spatial or geographic
dimensions into the analysis of strategic interactions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

Spatial models of voting

A

a formal modeling approach used to analyze how voters’ preferences
and spatial positioning influence electoral outcomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

Preference mapping

A

voters and candidates are positioned on a spatial map (often a one-dimensional or two-dimensional continuum) based on their ideological or policy preferences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

Vote maximization

A

candidates choose positions or policies to maximize their votes, typically moving towards the median voter or the center of voter preferences to appeal to the largest
segment of the electorate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

Equilibrium analysis

A

The model identifies equilibrium points, where candidates’ positions stabilize because any deviation would result in fewer votes. The most common equilibrium is the median
voter theorem, where candidates converge to the preferences of the median voter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

Causal relationship

A

a connection between two variables where one variable directly influences or determines the outcome of the other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

Confounder

A

a variable that influences both the independent and dependent variables, potentially leading to a misleading or spurious association between them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

Spurious relationship

A

a false or misleading association between two variables that is actually caused by a third, confounding variable, rather than a direct causal link between the two

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

Control variable

A

a variable or condition that is held constant or regulated in an
experiment or study to isolate the effect of the independent variable on the dependent variable, ensuring that the results are not influenced by extraneous factors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

Deterministic relationship

A

a connection between two variables where one variable’s value is precisely determined by the value of the other, with no randomness or uncertainty involved

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

Probabilistic relationship

A

a connection between two variables where changes in one variable are associated with changes in the likelihood or probability of different
outcomes in the other variable, but the relationship is not perfectly predictable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

Observational data

A

information collected from real-world observations or measurements without conducting experiments

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

Experimental data

A

information collected from experiments where variables are
systematically manipulated to observe their effects on other variables, allowing for causal inferences

57
Q

Randomized controlled trials (RCTs)

A

experimental studies where participants are randomly assigned to either a treatment group or a control group to evaluate the effectiveness of an intervention while minimizing biases

58
Q

Treatment group

A

a group of participants in a study that receives the treatment or intervention being tested, allowing researchers to assess its effects compared to a control group

59
Q

Random assignment

A

the process of randomly allocating
participants to control and treatment groups in a study to ensure that each group is comparable and to eliminate selection bias

60
Q

Selection bias

A

when the sample of participants in a study is not representative of the population being studied, leading to distorted or unrepresentative results

61
Q

Randomized controlled trials are considered the gold standard for
causal research because they can cross the

A

four causal hurdles.

62
Q

Experiments can exhibit low levels of

A

external validity

63
Q

External validity

A

the degree to which one can be confident that the results of an analysis apply to the broader population

64
Q

Natural experiments

A

experiments that leverage naturally occurring random variations or events to investigate causal effects, without direct manipulation of the independent variable by the researcher

65
Q

Natural experiments exhibit high levels of

A

internal validity

66
Q

Controlled experiments

A

studies that compare the effects of an intervention or treatment between pre-selected groups that are not randomly assigned, aiming to assess causal relationships while
controlling for confounding variables

67
Q

Quasi-experiments

A

research designs that aim to evaluate
interventions or treatments without full randomization, often using
pre-existing groups or natural conditions to infer causal relationships

68
Q

Observational research

A

research designs in which the
researcher does not have control over values of the independent
variable because the independent variable occurs naturally

69
Q

Survey item

A

a specific question or statement in a survey designed to gather
data on a particular aspect of a respondent’s attitudes, opinions, or behaviors

70
Q

Open-ended items

A

items that allow respondents to provide their answers in
their own words

71
Q

Ranking item

A

item that asks respondents to rank a list of choices according to their preferences or importance

72
Q

Likert scale

A

response options that allow respondents to rate their level of
agreement or disagreement with a series of statements on an interval scale, typically ranging from “strongly disagree” to “strongly agree

73
Q

Binary response option

A

a type of response with only two choices

74
Q

Multi-item scales

A

multiple questions or items that measure a single underlying construct

75
Q

Scale validation

A

the process of assessing whether a multi-item scale accurately and reliably captures the construct it is intended to measure, ensuring
that it reflects the intended attributes and performs consistently across different contexts and populations

76
Q

Demographic items

A

data collected about respondents’ characteristics, such as age, gender, education level, income, and ethnicity

77
Q

Population

A

the entire group of individuals or units from which a sample is drawn
and to whom the survey findings are intended to generalize

78
Q

Sample

A

a subset of individuals or units selected from a larger population
for the purpose of conducting a survey or study to draw conclusions about the entire population

79
Q

Sample

A

to select and examine a subset of a population or data set to draw conclusions or make inferences about the larger population

80
Q

Sample size (N)

A

the number of individual units or observations selected from a
population for a study, used to ensure the results are statistically reliable and representative of the larger group

81
Q

Statistical power

A

the probability that a statistical test will correctly reject a false null hypothesis, thereby detecting an effect or relationship if one truly exist

82
Q

Representative sample

A

a subset of a population that accurately reflects the characteristics and diversity of the larger group, allowing the results to be generalized to the entire population

83
Q

Probability sample

A

when each member of the population has a known, non-
zero chance of being selected for the sample, allowing for statistical inference and generalization to the population

84
Q

Non-probability sample

A

when members of the sample are not selected at random, making it difficult to determine the likelihood of any member being chosen and limiting the ability to generalize the findings

85
Q

Convenience samples

A

a type of non-probability sample where participants are selected based on their easy availability and proximity to the researcher, rather than through random sampling, which can lead to biases and limited generalizability

86
Q

Quantitative research

A

a method of inquiry that focuses on collecting and analyzing numerical data to identify patterns, test hypotheses, and make generalizations about a population

87
Q

Conceptual clarity

A

forming a precise definition for and clear understanding of the concepts being studied

88
Q

Concept

A

a broad, abstract idea or general notion that provides a
foundational understanding

89
Q

Construct

A

a specific, measurable version of a concept used in research
to operationalize and test theoretical ideas

90
Q

Face validity

A

the extent to which a measurement tool appears to measure what it is supposed to measure, based on casual inspection

91
Q

Construct validity

A

the extent to which a variable or measurement is related to other measures that theory suggests should be related

92
Q

Content validity

A

the extent to which a variable or measurement accurately represents all of the elements that define the concept it is intended to measure

93
Q

Reliability

A

the consistency and stability of a measurement tool across
repeated applications

94
Q

Survivorship bias

A

when only the entities that have “survived” a particular process are considered, leading to a skewed understanding or conclusion.

95
Q

Qualitative research

A

a method of inquiry that focuses on understanding and interpreting the meanings, experiences, and perspectives of individuals or groups through non-numerical data, such as interviews, observations, and texts

96
Q

Categorical variables

A

represent categories or groups and do not have a numeric value

97
Q

Nominal variables

A

categorical variables with no inherent order or ranking among the categories.

98
Q

Ordinal variables

A

categorical variables that have a meaningful order or ranking, but the intervals between the categories are not necessarily equal.

99
Q

Numerical variables

A

represent quantities and can be measured on a numeric scale

100
Q

Continuous variables

A

can take any value within a range and can be subdivided into finer increments with equal unit distances

101
Q

Discrete variables

A

can only take specific, distinct values, often counts or integers

102
Q

Rank statistics

A

a class of statistics used to describe the variation of continuous variables based on their ranking from lowest to highest values

103
Q

Quartile

A

a statistical term that divides a dataset into four equal parts, with
each quartile containing 25% of the data

104
Q

Box-whisker plot

A

a graphical representation of data
that displays the median, quartiles, and potential
outliers, using a box to show the interquartile range
and “whiskers” to indicate the range of the data

105
Q

Moments

A

numerical measures derived from the data values themselves and their positions relative to the mean or origin

106
Q

The zero-sum property of the mean

A

if you subtract the mean of a dataset
from each data point, the sum of these deviations will always be zero

107
Q

The mean of a variable is often called its

A

expected value because it is the
value you would most expect the variable to take.

108
Q

Variance (second moment)

A

a measure of the dispersion of a variable around its mean

109
Q

Standard deviation

A

another measure of the dispersion of a variable around
its mean.

110
Q

Kernal density plot

A

a visual depiction of the distribution of a single variable based on a smoothed calculation of the density of cases across the range of values

111
Q

Skewness (third moment)

A

a measure that indicates the symmetry of the variable’s distribution around the mean

112
Q

Kurtosis (fourth moment)

A

a measure that indicates the steepness of the distribution of a variable

113
Q

Even when we go all out to get information about every U.S. citizen in the Census, we still have

A

lots of nonrespondents.

114
Q

Convenience sample

A

a sample such that each member of the underlying population does NOT necessarily has an equal probability of being selected.

115
Q

Statistical inference

A

the process of using what we
know about a sample to make probabilistic statements about the broader population.

116
Q

Parameters

A

parameters are numerical values that
describe certain characteristics or features of a sample or an entire population, such as the mean, variance, or proportion.

117
Q

Central limit theorem

A

a fundamental result from
statistics indicating that if one were to collect an infinite number of random samples and plot the resulting sample means, those sample means would be distributed normally around the true population mean

118
Q

Distribution

A

a mathematical function that describes the probabilities of different outcomes in a random variable or set of data

119
Q

Data generating process

A

the underlying mechanism or
model that describes how data is produced and collected

120
Q

Independent outcomes

A

an outcome whose occurrence is not influenced by the outcome of another event.

121
Q

Normal distribution

A

a bell-shaped statistical distribution that can be entirely characterized by its mean and standard deviation.

122
Q

standard deviation numbers

A
  • One standard deviation in each direction captures
    68.3% of the area under the curve.
  • Two standard deviations in each direction captures
    95.5% of the area under the curve.
  • Three standard deviations in each direction captures
    99.7% of the area under the curve.
123
Q

Standard error (of the mean)

A

the standard deviation of the sampling distribution means.
-It is the measure of the variability or dispersion of sample means around the population mean

124
Q

Confidence intervals

A

a probabilistic statement about the likely value of a population characteristic based on the observations in a sample.

125
Q

hypothesis

A

a testable statement predicting a relationship or effect between variables, often framed as an expectation of what will happen

126
Q

null hypothesis

A

a specific type of hypothesis that assumes no effect or no difference between variables and serves as a baseline to test against

127
Q

Counterfactual

A

an alternative scenario or condition that contrasts with the proposed effect or relationship in the hypothesis, effectively serving as the null hypothesis which assumes no effect or difference

128
Q

Critical value

A

a predetermined threshold derived from a particular statistical distribution used to conduct a statistical test

129
Q

Significance level

A

the probability of rejecting the null hypothesis when its actually true, representing the threshold for statistical significance.

130
Q

Test statistic

A

a value calculated by:
* identifying the sample statistic (e.g., the mean),
* determining its standard error (e.g. standard error of the mean), and
* using a specific formula to assess how far the sample result deviates from the null hypothesis

131
Q

p-value

A

the probability of obtaining a test statistic as extreme as, or more extreme than, the one observed, assuming the null hypothesis is true

132
Q

In the social sciences, the standard p-value threshold is

133
Q

Statistical significance

A

an indication that an observed effect or relationship in the data is unlikely to have occurred by random chance alone. (assuming the null hypothesis is true and the study is repeated an infinite number of times by drawing random samples from the same population, less than 5% of these results will be more extreme than the current result.)

134
Q

When a result is statistically significant, that does not mean that

A

the alternative hypothesis is proven to be true. It just means you can reject the null hypothesis

135
Q

Chi-squared test of tabular association

A

a statistical test that
evaluates whether observed categorical data align with the expected frequencies based on a specific hypothesis

136
Q

Contingency table

A

a matrix that displays the frequency distribution of two categorical variables, showing how their values intersect

137
Q

Degrees of freedom

A

the number of independent values or quantities that can vary in a statistical calculation, typically indicating the number of values that are free to vary after certain constraints are applied

138
Q

The shape of the Chi-square
distribution depends on the

A

degrees of freedom