Term Test Flashcards

1
Q

hierarchical scales

A

simplify process of developing a statistical study design
1. sampling unit
2. sample
3. observation unit
4. statistical population
5. population of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

sampling unit

A

unit being selected at random (can be same as observation unit)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

sample

A

collection of sampling units that you randomly selected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

observation unit

A

scale for data collection
- subject of the study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

statistical population

A

collection of all sampling units that could’ve been in your sample
- is defined by your study design

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

population of interest

A

collection of sampling units that you hope to draw a conclusion about
- defined by your research question
- same as statistical population, but often population of interest is larger

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Ex. of hierarchy for design (street address)

A

pop. of interest: all people of voting age in kingston
statistical population: all addresses in kingston
sampling unit: street address
sample: 100 random street adresses
observation unit: a person
measurement variable: voting intent
measurement unit: none cause measurement is categorical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

measurement variable

A

what we want to measure about the obervation unit (height, age)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

measurement unit

A

scale of measurement variable (cm for height, years for age)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

descriptive statistics

A

characterize data in your sample (quantitative)
- averages, tables & graphs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

inferential statistics

A

uses information from sample to make a probabilistic statement about statistical population (qualitative)
- confidence intervals

***takes uncertainty into account

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

4 steps to statistical framework

A
  1. sampling
  2. measuring
  3. calculating descriptive statistics
  4. calculating inferential statistics
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

inferential vs descriptive statistics

A

inferential:use info from data to make statement about STATISTICAL POPULATION
descriptive: use info from data to make statement about OUR SAMPLE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

subgroups

A

divide the population in groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

sampling design

A

describe how to sample a statistical population in a fair way

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

4 goals of an ideal sampling design

A
  1. all sampling units are selectable
  2. selection is unbiased
  3. selection is independent
  4. all samples are possible
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q
  1. all sampling units are selectable
A

every sampling unit has probability of being included

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q
  1. selection is unbiased
A

probability of selecting certain sampling units cannot depend on any attribute of that sampling unit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q
  1. selection is independent
A

selection of sampling unit must not decrease or increase the probability that any other sampling unit is selected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q
  1. all samples are possible
A

all samples that could be created from statistical population are possible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

bias

A

over-or-under estimate of a value from an average sample compared to a statistical population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

observational studies

A

based on observations of a statistical population where researchers do not have any control over the variables which impact our conclusions
- ex. cant control confounding variable so relationships aren’t causal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

goal of observational studies

A

characterize something about an existing statistical population that allows us to investigate relationships among variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

limitations of observational studies

A

cannot make statements about whether a factor causes the response you’re interested in

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
response variable
response you are interested in - ex. tobacco
26
explanatory variable
factor you investigate - ex. lung cancer
27
confounding variables
unobserved variables that affect a response variable
28
spurious relationship
when relationship between explanatory and response variables is thought to be driven by confounding variable
29
simple random survey
sampling units are selected at random from the statistical population where each sampling unit has the same probability of being in your sample
30
stratified survey
researcher creates strata then takes samples within each strata
31
strata
name given to a subgroup within the statistical population in a stratified survey
32
cluster survey
used to remove diversity in the statistical population thats not relevant to research question - cluster= sampling unit - nesting inside the cluster=observational unit
33
one-stage clusters
data are collected from all observation units in a cluster
34
two-stage clusters
a subset of observation units are randomly selected within each cluster
35
case-control survey
used to compare data between two groups 2 groups: - case -control ***strong risk of spurious relationship
36
case group (first group)
contains sampling unit WITH a particular response variable
37
control group (second group)
contains sampling unit WITHOUT response variable of the case group
38
cohort survey
sampling unit are selected and followed over time - use simple random survey and then observe their fate over time
39
retrospective studies
where outcome is already known (increases risk of spurious relationships) ex. case-control studies
40
prospective studies
where the outcome is not yet known (require more effort, but decrease risk of spurious relationships) ex. cohort studies
41
cross-sectional studies
study a response variable at only a single snapshot in time
42
longitudinal studies
study a response variable at multiple points in time
43
experimental studies
based on creating treatments where the researcher controls one or more variable
44
goals of experimental studies
study effect of one or more manipulated variables on one or more random variables - establishes cause and effect
45
factor
each manipulated variable has two levels/groups
46
replicates
number of times treatment is repeated on randomly selected units - number of replicates is the number of sampling units in an experimental study
47
pseudoreplication
an error in the design of an experimental studies where the observation units are analyzed rather than sampling units
48
levels
different values of the factor
49
control treatments
contains everything except the treatment
50
blocking
used to control for variation among sampling unit thats not of interest that alter experimental variable ***PREDEFINED
51
blinded
a design where the sampling unit (usually a person) does not know what treatment they are being exposed to
52
single blind design
sampling unit does not know the treatment they are assigned
53
double blind design
both the researcher and sampling unit do not know what treatment they are assigned to ***removes accidental bias
54
placebo
method used for control treatment that helps accomplish a blinded design - substance or treatment that has no effect on response variable
55
sham treatment
aims to account for the effect of delivery of a treatment thats not of interest of researcher
56
multiple factors
one factor could be drug type and another is diet type
57
interaction
when two explanatory variables have effects that are different than the simple sum of each variable in isolation
58
variable
any measurable characteristic of an observation unit (varies among sampling units)
59
3 pieces of information a variable contains
1. what the variable represents 2. measurement unit 3. description of the observation units
60
data
value of a variable you measure
61
continuous numerical variable
can take on continuous numbers (fractional numbers) ex. weight =107.23kg
62
discrete numerical variable
can take on only whole numbers (integers)
63
categorical
data is a qualitative description - no measurement units
64
ordinal categorical variable
categorical (qualitative) variables that have ORDERED levels ex. use emojis to describe how you feel
65
nominal categorical variable
can take on qualitative values but where values do not have any particular order
66
central tendency
describes the typical value in your sample (ex. mean)
67
dispersion
describes the spread of the values (ex. variance)
68
counts
number of sampling units in each category
69
proportion
share of the total sampling unit in each category
70
variance
measure of the amount of variation in your sample
71
standard deviation
square root of variance
72
quartiles
specific values of the variable that divide your data into ranked groups
73
median
central tendency is given by the second quartile
74
dispersion
describes how much variation there is in a sample
75
interquartile range
range between 1st and 3rd quartiles
76
when are quartiles sensitive?
when data set is small
77
pros to quartiles
median and IQR are robust to extreme values
78
cons to quartiles
median and IQR become quite variable for samples with a small number of observations
79
what are means sensitive to?
outliers
80
pros to means
mean and standard deviation are more robust when theres a small number of observations
81
cons to means
mean and standard deviation are sensitive to extreme values
82
effect size
used to evaluate whether changes in response variables is meaningful
83
absolute effect size
simple change in mean value between groups - can be calculated as a difference or ratio
84
difference
differences in mean values among groups - has advantage of retaining original scale
85
ratio
ratio of mean values among groups - has advantage of indicating a relative change, but loses the original scale
86
contingency table
summarizes data from categorial variables - shows frequency or proportion of sampling units in each level of a categorial variable
87
frequency
number of sampling units that falls in each level
88
contingency tables as proportions
help with visualizing the relative distribution of sampling units among levels
89
one-way contingency tables
observe 1 categorial variable
90
two-way contingency tables
observe 2 categorical variables
91
marginal distributions
calculate row and column - they are frequencies to see the overall pattern
92
row of contingency table
sum frequencies across all columns for each row
93
column of contingency table
sum frequencies across all rows for each column
94
distribution
refers to categorical variables rather than the table
95
conditional distributions
relative frequencies of one categorical variable within the other - shows interaction between two variables
96
bar graphs
used to visualize both single variable and two variable categorical data - NOT USED FOR NUMERICAL DATA - can be vertical or horizontal
97
vertical vs. horizontal
depends on research question - most relevant information should be on the HORIZONTAL axis
98
grouping variable
forms base of the figure - typically use ordinal categorical variables
99
grouped bar chart
levels of variable are shown beside each other - levels of grouping variable are separated by LARGE gap - levels of other variable are separated by SMALL gap
100
stacked bar graph
levels of variable are stacked on top of each other - colour is used to separate levels
101
histograms
split numerical data into bins and display number of sampling units in each bin
102
advantage to histogram
provide great way to visualize the pattern
103
disadvantage to histogram
complicated to display histograms when your dataset also has multiple levels of a categorical variable
104
what happens when theres too many bins in a histogram
pattern is lost cause theres little variation in frequency
105
what happens when theres too few bins in a histogram
pattern is lost cause of excessive aggregation
106
box plots
shows how the median value differs among groups, and how much variation of data
107
single box plot
based on quartiles and contains... 1. min 2. max 3. median 4. 1st quartile 5. 3rd quartile therefore IQR
108
parts of a single box plot
1. a box 2. solid line 3. whiskers 4. extreme value
109
extreme threshold
pair of imaginary lines drawn above and below box
110
box plots in observational studies
categorical group would be a measured categorical variable
111
box plot in experimental studies
categorical group would be the treatment factors
112
grouped box plot
two categorical groups
113
pros of histograms
- provide richest information about how your data is distributed - illustrates shape of the distribution
114
con of histogram
difficult to look at a numerical variable across categorical groups
115
pro of box plot
it is easy to compare across multiple categorical groups
116
con of box plot
convey much less about shape of distribution
117
scatter plots
used to show pattern between two numerical variables collected from DIFFERENT sampling units *HR against age for group of winner
118
line plots
used when data is collected repeatedly from SAME sampling units - data points are NOT INDEPENDENT of one another *HR during a run
119
x-axis
horizontal - independent variable
120
y-axis
vertical - dependent variable
121
independent variable
experimental treatment that is manipulated
122
dependent variable
measured response under those treatments
123
covariates
when both numerical variable are measured quantities from sampling unit - evaluating patten, so not causal
124
association
correlation between two variables - typically covariates
125
prediction
one variable predicts another - x-axis=predictor variable - y-axis=response variable
126
probability
frequency of a particular outcome or event
127
random trial
any process that has multiple outcomes but the result on any particular trial is unknown - can be discrete or continuous
128
sample space
the list or set of all possible outcomes - shown with {}
129
an event
outcome you are interested in - can be single element in sample space - can be any subset of the sample space
130
measurement variable
value of any particular measurement is unknown prior to making the observation
131
law of large numbers
random trial must be repeated many times to estimate probability
132
Ex. of probability (rolling a one)
1. random trial: rolling die 2. sampling space: s={1,2,3,4,5,6} 3. event: E={1} 4. probability= is 1/6 cause every side has an equal chance
133
probability distributions
functions that describe the probability over a range of events
134
properties of probability distributions
1. describe probability for entire sample space 2. area under probability distribution always sum to one 3. are used to describe both continuous and discrete random variables
135
discrete distributions
prob distributions for discrete random variable ex. number of times children ask for ice cream on a hot day
136
continuous distribution
prob distributions for continuous random variables ex. mass of an ice cream cone in grams
137
how is a discrete distribution shown
series of vertical bars with no space between them - vertical axis=probability mass
138
how is a continuous distribution shown
single curve as a function of continuous event - vertical mass=probability density
139
what are distributions used for
estimating a range, or calculate a probability
140
properties of standard normal distribution
1. mean of SND is zero 2. standard deviation of SND is one 3. x-axis is called the z-score
141
z-score
a scale that measures number of standard deviations from the mean
142
range vs. probabilities
probability and range are calculated as opposites
143
population parameters
describe attributes of the statistical population
144
sampling distributions
distribution of some descriptive statistic that only occurs if you repeatedly draw samples from statistical population
145
bimodal vs. unimodal
bimodal: two peaks unimodal: one peak
146
similarity between sampling distribution and stat. pop.
have the same mean value
147
difference between sampling distribution and stat. pop
sampling distribution is narrower than stat. pop.
148
characteristics of sampling distribution
1. shape of sampling distribution is independent of stat. pop. as long as sample size is large 2. variance decreases as number of sampling units increases
149
what shape is sampling distribution
smooth bell-shaped distribution (symmetrical)
150
central limit theorem
1. sampling distribution tends towards a normal distribution as sample size increases 2. mean of a sampling distribution is the same as mean of stat. pop. 3. sampling error can be calculated from sd of stat. pop and sample size
151
standard error
standard deviation of a sampling distribution
152
chain of inference adding to shape independence
the descriptive statistics of a sample provide an estimate of stat. pop. parameters and therefore sampling distribution
153
student t's distribution
similar to normal distribution but has a shape that depends on the sample size
154
what happens to t distribution when sample size is small
it has fatter tails than normal distribution to account for uncertainty - larger size= more certainty= t-distribution looks more like normal distribution
155
what is observed directly?
sample
156
what is not observed directly?
statistical population and sampling distribution (inference, not used in practice)
157
confidence intervals
describe range over x-axis of a sampling distribution that brackets a certain probability of where new samples may be found
158
purpose of confidence intervals
provide gauge for how much uncertainty there is in a descriptive statistic
159
what is the difference between experimental and observational studies?
experimental: causal observational: correlative
160
standard error
is unavoidable - helps make the statistical inference