Term Test Flashcards

1
Q

hierarchical scales

A

simplify process of developing a statistical study design
1. sampling unit
2. sample
3. observation unit
4. statistical population
5. population of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

sampling unit

A

unit being selected at random (can be same as observation unit)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

sample

A

collection of sampling units that you randomly selected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

observation unit

A

scale for data collection
- subject of the study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

statistical population

A

collection of all sampling units that could’ve been in your sample
- is defined by your study design

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

population of interest

A

collection of sampling units that you hope to draw a conclusion about
- defined by your research question
- same as statistical population, but often population of interest is larger

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Ex. of hierarchy for design (street address)

A

pop. of interest: all people of voting age in kingston
statistical population: all addresses in kingston
sampling unit: street address
sample: 100 random street adresses
observation unit: a person
measurement variable: voting intent
measurement unit: none cause measurement is categorical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

measurement variable

A

what we want to measure about the obervation unit (height, age)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

measurement unit

A

scale of measurement variable (cm for height, years for age)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

descriptive statistics

A

characterize data in your sample (quantitative)
- averages, tables & graphs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

inferential statistics

A

uses information from sample to make a probabilistic statement about statistical population (qualitative)
- confidence intervals

***takes uncertainty into account

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

4 steps to statistical framework

A
  1. sampling
  2. measuring
  3. calculating descriptive statistics
  4. calculating inferential statistics
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

inferential vs descriptive statistics

A

inferential:use info from data to make statement about STATISTICAL POPULATION
descriptive: use info from data to make statement about OUR SAMPLE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

subgroups

A

divide the population in groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

sampling design

A

describe how to sample a statistical population in a fair way

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

4 goals of an ideal sampling design

A
  1. all sampling units are selectable
  2. selection is unbiased
  3. selection is independent
  4. all samples are possible
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q
  1. all sampling units are selectable
A

every sampling unit has probability of being included

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q
  1. selection is unbiased
A

probability of selecting certain sampling units cannot depend on any attribute of that sampling unit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q
  1. selection is independent
A

selection of sampling unit must not decrease or increase the probability that any other sampling unit is selected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q
  1. all samples are possible
A

all samples that could be created from statistical population are possible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

bias

A

over-or-under estimate of a value from an average sample compared to a statistical population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

observational studies

A

based on observations of a statistical population where researchers do not have any control over the variables which impact our conclusions
- ex. cant control confounding variable so relationships aren’t causal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

goal of observational studies

A

characterize something about an existing statistical population that allows us to investigate relationships among variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

limitations of observational studies

A

cannot make statements about whether a factor causes the response you’re interested in

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

response variable

A

response you are interested in
- ex. tobacco

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

explanatory variable

A

factor you investigate
- ex. lung cancer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

confounding variables

A

unobserved variables that affect a response variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

spurious relationship

A

when relationship between explanatory and response variables is thought to be driven by confounding variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

simple random survey

A

sampling units are selected at random from the statistical population where each sampling unit has the same probability of being in your sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

stratified survey

A

researcher creates strata then takes samples within each strata

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

strata

A

name given to a subgroup within the statistical population in a stratified survey

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

cluster survey

A

used to remove diversity in the statistical population thats not relevant to research question
- cluster= sampling unit
- nesting inside the cluster=observational unit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

one-stage clusters

A

data are collected from all observation units in a cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

two-stage clusters

A

a subset of observation units are randomly selected within each cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

case-control survey

A

used to compare data between two groups
2 groups:
- case
-control

***strong risk of spurious relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

case group (first group)

A

contains sampling unit WITH a particular response variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

control group (second group)

A

contains sampling unit WITHOUT response variable of the case group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

cohort survey

A

sampling unit are selected and followed over time
- use simple random survey and then observe their fate over time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

retrospective studies

A

where outcome is already known (increases risk of spurious relationships)

ex. case-control studies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

prospective studies

A

where the outcome is not yet known (require more effort, but decrease risk of spurious relationships)

ex. cohort studies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

cross-sectional studies

A

study a response variable at only a single snapshot in time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

longitudinal studies

A

study a response variable at multiple points in time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

experimental studies

A

based on creating treatments where the researcher controls one or more variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

goals of experimental studies

A

study effect of one or more manipulated variables on one or more random variables
- establishes cause and effect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

factor

A

each manipulated variable has two levels/groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

replicates

A

number of times treatment is repeated on randomly selected units
- number of replicates is the number of sampling units in an experimental study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

pseudoreplication

A

an error in the design of an experimental studies where the observation units are analyzed rather than sampling units

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

levels

A

different values of the factor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

control treatments

A

contains everything except the treatment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

blocking

A

used to control for variation among sampling unit thats not of interest that alter experimental variable

***PREDEFINED

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

blinded

A

a design where the sampling unit (usually a person) does not know what treatment they are being exposed to

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

single blind design

A

sampling unit does not know the treatment they are assigned

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

double blind design

A

both the researcher and sampling unit do not know what treatment they are assigned to
***removes accidental bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

placebo

A

method used for control treatment that helps accomplish a blinded design
- substance or treatment that has no effect on response variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

sham treatment

A

aims to account for the effect of delivery of a treatment thats not of interest of researcher

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

multiple factors

A

one factor could be drug type and another is diet type

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

interaction

A

when two explanatory variables have effects that are different than the simple sum of each variable in isolation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

variable

A

any measurable characteristic of an observation unit (varies among sampling units)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

3 pieces of information a variable contains

A
  1. what the variable represents
  2. measurement unit
  3. description of the observation units
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

data

A

value of a variable you measure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
61
Q

continuous numerical variable

A

can take on continuous numbers (fractional numbers)
ex. weight =107.23kg

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
62
Q

discrete numerical variable

A

can take on only whole numbers (integers)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
63
Q

categorical

A

data is a qualitative description
- no measurement units

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
64
Q

ordinal categorical variable

A

categorical (qualitative) variables that have ORDERED levels
ex. use emojis to describe how you feel

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
65
Q

nominal categorical variable

A

can take on qualitative values but where values do not have any particular order

66
Q

central tendency

A

describes the typical value in your sample (ex. mean)

67
Q

dispersion

A

describes the spread of the values (ex. variance)

68
Q

counts

A

number of sampling units in each category

69
Q

proportion

A

share of the total sampling unit in each category

70
Q

variance

A

measure of the amount of variation in your sample

71
Q

standard deviation

A

square root of variance

72
Q

quartiles

A

specific values of the variable that divide your data into ranked groups

73
Q

median

A

central tendency is given by the second quartile

74
Q

dispersion

A

describes how much variation there is in a sample

75
Q

interquartile range

A

range between 1st and 3rd quartiles

76
Q

when are quartiles sensitive?

A

when data set is small

77
Q

pros to quartiles

A

median and IQR are robust to extreme values

78
Q

cons to quartiles

A

median and IQR become quite variable for samples with a small number of observations

79
Q

what are means sensitive to?

A

outliers

80
Q

pros to means

A

mean and standard deviation are more robust when theres a small number of observations

81
Q

cons to means

A

mean and standard deviation are sensitive to extreme values

82
Q

effect size

A

used to evaluate whether changes in response variables is meaningful

83
Q

absolute effect size

A

simple change in mean value between groups
- can be calculated as a difference or ratio

84
Q

difference

A

differences in mean values among groups
- has advantage of retaining original scale

85
Q

ratio

A

ratio of mean values among groups
- has advantage of indicating a relative change, but loses the original scale

86
Q

contingency table

A

summarizes data from categorial variables
- shows frequency or proportion of sampling units in each level of a categorial variable

87
Q

frequency

A

number of sampling units that falls in each level

88
Q

contingency tables as proportions

A

help with visualizing the relative distribution of sampling units among levels

89
Q

one-way contingency tables

A

observe 1 categorial variable

90
Q

two-way contingency tables

A

observe 2 categorical variables

91
Q

marginal distributions

A

calculate row and column
- they are frequencies to see the overall pattern

92
Q

row of contingency table

A

sum frequencies across all columns for each row

93
Q

column of contingency table

A

sum frequencies across all rows for each column

94
Q

distribution

A

refers to categorical variables rather than the table

95
Q

conditional distributions

A

relative frequencies of one categorical variable within the other
- shows interaction between two variables

96
Q

bar graphs

A

used to visualize both single variable and two variable categorical data
- NOT USED FOR NUMERICAL DATA
- can be vertical or horizontal

97
Q

vertical vs. horizontal

A

depends on research question
- most relevant information should be on the HORIZONTAL axis

98
Q

grouping variable

A

forms base of the figure
- typically use ordinal categorical variables

99
Q

grouped bar chart

A

levels of variable are shown beside each other
- levels of grouping variable are separated by LARGE gap
- levels of other variable are separated by SMALL gap

100
Q

stacked bar graph

A

levels of variable are stacked on top of each other
- colour is used to separate levels

101
Q

histograms

A

split numerical data into bins and display number of sampling units in each bin

102
Q

advantage to histogram

A

provide great way to visualize the pattern

103
Q

disadvantage to histogram

A

complicated to display histograms when your dataset also has multiple levels of a categorical variable

104
Q

what happens when theres too many bins in a histogram

A

pattern is lost cause theres little variation in frequency

105
Q

what happens when theres too few bins in a histogram

A

pattern is lost cause of excessive aggregation

106
Q

box plots

A

shows how the median value differs among groups, and how much variation of data

107
Q

single box plot

A

based on quartiles and contains…
1. min
2. max
3. median
4. 1st quartile
5. 3rd quartile
therefore IQR

108
Q

parts of a single box plot

A
  1. a box
  2. solid line
  3. whiskers
  4. extreme value
109
Q

extreme threshold

A

pair of imaginary lines drawn above and below box

110
Q

box plots in observational studies

A

categorical group would be a measured categorical variable

111
Q

box plot in experimental studies

A

categorical group would be the treatment factors

112
Q

grouped box plot

A

two categorical groups

113
Q

pros of histograms

A
  • provide richest information about how your data is distributed
  • illustrates shape of the distribution
114
Q

con of histogram

A

difficult to look at a numerical variable across categorical groups

115
Q

pro of box plot

A

it is easy to compare across multiple categorical groups

116
Q

con of box plot

A

convey much less about shape of distribution

117
Q

scatter plots

A

used to show pattern between two numerical variables collected from DIFFERENT sampling units

*HR against age for group of winner

118
Q

line plots

A

used when data is collected repeatedly from SAME sampling units
- data points are NOT INDEPENDENT of one another

*HR during a run

119
Q

x-axis

A

horizontal
- independent variable

120
Q

y-axis

A

vertical
- dependent variable

121
Q

independent variable

A

experimental treatment that is manipulated

122
Q

dependent variable

A

measured response under those treatments

123
Q

covariates

A

when both numerical variable are measured quantities from sampling unit
- evaluating patten, so not causal

124
Q

association

A

correlation between two variables
- typically covariates

125
Q

prediction

A

one variable predicts another
- x-axis=predictor variable
- y-axis=response variable

126
Q

probability

A

frequency of a particular outcome or event

127
Q

random trial

A

any process that has multiple outcomes but the result on any particular trial is unknown
- can be discrete or continuous

128
Q

sample space

A

the list or set of all possible outcomes
- shown with {}

129
Q

an event

A

outcome you are interested in
- can be single element in sample space
- can be any subset of the sample space

130
Q

measurement variable

A

value of any particular measurement is unknown prior to making the observation

131
Q

law of large numbers

A

random trial must be repeated many times to estimate probability

132
Q

Ex. of probability (rolling a one)

A
  1. random trial: rolling die
  2. sampling space: s={1,2,3,4,5,6}
  3. event: E={1}
  4. probability= is 1/6 cause every side has an equal chance
133
Q

probability distributions

A

functions that describe the probability over a range of events

134
Q

properties of probability distributions

A
  1. describe probability for entire sample space
  2. area under probability distribution always sum to one
  3. are used to describe both continuous and discrete random variables
135
Q

discrete distributions

A

prob distributions for discrete random variable

ex. number of times children ask for ice cream on a hot day

136
Q

continuous distribution

A

prob distributions for continuous random variables

ex. mass of an ice cream cone in grams

137
Q

how is a discrete distribution shown

A

series of vertical bars with no space between them
- vertical axis=probability mass

138
Q

how is a continuous distribution shown

A

single curve as a function of continuous event
- vertical mass=probability density

139
Q

what are distributions used for

A

estimating a range, or calculate a probability

140
Q

properties of standard normal distribution

A
  1. mean of SND is zero
  2. standard deviation of SND is one
  3. x-axis is called the z-score
141
Q

z-score

A

a scale that measures number of standard deviations from the mean

142
Q

range vs. probabilities

A

probability and range are calculated as opposites

143
Q

population parameters

A

describe attributes of the statistical population

144
Q

sampling distributions

A

distribution of some descriptive statistic that only occurs if you repeatedly draw samples from statistical population

145
Q

bimodal vs. unimodal

A

bimodal: two peaks
unimodal: one peak

146
Q

similarity between sampling distribution and stat. pop.

A

have the same mean value

147
Q

difference between sampling distribution and stat. pop

A

sampling distribution is narrower than stat. pop.

148
Q

characteristics of sampling distribution

A
  1. shape of sampling distribution is independent of stat. pop. as long as sample size is large
  2. variance decreases as number of sampling units increases
149
Q

what shape is sampling distribution

A

smooth bell-shaped distribution (symmetrical)

150
Q

central limit theorem

A
  1. sampling distribution tends towards a normal distribution as sample size increases
  2. mean of a sampling distribution is the same as mean of stat. pop.
  3. sampling error can be calculated from sd of stat. pop and sample size
151
Q

standard error

A

standard deviation of a sampling distribution

152
Q

chain of inference adding to shape independence

A

the descriptive statistics of a sample provide an estimate of stat. pop. parameters and therefore sampling distribution

153
Q

student t’s distribution

A

similar to normal distribution but has a shape that depends on the sample size

154
Q

what happens to t distribution when sample size is small

A

it has fatter tails than normal distribution to account for uncertainty

  • larger size= more certainty= t-distribution looks more like normal distribution
155
Q

what is observed directly?

A

sample

156
Q

what is not observed directly?

A

statistical population and sampling distribution (inference, not used in practice)

157
Q

confidence intervals

A

describe range over x-axis of a sampling distribution that brackets a certain probability of where new samples may be found

158
Q

purpose of confidence intervals

A

provide gauge for how much uncertainty there is in a descriptive statistic

159
Q

what is the difference between experimental and observational studies?

A

experimental: causal
observational: correlative

160
Q

standard error

A

is unavoidable - helps make the statistical inference