KNPE 251 1/2 Flashcards

1
Q

5 hierarchal scales

A

-Sampling Unit
-Sample
-Observation Unit
-Statistical Population
-Population of Interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Sampling Unit

A

unit being selected at random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Sample

A

collection of sampling units randomly selected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Observation Unit

A

scale for data collection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Statistical Population

A

collection of all sampling units that could have been in sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Population of interest

A

collection of sampling units that you hope to draw a conclusion about (Scope of research question)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Measurement variable

A

what we want to measure about the observation unit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Measurement unit

A

scale of measurement variable (cm, years etc)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the measurement unit if the data is categorical

A

no measurement unit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is descriptive statistics used for?

A

describe the data in the sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is inferential statistics used for?

A

describe statistical population population based on sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

steps to carry out study

A
  1. Sampling
  2. Measure
  3. Calculate Descriptive Statistics
  4. calculate inferential statistics
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Goals of ideal sampling designs

A

-all sampling units must have some probability of being included in sample (p>0)

-Selection of sampling units are unbiased

-selection of sampling units are independent

-Each possible sample has equal chance of being selected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is an observational study

A

based on observations of a statistical population

*researchers do not have any control over the variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Primary goal of observational study

A

characterize something about an existing population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Limitations of observational study

A

cannot make statements about whether a factor CAUSES the response you are interested

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Response Variable

A

response you are interested in

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Explanatory variable

A

factor you are investigating

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Confounding Variable

A

unobserved variables that affect the response variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Spurious

A

when the relstionship between and explanatory variable and response variable is thought to be driven by a confounding variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Simple Random Survey

A

starts by identifying every sampling unit in the statistical population and then selecting a random subset for those samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Stratified Survey

A

-used when there are subgroups within the statistical population that can influence the results
-break statistical population into strata then sample within each strata

**strata must be defined ahead of time by researcher
**each strata has equal weighing in sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Cluster Survey

A

-used to remove diversity in the statistical population that is not relevant to research question

-create groups where the non-relevant diversity is contained within each group

-can be done in one or two stage designs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

One stage cluster design

A

data is collected from ALL observation units in a cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Two stage cluster design

A

a subset of observation units are randomly selected within each cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Case-Control Survey

A

-used to compare data between 2 groups

-1st group is the “case” and contains sampling units with a particular response variable

-2nd group is the “control” and contains sampling units without the response variable of the case grou

-purposely biased as it aims to select sampling units for the case group based on a measured resposne variable and compare that to the control group

***high spurious chance
**retrospective

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Cohort survey

A

-follow sampling units over time, looking for development of a particular response variable

-goal is to select a random set of sampling units and observe over time

**outcomes unknown when sampling units selected
**prospective

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Retrospective

A

outcome is already known, looking back in time

*increase risk of spurious relationships

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Prospective

A

outcome is not yet known, looking forward in time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Cross-sectional

A

ones that study a response variable at only a single snapshot in time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Longitudinal

A

studying a response variable at multiple points in time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

experimental studies

A

-treatment only starts once put in the category
-based on creating treatments where the researcher controls one or more variables
-establish cause-effect among variables
-each manipulated variable is called a factor (each factor has levels)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

the 2 steps when sampling units are selected at random in experimental studies

A
  1. Selection
  2. Assignment
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Replication

A

the idea that a treatment will be repeated a number of times to see how reputable a measured outcome is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Pseudoreplication

A

where the observation units are analyzed rather than the sampling units
*this is an error in the design of an experimental study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Types of experimental study designs

A

-control treatment
-blocking
-blinding
-placebo
-sham treatment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Control treatment

A

contains everything except the actual treatment; reference to compare treatment levels against

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Blocking

A

predefined groups where treatments are applied within each group; you can randomly allocate your sampling units to the treatments, but cant do it across groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Blinding

A

sampling unit does not know what treatments are applied within each group
(double blind: researcher does not know either)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Placebo

A

given substance/treatment that has no affect on the response variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Sham Treatment

A

controls for treatments that require handling the sampling unit
(aims to account for effect of delivery of a treatment that is not of interest)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

3 pieces of information a variable contains

A

-what the variable represents
-the measurement unit
-description of the observation unit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

4 subtypes of variables

A

continuous, discrete, ordinal and nominal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

continuous variable

A

can take on continuous numbers (any value including fractions)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

discrete variable

A

can only take on whole numbers (integers)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

ordinal categorical variable

A

can take on qualitative values but where values are from a ranked scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

nominal categorical variables

A

can take on qualitative values but where values have no particular order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

what is central tendency

A

describes typical values in a sample

*2nd quartile, the median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

what is dispersion

A

describes the spread of values

*range of inner-most 50% of the data, 3rd to 1st quartile (IQR)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

what do central tendency and dispersion depend on

A

whether the variable is numerical or categorical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

two ways to characterize categorical data

A

counts and proportions

52
Q

what are counts

A

the number of sampling units in each category

53
Q

what are proportions

A

the share of the total sampling units in each category
(frequency/ total)

54
Q

what do counts and proportions indicate

A

the central tendency of categorical data

55
Q

what is range used to indicate

A

dispersion

56
Q

what is mean used to describe

A

central tendency

57
Q

what is variance used to indicate

A

dispersion

58
Q

Steps to calculating the mean

A
  1. sum of all values in a sample
  2. divide by the number of data point in a samples
59
Q

steps to calculating standard deviation

A
  1. calculate the mean for a sample
  2. calculate the difference between each data point and the mean of those and square
  3. sum the squares of differences and divide by the number of observations

*dividing by the number of observations, we are calculating population variance

60
Q

what do quartiles show us

A

central tendency and dispersion

61
Q

steps to calculating quartiles

A
  1. Sort data lowest to highest
  2. find the second quartile by splitting data in half
  3. find the first quartile by subsetting the lower-valued half of the observations, then find middle value (the second quartile IS included if the # of observations is odd)
  4. find the third quartile by repeating step 3 for the upper valued half
62
Q

when to use quartiles over mean

A

-for larger data sets

**because theya re sensitive when ad dataset is small due to major median change

63
Q

When to use mean over quartiles

A

for smaller data sets

**sensitive to outliers that change the mean a lot

64
Q

What is effect size

A

the change in mean value of response variable among groups

*used to evaluate whether change in response variables is meaningful

65
Q

what does effect size allow for

A

allows us to put study results into context and look at change across groups

66
Q

two types of effect size

A

absolute and relative

67
Q

how can effect size be calacuated

A

as either a difference or a ratio

**depends on study

68
Q

calculating effect size using difference method advantage

A

retains original scale

69
Q

calculating effect size using ratio method advantge

A

indicates relative change but loses original scale

70
Q

what are contingency tables

A

tables of data frequencies or proportions within different levels of categorical variable

71
Q

what do contingency tables show

A

frequency or proportion of sampling units in each level of a categorical variable

72
Q

what is frequency

A

number of sampling units that falls in each level

73
Q

one way vs two way categorical variables

A

one way: one categorical variable
two way: two categorical variables

74
Q

what are marginal distributions

A

-allow you to see patterns in contingency tables
-the row and column sums of a two-way contingency table

75
Q

what are conditional distributions

A

two way tables that show the interaction between the two variables in a contingency table

76
Q

difference between marginal and conditional distributions

A

conditional distributions look at the relative proportion of sampling units across the levels of one variable but within a single level of another variable

77
Q

When to use a bar graph

A

used to visualize categorical data (NOT for numerical data)

78
Q

emphasizing vertical vs horizontal bar graphs

A

vertical: emphasizes categorical variable

horizontal: focuses on the number of sampling units

79
Q

only time you can use a bar graph for numerical data

A

when each categorical level has a single numerical value
(data must be statistical in nature because they cant represent a subsample from a larger pop)

80
Q

bar graphs with two categorical variables

A

-shows if one is impacting the other

*can be stacked or grouped

-1st variable is “grouping” ; usually ordinal
-2nd variable is secondary

81
Q

Histograms

A

split numerical data into bins and display the number of sampling units in each bin

82
Q

drawback of histograms

A

if we have a dataset that has a numerical variable and also has a categorical variable with many levels, it can be cumbersome to show histograms for each level of the other categorical variable

83
Q

what are Box plots used for

A

visualizing numerical data across groups

84
Q

5 descriptive statistics that box plots show

A

-min
-max
-median
-1st quartile
3rd quartile

85
Q

BOx plot vs histograms

A

box plots: easy to compare across multiple categorical groups but mask shape distribution

histograms: richest info on data distribution and shape distribution, but difficult to look at numerical variable across categorical groups

86
Q

Scatter Plot

A

shows pattern between 2 numerical variables that are collected from different sampling units

*each point is a sampling unit

87
Q

axis naming conventions based on descriptive statistics

A

depend on whether the data are from experimental or observations studies, and whether the treatment variable is displayed.

88
Q

axis naming conventions based on inferential statistics

A

depend on whether the statistical analysis is looking at association between the variables or prediction.

89
Q

Axis naming descriptive statistics: when figure is intended to showcase sample data and experimental study showing treatment

A

x axis is IV, y axis is DV

90
Q

Axis naming descriptive statistics: when figure is intended to showcase sample data and experimental study not showing treatment or observational study

A

x-axis and y-axis are covariates (no causation)

91
Q

Axis namin for inferential statistics: when figure is intended to showcase inference and association (correlation test)

A

x-axis and y-axis are covariates

92
Q

Axis naming for inferential statistics: when figure is intended to showcase inference and prediction (regression test)

A

x-axis is predictor variable and y-axis is response variable

93
Q

Line plot

A

data is collected repeatedly from same sampling unit (o2 numerical variables)

*each line represents a sampling unit

94
Q

what is probability

A

the proportion of times an event would occur if a random trial was repeated many times

*used to describe confidence in an outcome or anticipated frequency

95
Q

what is a random trial

A

any process with multiple outcomes but where the outcome on any particular trial is unknown

96
Q

what is sample space

A

a list of all possible outcomes

97
Q

what is event

A

the outcome of interest

98
Q

Frequentist statistics

A

random trial must be repeated many times to estimate probability (depends on how accurate you want probability to be)

99
Q

Probability distributions

A

are functions that describe probability of all events and a tool for calculating

100
Q

where can probability be found on a graph

A

the area under the function

101
Q

three properties of probability distributions

A
  1. describe the probability for the entire sample space
  2. area under entire curve always sums to 1
  3. are for both continuous and discrete variables
102
Q

2 types of probability distributions

A

discrete and continuous

103
Q

discrete probability distributions

A

-typically shown as vertical bars with no space
-y axis is probability mass

104
Q

continuous probability distributions

A

-typically shown as a line graph
-y axis is probability density

105
Q

what is the probability of a single event in continuous distribution

A

zero

106
Q

term when a continuous distribution has two peaks

A

bimodal

107
Q

standard normal distribution

A

used to answer any question that is based o probabilities from a normal distribution
*must convert to standard form

108
Q

when to use forwards and backwards equation of conversion to standard form

A

forwards: to estimate probabilities when given a range

backwards: to estimate ranges from probability

109
Q

descriptive statistics

A

-used to describe attributes of a sample
-quantifiable characteristic of a sample
-values are NOT fixed (can change each time a sample is taken)

110
Q

Population parameters

A

-any quantifiable characeteristic of a statistical population
-each measurement variable has its own set of population parameters
-values are FIXED (consistent each time a sample is taken)

111
Q

what is estimation in sampling distributions

A

descriptive statistics provide an estimate of population parameter

112
Q

what is sampling distribution

A

the probability of a descriptive statistic that would emerge if a statistical population was sampled repeatedly a large number of times

113
Q

is the shape of the sampling distribution influenced by the shape of a statistical population

A

No. shape of a sampling distribution is independent and does not rely on the shape of the statistical population

114
Q

what does variance depend on?

A

variance depends on sampling size, the larger the sample, the less variance (inverse relationship)

115
Q

Central Limit theorem

A

the development of the principles behind the two key characteristics of sampling distributions

  1. a sampling distribution has a bell shape; independant of statistical population
  2. the variance of a sampling distribution decreases as sample size increases
116
Q

what does the central limit theorem add to shape independence?

A

-sampling distribution becomes a normal distribution
-mean of sampling distribution is the same as statistical population
-standard error can be calculated using standard deviation and sample size

117
Q

standard deviation of sampling distribution is called…

A

standard error

calculated by: standard deviation of statistical population divided by the square root of sample size

118
Q

the chain of inference reinforces that…

A

statistical population and sampling distribution are not directly observed

119
Q

steps in chain of inference

A
  1. sample
  2. estimate statistical population
  3. calculate sampling distribution

**statistical population and sampling distribution are both based off an estimate

120
Q

what does the central limit theorem assume

A

that we know statistical population perfectly

121
Q

solution to uncertainty in estimation

A

students t-distribution

122
Q

what is students t-distribution?

A

used to describe the sampling distribution when the paramteres of the statistical population are estimated from a sample

123
Q

attributes of t-distribution

A

-looks like normal distribution but tails are a bit fatter to account for uncertainty in estimate
- sample size influences shape (larger the sample size, the better the estimate and the more it looks like a normal distribution)

124
Q

confidence intervals

A

the range over a sampling distribution that brackets the centre-most probability of interest

125
Q

what is the purpose of confidence intervals

A

-used to convey uncertainty in descriptive statistics of a sample

*derrived from sampling distributions
**the range over the x-axis of a sampling distribution that brackets where new samples may be found with a certain probability

126
Q

interpreting confidence intervals

A

-they are an estimate
-give a sense of variation from sampling error