Term Test Flashcards

Question

response variable

Answer 1

response you are interested in - ex. tobacco

Answer 2

factor you investigate - ex. lung cancer

Answer 3

unobserved variables that affect a response variable

Answer 4

when relationship between explanatory and response variables is thought to be driven by confounding variable

Answer 5

sampling units are selected at random from the statistical population where each sampling unit has the same probability of being in your sample

Answer 6

researcher creates strata then takes samples within each strata

Answer 7

name given to a subgroup within the statistical population in a stratified survey

Answer 8

used to remove diversity in the statistical population thats not relevant to research question - cluster= sampling unit - nesting inside the cluster=observational unit

Answer 9

data are collected from all observation units in a cluster

Answer 10

a subset of observation units are randomly selected within each cluster

Answer 11

used to compare data between two groups 2 groups: - case -control ***strong risk of spurious relationship

Answer 12

contains sampling unit WITH a particular response variable

Answer 13

contains sampling unit WITHOUT response variable of the case group

Answer 14

sampling unit are selected and followed over time - use simple random survey and then observe their fate over time

Answer 15

where outcome is already known (increases risk of spurious relationships) ex. case-control studies

Answer 16

where the outcome is not yet known (require more effort, but decrease risk of spurious relationships) ex. cohort studies

Answer 17

study a response variable at only a single snapshot in time

Answer 18

study a response variable at multiple points in time

Answer 19

based on creating treatments where the researcher controls one or more variable

Answer 20

study effect of one or more manipulated variables on one or more random variables - establishes cause and effect

Answer 21

each manipulated variable has two levels/groups

Answer 22

number of times treatment is repeated on randomly selected units - number of replicates is the number of sampling units in an experimental study

Answer 23

an error in the design of an experimental studies where the observation units are analyzed rather than sampling units

Answer 24

different values of the factor

Answer 25

contains everything except the treatment

Answer 26

used to control for variation among sampling unit thats not of interest that alter experimental variable ***PREDEFINED

Answer 27

a design where the sampling unit (usually a person) does not know what treatment they are being exposed to

Answer 28

sampling unit does not know the treatment they are assigned

Answer 29

both the researcher and sampling unit do not know what treatment they are assigned to ***removes accidental bias

Answer 30

method used for control treatment that helps accomplish a blinded design - substance or treatment that has no effect on response variable

Answer 31

aims to account for the effect of delivery of a treatment thats not of interest of researcher

Answer 32

one factor could be drug type and another is diet type

Answer 33

when two explanatory variables have effects that are different than the simple sum of each variable in isolation

Answer 34

any measurable characteristic of an observation unit (varies among sampling units)

Answer 35

1. what the variable represents 2. measurement unit 3. description of the observation units

Answer 36

value of a variable you measure

Answer 37

can take on continuous numbers (fractional numbers) ex. weight =107.23kg

Answer 38

can take on only whole numbers (integers)

Answer 39

data is a qualitative description - no measurement units

Answer 40

categorical (qualitative) variables that have ORDERED levels ex. use emojis to describe how you feel

Answer 41

can take on qualitative values but where values do not have any particular order

Answer 42

describes the typical value in your sample (ex. mean)

Answer 43

describes the spread of the values (ex. variance)

Answer 44

number of sampling units in each category

Answer 45

share of the total sampling unit in each category

Answer 46

measure of the amount of variation in your sample

Answer 47

square root of variance

Answer 48

specific values of the variable that divide your data into ranked groups

Answer 49

central tendency is given by the second quartile

Answer 50

describes how much variation there is in a sample

Answer 51

range between 1st and 3rd quartiles

Answer 52

when data set is small

Answer 53

median and IQR are robust to extreme values

Answer 54

median and IQR become quite variable for samples with a small number of observations

Answer 55

mean and standard deviation are more robust when theres a small number of observations

Answer 56

mean and standard deviation are sensitive to extreme values

Answer 57

used to evaluate whether changes in response variables is meaningful

Answer 58

simple change in mean value between groups - can be calculated as a difference or ratio

Answer 59

differences in mean values among groups - has advantage of retaining original scale

Answer 60

ratio of mean values among groups - has advantage of indicating a relative change, but loses the original scale

Answer 61

summarizes data from categorial variables - shows frequency or proportion of sampling units in each level of a categorial variable

Answer 62

number of sampling units that falls in each level

Answer 63

help with visualizing the relative distribution of sampling units among levels

Answer 64

observe 1 categorial variable

Answer 65

observe 2 categorical variables

Answer 66

calculate row and column - they are frequencies to see the overall pattern

Answer 67

sum frequencies across all columns for each row

Answer 68

sum frequencies across all rows for each column

Answer 69

refers to categorical variables rather than the table

Answer 70

relative frequencies of one categorical variable within the other - shows interaction between two variables

Answer 71

used to visualize both single variable and two variable categorical data - NOT USED FOR NUMERICAL DATA - can be vertical or horizontal

Answer 72

depends on research question - most relevant information should be on the HORIZONTAL axis

Answer 73

forms base of the figure - typically use ordinal categorical variables

Answer 74

levels of variable are shown beside each other - levels of grouping variable are separated by LARGE gap - levels of other variable are separated by SMALL gap

Answer 75

levels of variable are stacked on top of each other - colour is used to separate levels

Answer 76

split numerical data into bins and display number of sampling units in each bin

Answer 77

provide great way to visualize the pattern

Answer 78

complicated to display histograms when your dataset also has multiple levels of a categorical variable

Answer 79

pattern is lost cause theres little variation in frequency

Answer 80

pattern is lost cause of excessive aggregation

Answer 81

shows how the median value differs among groups, and how much variation of data

Answer 82

based on quartiles and contains... 1. min 2. max 3. median 4. 1st quartile 5. 3rd quartile therefore IQR

Answer 83

1. a box 2. solid line 3. whiskers 4. extreme value

Answer 84

pair of imaginary lines drawn above and below box

Answer 85

categorical group would be a measured categorical variable

Answer 86

categorical group would be the treatment factors

Answer 87

two categorical groups

Answer 88

- provide richest information about how your data is distributed - illustrates shape of the distribution

Answer 89

difficult to look at a numerical variable across categorical groups

Answer 90

it is easy to compare across multiple categorical groups

Answer 91

convey much less about shape of distribution

Answer 92

used to show pattern between two numerical variables collected from DIFFERENT sampling units *HR against age for group of winner

Answer 93

used when data is collected repeatedly from SAME sampling units - data points are NOT INDEPENDENT of one another *HR during a run

Answer 94

horizontal - independent variable

Answer 95

vertical - dependent variable

Answer 96

experimental treatment that is manipulated

Answer 97

measured response under those treatments

Answer 98

when both numerical variable are measured quantities from sampling unit - evaluating patten, so not causal

Answer 99

correlation between two variables - typically covariates

Answer 100

one variable predicts another - x-axis=predictor variable - y-axis=response variable

Answer 101

frequency of a particular outcome or event

Answer 102

any process that has multiple outcomes but the result on any particular trial is unknown - can be discrete or continuous

Answer 103

the list or set of all possible outcomes - shown with {}

Answer 104

outcome you are interested in - can be single element in sample space - can be any subset of the sample space

Answer 105

value of any particular measurement is unknown prior to making the observation

Answer 106

random trial must be repeated many times to estimate probability

Answer 107

1. random trial: rolling die 2. sampling space: s={1,2,3,4,5,6} 3. event: E={1} 4. probability= is 1/6 cause every side has an equal chance

Answer 108

functions that describe the probability over a range of events

Answer 109

1. describe probability for entire sample space 2. area under probability distribution always sum to one 3. are used to describe both continuous and discrete random variables

Answer 110

prob distributions for discrete random variable ex. number of times children ask for ice cream on a hot day

Answer 111

prob distributions for continuous random variables ex. mass of an ice cream cone in grams

Answer 112

series of vertical bars with no space between them - vertical axis=probability mass

Answer 113

single curve as a function of continuous event - vertical mass=probability density

Answer 114

estimating a range, or calculate a probability

Answer 115

1. mean of SND is zero 2. standard deviation of SND is one 3. x-axis is called the z-score

Answer 116

a scale that measures number of standard deviations from the mean

Answer 117

probability and range are calculated as opposites

Answer 118

describe attributes of the statistical population

Answer 119

distribution of some descriptive statistic that only occurs if you repeatedly draw samples from statistical population

Answer 120

bimodal: two peaks unimodal: one peak

Answer 121

have the same mean value

Answer 122

sampling distribution is narrower than stat. pop.

Answer 123

1. shape of sampling distribution is independent of stat. pop. as long as sample size is large 2. variance decreases as number of sampling units increases

Answer 124

smooth bell-shaped distribution (symmetrical)

Answer 125

1. sampling distribution tends towards a normal distribution as sample size increases 2. mean of a sampling distribution is the same as mean of stat. pop. 3. sampling error can be calculated from sd of stat. pop and sample size

Answer 126

standard deviation of a sampling distribution

Answer 127

the descriptive statistics of a sample provide an estimate of stat. pop. parameters and therefore sampling distribution

Answer 128

similar to normal distribution but has a shape that depends on the sample size

Answer 129

it has fatter tails than normal distribution to account for uncertainty - larger size= more certainty= t-distribution looks more like normal distribution

Answer 130

statistical population and sampling distribution (inference, not used in practice)

Answer 131

describe range over x-axis of a sampling distribution that brackets a certain probability of where new samples may be found

Answer 132

provide gauge for how much uncertainty there is in a descriptive statistic

Answer 133

experimental: causal observational: correlative

Answer 134

is unavoidable - helps make the statistical inference