STATS (BIOL 243) FALL 2024 Flashcards

Question

cross-sectional

Answer 1

study a response variable at only a single snap shot of time ie. simple random

Answer 2

study a response variable at multiple points of time

Answer 3

c. Whether the survey design is retrospective or prospective (correct)

Answer 4

Whether strata are defined ahead of time or not

Answer 5

Stratified survey

Answer 6

Simple random survey

Answer 7

cluster survey

Answer 8

replication

Answer 9

number of replicates

Answer 10

an error in the design of an experimental study where the observation units are analyzed instead

Answer 11

1. control 2. blocking 3. blinded (single and double) 4. placebo 5. sham treatment

Answer 12

reference treatment to compare against the treatment levels

Answer 13

used to control variation among the sampling units (similar to stratified sampling it forms subgroups or "blocks")

Answer 14

when the sampling unit does not know what treatment they are being exposed to

Answer 15

both researcher and sample unit are unaware

Answer 16

often used in medical trials as the control treatment that helps accomplish a blinded design (has no effect)

Answer 17

method used in control treatments, accounts for the affect of delivery of a treatment that is not of interest compare and contrast between sham and treatment

Answer 18

1 factor with 4 levels

Answer 19

Reduces the possibility of placebo effects Reduces biases in measurements stemming from the anticipation of a treatment effect

Answer 20

Reduces biases in measurements stemming from the anticipation of a treatment effect Reduces the possibility of placebo effects

Answer 21

Whether sampling units are randomly assigned to treatments or not.

Answer 22

There is one Factor (drug) with two Levels (raloxifene, no raloxifene).

Answer 23

any measurable characteristic of an observation

Answer 24

value of the variable

Answer 25

can take on any value (1.2 or 1/4 etc.)

Answer 26

can only be whole numbers

Answer 27

can take on qualitative values but the values are on a ranked scale

Answer 28

takes on qualitative values but they do not have any particular order eg. types of fruit

Answer 29

Continuous numerical

Answer 30

Ordinal categorical

Answer 31

Discrete numerical

Answer 32

Ordinal categorical

Answer 33

Continuous numerical

Answer 34

describes the typical values in our sample (eg. mean) the second quartile

Answer 35

describes the spread of the values

Answer 36

categorical variable of observations in your sample that fall within a particular category

Answer 37

percentages

Answer 38

variance measures the amount of variation the average squared distance of each data point from the sample mean σ^2

Answer 39

calculate the mean find the diff between each data point and the mean square the value sum the squares and divideby the # of observation points

Answer 40

ranked bins of data 1. sort from lowest to highest

Answer 41

split the data in half, according to a. if you have a odd data set then quartile 2 is the middle value b. if a even data set the the second quartile is the average of the two middle values

Answer 42

subset the lower-valued half of observations, then use the rules in the second quartile to find the middle value note the 2nd quartile is included if the # of observations is odd

Answer 43

repeat steps for quartile 1 in the upper valued half

Answer 44

range of inner-most 50% of the data between Q1 and Q3 (Q3-Q1)

Answer 45

Mean is 9.9, median is 9.4

Answer 46

Variance is 5.5, IQR is 1.5

Answer 47

5 ≤ ANSWER < 7

Answer 48

19 ≤ ANSWER <23

Answer 49

the difference among groups important to your study

Answer 50

whether the change in the response variables is meaningful for a practical study

Answer 51

4.7 (0.56/0.12)

Answer 52

the actual difference in outcomes ie. 80%-60%=20%

Answer 53

Relative effect size compares the outcomes between two groups as a ratio or percentage. (80% / 60%) = 1.33, or a 33% increase

Answer 54

sum the values in each row sum the values in each column in the last box add up every row and column, this helps make proportions shows how many sampling units are in each level of one categorical variable good way to describe patterns

Answer 55

shows the relationship between the columns and the rows take the value of the cell you are interested in and divide by the total amount of the column or row

Answer 56

- gaps show the levels are categorical - which ever variable you are most interested in goes on the x axis - each bar is a level

Answer 57

- visualizes interactions between data sets

Answer 58

grouped bar graph stacked bar graph

Answer 59

bars are side by side (no gap) represent a small numerical range

Answer 60

based on quartiles and used when you have numerical data and categorical groups - whisks - median: solid line - box: drawn from the first quartile to the 3rd - extreme threshold

Answer 61

drawn from the box to the last data point before the extrem threshold

Answer 62

Q3 + (1.5IQR) and Q1-(1.5IQR)

Answer 63

when you have two numerical variables and you want to look at the relationship between them x axis is the independent variable y axis is the dependent variable in an observation study the x and y axis are covariates

Answer 64

two numerical variables that have been measured repeatably from the same sampling unit each line is a different sampling unit

Answer 65

Conditional distribution with game as the primary variable

Answer 66

z = (x-u)/σ

Answer 67

set of all possible outcomes

Answer 68

a subset of a sample space (2,4,6 of 1 through 6)

Answer 69

procedure or action that produces one outcome from a set of possible outcomes, where the result is uncertain and cannot be predicted in advance.

Answer 70

probability based on the frequency of events occurring in repeated experiments or trials P(A)= Total number of trials/ Number of times event A occurs

Answer 71

numerical outcome of a random phenomenon. It assigns a number to each outcome in a sample space, allowing for the analysis of probabilities associated with different outcomes.

Answer 72

the probability of different possible values of a variable.

Answer 73

a function that gives the probability of a discrete random variable, X, being exactly equal to some value

Answer 74

systematic error in a study or analysis that leads to incorrect conclusions or inferences about a population. the selection of one sample unit does not influence the selection of another.

Answer 75

1. all sampling units are selectable 2. selection is unbiased 3. selection is independent 4. all samples are possible

Answer 76

a situation where two variables appear to be correlated with each other but, in fact, are not directly related

Answer 77

are for data with a single categorical variable and are shown as a one-dimensional table of columns.

Answer 78

are for data with two categorical variables and are shown as a two-dimensional table of rows and columns.

Answer 79

the 60 parks in the region

Answer 80

viewer age

Answer 81

all at-least-weekly Canadian Viewers of MSNBC news programming who watch using bell satellite

Answer 82

both sampling and observation unit

Answer 83

sampling unit

Answer 84

the individual tick

Answer 85

measurement unit

Answer 86

simple random

Answer 87

case control study

Answer 88

sample unit selection is not independent

Answer 89

confounding factors

Answer 90

discrete numerical

Answer 91

categorical

Answer 92

discrete numerical

Answer 93

number of columns

Answer 94

a bar graph

Answer 95

list of all cards in a deck

Answer 96

list of all aces

Answer 97

Roughly 1 in a million people have won a national lottery over hundreds of draws, which means the probability is p=0.0000001. (correct) The probability that a product fails can be calculated directly from repeated testing in a factory. (correct) The probability that I will buy my lunch today is 100% (correct)

Answer 98

Observing a random shopped how much they spent in a particular store. (correct) Playing a 'scratch and win' lottery ticket. (correct) Rolling a die in a board game (correct)

Answer 99

Continuous distribution

Answer 100

Discrete distribution

Answer 101

Can be used to describe both discrete and continuous numerical variables (correct) The area beneath the function always sums to one (correct) The x-axis is the outcome, or event, of interest (correct)

Answer 102

The probability of a single event in a discrete distribution is always zero (correct) Probability distributions cannot be used for a range of events. (correct)

Answer 103

statement or position that is the skeptical view-point of the research question.

Answer 104

sampling distribution from an imaginary statistical population where the null hypothesis is true

Answer 105

conclusion that is unlikely to come from the null

Answer 106

used to evaluate statistical significance

Answer 107

the probability of seeing your data, or something more extreme, under the null hypothesis helps quantify the evidence against the null hypothesis It measures how compatible your data is with the assumption that the null is true. If α=0.05, a p-value below 0.05 means rejecting 𝐻0 is justified. p=0.03, 𝛼=0.05 α=0.05: The result is statistically significant because 𝑝<0.05 p<0.05. You reject 𝐻0 . 𝑝=0.10, 𝛼=0.05 α=0.05: The result is not statistically significant because 𝑝>0.05 p>0.05. You fail to reject 𝐻0 .

Answer 108

probability of rejecting the null when it is true (false positive)

Answer 109

probability of failing to reject the null when its false (false negative)

Answer 110

probability of making a mistake

Answer 111

descriptive statistics of the sample quantifiable characteristics of a statistical pop labeled using the Greek alphabet values are fixed

Answer 112

shape is independent of the statistical pop if the sample size is sufficiently large bell shaped curve taking the mean of multiple sampling units averages out asymmetries in the statistical population the variance of a sampling distribution increases as the # of sampling units decreases

Answer 113

given a sufficiently large sample size, the distribution of the sample mean will approximate a normal distribution, regardless of the original population's distribution standard error can be calculates from the sd of the statistical pop and the sample size

Answer 114

theta (sd) / sqrt (n)

Answer 115

shape depends on size of sample (influential when size is small) has fatter tails to accunt for the uncertainty in estimating the sd continuous probability distribution sample size is small, and the population standard deviation is unknown. As df increases, the t-distribution approaches the normal distribution.

Answer 116

the range over a sampling distribution that brackets the center most probability of interest

Answer 117

t = (x-m)/SE x = m + t * SE

Answer 118

evaluates if the mean of your sample is different from some reference value compares numerical variable to a reference (sample mean - reference) / SE

Answer 119

if the difference in paired data of numerical variables is different from some reference value looks at how sampling units change across factors t= (mean of differences-reference)/SE

Answer 120

determines if the means of two groups are different from each other (m1-m2)/SEs

Answer 121

summarized categorical data

Answer 122

the contingency table of expected frequencies under the null hypothesis compare observed vs. expected

Answer 123

one categorical variable with levels sum of observed counts must be the same as expected expectation counts are distributed equally is there a difference in counts among the level of that variable?

Answer 124

two categorical variable expected counts are distributed independently are the counts independent between variables?

Answer 125

calculate marginal distribution .........

Answer 126

(row total * column total) / table total, do it for each cell

Answer 127

used to determine whether there is a significant association between categorical variables or whether observed data matches expected data under a certain hypothesis. It works by comparing observed frequencies (data collected) to expected frequencies (based on a hypothesis).

Answer 128

distribution of chi-square scores expected from repeatedly sampling a statistical pop where the null is true can only have positive values (square everything) shape will vary depending on df's

Answer 129

take the difference between each observed and expected cell square the difference divide by the expected value sum over all cells in the table

Answer 130

(r-1)(c-1)

Answer 131

X - Variable independent variable predictor variable

Answer 132

the x variable the predictor variable

Answer 133

+0.0257 (correct)

Answer 134

response variable is a linear function of the predictor variable (well describes by a linear relationship) the effect of the predictor variable on the response is additive and proportional

Answer 135

assumption that residuals are normally distributed

Answer 136

assumes that the residuals a sequentially independent of each other (vary between + and - numbers seemingly at random) when residuals are not independent there will be adjacent runs of positive and negative runs prevent violations by making sure units are selected at random and independently of each other

Answer 137

the variance of residuals (errors) should be constant across all levels of the predictor variable (spread should be equal)

Answer 138

3D normal distribution graph depicted as contours

Answer 139

r or roe measures the strength of association p = -1, p=0, p=1 (negative, no, positive association)

Answer 140

evaluates if changes in one numerical variable can predict changes in another

Answer 141

y = a (intercept) + b (slope) x

Answer 142

describes the function used for predictions

Answer 143

describes the probability distribution for sampling error ( only occurs in the y variable)

Answer 144

connects the systematic to the random component

Answer 145

systematic component random component link function

Answer 146

calculate residual for each data point take the square of each residual sum the squared residuals across all data points divide by dfs (n-2)

Answer 147

define the null and alternative hypothesis establish the null distribution conduct the statistical test draw scientific conclusions

Answer 148

determines the ration of variance between two variables ( no variance, F = 1)

Answer 149

residual variation (MSE) (correct)

Answer 150

test the difference in means between groups in an ANOVA test

Answer 151

secondary test used to evaluate what groups have different means in ANOVA only used if the F-test indicates to reject the null hypothesis

Answer 152

compares the means of all possible combinations of categorical levels in an ANOVA controls the family wise error rate by using a specialized null distribution that accounts for the number of contrasts

Answer 153

type 1 error rate for the family of contrasts used to evaluate the adjusted p-values returned from the TukeyHSD test P>FWER (0.05) we fail to reject P

Answer 154

looks at the effect of two categorical variable on a numerical variable

Answer 155

questions about the differences among the levels of factor A averaging across the levels of factor B. These are comparisons among full columns

Answer 156

questions about the differences among the levels of factor B averaging across the levels of factor A. These are comparisons among full rows

Answer 157

differences among the levels of one factor with each level of the other factor deviation from the assumption that the levels of each factor simply ass together

Answer 158

response from the two variables is the sum of the two

Answer 159

response is more than the two variables added together

Answer 160

response is less than the two variable combined

Answer 161

The affect of factor A depends on the level of factor B. (correct)

Answer 162

MsG = SSG(sum of squares)/dfG (k-1) k = number of groups

Answer 163

residual variation MSE = SSE / dfE (n-k)

Answer 164

variance reduces standard error becomes smaller

Answer 165

distribution values produced from the measurement of some parameter about each individual of a population

Answer 166

wide and similarly located narrow and located differently identical symmetrical wide and have similar medians

Answer 167

K-1: variation between groups (ANOVA, MSG) N-K: variation within groups (ANOVA, residual variation (MSE)) n-1: one-way table n-2: confidence intervals and residual analysis (r/a-1)(c/b-1): 2-way table ab(n-1): residual analysis (variation among sampling units within a cell) n1+n2-2 = two sample t-test

Answer 168

the ratio of the variation among categorical groups divided by the residual variation within a group

Answer 169

represents the variation in a ratio you would expect from repeated sampling of a population where there was no true difference in means.

STATS (BIOL 243) FALL 2024 Flashcards

(203 cards)