STATS (BIOL 243) FALL 2024 Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

five Hierarchical scales

A

sample unit
sample
observation unit
statistical population
population of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

sampling unit

A

the unit being selected at random, it may be the same as the observation unit or contain multiple observation unit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

sample

A

collection of the sampling units

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

observation unit

A

scale of data collection, subject of study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

statistical population

A

collection of all sampling units that could have been in your sample, and represents the true scale in which your statistical conclusions are valid

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

population of interest

A

collection of sampling units that you hope to draw conclusion about

scope of the research question

ideally the same as your statistical population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

measurement variable

A

what we want to know/measure about the observation unit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

measurement unit

A

scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

descriptive stats

A

set of tools used to describe data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

inferential statistics

A

uses information from the sample to make a probolistic statement about the statistical population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is the rule for descriptive and inferential stats when there are multiple groups i a statistical population

A

descriptive stats are repeated for each group but inferential stats are only done once and can be used to make statements about the differences between groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

ideal sampling design

A
  1. all sample units have a probability of being included
  2. selection of sampling units must be unbiased
  3. selection of sampling units are independent
  4. each possible sample has an equal chance of being selected
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

observational studies

A
  • researchers have no control over the variables
  • it characterizes something about an existing statistical pop
  • a tool for discovering associations, but can not make statements about the involvement of the sampling unit (cannot establish causation cause there is no way to know if the the factor is governed by something else
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

response variables

A

variable the investigators are interested in

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

explanatory variable

A

variable that the investigator believes may explain the response variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

confounding variables

A

unobserved variables that affect the response variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

simple random

A

starts by identifying every sampling unit in the statistical population and then selecting a random subset of those to be in your sample. Each sampling unit has the same probability of being included in your sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

stratified

A

used when the statistical population has some grouping (strata)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

clustered

A

observation units are contained within a larger group that we can randomly sample (geographicl or organizational)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

case control

A

when there is a known outcome we are trying to explain

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

cohort

A

select a sampling unit, follow them through time to see if they developed the result we want

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

retrospective

A

studies where the results are already known
ie. case control studies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

prospective

A

outcome is not yet known
ie. cohort studies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

cross-sectional

A

study a response variable at only a single snap shot of time
ie. simple random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

longitudinal surveys

A

study a response variable at multiple points of time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

which of the following distinguishes case control from cohort surveys

a. Whether the survey is cross-sectional or longitudinal

b. Whether strata are defined ahead of time or not

c. Whether the survey design is retrospective or prospective

d. Whether clusters of observation units were selected at random or not

A

c. Whether the survey design is retrospective or prospective (correct)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Which of the following distinguish stratified from clustered surveys?

Whether the survey is cross-sectional or longitudinal

Whether strata are defined ahead of time or not

Whether the survey design is retrospective or prospective

Whether clusters of observation units were selected at random or not

A

Whether strata are defined ahead of time or not

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

You design a study where you randomly select 10 car models from within each category of electric, hybrid electric-gas, gasoline, or diesel. For each model, you find the purchase cost and estimate how much it will cost you to drive the vehicle for the next 10 years. What type of survey design is this?

A

Stratified survey

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Your children are young teenagers and you hear them listening to an entirely new genre of music called Korean Pop. You are curious whether it is just your kids that are listening to Korean Pop or if other kids their age are as well. You decide to find out by approaching 15 parents at the next Parent Teacher Night. Being a bit of a statistical geek, you mentally number each of the parents while they are talking to teachers. You pull out your cell phone with a list of random numbers and use these numbers to randomly select the parents that you approach to ask. What type of survey design is this?

A

Simple random survey

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

You are a researcher interested in the rates of mental illness in Canadian cities. You randomly select 120 cities across Canada, and conduct a survey of each to get a single estimate of per capita incidence of mental illness. The design of this surveying method is best characterized as:

A

cluster survey

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

corner stone of experimental studies

A

replication

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

number of sample units =?

A

number of replicates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

pseudoreplicates

A

an error in the design of an experimental study where the observation units are analyzed instead

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

the common design elements/types

A
  1. control
  2. blocking
  3. blinded (single and double)
  4. placebo
  5. sham treatment
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

control treatment

A

reference treatment to compare against the treatment levels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

blocking

A

used to control variation among the sampling units (similar to stratified sampling it forms subgroups or “blocks”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

single blinded

A

when the sampling unit does not know what treatment they are being exposed to

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

double blinded

A

both researcher and sample unit are unaware

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

placebo

A

often used in medical trials as the control treatment that helps accomplish a blinded design (has no effect)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

sham treatment

A

method used in control treatments, accounts for the affect of delivery of a treatment that is not of interest

compare and contrast between sham and treatment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Imagine a study that evaluates the effectiveness of different over-the-counter pain relievers in alleviating the symptoms of arthritis: acetaminophen, ibuprofen and acetylsalicylic acid. Two hundred patients are randomly assigned to receive one of these three pain relievers, or to receive a placebo (control). How many factors and levels are evident in this study?

A

1 factor with 4 levels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Patients who are blinded to the experimental treatment is a crucial part of a randomized clinical trial. Why?

A

Reduces the possibility of placebo effects

Reduces biases in measurements stemming from the anticipation of a treatment effect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What is the reason for blinding the researcher to what experimental treatment a patient is going to receive?

A

Reduces biases in measurements stemming from the anticipation of a treatment effect

Reduces the possibility of placebo effects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

What design characteristic distinguish experimental studies from observational studies?

A

Whether sampling units are randomly assigned to treatments or not.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

A researcher studied the effect of the prescription drug raloxifene on fracture risk in postmenopausal women. They found that women who took raloxifene over a five year period reduced their risk of clinical vertebrate fracture compared to women who did not take the drug. What are the factors and levels in this experiment?

A

There is one Factor (drug) with two Levels (raloxifene, no raloxifene).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

variable

A

any measurable characteristic of an observation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

datum

A

value of the variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

continuous numerical variable

A

can take on any value (1.2 or 1/4 etc.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

discrete numerical

A

can only be whole numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

ordinal categorical variable

A

can take on qualitative values but the values are on a ranked scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

nominal categorical variable

A

takes on qualitative values but they do not have any particular order

eg. types of fruit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

What is the data type for describing your age

A

Continuous numerical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

What is the data type for the description: child, teenager, adult?

A

Ordinal categorical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

What is the data type for the number of students in a class?

A

Discrete numerical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

What is the data type for the letter grade on your exam?

A

Ordinal categorical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

What is the data type for the percent grade on your exam?

A

Continuous numerical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

central tendency

A

describes the typical values in our sample (eg. mean)

the second quartile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

dispersion

A

describes the spread of the values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

counts

A

categorical variable

of observations in your sample that fall within a particular category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

proportions

A

percentages

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

variance

A

variance measures the amount of variation

the average squared distance of each data point from the sample mean

σ^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

calculating variance

A

calculate the mean

find the diff between each data point and the mean

square the value

sum the squares and divideby the # of observation points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
61
Q

Quartiles

A

ranked bins of data
1. sort from lowest to highest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
62
Q

finding the second quartile

A

split the data in half, according to

a. if you have a odd data set then quartile 2 is the middle value

b. if a even data set the the second quartile is the average of the two middle values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
63
Q

finding the first quartile

A

subset the lower-valued half of observations, then use the rules in the second quartile to find the middle value

note the 2nd quartile is included if the # of observations is odd

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
64
Q

3rd quartile

A

repeat steps for quartile 1 in the upper valued half

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
65
Q

dispersion aka interquartile range

A

range of inner-most 50% of the data

between Q1 and Q3 (Q3-Q1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
66
Q

Calculate the mean & median of the following data:

7.5 9.9 8.6 10.3 8.5 9.4 15.1

A

Mean is 9.9, median is 9.4

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
67
Q

Would the mean or median be a better descriptor of the ‘middle’ value for this set of data?

7.5 9.9 8.6 10.3 8.5 9.4 15.1

A

Median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
68
Q

Calculate the population variance & interquartile range (IQR) of the following data:

7.5 8.6 8.9 8.5 9.4 10.7 15.1

A

Variance is 5.5, IQR is 1.5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
69
Q

Calculate the interquartile range (IQR) for the following set of numbers and indicate what range the answer lies within.

10.1, 18.6, 19.8, 15.7, 21.9, 12.9, 11.8, 26.0, 13.0, 12.9

A

5 ≤ ANSWER < 7

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
70
Q

Calculate the interquartile range (IQR) for the following set of data and indicate what range the answer lies within.

46.7, 18.7, 39.4, 7.2, 19.8, 42.1, 2.6, 17.1, 30.7, 21.9

A

19 ≤ ANSWER <23

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
71
Q

meaningfulness

A

the difference among groups important to your study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
72
Q

effect size

A

whether the change in the response variables is meaningful for a practical study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
73
Q

The rate of home ownership in Canada decreased from 46% in 2004 to 44% in 2011. What is the effect size as a difference between the years?

A

-2%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
74
Q

do relative effect sizes have units

A

no

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
75
Q

In the United Kingdom, 56% of older adults (55+ years) get their news from the television whereas only 12% of youth (18-24 years) do. What is the relative effect size of youth compared to older adults?

A

4.7 (0.56/0.12)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
76
Q

absolute effect size

A

the actual difference in outcomes

ie. 80%-60%=20%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
77
Q

relative effect size

A

Relative effect size compares the outcomes between two groups as a ratio or percentage.

(80% / 60%) = 1.33, or a 33% increase

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
78
Q

marginal distributions

A

sum the values in each row

sum the values in each column

in the last box add up every row and column, this helps make proportions

shows how many sampling units are in each level of one categorical variable

good way to describe patterns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
79
Q

conditional distributions

A

shows the relationship between the columns and the rows

take the value of the cell you are interested in and divide by the total amount of the column or row

80
Q

characteristics of single variable bar graphs

A
  • gaps show the levels are categorical
  • which ever variable you are most interested in goes on the x axis
  • each bar is a level
81
Q

two variable bar graphs

A
  • visualizes interactions between data sets
82
Q

types of two variable bar graphs

A

grouped bar graph

stacked bar graph

83
Q

histograms

A

bars are side by side (no gap)

represent a small numerical range

84
Q

box plots and its parts

A

based on quartiles and used when you have numerical data and categorical groups
- whisks
- median: solid line
- box: drawn from the first quartile to the 3rd
- extreme threshold

85
Q

whisks

A

drawn from the box to the last data point before the extrem threshold

86
Q

extreme thresholds

A

Q3 + (1.5IQR) and Q1-(1.5IQR)

87
Q

scatter plot

A

when you have two numerical variables and you want to look at the relationship between them

x axis is the independent variable

y axis is the dependent variable

in an observation study the x and y axis are covariates

88
Q

line plots

A

two numerical variables that have been measured repeatably from the same sampling unit

each line is a different sampling unit

89
Q

Identify which type of summary information would answer the following question “What proportion of people like cookies when playing poker?”

A

Conditional distribution with game as the primary variable

90
Q

standard normal distribution

A

z = (x-u)/σ

91
Q

sample space

A

set of all possible outcomes

92
Q

event

A

a subset of a sample space (2,4,6 of 1 through 6)

93
Q

random trial

A

procedure or action that produces one outcome from a set of possible outcomes, where the result is uncertain and cannot be predicted in advance.

94
Q

frequentist probability

A

probability based on the frequency of events occurring in repeated experiments or trials

P(A)= Totalnumberoftrials/
NumberoftimeseventAoccurs

95
Q

random variable

A

numerical outcome of a random phenomenon. It assigns a number to each outcome in a sample space, allowing for the analysis of probabilities associated with different outcomes.

96
Q

probability distribution

A

the probability of different possible values of a variable.

97
Q

discrete distributions

A

a function that gives the probability of a discrete random variable, X, being exactly equal to some value

98
Q

define bias and sampling independence

A

systematic error in a study or analysis that leads to incorrect conclusions or inferences about a population.

the selection of one sample unit does not influence the selection of another.

99
Q

4 goals of an ideal sampling design

A
  1. all sampling units are selectable
  2. selection is unbiased
  3. selection is independent
  4. all samples are possible
100
Q

spurious relationships

A

a situation where two variables appear to be correlated with each other but, in fact, are not directly related

101
Q

one way contingency table

A

are for data with a single categorical variable and are shown as a one-dimensional table of columns.

102
Q

marginal distributions

A

are for data with two categorical variables and are shown as a two-dimensional table of rows and columns.

103
Q

You have been asked by a regional conservation authority to design a study to evaluate the risk that a tick will bite someone walking at one of the parks. They provide you enough money to survey 15 parks out of the 60 that are in the region. Your plan is to spend a day at each of the selected parks and survey all the people leaving the park to assess whether a tick bit them or not. You will then calculate the proportion of people bitten for each park sampled.

A

the 60 parks in the region

104
Q

According to USA Today (Dec 30, 1999), the average age of viewers of MSNBC cable television news programming is 50 years old. A Canadian network executive thinks this might not be true in Canada, and believes that the average age of these viewers in Canada is significantly less than 50 years old.

To test her hypothesis, the Canadian executive obtains a list of Bell satellite subscribers who included MSNBC in their channel package, and then conducts a phone poll of 2,000 of these subscribers across Canada. Anyone called who reports not watching MSNBC news programming at least once a week is left out of the survey; in the end 287 respondents watch MSNBC news programming at least weekly, and report their ages as part of the survey.

What is the variable of interest?

A

viewer age

105
Q

According to USA Today (Dec 30, 1999), the average age of viewers of MSNBC cable television news programming is 50 years old. A Canadian network executive thinks this might not be true in Canada, and believes that the average age of these viewers in Canada is significantly less than 50 years old.

To test her hypothesis, the Canadian executive obtains a list of Bell satellite subscribers who included MSNBC in their channel package, and then conducts a phone poll of 2,000 of these subscribers across Canada. Anyone called who reports not watching MSNBC news programming at least once a week is left out of the survey; in the end 287 respondents watch MSNBC news programming at least weekly, and report their ages as part of the survey.

What is the statistical population for this study?

A

all at-least-weekly Canadian Viewers of MSNBC news programming who watch using bell satellite

106
Q

A medical study wants to relate consumption of fat to heart conditions. 100 patients with heart conditions are randomly selected from clinics in the Kingston area, and each patient is asked to track their food consumption for 6 weeks. After the six weeks, each patient’s heart health is evaluated using a standard array of test (blood pressure, heart rate, ECG, etc.)

What term best describes each patient in this study design?

A

both sampling and observation unit

107
Q

An ornithologist at Queen’s University is studying the development time of recently hatched black-capped chickadees on Wolfe Island. He randomly samples 20 nests from across the island and measures the weight of each new hatchling in the nest. He repeats this sampling after 1 week, and then again after 2 weeks.

What term best describes each nest included in this study?

A

sampling unit

108
Q

Lyme disease is caused by the bacterium Borrelia burgdorferi, carried primarily by black-legged ticks. A recent study assessed the percentage of black-legged ticks that carry Borrelia from 10 random sites across North American spanning a range of mean annual temperatures. The number of ticks carrying Borrelia was quantified by collecting 100 ticks from each site and screening each tick for the bacterium (either YES or NO). The goal was to quantify the relation between annual temperature among sites and the percentage of ticks with Borrelia.

What is the observation unit in this study?

A

the individual tick

109
Q

A medical study wants to relate consumption of fat to heart conditions. 100 patients with heart conditions are randomly selected from clinics in the Kingston area, and each patient is asked to track their food consumption for 6 weeks. After the six weeks, each patient’s heart health is evaluated using a standard array of test (blood pressure, heart rate, ECG, etc.)

What term best describes the beats per minute of heart rate in this study design?

A

measurement unit

110
Q

You are interested in the growth potential of a new seed variety. You gather a random selection of 1,000 seeds from a field where the new variety is growing, and measure the final height of all the resulting plants.

What kind of study design is this?

A

simple random

111
Q

You are the quality assurance manager for a company that produces toasters. In post-production testing, you find that more toasters are failing than expected; the cause or source of the failures is not immediately clear though.

You ask your intern to gather a random selection of failed toasters, and a selection of toasters that do not fail in testing, and then to trace all those toasters back through the production process (employees that did which installation, source of the particular components, etc.)

What kind of study design is this?

A

case control study

112
Q

A psychology professor recruits 50 randomly selected Queen’s undergraduates, and ask them to recommend friends who would also be willing to participate in an introvert/extrovert personality study; overall, 93 students complete the study.

The results are 73% of the students are extroverts, 17% are introverts, and 10% are a mix.

What would the biggest concern or risk be about this sampling strategy?

A

sample unit selection is not independent

113
Q

A medical experiment, in which a treatment group is compared to a control group, is carried out to reduce the effect of

A

confounding factors

114
Q

Consider a survey being designed for customers of a tour company in Paris.

Determine whether the possible responses to the following question on their survey should be classified as categorical, continuous numerical or discrete numerical.

“How many escorted vacations have your taken prior to this one?”

A

discrete numerical

115
Q

Determine whether the possible responses to the following question should be classified as categorical, discrete numerical or continuous numerical.

“Whether you are a Canadian citizen.”

A

categorical

116
Q

Determine whether the possible responses to the following question should be classified as categorical, discrete numerical or continuous numerical.

“The number of students in a statistics course.”

A

discrete numerical

117
Q

number of observation units in a table

A

of rows

118
Q

number of variables in a table

A

number of columns

119
Q

Customers finishing a free sample at Costco are asked to complete a survey asking whether they would be “Very interested”, “Interested” or “Not interested” in buying the food product in the future. In one day, 357 customers complete the survey.

What graph type would be most appropriate for displaying the resulting data all at once?

A

a bar graph

120
Q

two way contingency table

A
121
Q

What is the sample space for determining the probability of drawing a Jack of Clubs from a deck of cards in a game of poker?

A

list of all cards in a deck

122
Q

What is the event for drawing an ace from a deck of cards in a game of poker?

A

list of all aces

123
Q

Which of the following statements reflects a correct definition of probability?

There is a good probability of rain tomorrow

Roughly 1 in a million people have won a national lottery over hundreds of draws, which means the probability is p=0.0000001.

The probability that a product fails can be calculated directly from repeated testing in a factory.

The probability that I will buy my lunch today is 100%

A

Roughly 1 in a million people have won a national lottery over hundreds of draws, which means the probability is p=0.0000001. (correct)

The probability that a product fails can be calculated directly from repeated testing in a factory. (correct)

The probability that I will buy my lunch today is 100% (correct)

124
Q

Which of the following statements describe a random trial?

The weight of an orange in measured in grams.

Observing a random shopped how much they spent in a particular store.

Playing a ‘scratch and win’ lottery ticket.

Finding out that your neighbour won a million dollars in the lotto

Playing a crossword puzzle

Rolling a die in a board game

A

Observing a random shopped how much they spent in a particular store. (correct)

Playing a ‘scratch and win’ lottery ticket. (correct)

Rolling a die in a board game (correct)

125
Q

Question 1:User Answer Incorrect
Would the following be a continuous or discrete distribution? ‘Length of time between shots on net in a soccer game’

A

Continuous distribution

126
Q

Would the following be a continuous or discrete distribution? ‘Number of shots on net in a soccer game’

A

Discrete distribution

127
Q

Which of the following statements about probability distributions are TRUE?

Can be used to describe both discrete and continuous numerical variables

The area beneath the function always sums to one

The y-axis of a continuous distribution is called probability mass

The x-axis is the outcome, or event, of interest

Probability distributions show the probability of some events, but they do not have to account for all possible events from a random trial.

A

Can be used to describe both discrete and continuous numerical variables (correct)

The area beneath the function always sums to one (correct)

The x-axis is the outcome, or event, of interest (correct)

128
Q

Which of the following statements about probability distributions are FALSE?

The probability of a single event in a continuous distribution is always zero

The probability of a single event in a discrete distribution is always zero

Regardless of whether the distribution is discrete or continuous, probability is the area under the curve.

Probability distributions cannot be used for a range of events.

A

The probability of a single event in a discrete distribution is always zero (correct)

Probability distributions cannot be used for a range of events. (correct)

129
Q

Null hypothesis

A

statement or position that is the skeptical view-point of the research question.

130
Q

Null distribution

A

sampling distribution from an imaginary statistical population where the null hypothesis is true

131
Q

statistical significance

A

conclusion that is unlikely to come from the null

132
Q

hypothesis testing

A

used to evaluate statistical significance

133
Q

P

A

the probability of seeing your data, or something more extreme, under the null hypothesis

helps quantify the evidence against the null hypothesis

It measures how compatible your data is with the assumption that the null is true.

If α=0.05, a p-value below 0.05 means rejecting 𝐻0 is justified.

p=0.03, 𝛼=0.05
α=0.05: The result is statistically significant because
𝑝<0.05
p<0.05. You reject
𝐻0


.
𝑝=0.10, 𝛼=0.05
α=0.05: The result is not statistically significant because
𝑝>0.05
p>0.05. You fail to reject
𝐻0

.

134
Q

type one error rate

A

probability of rejecting the null when it is true (false positive)

135
Q

type two error

A

probability of failing to reject the null when its false (false negative)

136
Q

error rates

A

probability of making a mistake

137
Q

population parameters

A

descriptive statistics of the sample

quantifiable characteristics of a statistical pop

labeled using the Greek alphabet
values are fixed

138
Q

sampling distributions

A

shape is independent of the statistical pop if the sample size is sufficiently large

bell shaped curve

taking the mean of multiple sampling
units averages out asymmetries in the statistical population

the variance of a sampling distribution increases as the # of sampling units decreases

139
Q

central limit theorem

A

given a sufficiently large sample size, the distribution of the sample mean will approximate a normal distribution, regardless of the original population’s distribution

standard error can be calculates from the sd of the statistical pop and the sample size

140
Q

SE =

A

theta (sd) / sqrt (n)

141
Q

student t distribution

A

shape depends on size of sample (influential when size is small)

has fatter tails to accunt for the uncertainty in estimating the sd

continuous probability distribution

sample size is small, and the population standard deviation is unknown.

As df increases, the t-distribution approaches the normal distribution.

142
Q

confidence intervals

A

the range over a sampling distribution that brackets the center most probability of interest

143
Q

confidence interval formulas

A

t = (x-m)/SE

x = m + t * SE

144
Q

single sample t-test

A

evaluates if the mean of your sample is different from some reference value

compares numerical variable to a reference

(sample mean - reference) / SE

145
Q

paired sample t-test

A

if the difference in paired data of numerical variables is different from some reference value

looks at how sampling units change across factors

t= (mean of differences-reference)/SE

146
Q

two sample t-test

A

determines if the means of two groups are different from each other

(m1-m2)/SEs

147
Q

contingency table

A

summarized categorical data

148
Q

expected contingency table

A

the contingency table of expected frequencies under the null hypothesis

compare observed vs. expected

149
Q

expected 1-way table

A

one categorical variable with levels

sum of observed counts must be the same as expected

expectation counts are distributed equally

is there a difference in counts among the level of that variable?

150
Q

expected 2-way table

A

two categorical variable

expected counts are distributed independently

are the counts independent between variables?

151
Q

calculating independence

A

calculate marginal distribution ………

152
Q

calculating expected frequencies

A

(row total * column total) / table total, do it for each cell

153
Q

Chi-square test

A

used to determine whether there is a significant association between categorical variables or whether observed data matches expected data under a certain hypothesis. It works by comparing observed frequencies (data collected) to expected frequencies (based on a hypothesis).

154
Q

chi-square distribution

A

distribution of chi-square scores expected from repeatedly sampling a statistical pop where the null is true

can only have positive values (square everything)

shape will vary depending on df’s

155
Q

calculating chi-square (X^2)

A

take the difference between each observed and expected cell

square the difference

divide by the expected value

sum over all cells in the table

156
Q

dfs for 1 - way tables

A

n-1

157
Q

dfs for 2-way tables

A

(r-1)(c-1)

158
Q

names for the variable used to explain the change in the outcome of an experiment

A

X - Variable

independent variable

predictor variable

159
Q

names for the variable used to explain the change in the outcome of an observational study

A

the x variable

the predictor variable

160
Q

The relationship between number of beers consumed (x) and blood alcohol content (y) was studied in 16 adults by using linear regression. The following regression equation was obtained from the study:

y= -0.0127 + 0.0180x

If a individual had 4 beers and scored a blood alcohol content of 0.085, what is their residual variation?

A

+0.0257 (correct)

161
Q

Linearity

A

response variable is a linear function of the predictor variable (well describes by a linear relationship)

the effect of the predictor variable on the response is additive and proportional

162
Q

normality

A

assumption that residuals are normally distributed

163
Q

Independence

A

assumes that the residuals a sequentially independent of each other (vary between + and - numbers seemingly at random)

when residuals are not independent there will be adjacent runs of positive and negative runs

prevent violations by making sure units are selected at random and independently of each other

164
Q

Homoscedasticity

A

the variance of residuals (errors) should be constant across all levels of the predictor variable (spread should be equal)

165
Q

bivariate normal distribution

A

3D normal distribution graph depicted as contours

166
Q

Pearsons correlation coefficient

A

r or roe

measures the strength of association

p = -1, p=0, p=1 (negative, no, positive association)

167
Q

linear regression

A

evaluates if changes in one numerical variable can predict changes in another

168
Q

linear regression equation

A

y = a (intercept) + b (slope) x

169
Q

systematic component

A

describes the function used for predictions

170
Q

random component

A

describes the probability distribution for sampling error ( only occurs in the y variable)

171
Q

link function

A

connects the systematic to the random component

172
Q

3 parts of the statistical model

A

systematic component

random component

link function

173
Q

minimizing residual variance

A

calculate residual for each data point

take the square of each residual

sum the squared residuals across all data points

divide by dfs (n-2)

174
Q

what are the four steps to the hypothesis test

A

define the null and alternative hypothesis

establish the null distribution

conduct the statistical test

draw scientific conclusions

175
Q

F-test

A

determines the ration of variance between two variables ( no variance, F = 1)

176
Q

which sum of squares measures the variability of the observes values of the response variable around their respective treatment means in ANOVA

A

residual variation (MSE) (correct)

177
Q

contrast statement

A

test the difference in means between groups in an ANOVA test

178
Q

post Hoc test

A

secondary test used to evaluate what groups have different means in ANOVA

only used if the F-test indicates to reject the null hypothesis

179
Q

TukeyHSD test

A

compares the means of all possible combinations of categorical levels in an ANOVA

controls the family wise error rate by using a specialized null distribution that accounts for the number of contrasts

180
Q

family wise error rate

A

type 1 error rate for the family of contrasts

used to evaluate the adjusted p-values returned from the TukeyHSD test

P>FWER (0.05) we fail to reject
P<FWER (0.05) we reject

181
Q

Two factor ANOVA

A

looks at the effect of two categorical variable on a numerical variable

182
Q

main effects A

A

questions about the differences among the levels of factor A averaging across the levels of factor B. These are comparisons among full columns

183
Q

main effects B

A

questions about the differences among the levels of factor B averaging across the levels of factor A. These are comparisons among full rows

184
Q

Interactions

A

differences among the levels of one factor with each level of the other factor

deviation from the assumption that the levels of each factor simply ass together

185
Q

additivity

A

response from the two variables is the sum of the two

186
Q

synergistic interaction

A

response is more than the two variables added together

187
Q

antagonistic interaction

A

response is less than the two variable combined

188
Q

What does a significant AB interaction mean in a two-way ANOVA?

A

The affect of factor A depends on the level of factor B. (correct)

189
Q

What type of sum of squares measures the variability of the observed values of the response variable around their respective cell means?

A

residual

190
Q

Mean sum of squares for groups

A

MsG = SSG(sum of squares)/dfG (k-1)

k = number of groups

191
Q

mean square Error

A

residual variation

MSE = SSE / dfE (n-k)

192
Q

what happens when the sample size increases

A

variance reduces

standard error becomes smaller

193
Q

population distribution

A

distribution values produced from the measurement of some parameter about each individual of a population

194
Q

If the coefficient of correlation r = ± 1, then the best-fit linear equation will actually include all of the data points?

A

true

195
Q

The coefficient of correlation r is a number that indicates the direction and the strength of the relationship between the variable y and the variable x?

A

true

196
Q

We anticipate a small P value for an ANOVA F statistic if the box plots for the samples are

A

wide and similarly located

narrow and located differently

identical

symmetrical

wide and have similar medians

197
Q

t distributions can be used to test whether the difference between two sample means is different from zero?

A

true

198
Q

df formulas

A

K-1: variation between groups (ANOVA, MSG)
N-K: variation within groups (ANOVA, residual variation (MSE))
n-1: one-way table
n-2: confidence intervals and residual analysis
(r/a-1)(c/b-1): 2-way table
ab(n-1): residual analysis (variation among sampling units within a cell)
n1+n2-2 = two sample t-test

199
Q

what is the F-score

A

the ratio of the variation among categorical groups divided by the residual variation within a group

200
Q

what is the null distribution of the F-score

A

represents the variation in a ratio you would expect from repeated sampling of a population where there was no true difference in means.