FALL 2024 Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

five Hierarchical scales

A

sample unit
sample
observation unit
statistical population
population of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

sampling unit

A

the unit being selected at random, it may be the same as the observation unit or contain multiple observation unit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

sample

A

collection of the sampling units

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

observation unit

A

scale of data collection, subject of study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

statistical population

A

collection of all sampling units that could have been in your sample, and represents the true scale in which your statistical conclusions are valid

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

population of interest

A

collection of sampling units that you hope to draw conclusion about

scope of the research question

ideally the same as your statistical population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

measurement variable

A

what we want to know/measure about the observation unit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

measurement unit

A

scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

descriptive stats

A

set of tools used to describe data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

inferential statistics

A

uses information from the sample to make a probolistic statement about the statistical population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is the rule for descriptive and inferential stats when there are multiple groups i a statistical population

A

descriptive stats are repeated for each group but inferential stats are only done once and can be used to make statements about the differences between groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

ideal sampling design

A
  1. all sample units have a probability of being included
  2. selection of sampling units must be unbiased
  3. selection of sampling units are independent
  4. each possible sample has an equal chance of being selected
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

observational studies

A
  • researchers have no control over the variables
  • it characterizes something about an existing statistical pop
  • a tool for discovering associations, but can not make statements about the involvement of the sampling unit (cannot establish causation cause there is no way to know if the the factor is governed by something else
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

response variables

A

variable the investigators are interested in

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

explanatory variable

A

variable that the investigator believes may explain the response variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

confounding variables

A

unobserved variables that affect the response variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

simple random

A

starts by identifying every sampling unit in the statistical population and then selecting a random subset of those to be in your sample. Each sampling unit has the same probability of being included in your sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

stratified

A

used when the statistical population has some grouping (strata)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

clustered

A

observation units are contained within a larger group that we can randomly sample (geographicl or organizational)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

case control

A

when there is a known outcome we are trying to explain

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

cohort

A

select a sampling unit, follow them through time to see if they developed the result we want

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

retrospective

A

studies where the results are already known
ie. case control studies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

prospective

A

outcome is not yet known
ie. cohort studies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

cross-sectional

A

study a response variable at only a single snap shot of time
ie. simple random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

longitudinal surveys

A

study a response variable at multiple points of time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

which of the following distinguishes case control from cohort surveys

a. Whether the survey is cross-sectional or longitudinal

b. Whether strata are defined ahead of time or not

c. Whether the survey design is retrospective or prospective

d. Whether clusters of observation units were selected at random or not

A

c. Whether the survey design is retrospective or prospective (correct)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Which of the following distinguish stratified from clustered surveys?

Whether the survey is cross-sectional or longitudinal

Whether strata are defined ahead of time or not

Whether the survey design is retrospective or prospective

Whether clusters of observation units were selected at random or not

A

Whether strata are defined ahead of time or not

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

You design a study where you randomly select 10 car models from within each category of electric, hybrid electric-gas, gasoline, or diesel. For each model, you find the purchase cost and estimate how much it will cost you to drive the vehicle for the next 10 years. What type of survey design is this?

A

Stratified survey

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Your children are young teenagers and you hear them listening to an entirely new genre of music called Korean Pop. You are curious whether it is just your kids that are listening to Korean Pop or if other kids their age are as well. You decide to find out by approaching 15 parents at the next Parent Teacher Night. Being a bit of a statistical geek, you mentally number each of the parents while they are talking to teachers. You pull out your cell phone with a list of random numbers and use these numbers to randomly select the parents that you approach to ask. What type of survey design is this?

A

Simple random survey

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

You are a researcher interested in the rates of mental illness in Canadian cities. You randomly select 120 cities across Canada, and conduct a survey of each to get a single estimate of per capita incidence of mental illness. The design of this surveying method is best characterized as:

A

cluster survey

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

corner stone of experimental studies

A

replication

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

number of sample units =?

A

number of replicates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

pseudoreplicates

A

an error in the design of an experimental study where the observation units are analyzed instead

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

the common design elements/types

A
  1. control
  2. blocking
  3. blinded (single and double)
  4. placebo
  5. sham treatment
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

control treatment

A

reference treatment to compare against the treatment levels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

blocking

A

used to control variation among the sampling units (similar to stratified sampling it forms subgroups or “blocks”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

single blinded

A

when the sampling unit does not know what treatment they are being exposed to

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

double blinded

A

both researcher and sample unit are unaware

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

placebo

A

often used in medical trials as the control treatment that helps accomplish a blinded design (has no effect)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

sham treatment

A

method used in control treatments, accounts for the affect of delivery of a treatment that is not of interest

compare and contrast between sham and treatment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Imagine a study that evaluates the effectiveness of different over-the-counter pain relievers in alleviating the symptoms of arthritis: acetaminophen, ibuprofen and acetylsalicylic acid. Two hundred patients are randomly assigned to receive one of these three pain relievers, or to receive a placebo (control). How many factors and levels are evident in this study?

A

1 factor with 4 levels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Patients who are blinded to the experimental treatment is a crucial part of a randomized clinical trial. Why?

A

Reduces the possibility of placebo effects

Reduces biases in measurements stemming from the anticipation of a treatment effect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What is the reason for blinding the researcher to what experimental treatment a patient is going to receive?

A

Reduces biases in measurements stemming from the anticipation of a treatment effect

Reduces the possibility of placebo effects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

What design characteristic distinguish experimental studies from observational studies?

A

Whether sampling units are randomly assigned to treatments or not.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

A researcher studied the effect of the prescription drug raloxifene on fracture risk in postmenopausal women. They found that women who took raloxifene over a five year period reduced their risk of clinical vertebrate fracture compared to women who did not take the drug. What are the factors and levels in this experiment?

A

There is one Factor (drug) with two Levels (raloxifene, no raloxifene).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

variable

A

any measurable characteristic of an observation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

datum

A

value of the variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

continuous numerical variable

A

can take on any value (1.2 or 1/4 etc.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

discrete numerical

A

can only be whole numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

ordinal categorical variable

A

can take on qualitative values but the values are on a ranked scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

nominal categorical variable

A

takes on qualitative values but they do not have any particular order

eg. types of fruit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

What is the data type for describing your age

A

Continuous numerical

51
Q

What is the data type for the description: child, teenager, adult?

A

Ordinal categorical

52
Q

What is the data type for the number of students in a class?

A

Discrete numerical

53
Q

What is the data type for the letter grade on your exam?

A

Ordinal categorical

54
Q

What is the data type for the percent grade on your exam?

A

Continuous numerical

55
Q

central tendency

A

describes the typical values in our sample (eg. mean)

the second quartile

56
Q

dispersion

A

describes the spread of the values

57
Q

counts

A

categorical variable

of observations in your sample that fall within a particular category

58
Q

proportions

A

percentages

59
Q

variance

A

variance measures the amount of variation

the average squared distance of each data point from the sample mean

σ^2

60
Q

calculating variance

A

calculate the mean

find the diff between each data point and the mean

square the value

sum the squares and divideby the # of observation points

61
Q

Quartiles

A

ranked bins of data
1. sort from lowest to highest

62
Q

finding the second quartile

A

split the data in half, according to

a. if you have a odd data set then quartile 2 is the middle value

b. if a even data set the the second quartile is the average of the two middle values

63
Q

finding the first quartile

A

subset the lower-valued half of observations, then use the rules in the second quartile to find the middle value

note the 2nd quartile is included if the # of observations is odd

64
Q

3rd quartile

A

repeat steps for quartile 1 in the upper valued half

65
Q

dispersion aka interquartile range

A

range of inner-most 50% of the data

between Q1 and Q3 (Q3-Q1)

66
Q

Calculate the mean & median of the following data:

7.5 9.9 8.6 10.3 8.5 9.4 15.1

A

Mean is 9.9, median is 9.4

67
Q

Would the mean or median be a better descriptor of the ‘middle’ value for this set of data?

7.5 9.9 8.6 10.3 8.5 9.4 15.1

A

Median

68
Q

Calculate the population variance & interquartile range (IQR) of the following data:

7.5 8.6 8.9 8.5 9.4 10.7 15.1

A

Variance is 5.5, IQR is 1.5

69
Q

Calculate the interquartile range (IQR) for the following set of numbers and indicate what range the answer lies within.

10.1, 18.6, 19.8, 15.7, 21.9, 12.9, 11.8, 26.0, 13.0, 12.9

A

5 ≤ ANSWER < 7

70
Q

Calculate the interquartile range (IQR) for the following set of data and indicate what range the answer lies within.

46.7, 18.7, 39.4, 7.2, 19.8, 42.1, 2.6, 17.1, 30.7, 21.9

A

19 ≤ ANSWER <23

71
Q

meaningfulness

A

the difference among groups important to your study

72
Q

effect size

A

whether the change in the response variables is meaningful for a practical study

73
Q

The rate of home ownership in Canada decreased from 46% in 2004 to 44% in 2011. What is the effect size as a difference between the years?

A

-2%

74
Q

do relative effect sizes have units

A

no

75
Q

In the United Kingdom, 56% of older adults (55+ years) get their news from the television whereas only 12% of youth (18-24 years) do. What is the relative effect size of youth compared to older adults?

A

4.7 (0.56/0.12)

76
Q

absolute effect size

A

the actual difference in outcomes

ie. 80%-60%=20%

77
Q

relative effect size

A

Relative effect size compares the outcomes between two groups as a ratio or percentage.

(80% / 60%) = 1.33, or a 33% increase

78
Q

marginal distributions

A

sum the values in each row

sum the values in each column

in the last box add up every row and column, this helps make proportions

shows how many sampling units are in each level of one categorical variable

good way to describe patterns

79
Q

conditional distributions

A

shows the relationship between the columns and the rows

take the value of the cell you are interested in and divide by the total amount of the column or row

80
Q

characteristics of single variable bar graphs

A
  • gaps show the levels are categorical
  • which ever variable you are most interested in goes on the x axis
  • each bar is a level
81
Q

two variable bar graphs

A
  • visualizes interactions between data sets
82
Q

types of two variable bar graphs

A

grouped bar graph

stacked bar graph

83
Q

histograms

A

bars are side by side (no gap)

represent a small numerical range

84
Q

box plots and its parts

A

based on quartiles and used when you have numerical data and categorical groups
- whisks
- median: solid line
- box: drawn from the first quartile to the 3rd
- extreme threshold

85
Q

whisks

A

drawn from the box to the last data point before the extrem threshold

86
Q

extreme thresholds

A

Q3 + (1.5IQR) and Q1-(1.5IQR)

87
Q

scatter plot

A

when you have two numerical variables and you want to look at the relationship between them

x axis is the independent variable

y axis is the dependent variable

in an observation study the x and y axis are covariates

88
Q

line plots

A

two numerical variables that have been measured repeatably from the same sampling unit

each line is a different sampling unit

89
Q

Identify which type of summary information would answer the following question “What proportion of people like cookies when playing poker?”

A

Conditional distribution with game as the primary variable

90
Q

standard normal distribution

A

z = (x-u)/σ

91
Q

sample space

A

set of all possible outcomes

92
Q

event

A

a subset of a sample space (2,4,6 of 1 through 6)

93
Q

random trial

A

procedure or action that produces one outcome from a set of possible outcomes, where the result is uncertain and cannot be predicted in advance.

94
Q

frequentist probability

A

probability based on the frequency of events occurring in repeated experiments or trials

P(A)= Totalnumberoftrials/
NumberoftimeseventAoccurs

95
Q

random variable

A

numerical outcome of a random phenomenon. It assigns a number to each outcome in a sample space, allowing for the analysis of probabilities associated with different outcomes.

96
Q

probability distribution

A

the probability of different possible values of a variable.

97
Q

discrete distributions

A

a function that gives the probability of a discrete random variable, X, being exactly equal to some value

98
Q

define bias and sampling independence

A

systematic error in a study or analysis that leads to incorrect conclusions or inferences about a population.

the selection of one sample unit does not influence the selection of another.

99
Q

4 goals of an ideal sampling design

A
  1. all sampling units are selectable
  2. selection is unbiased
  3. selection is independent
  4. all samples are possible
100
Q

spurious relationships

A

a situation where two variables appear to be correlated with each other but, in fact, are not directly related

101
Q

one way contingency table

A

are for data with a single categorical variable and are shown as a one-dimensional table of columns.

102
Q

marginal distributions

A

are for data with two categorical variables and are shown as a two-dimensional table of rows and columns.

103
Q

You have been asked by a regional conservation authority to design a study to evaluate the risk that a tick will bite someone walking at one of the parks. They provide you enough money to survey 15 parks out of the 60 that are in the region. Your plan is to spend a day at each of the selected parks and survey all the people leaving the park to assess whether a tick bit them or not. You will then calculate the proportion of people bitten for each park sampled.

A

the 60 parks in the region

104
Q

According to USA Today (Dec 30, 1999), the average age of viewers of MSNBC cable television news programming is 50 years old. A Canadian network executive thinks this might not be true in Canada, and believes that the average age of these viewers in Canada is significantly less than 50 years old.

To test her hypothesis, the Canadian executive obtains a list of Bell satellite subscribers who included MSNBC in their channel package, and then conducts a phone poll of 2,000 of these subscribers across Canada. Anyone called who reports not watching MSNBC news programming at least once a week is left out of the survey; in the end 287 respondents watch MSNBC news programming at least weekly, and report their ages as part of the survey.

What is the variable of interest?

A

viewer age

105
Q

According to USA Today (Dec 30, 1999), the average age of viewers of MSNBC cable television news programming is 50 years old. A Canadian network executive thinks this might not be true in Canada, and believes that the average age of these viewers in Canada is significantly less than 50 years old.

To test her hypothesis, the Canadian executive obtains a list of Bell satellite subscribers who included MSNBC in their channel package, and then conducts a phone poll of 2,000 of these subscribers across Canada. Anyone called who reports not watching MSNBC news programming at least once a week is left out of the survey; in the end 287 respondents watch MSNBC news programming at least weekly, and report their ages as part of the survey.

What is the statistical population for this study?

A

all at-least-weekly Canadian Viewers of MSNBC news programming who watch using bell satellite

106
Q

A medical study wants to relate consumption of fat to heart conditions. 100 patients with heart conditions are randomly selected from clinics in the Kingston area, and each patient is asked to track their food consumption for 6 weeks. After the six weeks, each patient’s heart health is evaluated using a standard array of test (blood pressure, heart rate, ECG, etc.)

What term best describes each patient in this study design?

A

both sampling and observation unit

107
Q

An ornithologist at Queen’s University is studying the development time of recently hatched black-capped chickadees on Wolfe Island. He randomly samples 20 nests from across the island and measures the weight of each new hatchling in the nest. He repeats this sampling after 1 week, and then again after 2 weeks.

What term best describes each nest included in this study?

A

sampling unit

108
Q

Lyme disease is caused by the bacterium Borrelia burgdorferi, carried primarily by black-legged ticks. A recent study assessed the percentage of black-legged ticks that carry Borrelia from 10 random sites across North American spanning a range of mean annual temperatures. The number of ticks carrying Borrelia was quantified by collecting 100 ticks from each site and screening each tick for the bacterium (either YES or NO). The goal was to quantify the relation between annual temperature among sites and the percentage of ticks with Borrelia.

What is the observation unit in this study?

A

the individual tick

109
Q

A medical study wants to relate consumption of fat to heart conditions. 100 patients with heart conditions are randomly selected from clinics in the Kingston area, and each patient is asked to track their food consumption for 6 weeks. After the six weeks, each patient’s heart health is evaluated using a standard array of test (blood pressure, heart rate, ECG, etc.)

What term best describes the beats per minute of heart rate in this study design?

A

measurement unit

110
Q

You are interested in the growth potential of a new seed variety. You gather a random selection of 1,000 seeds from a field where the new variety is growing, and measure the final height of all the resulting plants.

What kind of study design is this?

A

simple random

111
Q

You are the quality assurance manager for a company that produces toasters. In post-production testing, you find that more toasters are failing than expected; the cause or source of the failures is not immediately clear though.

You ask your intern to gather a random selection of failed toasters, and a selection of toasters that do not fail in testing, and then to trace all those toasters back through the production process (employees that did which installation, source of the particular components, etc.)

What kind of study design is this?

A

case control study

112
Q

A psychology professor recruits 50 randomly selected Queen’s undergraduates, and ask them to recommend friends who would also be willing to participate in an introvert/extrovert personality study; overall, 93 students complete the study.

The results are 73% of the students are extroverts, 17% are introverts, and 10% are a mix.

What would the biggest concern or risk be about this sampling strategy?

A

sample unit selection is not independent

113
Q

A medical experiment, in which a treatment group is compared to a control group, is carried out to reduce the effect of

A

confounding factors

114
Q

Consider a survey being designed for customers of a tour company in Paris.

Determine whether the possible responses to the following question on their survey should be classified as categorical, continuous numerical or discrete numerical.

“How many escorted vacations have your taken prior to this one?”

A

discrete numerical

115
Q

Determine whether the possible responses to the following question should be classified as categorical, discrete numerical or continuous numerical.

“Whether you are a Canadian citizen.”

A

categorical

116
Q

Determine whether the possible responses to the following question should be classified as categorical, discrete numerical or continuous numerical.

“The number of students in a statistics course.”

A

discrete numerical

117
Q

number of observation units in a table

A

of rows

118
Q

number of variables in a table

A

number of columns

119
Q

Customers finishing a free sample at Costco are asked to complete a survey asking whether they would be “Very interested”, “Interested” or “Not interested” in buying the food product in the future. In one day, 357 customers complete the survey.

What graph type would be most appropriate for displaying the resulting data all at once?

A

a bar graph

120
Q

two way contingency table

A
121
Q

What is the sample space for determining the probability of drawing a Jack of Clubs from a deck of cards in a game of poker?

A

list of all cards in a deck

122
Q

What is the event for drawing an ace from a deck of cards in a game of poker?

A

list of all aces

123
Q

Which of the following statements reflects a correct definition of probability?

There is a good probability of rain tomorrow

Roughly 1 in a million people have won a national lottery over hundreds of draws, which means the probability is p=0.0000001.

The probability that a product fails can be calculated directly from repeated testing in a factory.

The probability that I will buy my lunch today is 100%

A

Roughly 1 in a million people have won a national lottery over hundreds of draws, which means the probability is p=0.0000001. (correct)

The probability that a product fails can be calculated directly from repeated testing in a factory. (correct)

The probability that I will buy my lunch today is 100% (correct)

124
Q

Which of the following statements describe a random trial?

The weight of an orange in measured in grams.

Observing a random shopped how much they spent in a particular store.

Playing a ‘scratch and win’ lottery ticket.

Finding out that your neighbour won a million dollars in the lotto

Playing a crossword puzzle

Rolling a die in a board game

A

Observing a random shopped how much they spent in a particular store. (correct)

Playing a ‘scratch and win’ lottery ticket. (correct)

Rolling a die in a board game (correct)

125
Q

Question 1:User Answer Incorrect
Would the following be a continuous or discrete distribution? ‘Length of time between shots on net in a soccer game’

A

Continuous distribution

126
Q

Would the following be a continuous or discrete distribution? ‘Number of shots on net in a soccer game’

A

Discrete distribution

127
Q

Which of the following statements about probability distributions are TRUE?

Can be used to describe both discrete and continuous numerical variables

The area beneath the function always sums to one

The y-axis of a continuous distribution is called probability mass

The x-axis is the outcome, or event, of interest

Probability distributions show the probability of some events, but they do not have to account for all possible events from a random trial.

A

Can be used to describe both discrete and continuous numerical variables (correct)

The area beneath the function always sums to one (correct)

The x-axis is the outcome, or event, of interest (correct)

128
Q

Which of the following statements about probability distributions are FALSE?

The probability of a single event in a continuous distribution is always zero

The probability of a single event in a discrete distribution is always zero

Regardless of whether the distribution is discrete or continuous, probability is the area under the curve.

Probability distributions cannot be used for a range of events.

A

The probability of a single event in a discrete distribution is always zero (correct)

Probability distributions cannot be used for a range of events. (correct)