Midterm 1 Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What does it mean if a study is reproducible?

A
  • One can repeat the original study using THE SAME data, materials, and methods
  • The reproduction of the study confirms the soundness and reliability of original study
    conclusions.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does it mean if a study is replicable?

A
  • One can repeat original study using the same materials, and methods but DIFFERENT data
  • A study is deemed replicated if the replication study reached the same statistical conclusions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is computational reproducibility?

A
  • All aspects of data processing, analysis, visualization, and presentation are entirely and independently reproducible, yielding the exact same outputs.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the problem surrounding replication?

A

The typical research ecosystem promotes questionable research practices (QRPs), rather than promoting rigorous, replicable research; ultimately yielding too many studies that cannot be replicated. It can also perpetuate inequities, injustices, and biases in science.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the “Reproducibility Crisis”?

A

Replicability is not the norm; vast majority of studies could not be repeated without extensive consultation with original authors (only 46% of studies could be repeated).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does low statistical power lead to?

A

Lower power = more false negatives
A true effect was there, but it wasn’t detected.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What can cognitive biases lead to?

A

Cognitive biases can hinder objectivity, can lead to false positives.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the problem with poor or inaccessible documentation?

A

It makes it so the experiment cannot be replicated or reproduced.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is P-hacking?

A

Data processing and analytical choices made after seeing and interacting with your data. Results become data dependent, and no longer adhere to the original hypothesis testing model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is HARKing?

A

Hypothesizing After the Results are Known.
Ex. researcher who finds patterns through exploratory research presenting findings as though they were part of confirmatory research.
Hypothesis presented as if it was determined beforehand.
False positive rates are higher.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is confirmation bias?

A

Finding stats and information that confirms the results that the researcher was studying.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the file drawer problem?

A

Where research results, especially negative ones, remain unpublished.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the solution to all the reproducibility problems?

A

Open Science
Science research conducted and communicated in an honest, accessible, and transparent way, such that - at a minimum – a study can be reproduced, but ideally, replicated. Replication builds strength of evidence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the benefits of open science?

A
  • Saves time and money by pursuing best leads and avoiding poor ones (because
    poor ones were documented!)
  • Re-use methods / code that work, avoid using ones already found to be ineffective
  • Avoids duplication while enabling replication
  • Facilitates meta-analyses
  • Promotes more rapid discovery*
  • Democratizes science and promotes equitable access and relevance to all stakeholders
    As witnessed with COVID-19 research*
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are registered reports?

A

When the study design is first peer review then the final study is also peer reviewed and no matter the results it is published.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the two goals of statistics?

A

Goal 1: To estimate the values if important quantities in a population of interest.
Goal 2: To specific claims, or statistical hypotheses, about those quantities in the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Why do we need statistics?

A

Measuring everyone in the population is almost infeasible. Statistics provides the tools necessary to reliably describe populations and draw inferences about them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Population

A

All the individual units of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Sample

A

A subset of units from the population that we measure and analyze.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Estimation

A

The process of inferring an unknown quantity of a population using sample data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Parameter

A

A quantity describing a population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How do populations, parameters, samples, and estimates relate?

A

Populations <—> Parameters
Samples <—> Estimates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are some characteristics of parameters?

A

They are constant, fixed, the truth (which we almost never know)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What are some characteristics of estimates?

A

They are random variables; they change from one sample to the next from the same population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is a sample of convenience?

A

A collection of individuals that happen to be available at the time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is sampling bias?

A

Sampling Bias is a systematic difference between a parameter and its estimate.
Sampling Bias arises when samples aren’t representative of the population.
Sampling Bias is typically difficult to deal with.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is volunteer bias?

A

Volunteers for a study are likely to be different from the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What are the properties of a good sample?

A

A good sample is made up of random and independent selection of a large number of individuals. In a random sample, each member of a population has an independent and equal chance of being selected.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is sampling error?

A
  • Discrepancy between the population parameter and the sample estimate caused by chance.
  • It is inevitable and expected, and can be managed and dealt with
  • Because an estimate is a random variable, the value of an estimate is influenced by chance.
  • Therefore, estimates will differ among random samples from the same population
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What is sampling bias?

A

Systematic difference between estimates and parameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is sampling error (shortened version)?

A

Discrepancy between estimates and parameters caused by chance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What is a good random sample in terms of sampling error and sampling bias?

A

Good random samples minimize bias and make it possible to measure the amount of sampling error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Accurate and precise versus inaccurate and imprecise

A

Accurate is how close to the target it is and precise is how many times it hits the same spot.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What is a variable?

A

A variable is a characteristic that differs among individuals or other sampling units.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What is data?

A

Data are measurements of one or more variables made on a sample of individuals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What are nominal categorical variables? what are some examples?

A

No natural order to categories
- sex
- genotype
- drug treatment (e.g., aspirin vs. ibuprofen)
- province
- survival (i.e., live or die)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What are ordinal categorical variables? what are some examples?

A

Natural ordering to categories
- severity (mild, moderate, severe)
- light intensity (dim, moderate, bright)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What are discrete numerical variables? what are some examples?

A

They can be counted.
- Number of limbs
- Number of offspring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What are continuous numerical variables? what are some examples?

A

They can be measured
- Arm length
- Height
- Salt concentration (mg/L)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

How are response (dependent) and explanatory (independent) variables connected?

A

We aim to predict response variables using explanatory variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What is frequency distribution?

A

Describes the number of times each value of a variable occurs in a sample (categorical or numerical).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

What is probability distribution?

A

The distribution of the variable in the entire population (rarely known)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

What is an observational study?

A
  • Treatments are NOT assigned by researcher
  • Can only evaluate associations between variables
  • Cause and effect CANNOT be assessed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

What is an experimental study?

A
  • Treatments assigned randomly to individuals
  • CAN assess cause and effect relationships between variables (given good experimental design)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

What is a confounding variable?

A

A confounding variable is an unmeasured variable that changes in tandem with one or more of the measured variables in a study.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

Why should we be aware of reverse causation?

A

Ex. hypothesized causality can be that feeding method affects infant growth rate when in actuality infant growth rate affects feeding method.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

What is meta-analysis? (not super important)

A

An analysis of analyses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

What is exploratory research?

A
  • Characterized by the use of data to generate hypotheses about why something occurred
  • Crucial for discovery, especially of unexpected patterns that then lead to new lines of research
  • Typically proceeds without rigid analysis path; flexibility to explore different leads and angles of inquiry
  • Open to an array of possible relationships and resulting interpretations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

What is confirmatory research?

A
  • Characterized by undertaking a study specifically designed to test an a priori hypothesis and associated predictions about what will occur
  • Crucial for establishing diagnostic evidence for explanatory claims
  • Proceeds with a clear study design and analysis plan that is strictly followed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

What is a common mistake people make about the relationship between correlation and causation?

A

Correlation means causation. When in reality correlation does not mean or require causation.

51
Q

What type of table should be used to show categorical data?

A

Frequency table

52
Q

What kind of graph is used for categorical data?

A

Bar graph

53
Q

What kind of graph is used for numerical data?

A

Histogram

54
Q

Why should you always visualize data early?

A

To check for data entry errors.

55
Q

What does a bell-shaped histogram look like?

A

It is an evenly centered and distributed graph.

56
Q

What does a positively asymmetric (right-skewed) graph look like?

A

The left is very tall then it slopes down towards the right.

57
Q

What does a negatively asymmetric (left-skewed) graph look like?

A

The left is very short then it slopes upward as it passed to the right.

58
Q

What does a bimodal graph look like?

A

It has two peaks.

59
Q

What type of plots can be used to show association between 2 categorical variables?

A
  • Grouped bar chart
  • Mosaic plot
60
Q

What is relative frequency?

A

Proportion of responses in a category.

61
Q

What type of plots can be used to show association between 2 numerical variables?

A
  • Scatterplot
  • In some cases line graph (time series)
62
Q

What type of plots can be used to show association between numerical and categorical variables?

A
  • Stripchart (<=20 observations per group)
  • Violin plot (many observations)
  • Multiple histograms
63
Q

Flip for graphing tips

A
  • Three-dimensional bars / figures generally not good
  • Pie charts can be difficult to interpret (bar charts are better)
  • Be sure to know when to use BAR chart versus HISTOGRAM
  • R and the “ggplot2” package produces good graphs by default, but some tailoring is typically beneficial
64
Q

What are qualities of a good graph? bad graph?

A

Good
-Magnitudes honest
-No clutter
-Hollow, good size points
-All data points visible
-Ticks to the outside
-Clear axis labels
-Units in axis labels
-No points on axes lines
-Color only when necessary
Bad
-Magnitudes distorted

65
Q

What are measures of location?

A
  • Give an idea about a typical value in the dataset
  • Allow us to answer questions like ”Which species is larger?”
66
Q

What are measures of spread?

A
  • Give an idea about variability in the data
  • Variability is key in biology! – individuals differ from one another, and this variation provides the basis for evolution
67
Q

What are measures of location for categorical variables?

A

Proportion

68
Q

How do you calculate proportion?

A

The number items/individuals in a specific catergory divided by the total sample size (n).

69
Q

What are measures of location for numerical variables?

A
  • Mean
  • Median
  • Mode
70
Q

What is the mean?

A

The average of the given numbers

71
Q

How do you calculate mean?

A

The sum of all the given values divided by the number of values (n).

72
Q

What is the median?

A
  • The median is the middle measurement of a sample.
  • Calculate the median by taking the value halfway through an ordered list of observations.
73
Q

How do you calculate the median?

A
  • Sort observations from smallest to largest
    -Then determine the exact middle value
74
Q

What are percentiles?

A
  • The median is equivalent to the 50th percentile
75
Q

What is the mode?

A

The mode is the most frequent measurement, aka what shows up the most in response.

76
Q

What are the 5 measures of spread?

A

-Variance
-Standard deviation
-Coefficient of variation
-Range
-Interquartile range (IQR)

77
Q

What do variance, standard deviation, and coefficient of variation all commuincate?

A

How far individual observations typically deviate from the mean.

78
Q

What is variance?

A

The average of the squared difference between an observation and the mean
*The numerator is called the “sum of squares”

79
Q

How do you calculate variance?

A

The sum of all the individual numbers minus the mean squared divided by the total sample size minus 1.

80
Q

When is variance an estimate versus a parameter?

A

It is an estimate when it has n-1 on the bottom and uses the mean. It is a parameter when the denominator is just n and it uses U

81
Q

What is standard deviation and how do you calculate it?

A

A measure of the amount of variation or dispersion of a set of values.
Calculate by taking the square root of the variance (s^2).

82
Q

What is interquartile range (IQR)?

A

The difference between the 75th and 25th percentile.

83
Q

How is IQR calculated?

A
  • Order observations from smallest to largest
  • 25th percental (= 1st “quartile”) is the value at which 25% of observations are smaller
  • 75th percental (= 3rd “quartile”) is the value at which 75% of observations are smaller
84
Q

What is range?

A

The sample range is simply the difference between the largest and smallest value. Not particularly useful, because it is influenced by sample size… meaning it is a biased estimate.

85
Q

What is the coefficient of variation?

A

A precentage ratio of standard deviation to the mean.

86
Q

How is the coefficient of variation calculated?

A

Standard deviation divided by the mean times 100 percent.

87
Q

When do you use mean and standard deviation?

A

If frequency distribution is approximately normal, and absent of outliers/extreme values.

88
Q

When do you use median and IQR?

A

If frequency distribution non-normal, and/or if there are extreme values/outliers

89
Q

What happens to the mean on a non-normal distribution graph?

A

The mean gets pulled towards the skew; it is the center of gravity of the distribution.

90
Q

What happens to standard deviation and IQR on a non-normal distribution graph?

A

The standard deviation gets much larger meanwhile the IQR stays the same.

91
Q

Why use mean & SD for normal values and median & IQR for non-normal values?

A

Because the latter are less influenced by extreme values, and thus more representative.

92
Q

What do μ and σ mean?

A

μ represents the population mean
σ represents the population standard deviation

93
Q

What is the sampling distribution?

A

The probability distribution of all values for an estimate that we might obtain when we sample a population.

94
Q

What is standard error?

A

The standard deviation of the estimate’s sampling distribution.

95
Q

How do you calculate standard error?

A

Standard deviation divided by the square root of the total sample size.

96
Q

What is the uncertainty added to the standard error?

A

Using the mean, move the decimal point once to the left.
*we only add uncertainty to standard error

97
Q

What is a confidence interval?

A

A range of values surrounding the sample estimate that is likely to contain the population parameter

98
Q

What is the interval formula/example?

A
  • lower 95% “confidence limit” = 2411.8 - (2 x 146.3) = 2119.2
  • upper 95% “confidence limit” = 2411.8 + (2 x 146.3) = 2704.4
  • 95% confidence interval: 2119.2 to 2704.4
99
Q

Out of these two what it the correct language when describing the confidence interval?
“There is a 95% probability that the population mean is within a particular 95% confidence interval”
or
“We are 95% confident that the population mean lies within the 95% confidence interval”

A

CORRECT: “We are 95% confident that the population mean lies within the 95% confidence interval”

100
Q

What is margin of error?

A

A statistic expressing the amount of random sampling error in a survey’s results.
The larger the margin of error the less confidence in the results.

101
Q

What do error bars do?

A

Help illustrate uncertainty about the value of the parameter being estimated.

102
Q

What is a random trial?

A

A process or experiment that has two or more possible outcomes whose occurrence cannot be predicted with certainty.
Only one outcome is observed from each repetition of a random trial.

103
Q

What is an outcome?

A

Rolling a “6” is one of 6 possible outcomes when rolling a single fair die.

104
Q

What is an event?

A

Any potential subset of all the possible outcomes; e.g. rolling a 4; rolling a number greater than 2

105
Q

What is probability?

A

The probability of an event is the proportion of times the event would occur if we repeated a random trial over and over again under the same conditions.
Probability ranges from 0 to 1.

106
Q

What does Pr[A] stand for?

A

The probability of event A.
e.g. Pr[rolling a 4] = 1/6
Pr[>2] = 4/6 = 2/3

107
Q

What does it mean if two events are mutually exclusive?

A

They cannot both occur at the same time.
e.g. Pr[4 and 6] = 0

108
Q

What does it mean if Pr[A and B] ≠ 0?

A

They may occur simultaneously.

109
Q

What is probability distribution?

A

A list of the probabilities of all mutually exclusive outcomes of a random trial.

110
Q

What is discrete probability distribution?

A
  • a discrete probability distribution gives the probability of each possible
    value of a discrete variable
  • eg, for the roll of a 6-sided die…
111
Q

When is discrete probability distribution used?

A

For random trials in which all possible mutually exclusive outcomes are able to be enumerated, the relevant probability distribution is a discrete probability distribution, and we can calculate the probability of the event.

112
Q

How do you calculate discrete probability distribution?

A

Pr[gene length = 2000]
= # genes with length 2000 ÷ total number of genes
= 4 ÷ 22385
≅ 1.79 x 10 -4

113
Q

What is continuous probability distribution?

A

the probability distribution of a continuous variable is a continuous function, e.g. the normal distribution.

114
Q

How is continuous probability calculated?

A

Continuous probability distribution is the probability of a range of possible values represented as area under the curve integrated from probability density.

115
Q

If we know the continuous probability of one range, what judgement can we make?

A

Pr[length between 4 and 4.2mm]&raquo_space; Pr[length between 2 and 2.2mm]

116
Q

What is the addition rule/principle?

A
  • if two events, A and B are mutually exclusive…
    Pr[A or B] = Pr[A] + Pr[B]
117
Q

How do you calculate the probability of an event not happening?

A

TIP: remember, the sum of all probabilities is 1
Pr[not 2] = 1 -Pr[2]
= 1 – 1/6 = 5/6

118
Q

What is the addition principle equation for events that can occur simultaneously?

A

Pr[A or B] = Pr[A] + Pr[B] – Pr[A and B]

119
Q

What is the multiplication rule or what equation is used when finding the probability of two independent events?

A

Pr[A and B] = Pr[A] × Pr[B]

120
Q

What is the difference between the addition rule and the multiplication rule?

A

Addition is A OR B
Multiplication is A AND B

121
Q

What are probability trees?

A

An image that can model multiple events.
Image in slides Chapter 5 part 2 slide 8

122
Q

Can probability trees show dependent events?

A

Yes, the example showed that the probability of washing hands was dependent of sex.

123
Q

What do probability trees help us calculate?

A

The probability of each mutually exclusive event.