Module 2 Flashcards
What is a case-control study?
A study comparing cases and controls
What is a retrospective case control study?
Researchers looking back on how subjects behaved over time, looking at case group and control gorup
Draw the casual model(i.e. the “directed acyclic graph). Include the mediator, and have arrows which show “what we already know”, “what we can prove”, and “what we want to know”
Refer to PCV 2.1
What is sampling variability/error?
When you draw a random and representative sample from a population, it is not always going to be the exact same sample every time you draw it.
3 students draw a sample of 5 observations from the population. How many samples did each student draw from the population? What is the sample size used in the experiment?
1 sample with a sample size of 5
What is a sampling distribution?
First you take several samples and take the mean of each sample. Then, you treat the sample means as the new data set and plot a histogram.
What is a sample distribution?
A sample distribution would be a histogram of the values within one sample
As you increase the sample size, what happens to the standard deviation, graph shape, and mean of the sampling distribution graph for a NORMAL distribution?
The graph tightens, variability decreases, standard deviation also decreases. The shape of the graph does not change.
The mean of the means does not change with an increased sample size.
What is n?
The sample size
As you increase the sample size, what happens to the standard deviation, graph shape, and mean of the sampling distribution graph for a UNIFORM distribution?
Mean of the means is unchanged
The standard deviation gets smaller
The shape of the sampling distribution becomes more and more normal
What are some synonyms for a normal graph
symmetric
gaussian
bell shaped
What is the central limit theorem?
When n is “sufficiently” large, the sampling distribution for a particular statistic(e.g. sample mean) will tend towards a normal distribution even if the underlying population distribution is not Gaussian.
What is the normal distribution? What are the two parameters for a normal distribution? If a random variable X is distributed normally, then we denote it as ….
The normal distribution is a continuous probability distribution for real-valued random variable. The two parameters for a normal distribution are the mean (mu) and variance(sigma squared).
(Notation in notes)
How does the normal distribution differ from the binomial distribution?
The binomial distribution is a discrete probability distribution. This means that the values that our random variable X could take on were clearly delineated integers, like the number of heads in five coin flips. X could not be a fraction like three and a half heads.
In comparison, the values that the random variable X could take on in a normal distribution could include fractions of a whole unit
In a normal distribution, what do the mean and variance tell us about the graph?
- The mean tells you about the location of the distribution. The shape would remain the same if only the mean of a normal distribution is changed
- The variance tells us about how widely distributed the values are. The centre of the distributions would be the same but the peak of the graph and the width of the graph would change.
The probability of observing random, normally-distributed values within a given range is equal to
the associated area under the curve (AUC)
this works by considering intervals of potential values for X as defined on the x-axis of the p-lot, then calculating the proportion of th total area under the curve that falls within that interval. This would give us the probability of observing a random normal variable from this population with a value in that range.
Can you calculate the probability for a single value of X in a normal distribution?
No, when working with continuous distributions like the normal distribution, our probability will always be anchored to an interval.
If a random variable X is distributed normally, why do we know about the standard deviations?
- There is an approximately 68% chance X falls within one standard deviation of the mean
- there is an approximately 95% chance falls within two standard deviations of the mean
- There is an approximately a 99.7% chance X falls within three standard deviations of the mean
The value you get for the probability density function is not a probability in and of itself. What do you have to do to calculate the actual probability?
Use calculus to integrate the PDF across the desired rand and calculate the AUC which tells you the probability X falls in the range.
What is a transformation?
transformations are just functions that map a value in one space to a value in a second space. Usually we can identify functions to “back-transform” the new data to the original space(useful transformations will allow us to do this).
What is the log transformation look like(i.e. the calculations)? What are log transformations useful for?
To go from original data to log transformed data: take the log of the x value you are trying to transform, keep the y value the same. TO back transform take the (new) x value to the power of 10.
Log transformation are useful fro mapping right-skewed distributions into more normal distributions in the transformed space.
What is the formula for calculating the 95% confidence interval for a population mean? Define each variable.
Refer to notes page
Is the confidence interval random?
Yes. Our X bar could be different depending on the specific random sample we take. Moreover, the confidence interval will either cover or not cover the true mean
What is the coverage probability?
The fraction of samples that are taken from the dataset that cover the true population mean
What is the difference between mu and X bar?
mu is the population mean
X bar is the sample mean