Midterm 1 Flashcards
Statistics
science of collecting, organizing, and analyzing data
What do biostatisticians look to achieve?
attempt to gain insight and draw conclusions using data
Can stats lie?
No but they can be wrong
What are some ways to chart categorical data?
bar graphs and pie charts
What are some ways to chart quantitative data?
histograms and scatterplots
What are some methods to organize and summarize raw data?
Graphically, numerically, and exploratory data analysis
Variable?
Characteristic of an individual
Classifying variables?
Questions to ask when designing or reviewing an experiment
Categorical variable?
individual placed in a category-arithmetic operations cannot be applied to these data
Quantitative variable?
things that arithmetic operations can be performed on
What does a pie chart represent?
How one categorical variable breaks down into components
What does a bar graph represent?
Each characteristic is represented by a bar
What does a histogram represent?
Summary graph from a single variable
What does a dot plot represent?
Raw data. Used to describe patterns in variability
What does a time plot represent?
Horizontal Variable (time). Changes in line between points show a change in time.
What does the vertical axis represent in a histogram?
Frequency or relative frequency
What is an extreme point known as?
Outlier
What is the mean?
Measures of location or measures of central tendency –
measuring center.
What is the median?
midpoint of the distribution such that half of the
numbers are smaller and the other half are larger
What is the median if n is even?
mean of centre two numbers
What is the mode?
the most common or frequent value - a list can have more than
one mode
Is the median resistent to outliers?
yes
Is the mean resistant to outliers?
No
Quartiles?
Quartiles mark the mid point between the lower observation
and median and the median and the upper observation.
What is the five number summary?
Lowest number, Q1, median, Q3, and largest number
What is a graph with the five number summary?
Box plot
What is interquartile range?
Distance between first and third quartiles.
What is the standard deviation?
Measures variation around the mean
How do you organize a statistical problem?
State, plan, solve, conclude
What is a density curve?
Line drawn through historgam
Is a density curve generalizable?
Yes it ignores outliers
What do bars of histograms represent?
Area
What is the area under of density curve always equal to?
1
Median of the density curve?
the point where half the
observations lie above and half below – point where there
are equal areas left and right of median line
Mean of the density curve?
the balance point of the curve if it were made out of
a solid material
What greek letters represent mean and standard deviation?
meuw (mean) and sigma (standard deviation)
What are Normal distributions (curves)?
Bell-shaped curves
Why are Normal distributions important?
1) Good descriptions for some distributions of real data.
* 2) Good approximation to many chance outcomes.
* 3) Many statistical inference procedures based on the
Normal distribution.
What is the distance of 1 deviation on a bell curve?
The point of which the curvature changes
What is the 68-95-99.7 rule?
About 68% of all observations
are within 1 standard
deviation (σ) of the mean (μ).
* About 95% of all observations
are within 2 σ of the mean μ.
* Almost all (99.7%)
observations are within 3 σ of
the mean
What is the shorthand for distribution curves?
N(mean, standard deviation)
(standardization)
What does the z score represent?
indicates how far the observation falls from the
mean and the direction. How many standard deviations away?
How are x and z related?
When x is larger than the mean, z is positive.
* When x is smaller than the mean, z is negative
What does the Standard normal table show?
Area under standard normal curve to the LEFT of the z value
What is cumulative proportion?
proportion of observations that lie at or below x
What is a normal quartile plot?
Z values on x axis and regular values on y axis. Use technology to obtain these. Help to see trend of data.
Response variable?
dependent variable (y axis)
Explanatory Variable?
independent variable (x axis)
Bivariate Data
relationship between two variables
What is a common way to visualize the relationship between 2 variables?
Scatter plot (2 dimensions)
Where do either of the variables go on a scatter plot axis?
Response variable on vertical axis
Explanatory variable on horizontal axis
What three factors are there to look for in a scatter plot?
Form, direction, and strength (and outliers)
What measurement is important for strength and direction on a scatterplot?
Correlation coefficient
What is strength of a scatterplot dependent on?
The scale of the axis
What does r represent in a scatterplot?
+/- means direction and closest to 1 means a strong correlation
Facts about r (correlation).
1) Correlation does not distinguish between explanatory and
response variables.
2) Both variables need to be quantitative.
3) r has no unit of measurement so for any given data set,
when the units of measure change, r does not.
4) Positive r indicates positive association between the
variables; Negative r indicates negative association.
5) r is always a number between -1 and +1. Values near 0
indicate a poor relationship. 1 or -1 indicate a perfect linear
relationship.
6) r is not resistant - greatly affected by outliers - use with
caution with outliers.
7) r only measures strength of linear relationships - not
curved relationships.
What is the linear line used in scatterplots known as?
Regression line
What does a regression line explain?
How y changes in terms of x
What method is used to have the best-fit regression line?
Least-squares method
What is the least squares regression line?
Line where the vertical distance of the data is at a minimum
What does a slope in a regression line represent?
Rate of Change
What does an intercept in a regression line represent?
the value of a when x =0
Should you use a regression line with an extreme outlier?
No
What is the coefficient of determination?
Correlation coefficient squared r^2
What does r^2 represent?
the fraction of variance in y that can be explained by the regression model
What are residuals?
Shows how far data stray from the regression line
What are vertical lines to regression line called?
Residuals
What does the +/- with residuals indicate?
– Residual is positive if it lies above
the regression line.
– Residual is negative if it lies below
the regression line
What is a residual plot?
When the regression line lies horizontal to be able to compare residuals
What is an influential individual?
An outlier who if removed changes the regression line significantly
Extrapolation?
Expanding your data set. Do not do this!
Lurking variable?
a variable that has an important
effect on the relationship but is not among
the variables studied.
Observational study?
observing natural events. Confound lurking variables.
Experiment?
observation + manipulation of variables. Cause and effect relationship
What is a sample?
The part of the population we actually examine
and for which we do have data
What is probability sampling?
individuals or units are randomly
selected; the sampling process is unbiased
What is convenience sampling?
individuals or units are randomly
selected; the sampling process is unbiased
What is single random sampling?
Everyone has a chance of being selected equally
What is a probability sample?
a sample chosen by chance
Stratified random sampling?
population divided into
groups of similar individuals called strata.
What is interference?
using the sample to infer something about the
population
What are cohort studies?
enlist individuals of common demographic and
keep track of them over a long period of time (“prospective”).
Individuals who later develop a condition are compared to those
who don’t develop the condition
What are case-control studies?
start with 2 random samples of individuals
with different outcomes, and look for exposure factors in the
subjects’ past (“retrospective”)
What is an experimental unit?
Individuals of which an experiment is done on
What is a factor?
Explanatory variable (independent variable)
What is a treatment?
specific experimental condition
What is a confounding factor?
an explanatory (independent) variable
that affects or distorts the relationship between another
explanatory variable and its’ response (dependent) variable
since it is related to both
What is a control group?
A treatment to which the other treatments are compared to
eliminate the effects of lurking variables on the
experimental outcome.
What is a placebo?
Fake experimental variable is given to the control group. Helps to make the experiment double blind.
What do randomized comparative experiments use?
Comparison and randomization
Why are Randomized comparative experiments considered the best designed experiments?
give good evidence that the treatments actually cause the
differences observed in the response.
What are the factors that create an ideal experiment?
Control, randomize, and sample size
What is something a well-designed experiment can result in that other types of studies cannot?
A causation statement. Associations means causation. A causes B.
What is realism?
Purpose of experiments. Discovering how the universe and world around us works.
What is a block design?
subjects are divided into blocks (groups
sharing a given characteristic) before the randomization, in
order to account for possible differences between the
blocks. lets us choose how many individuals of
each block will receive each treatment.
What is a main outcome measure?
Most important result from experiment
What is a match pair design?
Combines randomization and matching
What is the placebo effect?
People think something is helping them, when really it is in their head
What is a double-blind experiment?
Neither the patients nor experimenters know who is getting a placebo and who is getting the real thing
What does bimodal mean?
Two peaks (two modes)