Exam 1 Flashcards
Endocrine system
Study designs
Non- human and human studies
Statistics
the collection, organization, analysis, visualization and interpretation of numerical data
The big picture
producing data, exploratory data analysis, Probability, inference. [Population-> sample -> statistics -> parameters -> Population]
study designs
Non-human studies, human studies
non- human studies
experimental: traditional bench research
Non- experimental: field studies
human studies
experimental: randomized controlled Trials
Non- experimental: Observational- generally only reveals correlation.”correlation does not imply causation”
Observational studies
Cohort Study: follow a group of subjects over time, disease incidence.
Case- control: subjects selected on the basis of disease status, “do the diseased differ from the healthy in other ways?”
Cross-sectional study: simultaneously assess disease/exposure in a cross- section of the population.
Randomized control trials
Collect a random sample
– Subjects randomized into treatment or placebo group.
– Single- or double-blinded
– Very large samples, hundreds, thousands, etc.
– Can imply causation
– Gold standard for study design
Data
pieces of information about individuals organized into variables.
variable
a particular characteristic of the individual
Dataset
a set of data identified with particular circumstances. Datasets are typically displayed in tables, in which rows represent individuals and columns represent variables.
quantitative variable
takes a numerical value and represents some kind of measurement.
Categorical variable
: takes a category or label value and places an individual into one of several groups.
Categorical variables are sometimes called qualitative variables.
Exploratory Data Analysis (EDA):
how we make sense of the data by converting them from their raw form to a more informative one.
EDA consists of
• organizing and summarizing the raw data,
• discovering important features and patterns in the data and any striking deviations from those
patterns, and then
• interpreting our findings in the context of the problem
Distribution
what values the variable takes and how often the variable takes those values.
Graphical display of categorical variables
pie chart or bar chart, supplemented by numerical summaries (category counts and percentages).
Histogram
a graphical display of the distribution of a quantitative variable. It plots the number (count) of observations that fall in intervals of values. The histogram is the best graph to use to display the distribution of a quantitative variable.variable on x axis, frequency on y axis
Stemplot
a graphical display of the distribution of a quantitative variable. It has additional unique
features, such as preserving the original data and sorting the data.
4 features of a distribution include:
- Center
- Spread
- Shape
- Outliers
2 dimensions of the shape of a distribution include:
- Symmetry/skewedness
2. Peakness/modality: Number of peaks (modes) the distribution has
Peakness/modality: Number of peaks (modes) the distribution has
a. Unimodal distribution: one with one mode around which the observations are concentrated.
b. Bimodal distribution: one with two modes around which the observations are concentrated.
c. Uniform distribution: one that is kind of flat.
Symmetry/skewedness:
a. Symmetrical (normal) distribution: the left and right sides of the distribution mirror each other,
with one peak (mode).
b. Skewed right distribution: the right tail of the histogram (larger values) is much longer than the
left tail (small values).
c. Skewed left distribution: the left tail of the histogram (smaller values) is much longer than the
right tail (larger values).
Midpoint:
the center of the distribution, or the value that divides the distribution so that approximately
half the observations take smaller values, and approximately half the observations take larger values.
Outliers
observations that fall outside the overall pattern
Center
of the distribution can be described as the most commonly occurring value in the distribution
Mean
describes the center as an average.
Weighted average
the mean is computed by “weighting” each value by its frequency. Some values will have more weight than others.
Median:
the middle value in a distribution (50th percentile) or the POINT above and below which 1/2 of the scores fall. Because the median is not affected by extreme scores, it is most appropriate for skewed distributions of quantitative data
Mode:
the most commonly occurring value in a distribution.
Spread
(also called variability) of the distribution can be described by the approximate range covered
by the data. Three measures of spread are: range, interquartile range, and standard deviation.
Range
the distance between the smallest data point (min) and the largest one (Max).
Interquartile Range (IQR)
measures the variability of a distribution by giving us the range covered by
the middle 50% of the data. IQR = Q3 - Q1
Five Number Summary:
the combination of all five numbers (min, Quartile 1, Median, Quartile 3, Max)
that provides a quick numerical description of both the center and spread of a distribution.
Boxplot
graphically represents the distribution of a quantitative variable by visually displaying the fivenumber summary and any observation that was classified as a suspected outlier using the 1.5(IQR) criterion. Boxplots are most useful when presented side-by-side for comparing and contrasting distributions from two or more groups
Standard deviation
measures the spread by reporting a typical (average) distance between the data points and their average (mean).
Properties of the SD
(1) It should be paired as a measure of spread with the mean as a measure of
center; (2) the only way, mathematically, in which the SD = 0, is when all the observations have the
same value (Ex: 5, 5, 5, … , 5), in which case, the deviations from the mean (which is also 5) are all 0; (3)
it is strongly influenced by outliers in the data.
The Standard Deviation Rule
• Approximately 68% of the observations fall within 1 standard deviation of the mean.
• Approximately 95% of the observations fall within 2 standard deviations of the mean.
• Approximately 99.7% (or virtually all) of the observations fall within 3 standard deviations of the
mean.
Role-Type Classification
When we look at relationships between two variables, each variable can be described in terms of it’s proposed role in the relationship, and the type of information associated with that variable, which determines it’s categorical designation. While these designations DO NOT imply causality, designating our variables as either explanatory, or a response helps us understand the nature
of, or our underlying beliefs about the relationship.
Role
Explanatory or Response
Type
categorical (C) or quantitative (Q)
Categorical variable(high)
mutually exclusive categories exist, Eg. gender, treatment group
also called factor variables in R
They are discrete or countable not measured
Quantitative
something is measured or counted. Eg. Height, family size
Independent Variable
Another way of describing the explanatory variable. Despite what the terms “Independent” and “dependent” would seem to suggest, they do not
necessarily mean that a causal relationship exists.
Dependent Variable:
Another way of describing the response variable.
Despite what the terms “Independent” and “dependent” would seem to suggest, they do not
necessarily mean that a causal relationship exists.
non- EBM studies
Non- Evidence Based studies: Non- medical human subject research: - research to understand basic science Animal & Cell culture studies: - Mice=/people -cells from other people ≠ people
The Conditions for Concluding Cause!
To conclude that Event A causes Event B
- Event A and Event B must be correlated
- Event A must precede Event B
- All other explanations of the relationship between Event A and Event B must be ruled out
Interval(sub)
Subcategory of quantitative Variables
“0” = placeholder. A quantity still exists at zero.
Math properties: addition and subtraction, NO multiplication or division
EX: acidity(pH), intelligence (IQ)
Ratio(sub)
Subcategory of quantitative Variables
“0” = total absence of thing measured
Math properties: addition, subtraction, multiplication, division
Examples: Wealth(dollars), Weight(kg), height (m) measures not counts
quantitative variable(High variable)
also called continuous or scale. something measured using a standard(metter,sec,gram) between any two values=infinity. Involves decimals "0" may or may not mean no amount
Nominal variables(sub)
Categories are identified by name only.
- types of diseases
- Gender (male, female, etc.)
- ethnicity
Ordinal variable(sub)
Categories have name and an order.
- 1st, 2nd, 3rd
- Gold, silver, bronze
- small medium, large,
- pre/post treatment
Confounding Variables
variables that can ruin your experiment by producing useless results that falsely suggest correlation or causation.
- randomization
- stratification, blocking & matching pair designs
- Standardization: Eliminating the influence of potential confounding through division and multiplication
Bins on histogram
How many variables are grouped to be plotted together. (= not included, [= included
c -> c
two way table
C -> Q
Bar & pie, side by side boxplots
Q -> Q
scatterplots, correlation and regression
Sampling approachs
simple random, random, cluster, stratified, multistage sampling
simple random
All individuals/ groups have same probability
random
All individuals, but not all groups have same probability
cluster
Natural clusters are selected
stratified
Random sampling within existing strata
multistage sampling
Combining the above sequentially
Blocking
arranging subjects into similar blocks prior to randomizing into experimental legs.
Matched-pairs
Pre/Post (before and after), Similar subjects are paired
Crossover
Placebo subjects later cross over to treatment leg. Treatment leg later cross over to placebo leg.
aggregated
aggregated data is misleading, and is caused by a lurking variable
reaggregated
correct data that is modified to exclude the lurking variables
Simpson’s Paradox
A trend that is reversed in direction, when the data are considered in either an aggregated form or a disaggregated form. The trend in the aggregated data is misleading, and is caused by a lurking variable that is only visible when examining the disaggregated data.
Extrapolation
Making predictions based on values of an explanatory variable that are outside those used to establish the relationship. Generally considered not valid.
modifications after sampling
blocking, matched pairs, crossover