Exam 1 Flashcards
Endocrine system
Study designs
Non- human and human studies
Statistics
the collection, organization, analysis, visualization and interpretation of numerical data
The big picture
producing data, exploratory data analysis, Probability, inference. [Population-> sample -> statistics -> parameters -> Population]
study designs
Non-human studies, human studies
non- human studies
experimental: traditional bench research
Non- experimental: field studies
human studies
experimental: randomized controlled Trials
Non- experimental: Observational- generally only reveals correlation.”correlation does not imply causation”
Observational studies
Cohort Study: follow a group of subjects over time, disease incidence.
Case- control: subjects selected on the basis of disease status, “do the diseased differ from the healthy in other ways?”
Cross-sectional study: simultaneously assess disease/exposure in a cross- section of the population.
Randomized control trials
Collect a random sample
– Subjects randomized into treatment or placebo group.
– Single- or double-blinded
– Very large samples, hundreds, thousands, etc.
– Can imply causation
– Gold standard for study design
Data
pieces of information about individuals organized into variables.
variable
a particular characteristic of the individual
Dataset
a set of data identified with particular circumstances. Datasets are typically displayed in tables, in which rows represent individuals and columns represent variables.
quantitative variable
takes a numerical value and represents some kind of measurement.
Categorical variable
: takes a category or label value and places an individual into one of several groups.
Categorical variables are sometimes called qualitative variables.
Exploratory Data Analysis (EDA):
how we make sense of the data by converting them from their raw form to a more informative one.
EDA consists of
• organizing and summarizing the raw data,
• discovering important features and patterns in the data and any striking deviations from those
patterns, and then
• interpreting our findings in the context of the problem
Distribution
what values the variable takes and how often the variable takes those values.
Graphical display of categorical variables
pie chart or bar chart, supplemented by numerical summaries (category counts and percentages).
Histogram
a graphical display of the distribution of a quantitative variable. It plots the number (count) of observations that fall in intervals of values. The histogram is the best graph to use to display the distribution of a quantitative variable.variable on x axis, frequency on y axis
Stemplot
a graphical display of the distribution of a quantitative variable. It has additional unique
features, such as preserving the original data and sorting the data.
4 features of a distribution include:
- Center
- Spread
- Shape
- Outliers
2 dimensions of the shape of a distribution include:
- Symmetry/skewedness
2. Peakness/modality: Number of peaks (modes) the distribution has
Peakness/modality: Number of peaks (modes) the distribution has
a. Unimodal distribution: one with one mode around which the observations are concentrated.
b. Bimodal distribution: one with two modes around which the observations are concentrated.
c. Uniform distribution: one that is kind of flat.
Symmetry/skewedness:
a. Symmetrical (normal) distribution: the left and right sides of the distribution mirror each other,
with one peak (mode).
b. Skewed right distribution: the right tail of the histogram (larger values) is much longer than the
left tail (small values).
c. Skewed left distribution: the left tail of the histogram (smaller values) is much longer than the
right tail (larger values).
Midpoint:
the center of the distribution, or the value that divides the distribution so that approximately
half the observations take smaller values, and approximately half the observations take larger values.
Outliers
observations that fall outside the overall pattern
Center
of the distribution can be described as the most commonly occurring value in the distribution
Mean
describes the center as an average.
Weighted average
the mean is computed by “weighting” each value by its frequency. Some values will have more weight than others.