Analysis of a Single Variable (Exam 1) Flashcards

1
Q

what are the 4 steps in a statistical analysis?

A

1) identify a population of interest and a question about that population
2) collect sample data from the population
3) perform preliminary data analysis to summarize data (descriptive statistics)
4) draw conclusions from data (inferential statistics)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

distribution

A
  • tells us what values it takes and how often it takes those values
  • ways of describing a variable’s distribution (graphically/numerically) from a sample depend on the variable type (categorical/quantitative)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

categorical variables

A

variables that put the individual into one of several groups/categories

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

distribution of a categorical variable

A
  • lists the categories and gives either the count or percentage of individuals who fall in each category
  • 2 methods:
  • pie chart, bar graph
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

pie chart

A
  • shows how a whole group (the variable) is subdivided into smaller groups (categories)
  • the size of the slice is proportional to the fraction of the sample/population in that category
  • the sum of the %’s shown by each slice must add up to 100% (every individual must be represented, uses an “other” category)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

bar graph

A
  • represents each category as a bar
    The height of the bar shows the category count%
  • don’t have to plot every individual in the sample
  • to transform it into a pie chart, we’d have to know the “other” category to make it add up to 100%
  • has space between each bar because the categories aren’t in order or directly adjacent to each other
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

distribution of a quantitative variable

A
  • tells us what values the variable takes and how often it takes these values
  • 4 methods:
  • histogram, stemplot, boxplot, time plot
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

quantitative variables

A

variables that take values for which arithmetic operations make sense

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

histogram

A
  • a graph of the distribution of a quantitative variable whose values are grouped together
  • take the full range and divide it up
  • bars are directly adjacent to each other (touch), “classes” are all close to each other, and ordered
  • nobody is excluded
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

stemplot

A
  • “sideways” histogram that shows the actual numbers of the distribution
  • stem: consists of all but the final digit
  • leaf: the final digit
  • good for small datasets (<40-50)
  • unlike histograms, stemplots show the actual values of the data
  • less flexible than histograms
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

shape

A
  • symmetric distribution: the right & left sides of the histogram are approx mirror images of each other
  • skewed to the right: the right side of the histogram extends much further out than the left
  • skewed to the left: the left side of the histogram extends much further out than the right
  • unimodal: 1 peak
  • bimodal: 2 peaks
  • multimodal: multiple peaks
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

center

A
  • where is the middle?
  • mean: average of all observations of the variable, but highly influenced by outliers & extreme data values (preferred unless the distribution is strongly skewed/presents outliers)
  • median: resistant to outliers & extreme values, midpoint (50th percentile) of the distribution of the variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

spread

A
  • what is the variability of the data from the center?

these are not resistant to outliers…
- variance: measures the dispersion (spread about the mean) of all the observations
- standard deviation: the square root of the variance, indicates the extent of deviation for a group as a whole

these are better to describe the spread when the data is strongly skewed/present outliers…
- 5-number summary (percentiles): min, Q1, median, Q3, max
- interquartile range (IQR): Q3 – Q1, can be used to identify outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

boxplot

A
  • graphical rendition of statistical data based on the min, Q1, median, Q3, max
  • the info in the 5-number summary can be graphically displayed in a boxplot
  • central box spans Q1 & Q3
  • line marks the median within the central box
  • lines extend from the box to mark the min & max
  • special symbols can denote outliers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

time plot

A
  • for some quantitative data, we are measuring 1 subject at many time points (rather than measuring our variable across many subjects)
  • plots each observation against the time at which it was measured
  • 2 patterns: cycle (regular up & down movements over time) & trend (a long-term upward/downward movement over time)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly