Chapter 1 Flashcards
Individuals
The objects described by a set of data. Individuals may be people, animals, or things.
Variable
Any characteristic of an individual. A variable can take different values for different individuals.
Categorical Variable
Places individual into one of several groups or categories.
Quantitative Variable
Takes numerical values for which it makes sense to find an average.
Discrete Variable
When a variable cannot take on any value between its min. and max. value (Flipping a coin: Cannot get 1/2 tails or 2.5 heads)
Continuous Variable
When a variable can take on any value between its minimum value in its maximum value.
Univariate Data
when you only look at one variable.
Bivariate Data
when you conduct a study that examines the relationship between two variables.
Population
The total set of observations that can be made. (if you were studying the weight of adult women, the population is the set of weights of all the women in the world)
Sample
A set of observations drawn from a population.
Census
A study that obtains data from every member of the population. Most of the time a census is not practical, because of the cost and/or time required
Distribution
The distribution of a variable tells you that hat values the variable takes and how often it takes these values.
Inference
The process of using data analysis to deduce properties of an underlying distribution of probability.
Frequency Table
When a table shows frequency for a categorical variable.
Relative Frequency
Frequency count for subgroup of a population divided by the total count. (Percent)
Table
An arrangement of data in rows and columns.
Roundoff Error
When rounded percents do not add to 100% and are only the effect of rounding off results. EX: 99.9%
Pie Chart
Show the distribution of a categorical variable. Need all categories that make up the whole or an “Other” category.
Bar Graph
Represent each category as a bar. Bar heights show the category counts or percents. Can compare any set of quantities. Important that width of bars is equal when making them
Two-Way Table
Organizes data about two categorical variables measured for the same set of individuals. Groups outcomes into categories.
Marginal Distribution
The marginal distribution of one of the categorical variables in a two-way table is a distribution of values of that variable among all individuals described by the table.
Conditional Distribution
A conditional distribution of a variable describes the values of that variable among individuals who have a specific value of another variable.
Segmented Bar Graph
Stacks segments for each category. Harder to read.
Side-by-Side Bar Graph
Used to compare data.
Association
There is an association of knowing one variable helps predict the value of the other.
Simpson’s Paradox
A phenomenon where a trend appears in several different groups of data but disappears or reverses when these groups are connected.
Dotplot
Each value is shown as a dot on a number line. Used to show five number summary.
Shape
(First part of SOCS)
- Peaks
- Gaps
- Clusters
Mode
Most frequently appearing value in a population or sample.
Center
The midpoint of the values.
Spread
Measures of spread describe how similar or varied the set of values are for a particular variable. (range, quarreled, IQR, variance, standard deviation)
Range
The difference in the lowest and highest values.
Outlier
A data point that diverges greatly from the overall pattern of the data is an outlier. (1.5 x IQR)
Symmetric
Each half is a mirror image of the other.
Skewed Right
Distribution with fewer observations on the right (toaward higher values) also called positively-skewed
Skewed Left
Distributions with fewer observations on the left (toward lower values) also called negatively-skewed
Unimodal
Distributions of data with one clear peak.
Bimodal
Distribution with two clear peaks.
Multimodal
Distribution with several clear peaks.
Stemplot
Displays quantitative data generally from small data sets (<50) Requires a key.
Splitting Stems
Adding stems on the stemplot to better organize and understand the data. (0|1134 AND 0|56677)
Back-to-back Stem Plots
Used to compare groups.
Histogram
Columns plotted on a graph. Represents a quantitative variable. Height indicates size of group.
Mean
The average score. (x-bar) Take the sum of the individual scores divided by the number of individuals.
Median
The center point of the data. Arrange smallest to largest and find center.
IQR
Q3-Q1
Measure of Variability
Five-Number Summary
Median, Q1, Q3, Maximum, and Minimum.
Provides quick, overall description of the distribution.
Boxplot
Splits the data quartiles.
Box contains Q1, Median/Q2, Q3
Horizontal line shows range.
Outliers are marked as dots, separate from the horizontal line.
Standard Deviation
A numerical value used to indicate how widely individuals in a group vary.
Looks at how spread out a group of numbers is from the mean.
If individual observations vary greatly from the group mean, the standard deviation is big.
Standard deviation = square root of the variance
Variance
A numerical value used to indicate how widely individuals in a group vary.
Measures the average degree to which each point differs from the mean.
If individual observations vary greatly from the group mean, the variance is big.
Variance = square of the standard deviation