Summer Work Flashcards
Individuals
objects described by a set of data
Variable
any characteristic of an individual
Categorical Variable
places an individual into one of several groups or categories (ex: zip code)
Quantitative Variable
Takes a numerical value for which it makes sense to find an average (GPA)
Discreet Variable
have a fixed set of possible values, no in-between values, probability adds up to 1
Continuous variable
value is obtained by measuring, there are an infinite set of values
Univariate Data
results from a study looking at only one variable
Bivariate Data
results from a study that compares the relationship between two variables
Population
refers to the total set of observations that can be made
Sample
refers to a set of observations drawn from the population
Census
a study that obtains data from every member of a population
Distribution
Tells us what values a variable takes and how often it takes such values
Marginal Distribution
the distribution of values of that variable among all individuals described by the table.
Inference
a table that shows frequency counts for categorical data
Frequency Table
drawing conclusions that go beyond the data
Relative Frequency Table
shows relative frequency for different categories of categorical data
Round-Off Error
the effect of rounding off results
Pie Chart
show the distribution of categorical data, each slice is sized by percent of category
Bar Graph
represents categorical data with bars
Two Way Table
shows relationships between categorical variables
Relative Frequency
subgroup count/total count
Conditional Distribution
probability distribution for a subpopulation (inside the cells of two way tables)
Segmented Bar Graph
stacked bar chart, each bar makes up 100% of discrete values
Side by Side Bar Graph
uses bars to make side by side comparisons of data
Association
refers to the relationship that occurs if knowing the value of one variable helps predicts the value of the other
Simpsons Paradox
statistical phenomenon where an association between two variables in a population emerges, disappears or reverses when the population is divided into subpopulations.
Dot Plot
a type of graphic display used to compare frequency counts within categories or groups
Shape
describes graph
Mode
value that is repeated most often in a population or sample
Center
middle of distribution
Spread
is the extent to which a distribution is stretched or squeezed
Range
difference between the biggest and smallest random variable
Outlier
a data point that diverges greatly from the overall pattern of data
Symmetric
can be divided at the center so that each half is a mirror image of the other
Skewed Right
distributions with fewer observations on the right (more low values)
Skewed Left
distributions with fewer observations on the left (more high values)
Unimodal
distributions with one clear peak
Bimodal
distributions with two clear peaks
Multimodal
distributions with more than one peak
Stemplot
used to display quantitative data, usually from smaller data sets
Splitting Stems
method for spreading out a stemplot that has too few stems
Back to Back Stem Plots
stem in the middle vertical line, leaves on either side
Histogram
The columns are positioned over a label that represents a continuous quantitative variable, height represents size of group defined by label
Mean
an average, sum of individuals divided by the total number of individuals
Median
middle value in data set or average of two middle values
Inter Quartile Range
The interquartile range is equal to Q3 minus Q1
Five Number Summary
minimum, maximum, median, Q1 and Q3
Box Plot
type of graph used to display patterns of quantitative data
Standard Deviation
numerical value used to indicate how widely individuals in a group vary, square root of average deviation from mean
Variance
numerical value used to indicate how widely individuals in a group vary, average squared deviation from mean