Summer Work Flashcards
Individuals
the objects described by a set of data (people, animals, objects, etc.)
Variable
any characteristic of an individual. A variable can take a different value for different individuals.
Categorical Variables
places and individual into one of several groups or categories.
Quantitative Variables
takes numerical values for which it makes sense to find an average.
Discrete Variables
a variable that can not take on any value between its minimum and maximum.
Continuous
a variable that can take on any value between its minimum and maximum.
Univariate Data
data that only looks at one variable
Bivariate Data
data that looks at the relationship between two variables
Population
the total set of observations that can be made
Sample
a set of observations drawn from a population
Census
a study that obtains data from every member of a population. A census is not usually practical because of the cost and/or the time required.
Distribution
tells us what values the variable takes and how often it takes these values.
Inference
a conclusion that goes beyond the data at hand.
Frequency Table
displays the counts (frequencies) of variables for each individual.
Relative Frequency Table
shows the percents of variables for each individual.
Roundoff Error
The difference between the sum of the percents in a relative frequency table and 100%. Roundoff errors are caused by rounding off results.
Pie Chart
show the distribution of a categorical variable as a “pie” whose slices are sized by the counts or percents for the categories. It includes all the categories that make up the whole.
Bar Graph
represent each category as a bar. Bar heights show the category or percents.
Marginal Distribution
the distribution of values of one of the categorical variables in the two way table among all individuals described by that table
Conditional Distribution
describes the values of that variable among individuals who have a specific value of another variable. There are separate conditional distribution for each value of the other variable.
Association
the value of one variable helps predict the value of the other.
Simpson’s Paradox
a trend appears in several different groups but disappears or reverses when those groups are combined.
Dotplot
one of the simplest graphs to construct and interpret where each data point is shown as a dot above its location on a number line.
Shape
a way to describe the distribution of a set of data. Features of shape can include symmetry, skewed left or right, peaks, etc.
Mode
the most frequently appearing value in a population or sample.
Center
the middle of a distribution. There are different measures of center including median and mean.
Spread
the dispersion, or extent to which a distribution is stretched or squeezed. Measures of spread include IQR and standard deviation.
Range
a measure of variation in a set of random variables that is equal to the difference between the biggest and smallest random variable.
Outlier
a data point that diverges greatly from the overall pattern of the data.
Symmetric
the right and left sides of the graph are approximately mirror images of each other.
Skewed Right
the right side of the graph is much longer than the left side
Skewed Left
the left side of the graph is much longer than the right side
Unimodal
the data has a single peak
Bimodal
the data has two peaks
Multimodal
the data has more than two peaks
Stemplot
simple graphical display for smaller sets of data that give a quick “picture” of the distribution while using the actual numerical values.
Splitting Stems Plot
a stem plot where each stem is spit into two and the leaves are split between the two stems with 0-4 and 5-9.
Back-to-back Stems Plot
On each side of common stems, there are two variables and their leaves.
Histogram
made up of columns plotted on a graph. These graphs are usually used to display quantitative data.
Mean
the sum of a list of numbers, divided by the number if elements in the list.
Median
the middle value in a list
IQR
the difference between Q3 and Q1.
Five Number Summary
includes the lowest value, Q1, the median, Q3, and the highest value. These 5 values can be used to make a box plot.
Box plot
a method for depicting numerical data through its quartiles
Standard Deviation
the numerical value used to indicate how widely individuals in a group vary. If individual observations vary greatly from the group mean, the standard deviation is big, and vice versa. The Standard Deviation is the square root of the variance.
Variance
numerical value used to indicate how widely individuals in a group may vary. If individual observations vary greatly from the group mean, the variance is big, and vice versa. There is the variance of a sample and the variance of a population. The variance is the Standard Deviation squared.
Two Way Table
a useful tool for examining relationships between categorical variables. Entries in the cells of two-way tables can be frequency counts or relative frequencies.
Segmented Bar Graph
a kind of stacked bar chart where each bar shows 100% of the value but shows the distinctions in observations. These can be used as a way to depict two-way tables and are generally useful for comparing values among different variables and categories.
Side-by-side bar graph
a chart where the bars are split into colored bar segment that are placed next to each other.