Summer Work Flashcards
individuals
the objects described by a set of data
variable
an attribute that describes a person, place, thing, or idea who’s value can vary from one entity to another
categorical variable
variables that take on values that are names or labels
quantative variable
variables that are numerical and represent a measurable quantity
discrete variable
a variable that can take on any value, free of maximum or minimum constraints
continuous variable
a variable that can take on any value between its minimum value and its maximum value
univariate data
data that only investigates one variable
bivariate data
data that investigates the relationship between two variables
population
the total set of observations that can be made.
sample
a set of observations drawn from a population.
census
a study that obtains data from every member of a population
distribution
a function that shows the possible values for a variable and how often they occur
inference
the process of using data analysis to infer properties of an underlying distribution of probability
frequency table
a table that shows frequency counts for a categorical variable
relative frequency
a measure of the number of times that an event occurs
table
the values of the cumulative distribution functions, probability functions, or probability density functions of certain common distributions presented as reference tables for different values of their parameters
round-off-error
a mathematical miscalculation or quantization error caused by altering a number to an integer or one with fewer decimals
pie chart
a circular statistical graphic that’s divided into slices to illustrate numerical proportion
bar graph
a chart made up of columns or rows plotted on a graph
two-way table
a useful tool for examining relationships between categorical variables in which the entries in the cells can be frequency counts or relative frequencies
marginal distribution
Entries in the “Total” row and “Total” column of a two-way table
conditional distribution
The relative frequencies in the body of a table
segmented bar graph
A graph of frequency distribution for categorical data set
side-by-side bar graph
a graph that can be used to organise and display the data that arises when a group of individuals or things are categorised according to two or more criteria
association
any relationship between two variables
simpson’s paradox
a phenomenon in probability and statistics in which a trend appears in several groups of data but disappears or reverses when the groups are combined
dotplot
a type of graphic display used to compare frequency counts within categories or groups
shape
describes the distribution (or pattern) of the data within a dataset
mode
the most frequently appearing value in a population or sample
center
the middle of a distribution
spread
the extent to which a distribution is stretched or squeezed
range
difference between the biggest and smallest random variable in a data set
outlier
a data point that diverges greatly from the overall pattern of data
symmetry
an attribute used to describe the shape of a data distribution that, when graphed, can be divided at the center so that each half is a mirror image of the other
skewed right
distributions with fewer observations on the right (toward higher values)
skewed left
distributions with fewer observations on the left (toward higher values)
unimodal
distributions with one clear peak
bimodal
distributions with two clear peaks
multimodal
a probability distribution with more than one peak, or “mode”
stemplot
a graph used to display quantitative data, generally from small data sets (50 or fewer observations) in which the entries on the left are called stems; and the entries on the right are called leaves
splitting stems
a term used to describe stem-and-leaf plots that have more than 1 space on the stem for the same interval
back-to-back stemplots
a graphic option for comparing data from two populations in which the center consists of a column of stems with a vertical line on each side, while leaves representing one data set extend from the right, and leaves representing the other data set extend from the left.
plot
a graphical technique for representing a data set, usually as a graph showing the relationship between two or more variables
histogram
a graph made up of columns plotted on a graph in which the columns are positioned over a label that represents a continuous, quantitative variable and the height of the column indicates the size of the group defined by the column label
mean
an average score, often denoted by X that is calculated by finding the sum of individual scores and dividing by by the number of individuals
median
a simple measure of central tendency found by arranging the observations in order from smallest to largest value and finding the center value (if there are two center values, the median is the average of those two)
interquartile range (iqr)
a measure of variability found by subtracting Q1 (the middle value in the first half of a data set) from Q3(the middle value in the second half of a data set)
five-number summary
a set of descriptive statistics that provides information about a dataset consisting of the sample minimum (smallest observation), the lower quartile or first quartile, the median (the middle value), the upper quartile or third quartile, and the sample maximum (largest observation)
boxplot
a type of graph used to display patterns of quantitative data that consists of a “box” (hence, the name), which goes from the first quartile (Q1) to the third quartile (Q3) and two “whiskers” that go from the ends of the box to the largest (for the right whisker) and smallest (for the left) non-outliers (outliers are plotted as separate points)
standard deviation
a numerical value used to indicate how widely individuals in a group vary that is equal to the square root of the variance
variance
a numerical value used to indicate how widely individuals in a group vary that is calculated using the mean, a certain element of a population/sample, and the total number of elements in said sample/population