Summer Assignment Flashcards
individuals
objects described by a set of data
variable
any characteristic of an individual
categorical variable
places an individual into one of several groups or categories
quantitative variable
takes numerical values where it makes sense to take an average
distribution
tells us what values the variable takes and how often it takes those values
discrete variables
a variable that cannot take on a value between its minimum and maximum value. Like flipping a coin; you can’t get 2.5 heads, only 2 or 3 heads
Continuous variables
opposite of discrete variables. Can take on any value between minimum and maximum value
Univariate Data
when you conduct a study that only looks at one variable; data that only contains one variable
Bivariate Data
data that contains two variables and examines the relationship between them. For example, height and weight.
Census
a study that obtains data from every member of a population. In most studies, a census is not practical, because of the cost and/or time required
Boxplot
Also known as box and whisker plot. A boxplot splits the data set into quartiles.
Bar Graph
A bar chart or bar graph is a chart or graph that presents categorical data with rectangular bars with heights or lengths proportional to the values that they represent.
Conditional
The conditional probability of an event B is the probability that the event will occur given the knowledge that an event A has already occurred.
Sample
a set of observations drawn from a population.
Population
population refers to the total set of observations that can be made
Inference
Statistical inference is the process of using data analysis to deduce properties of an underlying distribution of probability
Frequency Table
When a table shows frequency counts for a categorical variable, it is called a frequency table
Relative Frequency
To compute relative frequency, one obtains a frequency count for the total population and a frequency count for a subgroup of the population. The relative frequency for the subgroup is:
Relative frequency = Subgroup count / Total count
Table
an arrangement of data in rows and columns, or possibly in a more complex structure.
Interquartile Range
a measure of variability based on dividing a data set into quartiles.
Five-Number
A five-number summary consists of five values: the most extreme values in the data set (the maximum and minimum values), the lower and upper quartiles, and the median
summary
The information that gives a quick and simple description of the data
Standard Deviation
The standard deviation is the square root of the variance. Its symbol is the greek letter sigma
Variance
The variance is the average of the squared differences from the Mean. It is a numerical value used to indicate how widely individuals in a group vary. If individual observations vary greatly from the group mean, the variance is big; and vice versa.
Roundoff Error
the difference between a rounded number and the actual value.
Pie Chart
A pie chart (or a circle chart) is a circular statistical graphic, which is divided into slices to illustrate numerical proportion. In a pie chart, the arc length of each slice (and consequently its central angle and area), is proportional to the quantity it represents.
Two-Way Table
A two-way table of counts organizes data about two categorical variables. Values of the row variable label the rows that run across the table, and values of the column variable label the columns that run down the table
Marginal Distribution
This is a distribution of one of the variables. These are the counts or percentages found in the last row or column of the table or margins.
Distribution
is a listing or function showing all the possible values (or intervals) of the data and how often they occur
Segmented Bar Graph
In this graph each bar is a whole and is divided proportionally based on the conditional distributions for each variable. (2 variables, 1 for bar and one for the segments of the bar)
Side-by-side Bar
like the segmented bar graph but the segments are placed next to each other instead of on top of each other.
graph
A graph is a picture that represents data in an organized manner.
Association
any relationship between two measured quantities that renders them statistically dependent.
Simpson’s Paradox
an effect that occurs when the marginal association between two categorical variables is qualitatively different from the partial association between the same two variables after controlling for one or more other variables.
Dotplot
a type of graphic display used to compare frequency counts within categories or groups.
Shape
The shape of a distribution is described by its number of peaks and by its possession of symmetry, its tendency to skew, or its uniformity
Mode
The mode of a set of data values is the value that appears most often.
Center
The center of data is a single number that summarizes the entire data set. It is important to use the correct method for finding the center of data so you can accurately summarize the data set. You can do this by using either the mean or the median.
spread
dispersion (also called variability, scatter, or spread) is the extent to which a distribution is stretched or squeezed. Common examples of measures of statistical dispersion are the variance, standard deviation, and interquartile range.
Range
lowest value to highest value difference
Outlier
a data point that differs significantly from other observations.
Symmetric
data set split down the center is equal
Skewed Right
the mean is typically greater than the median
Skewed Left
the mean is typically smaller than the median
Unimodal
a distribution with one clear peak or most frequent value
Bimodal
a probability distribution with two different modes. These appear as distinct peaks
Multimodal
many different modes or peaks
Stemplot
also known as stem and leaf plot. a way of comparing data.
Splitting Stems
when the leaves on the stem and leaf plot get too crowded you can split the stems into two different components, like 0-4 and 5-9 instead of 0-9.
Back-to-back Stem
compare 2 populations by having the stem in the middle and the 2 populations back to back against the stem. Easy to compare
Plots
how people display their data so it is easy to compare. EX: box plots, stem and leaf plots, scatterplots, etc.
Histogram
bar graph-like representation of data that buckets a range of outcomes into columns along the x-axis
Mean
average. add all numbers together then divide by number of data points to get the mean
median
the middle number of the data set