Summer Work Flashcards
Individuals
Individuals are the objects described by a set of data. Individuals may be people,
animals, or things.
Variable
A variable is any characteristic of an individual. A variable can take different values
for different individuals.
Categorical Variable
A categorical variable places an individual into one of several groups or categories.
Quantitative Variable
A quantitative variable takes numerical values for which it makes sense to find an
average.
Discrete Variables
If a variable can take on any value between its minimum value and its maximum value, it is called a continuous variable; otherwise, it is called a discrete variable.
Univariate Data
When we conduct a study that looks at only one variable, we say that we are working with univariate data. Suppose, for example, that we conducted a survey to estimate the average weight of high school students. Since we are only working with one variable (weight), we would be working with univariate data.
Bivariate Data
When we conduct a study that examines the relationship between two variables, we are working with bivariate data. Suppose we conducted a study to see if there were a relationship between the height and weight of high school students. Since we are working with two variables (height and weight), we would be working with bivariate data
Population
the total set of observations that can be made.
Sample
a set of observations drawn from a population.
Census
A census is a study that obtains data from every member of a population.
Distribution
The distribution of a variable tells us what values the variable takes and how often
it takes these values.
Frequency Table
When a table shows frequency counts for a categorical variable, it is called a frequency table Below, the bar chart and the frequency table display the same data.
Relative Frequency
A frequency count is a measure of the number of times that an event occurs. To compute relative frequency, one obtains a frequency count for the total population and a frequency count for a subgroup of the population. The relative frequency for the subgroup is
Bar Graph
A bar chart is made up of columns or rows plotted on a graph. Here is how to read a bar chart made up of columns.
Two-Way Table
A two-way table (also called a contingency table) is a useful tool for examining relationships between categorical variables. The entries in the cells of a two-way table can be frequency counts or relative frequencies
Marginal Distribution
The marginal distribution of one of the categorical variables in a two-way table of
counts is the distribution of values of that variable among all individuals described by
the table.
Conditional Distribution
A conditional distribution of a variable describes the values of that variable among individuals who have a specific value of another variable. There is a separate conditional distribution for each value of the other variable.
Association
We say that there is an association between two variables if knowing the value of
one variable helps predict the value of the other.
Dotplot
A dotplot is a type of graphic display used to compare frequency counts within categories or groups. As you might guess, a dotplot is made up of dots plotted on a graph.
Mode
The mode is the most frequently appearing value in a population or sample.
Range
The range is a simple measure of variation in a set of random variables. It is difference between the biggest and smallest random variable.
Outlier
In regression analysis, a data point that diverges greatly from the overall pattern of data is called an outlier.
Symmetric
Symmetry is an attribute used to describe the shape of a data distribution. When it is graphed, a symmetric distribution can be divided at the center so that each half is a mirror image of the other.
Skewed Right
Distributions with fewer observations on the right (toward higher values) are said to be skewed right
Skewed Left
Distributions with fewer observations on the left (toward lower values) are said to be skewed left.
Unimodal
Distributions of data can have few or many peaks. Distributions with one clear peak are called unimodal
Bimodal
Distributions with two clear peaks are called bimodal.
Multimodal
Distributions with more than two clear peaks are called bimodal.
Stemplot
A stemplot is used to display quantitative data, generally from small data sets (50 or fewer observations).
Back-to-back Stem
Back-to-back stemplots are a graphic option for comparing data from two populations.
Histogram
A histogram is made up of columns plotted on a graph. Here is how to read a histogram.
Mean
A mean score is an average score, often denoted by X. It is the sum of individual scores divided by the number of individuals.
Median
The median is a simple measure of central tendency. To find the median, we arrange the observations in order from smallest to largest value. If there is an odd number of observations, the median is the middle value. If there is an even number of observations, the median is the average of the two middle values.
Interquartile Range
The interquartile range (IQR) is a measure of variability, based on dividing a data set into quartiles.
Boxplot
A boxplot, sometimes called a box and whisker plot, is a type of graph used to display patterns of quantitative data.
Standard Deviation
The standard deviation is a numerical value used to indicate how widely individuals in a group vary. If individual observations vary greatly from the group mean, the standard deviation is big; and vice versa.
Variance
The variance is a numerical value used to indicate how widely individuals in a group vary. If individual observations vary greatly from the group mean, the variance is big; and vice versa.