Chap I Flashcards
Individuals
The objects described by a set of data - can be people, animals, or things
Variable
Any characteristic of an individual - can take different values for different individuals
Categorical Variable
places in individual into one of several groups or categories - values are names or labels
Quantitative Variable
takes numerical values for which it makes sense to find an average - represent a measurable quality
Discrete Variables
A variable that cannot take on any value between its minimum and maximum value - for example, when flipping a coin, the number of heads can be any integer value between 0 and plus infinity, but could not be any value because you could not get 2.5 heads.
Continuous Variable
A variable that can take on any value between its minimum and maximum value - for example, the weight of a firefighter between 150-250 pounds, because the firefighter’s weight could be any value between 150-250 pounds.
Univariate Data
A study that looks at only one variable - e.g. a study that looks at the weight of high school students
Bivariate Data
A study that examines the relationship between two variables - e.g. a study looking at the relationship between the height and weight of high school students.
Population
The total set of observations that can be made
Sample
A set of observations drawn from a population
Census
A study that obtains data from every member of a population - often no practical because of time/cost involved.
Distribution
Tell us what values the variable takes and how often it takes those values
Inference
Drawing conclusions that go beyond the data at hand, though it depends on how the data is produced
Frequency Table
Displays counts (frequencies) of x variable in each category
Relative Frequency Table
Displays percentages (relative frequencies) of x variable in each category
Interquartile Range (IQR)
Measures of the range of the middle 50% of the data - measure of variability, equal to Q3 - Q1.
Five-Number Summary
Consists of the smallest observation, the first quartile, the median, the third quartile, and the largest observation & divides each distribution roughly into quarters.
Boxplot
A type of graph used to display patterns of quantitative data & splits the data into quartiles, consisting of a box the size of the Q1 & Q3, with a line in the middle representing the median and lines, or whiskers, extending from the box to the largest and smallest observations that aren’t outliers.
Standard Deviation
A numerical value used to indicate how widely individuals in a group vary - measures the deviation from the mean and differs based upon population or a sample. Standard deviation for a population is found using σ = sqrt [ Σ ( Xi - X )2 / N ] and standard deviation for a sample is found using s = sqrt [ Σ ( xi - x )2 / ( n - 1 ) ]
Variance
A numerical value used to indicate how widely objects in a group vary and is equal to the square of standard deviation. Variance of a population is found using σ2 = Σ ( Xi - X )2 / N & variance of a sample is found using s2 = Σ ( xi - x )2 / ( n - 1 )
Roundoff Error
When the exact percentages add up to 100%, but the rounded percentages only come close - does not indicate mistakes in work
Pie Chart
Shows distribution of categorical variable as a pie, with slices sized by count or percentage per category - must have all categories
Bar chart
Represent each category as a bar, where heights show the count or percentage - can be more flexible than a pie chart and display the distribution of categorical variables or compare quartiles.
Two-Way Table
Examines relationships between categorical variables - contains a row variable and a column variable
Marginal Distribution
The distribution of values of one of the categorical variables in a two-way table of counts among all individuals described by the table, though a percentage is often more informative. Divide the row/column total by the table total and convert to a percentage to get the MD.
Conditional Distribution
Describes values of that variable among individuals who have a specific value of another variable - separate conditional distribution for each value of the other variable, often uses relative frequencies
Segmented Bar Graph
A bar graph that uses one category to separate into bars (ex: male/female) and another divided into connected segments of the bar, adding up to 100%.
Side-by-Side Bar Graph
A bar graph where two categories (ex: male/female) are made of two (or more) separate bars for one category, the bars being repeated each category
Association
When knowing the value of a variable helps to predict the value of the other
Simpson’s Paradox
An effect where the marginal association between two categorical variables is qualitatively different than the partial association between the same two variables - tldr - averages can be misleading
Dotplot
A plot where each data value is shown as a dot above its locative on a numberline.
Shape
Describes the way a graph looks - focus on the main features, such as major peaks, clusters, obvious gaps, and potential outliers
Mode
the most common value
Center
the midpoint of the data
Spread
similar to range, but not a singular value - data varies from __ to __
Range
A measure of variability that shows the full spread of the data - single value gotten by subtracting the smallest value from the largest value
Outlier
Any observation that falls more than 1.5 x IQR above the third quartile or below the first quartile
Symmetric Distribution
When the right and left sides of a graph are approximately mirror images of the other
Skewed Right
When the right side of the graph is longer than the left - in the direction of the tail
Skewed Left
When the left side of the graph is longer than the right - in the direction of the tail
Unimodal
having a single peak
Bimodal
having two clear peaks
Multimodal
having more than two clear peaks
Stemplot
A plot used to display quantitative data, usually from smaller data sets, consisting of a stem (including all but the final digits of an observation) and leaves (the final digit of an observation
Splitting Stems
Dividing a stem into further pieces - eg 0-9 stem becomes two 0 stems, one with a spread of 0-4 and the other with a spread of 5-9
Back-to-Back Stem
A stemplot plot where leaves are on either side of the stem, often to represent two different categories of data
Plots
A graphing technique used to represent a data set, often showing the relationship between or more variables
Histogram
A graph of distribution using quantitative data where nearby values are grouped together
Mean
An average score that shows how large each data value would be if the total were split equally amongst the observations & found by finding the sum of individual scores and diving it by the number of individuals. Not resistant measure of center.
Median
The midpoint of distribution, where around half of the observations are smaller than the value and about half are larger. Resistant measure of center.