Chapter 8: Elementary Quantitative Data Analysis Flashcards
Quantitative data analysis
Statistical techniques used to describe and analyze variation in quantitative measures
Statistic
a numerical description of some feature of a variable or variables in a sample from a larger population
Descriptivie statistics
Statistics used to describe the distribution of and relationship among variables
Inferential statistics
Statistics used to estimate how likely it is that a statistical result based on data from a randoms ample is representative of the population from which the sample is assumed to have beeen selected
Data cleaning
The process of checking data for errors after the data have been entered in a computer file
Central tendency
The most common value (for variables measured at the nominal level) or the value around which cases tend to center (for a quantitative variable
Variability
The extent to which cases are spread out throught he distribution or clustered around just one value
Skewness
The extent to which cases are clustered more at one or the other end of the distribution of a quantitative variable rather than ina symmetric pattern around its center. Skew can be positive (a right skew), witht he number of cases tapering off in the positive direction, or negative (a left skew), with the number of cases tapering off in the negative direction
Three features in describing the shape of the distribution
- Central Tendency
- Variability
- Skewness
Bar chart
A graphic for qualitative variables in which the variable’s distribution is displayed with solid bars separated by spaces
Histogram
A graphic for quantitative variables in which the variable’s distribution is displayed with adjacent bars
Frequency polygon
A graphic for quantitative variables in which a continuous line connects data points representing the variable’s distribution
Frequency distribution
Numerical display showing the number of cases, and usually the percentage of cases (the relative frequencies), corresponding to each value or group of values of a variable
Percentage
The relative frequency, compute by dividing the frequency of cases in a particular category by the total number of cases and multiplying by 100
Base number (N)
The total number of cases in a distribution
Mode (probability average)
The most frequent value in a distribution; also termed the probability average
Bimodal
A distribution in which two nonadjacent categories have about the same number of cases and these categories have more cases than any others
Unimodal
A distribution of a variable in which only one value is the most frequent
Median
The position average, or the point, that divides a distribution in half (the 50th percentile)
Mean
The arithmetic, or weighted, average compute by adding the value of all the cases and dividing by the total number of cases
Range
The true upper limit in a distribution minus the true lower limit (or the highest rounded value minus the lowest rounded value, plus 1)
Outlier
An exceptionally high or low value in a distribution
Interquartile range
The range in a distribution between the end of the 1st quartile and the beginning of the 3rd quartile
Quartiles
The points in a distribution corresponding to the first 25% of the cases, the first 50% of the cases, and the first 75% of the cases
Variance
A statistic that measures the variability of a distribution as the average squared deviation of each case from the mean
Standard deviation
The square root of the average squared deviation of each case from the mean
Normal distribution
A symmetric distribution shaped like a bell and centered around the population mean, with the number of cases tapering off in a predicatable pattern on both sides of the mean
Cross-tabulation (crosstab)
In the simplest case, a bivariate (two-variable) distribution showing the distribution of one variable for each category of another variable; can also be elaborated using three or more variables
Measure of association
A type of descriptive statistic that summarizes the strength of an association
Gamma
A measure of association that is sometimes used in cross-tabular analysis
Chi-square
An inferential statistic used to test hypothese about relationships between two or more variables in a cross-tabulation
Statistical significance
The mathematical likelihood that an association is not the result of chance, judged by a criterion the analyst sets (often that the probability is less than 5 out of 100, or p
Extraneous variable
A variable that influence influences both the independent and dependent variables to create a spurious association between them that dissappears when the extraneous variation is controlled
Elaboration analysis
The process of introducing a third variable into an analysis to better understand - to elaborate - the bivariate (two-ovariable) relationship under consideration; additional control variables also can be introduced
Secondary data analysis
The method of using preexisting data in a different way or to answer a different research question that intended by those who collected the data
Secondary data
Previously collected data that are used in a new analysis
Big data
Data produced or accessible in computer-readable form that is produced by people, available to social scientists, and manageable with today’s computers
Ngrams
Frequency graphs produced by Google’s database of all words printed in more than one third of the world’s books over time (with coverage still expanding).