Week 1 Flashcards
Population
Consists of all the members of a group about which you want to draw a conclusion
Sample
The portion of the population selected for analysis
Parameter
A numerical measure that describes a characteristic of a population.
Statistic
A numerical measure that describes a characteristic of a sample.
Descriptive Statistics
Collecting (e.g. survey), summarizing and presenting data (e.g. tables and graphs). Characterize (e.g. sample mean)
Inferential Statistics
Drawing conclusions about a population based on sample data (i.e. estimating a parameter based on a statistic).
Example of Inferential Statistics
Estimate the population mean weight (parameter) using the sample mean (statistic).
Hypothesis testing - e.g. Test the claim that the population mean weight is 100 pounds.
Types of Data
Categorical, Numerical Discrete, Numerical Continuous
Categorical Data
Simply classifies data into categories (e.g. marital status, hair color, gender)
Numerical Discrete
Counted items - finite number of items (e.g. number of children, number of people who have type-O blood)
Numerical Continuous
Measured characteristics - infinite number of items (e.g. weight, height)
Levels of Measurement and Measurement Scales
Highest level - Ratio Data
*Differences between measurements, true zero exists (Height, weight, age, weekly food spending)
Interval Data
*Differences between measurements but no true zero (temperature in Celsius, standardized exam scores)
Ordinal Data
*Ordered categories (rankings, order or scaling - tournament rankings, student letter grades, Likert scales)
Lowest level - Nominal Data
*Categories (no ordering or direction - marital status, type of car owned, gender, hair color)
Categorical Data (tables and charts)
Summary table
Graphing data -bar charts, pie charts
Numerical Data (tables and charts)
Ordered array, stem and leaf display, histogram, frequency and cumulative distributions
Examples of describing central tendency
Mean, Median, Mode, Geometric mean
Examples of describing variation
range, interquartile range, variance, standard deviation, coefficient of variation
Examples of describing shape
Skewness
Median
Main advantage over mean is that it is not affected by extreme values
Mode
- Not affected by extreme values
- Unlike for mean and median, there may be no unique (single) mode for a given
data set - Used for either numerical or categorical (nominal) data
- least least of the 3
Mean
Generally used most often, unless extreme (outliers) exist.
Quartiles
Split the ranked data into four segments, with an equal number of values per segement
The first quartile (Q1)
The value for which 25% of the observations are smaller and 75% are larger.
Q1 position = (n+1)/4
The second quartile (Q2)
Q2 is the same as the median (50% are smaller, 50% are larger)
Q2 position =(n+1)/2 (median)
The third quartile (Q3)
Only 25% of the observations are greater than the third quartile.
Q3 position = 3(n+1)/4