Week 1 Flashcards by Zoe G

Population

Consists of all the members of a group about which you want to draw a conclusion

How well did you know this?

Not at all

Perfectly

Sample

The portion of the population selected for analysis

How well did you know this?

Not at all

Perfectly

Parameter

A numerical measure that describes a characteristic of a population.

How well did you know this?

Not at all

Perfectly

Statistic

A numerical measure that describes a characteristic of a sample.

How well did you know this?

Not at all

Perfectly

Descriptive Statistics

Collecting (e.g. survey), summarizing and presenting data (e.g. tables and graphs). Characterize (e.g. sample mean)

How well did you know this?

Not at all

Perfectly

Inferential Statistics

Drawing conclusions about a population based on sample data (i.e. estimating a parameter based on a statistic).

How well did you know this?

Not at all

Perfectly

Example of Inferential Statistics

Estimate the population mean weight (parameter) using the sample mean (statistic).

Hypothesis testing - e.g. Test the claim that the population mean weight is 100 pounds.

How well did you know this?

Not at all

Perfectly

Types of Data

Categorical, Numerical Discrete, Numerical Continuous

How well did you know this?

Not at all

Perfectly

Categorical Data

Simply classifies data into categories (e.g. marital status, hair color, gender)

How well did you know this?

Not at all

Perfectly

Numerical Discrete

Counted items - finite number of items (e.g. number of children, number of people who have type-O blood)

How well did you know this?

Not at all

Perfectly

Numerical Continuous

Measured characteristics - infinite number of items (e.g. weight, height)

How well did you know this?

Not at all

Perfectly

Levels of Measurement and Measurement Scales

Highest level - Ratio Data
*Differences between measurements, true zero exists (Height, weight, age, weekly food spending)

Interval Data
*Differences between measurements but no true zero (temperature in Celsius, standardized exam scores)

Ordinal Data
*Ordered categories (rankings, order or scaling - tournament rankings, student letter grades, Likert scales)

Lowest level - Nominal Data
*Categories (no ordering or direction - marital status, type of car owned, gender, hair color)

How well did you know this?

Not at all

Perfectly

Categorical Data (tables and charts)

Summary table
Graphing data -bar charts, pie charts

How well did you know this?

Not at all

Perfectly

Numerical Data (tables and charts)

Ordered array, stem and leaf display, histogram, frequency and cumulative distributions

How well did you know this?

Not at all

Perfectly

Examples of describing central tendency

Mean, Median, Mode, Geometric mean

How well did you know this?

Not at all

Perfectly

Examples of describing variation

Study These Flashcards

range, interquartile range, variance, standard deviation, coefficient of variation

Examples of describing shape

Study These Flashcards

Skewness

Median

Study These Flashcards

Main advantage over mean is that it is not affected by extreme values

Mode

Study These Flashcards

Not affected by extreme values
Unlike for mean and median, there may be no unique (single) mode for a given
data set
Used for either numerical or categorical (nominal) data
least least of the 3

Mean

Study These Flashcards

Generally used most often, unless extreme (outliers) exist.

Quartiles

Study These Flashcards

Split the ranked data into four segments, with an equal number of values per segement

The first quartile (Q1)

Study These Flashcards

The value for which 25% of the observations are smaller and 75% are larger.

Q1 position = (n+1)/4

The second quartile (Q2)

Study These Flashcards

Q2 is the same as the median (50% are smaller, 50% are larger)
Q2 position =(n+1)/2 (median)

The third quartile (Q3)

Study These Flashcards

Only 25% of the observations are greater than the third quartile.

Q3 position = 3(n+1)/4

Measures of variation

give information on the spread or variability of the data values

Range

Simplest measure of variation Disadvantages - ignores the distribution of the data, it is sensitive to outliers

Interquartile Range (IQR)

Like the median and Q1 and Q2, the IQR is a resistant summary measure (resistant to the presence of extreme values) * Eliminates outlier problems by using the interquartile range, as high- and low-valued observations are removed from calculations * IQR = 3rd quartile – 1st quartile

Sample Variance (S^2)

Measures average scatter around the mean, units are also squared

Sample Standard Deviation - S

Most commonly used measure of variation, shows variation about the mean, has the same units as the original data

Variance and Standard deviation - Advantages

-Each value in the data set is used in the calculation * Values far from the mean are given extra weight as deviations from the mean are squared

Variance and Standard deviation - Disadvantages

Sensitive to extreme values (outliers) * Measures of absolute variation not relative variation

The Z Score

The difference between a given observation and the mean, divided by the standard deviation A z score above 3.0 or below -3.0 is considered an outlier

Shape of a Distribution

Describes how data are distributed -Left-skewed -Symmetric -Right-skewed

Population summary measures

Parameters The population mean is the sum of the values in the population divided by the population size, N

Population Variance

The average of the squared deviations of values from the mean

Population Standard Deviation

Shows variation around the mean -the square root of the population variance -has the same units as the original data

The Empirical Rule

If the data distribution is approximately bell-shaped, then the interval u+-1o contains about 68% of the values in the population u+-2o = contains about 95% of the values in the population u+-3o = contains about 99.7% of the values in the population

Determining Outliers

Using the empirical rule -over or under (1% extreme values) u+-3o Using Z scores -above 3 or below -3

Exploratory Data Analysis - Box-and-Whisker Plot

A graphical display of data using the 5 number summary -can determine skewness

Week 1 Flashcards

(39 cards)