Week 1 Flashcards

1
Q

Population

A

Consists of all the members of a group about which you want to draw a conclusion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Sample

A

The portion of the population selected for analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Parameter

A

A numerical measure that describes a characteristic of a population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Statistic

A

A numerical measure that describes a characteristic of a sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Descriptive Statistics

A

Collecting (e.g. survey), summarizing and presenting data (e.g. tables and graphs). Characterize (e.g. sample mean)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Inferential Statistics

A

Drawing conclusions about a population based on sample data (i.e. estimating a parameter based on a statistic).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Example of Inferential Statistics

A

Estimate the population mean weight (parameter) using the sample mean (statistic).

Hypothesis testing - e.g. Test the claim that the population mean weight is 100 pounds.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Types of Data

A

Categorical, Numerical Discrete, Numerical Continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Categorical Data

A

Simply classifies data into categories (e.g. marital status, hair color, gender)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Numerical Discrete

A

Counted items - finite number of items (e.g. number of children, number of people who have type-O blood)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Numerical Continuous

A

Measured characteristics - infinite number of items (e.g. weight, height)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Levels of Measurement and Measurement Scales

A

Highest level - Ratio Data
*Differences between measurements, true zero exists (Height, weight, age, weekly food spending)

Interval Data
*Differences between measurements but no true zero (temperature in Celsius, standardized exam scores)

Ordinal Data
*Ordered categories (rankings, order or scaling - tournament rankings, student letter grades, Likert scales)

Lowest level - Nominal Data
*Categories (no ordering or direction - marital status, type of car owned, gender, hair color)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Categorical Data (tables and charts)

A

Summary table
Graphing data -bar charts, pie charts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Numerical Data (tables and charts)

A

Ordered array, stem and leaf display, histogram, frequency and cumulative distributions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Examples of describing central tendency

A

Mean, Median, Mode, Geometric mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Examples of describing variation

A

range, interquartile range, variance, standard deviation, coefficient of variation

17
Q

Examples of describing shape

18
Q

Median

A

Main advantage over mean is that it is not affected by extreme values

19
Q

Mode

A
  • Not affected by extreme values
  • Unlike for mean and median, there may be no unique (single) mode for a given
    data set
  • Used for either numerical or categorical (nominal) data
  • least least of the 3
20
Q

Mean

A

Generally used most often, unless extreme (outliers) exist.

21
Q

Quartiles

A

Split the ranked data into four segments, with an equal number of values per segement

22
Q

The first quartile (Q1)

A

The value for which 25% of the observations are smaller and 75% are larger.

Q1 position = (n+1)/4

23
Q

The second quartile (Q2)

A

Q2 is the same as the median (50% are smaller, 50% are larger)
Q2 position =(n+1)/2 (median)

24
Q

The third quartile (Q3)

A

Only 25% of the observations are greater than the third quartile.

Q3 position = 3(n+1)/4

25
Measures of variation
give information on the spread or variability of the data values
26
Range
Simplest measure of variation Disadvantages - ignores the distribution of the data, it is sensitive to outliers
27
Interquartile Range (IQR)
Like the median and Q1 and Q2, the IQR is a resistant summary measure (resistant to the presence of extreme values) * Eliminates outlier problems by using the interquartile range, as high- and low-valued observations are removed from calculations * IQR = 3rd quartile – 1st quartile
28
Sample Variance (S^2)
Measures average scatter around the mean, units are also squared
29
Sample Standard Deviation - S
Most commonly used measure of variation, shows variation about the mean, has the same units as the original data
30
Variance and Standard deviation - Advantages
-Each value in the data set is used in the calculation * Values far from the mean are given extra weight as deviations from the mean are squared
31
Variance and Standard deviation - Disadvantages
Sensitive to extreme values (outliers) * Measures of absolute variation not relative variation
32
The Z Score
The difference between a given observation and the mean, divided by the standard deviation A z score above 3.0 or below -3.0 is considered an outlier
33
Shape of a Distribution
Describes how data are distributed -Left-skewed -Symmetric -Right-skewed
34
Population summary measures
Parameters The population mean is the sum of the values in the population divided by the population size, N
35
Population Variance
The average of the squared deviations of values from the mean
36
Population Standard Deviation
Shows variation around the mean -the square root of the population variance -has the same units as the original data
37
The Empirical Rule
If the data distribution is approximately bell-shaped, then the interval u+-1o contains about 68% of the values in the population u+-2o = contains about 95% of the values in the population u+-3o = contains about 99.7% of the values in the population
38
Determining Outliers
Using the empirical rule -over or under (1% extreme values) u+-3o Using Z scores -above 3 or below -3
39
Exploratory Data Analysis - Box-and-Whisker Plot
A graphical display of data using the 5 number summary -can determine skewness