Week 1 Flashcards

1
Q

Population

A

Consists of all the members of a group about which you want to draw a conclusion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Sample

A

The portion of the population selected for analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Parameter

A

A numerical measure that describes a characteristic of a population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Statistic

A

A numerical measure that describes a characteristic of a sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Descriptive Statistics

A

Collecting (e.g. survey), summarizing and presenting data (e.g. tables and graphs). Characterize (e.g. sample mean)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Inferential Statistics

A

Drawing conclusions about a population based on sample data (i.e. estimating a parameter based on a statistic).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Example of Inferential Statistics

A

Estimate the population mean weight (parameter) using the sample mean (statistic).

Hypothesis testing - e.g. Test the claim that the population mean weight is 100 pounds.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Types of Data

A

Categorical, Numerical Discrete, Numerical Continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Categorical Data

A

Simply classifies data into categories (e.g. marital status, hair color, gender)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Numerical Discrete

A

Counted items - finite number of items (e.g. number of children, number of people who have type-O blood)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Numerical Continuous

A

Measured characteristics - infinite number of items (e.g. weight, height)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Levels of Measurement and Measurement Scales

A

Highest level - Ratio Data
*Differences between measurements, true zero exists (Height, weight, age, weekly food spending)

Interval Data
*Differences between measurements but no true zero (temperature in Celsius, standardized exam scores)

Ordinal Data
*Ordered categories (rankings, order or scaling - tournament rankings, student letter grades, Likert scales)

Lowest level - Nominal Data
*Categories (no ordering or direction - marital status, type of car owned, gender, hair color)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Categorical Data (tables and charts)

A

Summary table
Graphing data -bar charts, pie charts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Numerical Data (tables and charts)

A

Ordered array, stem and leaf display, histogram, frequency and cumulative distributions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Examples of describing central tendency

A

Mean, Median, Mode, Geometric mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Examples of describing variation

A

range, interquartile range, variance, standard deviation, coefficient of variation

17
Q

Examples of describing shape

A

Skewness

18
Q

Median

A

Main advantage over mean is that it is not affected by extreme values

19
Q

Mode

A
  • Not affected by extreme values
  • Unlike for mean and median, there may be no unique (single) mode for a given
    data set
  • Used for either numerical or categorical (nominal) data
  • least least of the 3
20
Q

Mean

A

Generally used most often, unless extreme (outliers) exist.

21
Q

Quartiles

A

Split the ranked data into four segments, with an equal number of values per segement

22
Q

The first quartile (Q1)

A

The value for which 25% of the observations are smaller and 75% are larger.

Q1 position = (n+1)/4

23
Q

The second quartile (Q2)

A

Q2 is the same as the median (50% are smaller, 50% are larger)
Q2 position =(n+1)/2 (median)

24
Q

The third quartile (Q3)

A

Only 25% of the observations are greater than the third quartile.

Q3 position = 3(n+1)/4

25
Q

Measures of variation

A

give information on the spread or variability of the data values

26
Q

Range

A

Simplest measure of variation
Disadvantages - ignores the distribution of the data, it is sensitive to outliers

27
Q

Interquartile Range (IQR)

A

Like the median and Q1 and Q2, the IQR is a resistant summary
measure (resistant to the presence of extreme values)

  • Eliminates outlier problems by using the interquartile range, as
    high- and low-valued observations are removed from calculations
  • IQR = 3rd quartile – 1st quartile
28
Q

Sample Variance (S^2)

A

Measures average scatter around the mean, units are also squared

29
Q

Sample Standard Deviation - S

A

Most commonly used measure of variation, shows variation about the mean, has the same units as the original data

30
Q

Variance and Standard deviation - Advantages

A

-Each value in the data set is used in the calculation
* Values far from the mean are given extra weight as deviations
from the mean are squared

31
Q

Variance and Standard deviation - Disadvantages

A

Sensitive to extreme values (outliers)
* Measures of absolute variation not relative variation

32
Q

The Z Score

A

The difference between a given observation and the mean, divided by the standard deviation

A z score above 3.0 or below -3.0 is considered an outlier

33
Q

Shape of a Distribution

A

Describes how data are distributed
-Left-skewed
-Symmetric
-Right-skewed

34
Q

Population summary measures

A

Parameters
The population mean is the sum of the values in the population divided by the population size, N

35
Q

Population Variance

A

The average of the squared deviations of values from the mean

36
Q

Population Standard Deviation

A

Shows variation around the mean
-the square root of the population variance
-has the same units as the original data

37
Q

The Empirical Rule

A

If the data distribution is approximately bell-shaped,
then the interval u+-1o contains about 68% of
the values in the population

u+-2o = contains about 95% of the values in the population

u+-3o = contains about 99.7% of the values in the population

38
Q

Determining Outliers

A

Using the empirical rule
-over or under (1% extreme values) u+-3o

Using Z scores
-above 3 or below -3

39
Q

Exploratory Data Analysis - Box-and-Whisker Plot

A

A graphical display of data using the 5 number summary
-can determine skewness