Chapter 1 Flashcards

1
Q

Cases

A

The objects described by a set of data.

Ex. Customers, companies, subjects in a study, stock

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Label

A

Is a SPECIAL VARIABLE used in some data sets to distinguish the different cases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Variable

A

Is a characteristic of the case–> different cases can have different values for variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Observation

A

Describes the data for a particular case

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Categorical Variable

A

Places a case into one of several groups or categories

Ex. Bar Graphs, Pie Charts, and Pareto Charts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Quantitative Variable

A

Takes numerical values arithmetic operations, such as adding and averaging, makes sense

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Statistical Software

A

In some statistical software spaces are not allowed in variable names–> instead use an underscore

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Ordered Categorical Variable

A

Possible values for a grade…A, B, C, D..etc because A is better than B which is better then C and so on

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Nominal Variable

A

A categorical variable that is not ordered

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Instruments

A

Different areas of application (marketing) can also have their own special variables–> these variable are measured with instruments

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Rate

A

Computing a rate is one of several ways of adjusting one variable to create another–> sometime more meaningful than count

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Distribution

A

Describes how to values of a variable vary from case to case

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Pareto Chart

A

Categories are ordered from MOST frequent–>least frequent–>most important categories for a categorical variable
Ex. frequently used in quality control settings

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Histogram

A

The most common graph of the distribution of a quantitative variable wear we group near values into classes–> for small data sets a stemplot can be used

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can you describe the overall pattern of a histogram

A

You can describe the overall pattern of a histogram by its SHAPE, CENTER, and SPREAD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Outlier

A

The most important type of deviation–> an individual value that falls outside the overall pattern

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

When is a distribution symmetric?

A

If the right and left sides of the histogram are mirror images of each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Skewed to the right

A

If the right side of the histogram extends much farther out than the left side..and vice versa

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Positively skewed

A

Data that skews to the right–> positive skewness is the MOST common type of skewness that we see in real data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Time plot

A

Plots each observation against the time it was measured–> time on a horizontal and the variable you are measuring on a vertical scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Mean

A

The most common measure of center is the ordinary arithmetic average–> NOT a resistant measure of center as it can be influenced by outliers

22
Q

Median

A

The median is the midpoint of a distribution, the number such that half the observations are smaller and half are larger

23
Q

Median Odd

A

(N+1)/2 observations up from the bottom of the list

24
Q

Median Even

A

It is the mean of the two numbers in the middle

25
Q

Median vs Mean

A

The median is more resistant than the mean

26
Q

Median and Mean in a Symmetric Distibution

A

They are close together–> exactly symmetric exactly the same

27
Q

Median and Mean in a skewed distribution

A

The mean is farther out on the long tail than the median

28
Q

The five number summary

A

Boxplot–>consits of the smallest observation, the first quartile, the median, the thrid quartile, and the largest observation –> in order form largest to smallest

29
Q

The five number summary vs. distribution

A

Not the most common numerical description of distribution

30
Q

Most common numerical description of distribution

A

The mean to measure the center and the standard deviation to measure the spread

31
Q

Standard deviation

A

Measures spread by caluculating how far the observations are from their mean–> should only be used when the mean is chosen as the method of center

32
Q

n-1

A

Degrees of freedom of the variance or standard deviation

33
Q

S=0

A

Only when ther is no spread–> means all the observations have the same value, otherwise S is greater than 0

34
Q

What does it mean if the standard deviation is higher?

A

S gets larger when the observations are more spread out across their mean

35
Q

Units

A

S has the same units of measurement as the original observation

36
Q

S and the Mean

A

Like the mean, S is not resistant a few outliers or strong skewness can greatly increase S

37
Q

How do you measure risk in finance

A

Taking a looking at the standard deviation of returns –> large spread –> less predictable–> more risky
BUT five number summary would be more informative

38
Q

Density curve

A

A density curve is a mathematic model for the distribution of a quantitative variable

39
Q

What does a density curve describe?

A

The overall pattern of a distribution. Thea area under the curve AND within any range of values is the proportion of all observations that fall within that range

40
Q

68-95-99.7 rule

A

68% of observations fall within 1 standard deviation of the mean
95% of observations fall within 2 standard deviations of the mean
99.7% of observations fall within 3 standard deviations of the mean

41
Q

Z-Score

A

Standardized value–> tells us how many standard deviations the observation falls away from the mean and in which direction

42
Q

Z-score positive

A

Observations larger than the mean

43
Q

Z-score negative

A

Observations smaller than the mean

44
Q

Sample survey

A

Collects data from a sample of cases that represent a larger population of cases

45
Q

Observation vs Experiment

A

We do not attempt to influence the responses by imposing a treatment (change)

46
Q

Training Data Set

A

In some studies we generate one set of data to generate a set of results
Ex. model to predict something

47
Q

Database

A

Data sets for statistical analysis can be extracted

48
Q

Data warehouse

A

System for organizing, storing, and analyzing complex data

49
Q

Sampling frame

A

A list of items to be sampled

50
Q

Response rate

A

The proportion of the original sample who actually provide usable data

51
Q

Undercoverage

A

Some groups in the population are left out of the process of choosing the sample

52
Q

Nonresponse

A

Occurs when a case chosen for the sample cannot be contacted or does not cooperate