Chapter 3 (Summarizing Distributions) Flashcards by Jodie Buan

Mode

most likely value of a variable to occur

How well did you know this?

Not at all

Perfectly

Central tendency

values that are central in the distribution of a variable; describes what is typical

How well did you know this?

Not at all

Perfectly

Variation

describes how dispersed the data is over the range of possible values and what is atypical

How well did you know this?

Not at all

Perfectly

When a curve is bell-shaped (normal distribution), where does the mean, median, and mode lie?

They are all equal and lie in the middle of the distribution

How well did you know this?

Not at all

Perfectly

Sample (arithmetic) mean or average

most common measure of centrality; applies only to data where adding and dividing the values makes sense (nominal); has minimal variance (if replaced with any other number, variance would increase)

How well did you know this?

Not at all

Perfectly

Sample mean formula

the sum of all values of x from i=1 to n, divided by the number of observations or sample size (n)

How well did you know this?

Not at all

Perfectly

Weighted average

each observation gets a weight of 1/n, the proportion of the sample that it represents

How well did you know this?

Not at all

Perfectly

Weighted average

each observation gets a weight of 1/n, the proportion of the sample that it represents

How well did you know this?

Not at all

Perfectly

Dummy or binary variable

a qualitative variable that indicates the presence or absence of an attribute; must be coded as 1=present and 0=absent; also has a mean despite being qualitative

How well did you know this?

Not at all

Perfectly

Mean of a dummy variable

the proportion of the sample with the associated attribute

How well did you know this?

Not at all

Perfectly

How do you describe central tendency for qualitative variables?

(1) Create a dummy variable for each level of the qualitative variable (2) Summarize the mean of the dummy variable

How well did you know this?

Not at all

Perfectly

Percentiles

a way of describing how extreme a particular observation is (median is not extreme); the s-th percentile is the value of x such that s% of the data lies below it

How well did you know this?

Not at all

Perfectly

How do you get the median?

(1) Order x from smallest to largest (2) If n is odd, the median is the middle-most value. If n is even, the median is the average of the two middle-most values.

How well did you know this?

Not at all

Perfectly

Centrality of the median

the value that lies between two halves of all possible values

How well did you know this?

Not at all

Perfectly

Residual (ei)

a measure of variation; the difference between the proposed “typical” value (i.e. the sample mean) and the actual values

How well did you know this?

Not at all

Perfectly

Centrality of the sample mean

the sample mean is the value which is, on average, as close to the rest of the data as possible, and is subject to leverage by large or small values (i.e. outliers)

How well did you know this?

Not at all

Perfectly

How can you deal with outliers?

(1) Remove them from the dataset (2) Choose statistics that are robust to outliers like the median instead of the mean

How well did you know this?

Not at all

Perfectly

How is the residual a measure of variation?

Study These Flashcards

it gives us a measure of how dispersed the value of xi is about the center xbar

Bessel’s correction (n-1)

Study These Flashcards

a statistical adjustment to make the sample variance and standard deviation more accurate or unbiased estimators of the population variance and standard deviation, particularly for small values of n

Interquartile range (IQR)

Study These Flashcards

IQR= x75 - x25; outlier robust because it is a percentile based measure like the median

Sample standard deviation

Study These Flashcards

square root of the sample variance

Range

Study These Flashcards

R= x100 - x0 = max(xi) - min(xi); not robust to outliers

Percentile quintets

Study These Flashcards

x0, x25, x50, x75, x100

Covariance

Study These Flashcards

Similar to variance but with two variables (instead of one) and has no equivalent of standard deviation

Correlation or Pearson's Correlation Coefficient (r)

a unitless statistic (just a number good for interpretation) that is always between -1 and 1; the covariance between x and y divided by the standard deviation of x and y

Positive correlation

when r > 0, higher values of x result in higher values of y, and vice versa

Negative correlation

when r < 0, higher values of x result in lower values of y, and vice versa

No correlation

when r = 0, there is no relationship between values of x and y

Perfect correlation

when r = +/- 1, the values of x and y can be perfectly predicted from one another

How are correlation (r) and dependence related?

the r value is a numerical measurement of dependence in the data: close to -1 means strong negative dependence, close to +1 means strong positive dependence, close to 0 means a lack of dependence ("independent")

Population parameters

statistical objects that are sample analogues of important properties of the population distribution; population counterparts of the sample, as they have the same interpretation as the sample statistics

Correspondences between sample statistics and population parameters

sample mean and population mean, sample variance and population variance, sample covariance and population covariance, sample correlation and population correlation

Sampling distribution

shows every possible result a statistic can take in every possible (hypothetical) sample from a population and how often each result occurs; observations are unique samples and variables are statistics

Empirical distribution

a very good estimate of the distribution in the population, gets closer if the data is representative and n is large

Bootstrapping

simulating samples using the empirical distribution

How do you implement the bootstrap?

(1) Randomly draw a new sample of the same size from the existing sample, with replacement (2) Do this hundreds or thousands of times to create a new sample of bootstrap samples (3) Compute the sampling distribution of your statistic from this sample

Asymptotic behavior

how samples behave when n is large

Asymptotic behavior of bootstrap

it gives us a sense of what the sampling distribution might look like and is centered around the sample statistics

Law of Large Numbers (LLN)

if the sample is representative of the population (independent) and as n becomes large, the sample mean is a very close approximation of the population mean

Central limit theorem (CLT)

given a population with a finite mean µ and a finite non-zero variance σ2, the sampling distribution approaches a normal distribution with a variance of σ2/N, as N, the sample size, increases

Useful properties of the normal or gaussian distribution

(1) symmetrical about the mean (2) the mean, median, and mode coincide (3) quantiles are closely related to the standard deviations (empirical rule): 68% of data within 1 sd, 95% within 2 sd, 99.7% within 3 sd

Standard normal distribution

a normal distribution with a mean of 0 and a standard deviation of 1

What is the difference between standard deviation and standard error (CLT)?

SD is the variation within a sample while SE is the variation between samples; SD is always bigger

Chapter 3 (Summarizing Distributions) Flashcards

(43 cards)