Statistics Flashcards

1
Q

3 basic Classification of Statistics:

A

Classical statistics - paremeters unknown to us but they are fixed and we want to make inferences(mu, sigma ^2, X bar. )

Bayesian statistics - paremeters are not fixed, more parametric, you have to impose a distribution

Non parametric Statistics - does not assume normality, has the least assumptions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are Descriptive Statistics?

A
  1. First approach to turn data into information
  2. Summarize large amounts of data - ease of interpretation
  3. It consists of tables, graphs, summary measures, images or
    anything that illustrates the information contained in the data.

-A picture is worth a thousand words

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Types of Statistical Variables:

A

1) Qualitative: sex, socioeconomic status, marital status
2) Quantitative:
a) discrete- # of times a particular phenomenon has happened.
b) Continuous-indicate the result of a random
experiment whose sample space or possibilities is uncountable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Type of Statistical Data:

A

1) Ordinal: 1, 2, 3,…; A, B, C, …)
2) Non-ordinal: Married, Divorced, Single, Widowed …
3) Time Series: Poverty over time
4) Cross Section: Population in 200 countries at a given time (say for January 2010)
5) Panel Data: Population in 200 countries over the last 30 year.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

numerical measures:

A

• Location: Average, median, mode, quartiles, quintiles,
deciles, percentiles (quantiles in general), trimmed mean, weighted mean, geometric mean, harmonic mean, etc.
• Scale: Range, interquartile range, variance, pseudovariance, standard deviation, etc.
• Other: Coefficient of Variation, Sharpe Ratio, skewness, kurtosis…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

mean:

A

the arithmetic average

mean=EX/N

-it is important to remember that although mean provides a useful peace of information, it does not tell you anything about how spread out the scores are(variance), outliers that might skew the mean, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

median

A

the number in the distribution that marks the 50th percentile/the number in the middle of the entire distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

mode

A

the number that has the highest frequency(occurs most often)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Quantiles

A

quartile: splits the ranked data into 4 segments with an equal number of values per segment:
quintiles: splits the ranked data into 5 segments…
deciles: splits the ranked data into 10 segments…
percentiles: splits the ranked data into 100 segments…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Trimmed mean/Truncated mean

A

A method of averaging that removes a small percentage of the largest and smallest values before calculating the mean. After removing the specified observations

  • the trimmed mean is found using an arithmetic averaging formula (look in below website).
    https: //www.easycalculation.com/statistics/learn-trimmed-mean.php
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Weighted mean

A

Instead of each data point contributing equally to the final mean, some data points contribute more “weight” than others.

Formula: (X1 x .40) + (X2 x .30) + (X3 x .20) + (X4 x .10)

-If all the weights are equal, then the weighted mean equals the arithmetic mean (the regular “average” you’re used to)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

range

A

The difference between the lowest and highest value.

Example: In {4, 6, 9, 3, 7} the lowest value is 3, and the highest is 9, so the range is 9 − 3 = 6.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Interquartile Range(IQR)/H-spread

A

also called the midspread or middle fifty, it is a measure of statistical dispersion, it “chops off” the top 25% quartile and bottom 25%(ignores 50% of the data).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

variance

A

the expectation of the squared deviation of a random variable from its mean, and it informally measures how far a set of (random) numbers are spread out from their mean(dispersion).

σ^2 = [ ∑(x-mean)^2] / N

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

standard deviation

A

a measure that is used to quantify the average amount of variation or dispersion of a set of data values from the mean.

(represented by the Greek letter sigma σ or the Latin letter s)

Square root of variance (√[ ∑(x-mean)^2 / N)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Parameters vs Estimators

A

P correspond to the population. They are practical quantities. They can be computed from the data

E correspond to the sample. They are theoretical quantities, many times unknown

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

stochastic model

A

tool for estimating probability distributions for a collection of random variables over time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

3 methods assigning probability

A

Classical Method - based on the assumption of equally likely outcomes - > counting techinques

Relative Frequency Method - based on experimentation or historical data

Subjective Method - based on judgement, still can be scientific

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Complement vs Union vs Intersection

A

The Complement of an event is defined to be the event consisting of all sample points that are not in A
-it is denoted as A^c

The union of events A and B is the event containing all sample points that are in A or B(or both)
-denoted as A U B

The Intersection of events A and B is the set of all sample points that are in both A AND B
-denoted as A ^ B

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Addition Law

A

provides a way to compute the probability of event A, or B, or both A and B occuring

  • law is written as P(AUB) = P(A) + P(B) - P(A∩B)
  • this is done so you don’t count them twice
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Mutually exclusive events

A

have no sample points in common, cannot happen at the same time.

For example: when tossing a coin, the result can either be heads or tails but cannot be both.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Conditional probability

A

The probability of an event given that another event has occurred
-Denoted as P(A|B) computed mathematically as follows P(A|B) = P(A ^ B) / P(B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Independent Events

A

If the probability of event A is not changed by the occurrence of event B
-It would simply be denoted as P(A) but mathematically to find out if they are dependent do P(A) x P (B)

24
Q

Baye’s Theorem

A

describes the probability of an event, based on conditions that might be related to the event.

-For example, suppose one is interested in whether a person has cancer, and knows the person’s age. If cancer is related to age, then, using Bayes’ theorem, information about the person’s age can be used to more accurately assess the probability that they have cancer.

Bayes’ theorem provides the means for revising theprior probabilities.

25
Q

Steps of Bayes Theorem

A

Begin analysis with initial Prior Probabilities, then get any new information, then apply bayes theorem to and get Posterior probabilities.

26
Q

Equally Likely Probability Spaces (or simple spaces)

A

When the probability of each possible outcome is the same

27
Q

Binomial Theorem

A

a formula for finding any power of a binomial without multiplying at length

28
Q

random variable

A

a numerical description of theoutcome of an experiment. A function.

29
Q

discrete random variable

A

may assume either a finite number of values or an infinite sequence of values. Numerable.

Example: number of TVs sold on one day

30
Q

Types of Fallacies:

A
  • Measurement Rules:changing the way you measure
  • Correlation vs Causation
  • Ceteris Paribus-if I drive fast, I’ll spend less time on the road, less chance of an accident. All things are not equal here.
  • Extrapolation Bias-taking a survey of just woman or college students and applying it to the entire population
  • selection bias-some are not interviewed
31
Q

Discrete Probability Distributions

A

The required conditions are they have to be no negative numbers (f(x) > 0) and will all add up to 1 (Ef(x) = 1)

32
Q

expected value, of a random variable

A

EV(x) =u =Exf(x)

33
Q

Properties of a Binomial Probability Distribution experiment

A
  1. The experiment consists of a sequence of n identical trials.
  2. Two outcomes, success and failure, are possible on each trial.
  3. The probability of a success, denoted by p, does not change from trial to trial
  4. The trials are independent.

-Our interest is in the number of successes occurring in the n trials. We let x denote the number of successes occurring in the n trials.

34
Q

Binomial Probability Function

A

( n choose x) P^x(1-p)^(n-x)

35
Q

What is the Expected value, Variance, and SD of a binomial.

A

EV(x)=np

Var(x)=np(1 – p)

SD=sqr{np(1 – p)}

36
Q

Poisson Probability Distribution

A

A Poisson distributed random variable is often useful in estimating the number of occurrences over a specified interval of time or space.

It is a discrete random variable (R.V.) that may assume an infinite sequence of values (x = 0, 1, 2, . . . ).

37
Q

Two Properties of a Poisson Experiment

A
  1. The probability of an occurrence is the same for any two intervals of equal length.
  2. The occurrence or nonoccurrence in any interval is independent of the occurrence or nonoccurrence in any other interval.
38
Q

Poisson Probability Function

A

f(x)=mu^x(2.781828)^-mu / x!

39
Q

What is the Variance, and SD of a poisson distribution

A

A property of the Poisson distribution is that the mean and variance are equal. Of course, SD is just the square root.

40
Q

Hypergeometric Probability Distribution

A
  • it is closely related to binomial distribution, However 2 main differences:
    1. The trials are not independent
    2. The probability of success changes from trial to trial

They are also without replacement, more close to the real world

41
Q

Hypergeometric Probability Function

A

P= (r choose x) (N choose n-k) / (N - r choose n - x) / (N choose n)

where: x = number of successes
n = number of trials
N = number of elements in the population
r = number of elements in the population
labeled success

(see notes for example)

42
Q

continuous random variable

A

can assume any value in a real interval or a collection of intervals. non-countable.

43
Q

How do we handle continuous random variables?

A

We don’t want to ask the probability of a continuous random variable, either way it will always be 0. Instead it would fall in some interval/range.

44
Q

Continuous Probability Distribution

A

The equation used to describe it is called a probability density function:describes the relative likelihood for this random variable to falling within a particular range of values, the density over that range.

45
Q

Characteristics of Uniform Probability Distribution

A

when the probability of any event is proportional to the length of the interval. The distribution looks very flat and even.

46
Q

Uniform probability density function:

A

f(x) = 1/(b-a) if a

47
Q

Expected Value of X [Uniform Probability Distribution]

A

E(X) = (a + b)/2

We divide a+b by 2 to get the middle point of the distribution for expected value.

48
Q

Variance of X [Uniform Probability Distribution]

A

Var(X) = (b - a)^2 / 12

49
Q

Normal Probability Density Function

A

f(x) = 1/Ssqr(2ii) x e^(…very long)

-m3 pg 16

50
Q

Characteristics of Normal Probability Distribution

A

The distribution is symmetric, and bell-shaped. It is centered around the mean which is also its highest point(median and mode), the width is defined by the standard deviation.

Probabilities are given by the area under the curve which is 1 so .5 each side.

51
Q

Normal distribution standard deviation breakdown

A
  1. 26% of values of a normal random variable are within +/- 1 standard deviation of its mean.
  2. 44% of values of a normal random variable are within +/- 2 standard deviation of its mean.
  3. 74% of values of a normal random variable are within +/- 2 standard deviation of its mean.
52
Q

A random variable having a normal distribution with a mean of 0 and a standard deviation of 1 is said to have:

A

a standard normal probability distribution. X ~ N (0, 1)

53
Q

Converting to the Standard Normal Distribution

A

z= (x̄ - μ) / (σ/√n) or z = x - μ / σ for only 1 observation

We can think of z as a measure of the number of standard deviations x is from u.

-This is done to compare them easily/standardize them, doing this will make the mean 0 and the sd 1

54
Q

What does the standardized z-score tell us?

A

tells us how far above or below the sample mean is compared to the population mean in units of standard error.

55
Q

3 THINGS Central Limit Theorem tells us:

A
  1. It is a normal distribution
  2. As the sample size increases, the variance decreases(it is more accurate)
  3. It is unbiased (it is the true mean)

therefore it permits us to draw conclusions about the population based strictly on sample data without having knowledge about the distribution of the underlying population.

56
Q

CLT Standard deviation/standard error mean.

A

x̄=σ/√n