Statistics and Distributions Flashcards

1
Q

Distributions

A
  • Representation of the way values tend to vary across a single attribute
  • Usually presented as a histogram
  • Where is the data concentrated? Which values are less likely? Which is most likely?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which single value best represents the data?

A

Central Tendency
Context dependent
- On a histogram: affects the location on the x-axis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Mean

A

arithmetic mean:
sum of values/number of values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Median

A

Middle value of sorted data
- Resistant to outliers and skew

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Variability

A

How far does the data spread away from the mean?
Affects the width of the histogram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Standard Deviation

A

This is the average distance from the mean
If we pick a random value from the data, how far should we expect it to be from the mean?

sd = sqrt(sum(x-mu)^2 / N)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Percentiles and Quartiles

A

25th Percentile : 1st Quartile
50th Percentile : 2nd Quartile
75th Percentile : 3rd Quartile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

IQR and Outliers

A

Interquartile Range : Q3-Q1
Lower/Upper Fences: [Q1 - (3/2) * IQR, Q3 - (3/2) * IQR]
Outlier: A value that falls outside of the fences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Boxplots

A

Excellent tool to display and compare measures of variability

They display:
- Median
- IQR
- Fences
- Outliers
- Range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Normal Distribution

A
  • Gaussian Distribution or Bell Curve
    Fundamental to statistics
    Countless occurrences in nature
    Has a number of useful properties
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Normal Distribution Properties

A
  1. Symmetric
    Mean = Median = Mode
  2. 68-95-99 Rule
  3. Foundation of the Central Limit Theorem
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Random Experiment

A

A process that results in an outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Outcome

A

The value of the result of a single experiment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Sample Space

A

The set of all possible outcomes for an experiment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Event

A

A subset of the sample space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Probability

A

A number between 0 and 1 that dictates the chance of an event occurring

17
Q

Sample Space

A

A sample space of an experiment is the set of all possible outcomes
Ex: Sample Space of a single die roll is: {1, 2, 3, 4, 5, 6}

18
Q

Event

A

An event usually denoted by a single capital letter, is a subset of the sample space.
Ex: If you roll two dice, some possible events include:
- (1,1), (1,2), (2,1), (1,6), (6,6)

19
Q

Probability

A

For a single event A, the probability of A occurring, P(A), is denoted as:

P(A) = number of outcomes in which A occurs/ total possible outcomes

20
Q

Addition Rule

A

Addition Rule states:
P(A or B) = P(A) + P(A) - P(A and B)

21
Q

Multiplication Rule

A

Two events are said to be independent if the outcome of one does not depend on the outcome of the other. Otherwise, they are dependent.

The multiplication rule states:
P(A and B) = P(A) * P(B, given that A occurred) = P(A) * P(B|A)

For independent events, this is simply:
P(A and B) = P(A) * P(B)

22
Q

Complements

A

P(A) + P(not A) = 1

23
Q

Deterministic Sampling

A

Rather than randomizing, you take the first people that walk by or choose the people deterministically

24
Q

Uniform Random Sampling

A

Use software to assign and pick off an n’th group of people to choose

25
Q

Random Sampling

A

Randomly select

26
Q

From random sampling, what do we know about the sample mean?

A

The sample mean is the mean of the data sampled, and approximates the true mean.

27
Q

Probability Distribution

A

the calculated likelihood of each possible event occurring without simulation or conducting the experiment

28
Q

Empirical Distribution

A

the proportion of times a value is observed in a simulation or experiment, relative to the number of possible values

29
Q

Law of large numbers

A

As our sample size grows larger, the data represents the population more accurately

30
Q

statistic

A

a calculated number which describes a characteristic of a sample

31
Q

parameter

A

value that estimates a characteristic of a population

32
Q

statistical inference

A

a conclusion made based on data from multiple random
samples.

33
Q

Central Limit Theorem

A

This theorem states:
Upon taking sufficiently large samples, the distribution of the sample means will approximate a normal distribution, regardless of the distribution sampled from.

34
Q
A