- Representation of the way values tend to vary across a single attribute - Usually presented as a histogram - Where is the data concentrated? Which values are less likely? Which is most likely?

- Gaussian Distribution or Bell Curve Fundamental to statistics Countless occurrences in nature Has a number of useful properties

Statistics and Distributions Flashcards by Oscar Ryu

Distributions

Representation of the way values tend to vary across a single attribute
Usually presented as a histogram
Where is the data concentrated? Which values are less likely? Which is most likely?

How well did you know this?

Not at all

Perfectly

Which single value best represents the data?

Central Tendency
Context dependent
- On a histogram: affects the location on the x-axis

How well did you know this?

Not at all

Perfectly

Mean

arithmetic mean:
sum of values/number of values

How well did you know this?

Not at all

Perfectly

Median

Middle value of sorted data
- Resistant to outliers and skew

How well did you know this?

Not at all

Perfectly

Variability

How far does the data spread away from the mean?
Affects the width of the histogram

How well did you know this?

Not at all

Perfectly

Standard Deviation

This is the average distance from the mean
If we pick a random value from the data, how far should we expect it to be from the mean?

sd = sqrt(sum(x-mu)^2 / N)

How well did you know this?

Not at all

Perfectly

Percentiles and Quartiles

25th Percentile : 1st Quartile
50th Percentile : 2nd Quartile
75th Percentile : 3rd Quartile

How well did you know this?

Not at all

Perfectly

IQR and Outliers

Interquartile Range : Q3-Q1
Lower/Upper Fences: [Q1 - (3/2) * IQR, Q3 - (3/2) * IQR]
Outlier: A value that falls outside of the fences.

How well did you know this?

Not at all

Perfectly

Boxplots

Excellent tool to display and compare measures of variability

They display:
- Median
- IQR
- Fences
- Outliers
- Range

How well did you know this?

Not at all

Perfectly

Normal Distribution

Gaussian Distribution or Bell Curve
Fundamental to statistics
Countless occurrences in nature
Has a number of useful properties

How well did you know this?

Not at all

Perfectly

Normal Distribution Properties

Symmetric
Mean = Median = Mode
68-95-99 Rule
Foundation of the Central Limit Theorem

How well did you know this?

Not at all

Perfectly

Random Experiment

A process that results in an outcome

How well did you know this?

Not at all

Perfectly

Outcome

The value of the result of a single experiment

How well did you know this?

Not at all

Perfectly

Sample Space

The set of all possible outcomes for an experiment

How well did you know this?

Not at all

Perfectly

Event

A subset of the sample space

How well did you know this?

Not at all

Perfectly

Probability

Study These Flashcards

A number between 0 and 1 that dictates the chance of an event occurring

Sample Space

Study These Flashcards

A sample space of an experiment is the set of all possible outcomes
Ex: Sample Space of a single die roll is: {1, 2, 3, 4, 5, 6}

Event

Study These Flashcards

An event usually denoted by a single capital letter, is a subset of the sample space.
Ex: If you roll two dice, some possible events include:
- (1,1), (1,2), (2,1), (1,6), (6,6)

Probability

Study These Flashcards

For a single event A, the probability of A occurring, P(A), is denoted as:

P(A) = number of outcomes in which A occurs/ total possible outcomes

Addition Rule

Study These Flashcards

Addition Rule states:
P(A or B) = P(A) + P(A) - P(A and B)

Multiplication Rule

Study These Flashcards

Two events are said to be independent if the outcome of one does not depend on the outcome of the other. Otherwise, they are dependent.

The multiplication rule states:
P(A and B) = P(A) * P(B, given that A occurred) = P(A) * P(B|A)

For independent events, this is simply:
P(A and B) = P(A) * P(B)

Complements

Study These Flashcards

P(A) + P(not A) = 1

Deterministic Sampling

Study These Flashcards

Rather than randomizing, you take the first people that walk by or choose the people deterministically

Uniform Random Sampling

Study These Flashcards

Use software to assign and pick off an n’th group of people to choose

Random Sampling

Randomly select

From random sampling, what do we know about the sample mean?

The sample mean is the mean of the data sampled, and approximates the true mean.

Probability Distribution

the calculated likelihood of each possible event occurring without simulation or conducting the experiment

Empirical Distribution

the proportion of times a value is observed in a simulation or experiment, relative to the number of possible values

Law of large numbers

As our sample size grows larger, the data represents the population more accurately

statistic

a calculated number which describes a characteristic of a sample

parameter

value that estimates a characteristic of a population

statistical inference

a conclusion made based on data from multiple random samples.

Central Limit Theorem

This theorem states: Upon taking sufficiently large samples, the distribution of the sample means will approximate a normal distribution, regardless of the distribution sampled from.

Statistics and Distributions Flashcards

(34 cards)