What are chances? Flashcards
How to calculate for probability?
taking the number of ways the event can happen and dividing it by the total number of possible outcomes.
What is sampling with replacement?
The sample is placed back into the selection and can be chosen again.
What is independent probability?
Two events are independent if the probability of the second event does not changed based on the outcome of the first event.
Sampling without replacement is what?
Not replacing the value taken from the sample. Each pick is dependent
Dependent Events are what?
Probability of the second event is affected by the outcome of the first event
What is conditional probability?
Used to calculate the probability of dependent events.
A venn diagram is used for what?
Visualizing conditional probability
What is a venn diagram?
technique used to display the possible outcomes of multiple events and the overlap where both events can occur.
What is a probability distribution?
Describes the probability of each possible outcome in a scenario
What is the expected value of a probability distribution
mean
Discrete probability distribution is what?
Describes probabilities for discrete outcomes
What is the law of large numbers?
As the size of sample increases, the sample mean will approach the expected value
We can use discrete distributions to model what?
situations that involve count or interval data
What is a binary outcome?
Two possible values can occur
What is binomial distribution?
Probability distribution of the number of successes in a sequence of independent events
Binomial distribution can be represented by two parameters:
n: total number of events
p: probability of successes
Expected value of binomial distribution can be calculated by:
expected value: n x p
Normal distribution shape is commonly referred to as a what?
a bell curve
Bell curve properties:
Symmetrical
Area beneath the curve = 1
Curve never hits 0
Normal distribution is described by mean and standard deviation
Skewness is what?
Describes the direction that data tails off
Positive skewed / Right skewed is defined as
the plot peaks on the left and tails off to the right
A Negative skewed / left sked distribution is defined as
peaks on the right and tails off to the left.
Kurtosis is what?
A way of describing the occurrence of extreme values in a distribution
characterized by a large peak around the mean and smaller standard deviation
Positive kurtosis / leptokurtic
mesokurtic distribution is the term used to describe the
normal distribution
describes a distribution with a lower peak and larger standard deviation
negative kurtosis / platykurtic
A distribution of a summary statistic such as the mean is called?
Sampling Distribution
What is the central limit theorem?
a sampling distribution will approach a normal distribution as the size of the sample increases.
a process where the average number of events in a given time period is known, but the time or space between events is random.
Poisson process
describes the probability of some number of events happening over a fixed period of time.
Poisson distribution
Poisson distribution is described by a value called?
Lambda
Lambda represents what?
average number of events per time interval
Lambda changes the shape of the distribution
Lambda is the distribution’s peak
What is hypothesis testing?
Hypothesis testing is a group of theories, methods, and techniques to compare populations.
What is null hypothesis?
Start with an assumption that no difference exists between the populations
Hypothesis testing workflow:
Define the target populations
Develop null and alternative hypotheses
Collect sample data
Perform statistical tests on sample data
Draw conclusions about the population
What are experiments?
Experiments are a subset of hypothesis testing that involves performing statistical tests on sample data to draw conclusions about a population.
Experiments aim to answer:
What is the effect of the treatment on the response?
treatment = independent variable
response = dependent variable
What are controlled experiments?
A common type of experiment is a controlled experiment, where participants are randomly assigned to either the treatment group or the control group.
What is the gold standard of experiments?
Eliminate the bias as much as possible
Methods to help eliminate bias in controlled experiments?
Randomization
Blinding
double-blind randomized controlled trial
What is randomization?
participants are assigned to the treatment or control group randomly, not due to some characteristics
Randomization helps ensure that the groups are comparable.
What is blinding?
In a blind trial, the participants don’t know if they’re in the treatment or control group. This ensures that the effect of the treatment is due to the treatment itself, not the idea of getting the treatment.
What is a double-blind randomized controlled trial?
The person administering the treatment or running the experiment also doesn’t know whether they’re administering the actual treatment or a placebo.
What is A/B testing?
A/B testing only splits participants into two groups - treatment and control.
What is the Pearson correlation coefficient? (correlation coefficient).
It quantifies the strength of a relationship between two variables, producing a value between minus one and one.
This number corresponds to the strength of the relationship between the variables, and the sign, positive or negative, corresponds to the direction of the relationship
The Pearson correlation coefficient can only be used for what type of relationships?
Linear Relationships
What are linear relationships?
Proportionate changes between dependent and independent variables
Values represent what?
The strength of the relationship
What are the different correlation coefficients?
0.99 (very strong relationship)
0.75 (strong relationship)
0.56 (moderate relationship)
0.2 (weak relationship)
When the correlation coefficient is close to zero, what does that mean?
x and y have no relationship and the scatterplot looks completely random.
This means that knowing the value of x doesn’t tell us anything about the value of y.
What does the sign of the correlation coefficient represent?
The sign of the correlation coefficient corresponds to the direction of the relationship.
What is a positive and a negative correlation?
A positive correlation coefficient indicates that as x increases, y also increases.
A negative correlation coefficient indicates that as x increases, y decreases.
What is a confounding variable?
Something that affects the data we are analyzing, but was not accounted for when assessing the relationship between variables.
What is p-value?
This is the probability of achieving a result at least as extreme as the one we have observed, assuming the null hypothesis is true.
What is a type 1 error?
We can wrongly accept our null hypothesis when it’s false.
What is a type 2 error?