Chapter 4, 5, 6, 7, 8 Flashcards
scatter plots
measure two quantitative variables on the same individual. Each point represents an individual.
Explanatory variable is plotted on the horizontal axis; response variable is plotted on the vertical axis. Not always clear which variable is which.
Two types of linearly-related variables
Positively associated when one increases, the other increases. (trends uphill)
Negatively associated when one increases, the other decreases. (trends downhill)
Testing for a linear relation
- find the absolute value of the correlation coefficient, |r|
- Find the critical value in Table II Appendix A for the given samples size, CV.
- If |r| > CV, we say a linear relation exists between the two variables. Otherwise, no linear relation exists.
Residual
The difference between the observed and predicted values of y is the error, or residual.
The criterion to determine the line that best describes the relations between two variables is based on the residuals. The most popular technique for making the residuals as small as possible is the method of lease squares.
Least-Squares Regression Criterion
The least-squares regression line is the line that minimizes the sum of the squared errors (residuals).
This line minimizes the sum of the squared vertical distance between the observed values of y and those predicted by the line “y-hat”.
linear correlation coefficient
measures the strength and direction of the linear relationship between two quantitative variables.
If r = +1, then a perfect positive relationship exists between the two variables.
If r = -1, then a perfect negative linear relation exists between the two variables.
r is specifically for a sample. Rho for population.
Unitless.
sample space of a probability experiment
S
the collection of all possible outcomes.
Unusual event in a probability experiment
an event that has a low chance of occurring. < 5%, by convention.
probability of an event
P(E)
An event is any collection of outcomes from a probability experiment.
The probability of drawing an ace from a standard deck of cards is 1/13 is an example of…
theoretical (classical), empirical (experimental), or subjective probability?
The probability of drawing an ace from a standard deck of cards is 1/13 is an example of… theoretical (classical) probability.
Based on the Department of Public Safety, 75% of all car crashes is due to driver error is an example of…
theoretical (classical), empirical (experimental), or subjective probability?
Based on the Department of Public Safety, 75% of all car crashes is due to driver error is an example of empirical (experimental) probability.
I’m 100% confident that you will win the match is an example of…
theoretical (classical), empirical (experimental), or subjective probability?
I’m 100% confident that you will win the match is an example of subjective probability.
Independent vs. dependent probability events
In an independent probability event, the outcome of one event does not affect the probability of the next. Example: rolling a dice.
In a dependent probability event, the outcome of one event affects the probability of the next. Example: drawing a card from a deck, without replacement.
Think: Does the given variable change the probability?
Does P(A) = P(A/B)?
P(A/B) notation means…
…the probability of A occurring, given that B has already occurred.
(A and B could be independent or dependent events).
Ex. What is the probability of someone having leprosy, given that they are from a low income country? P(having leprosy/from L.I. country). LI country is a restriction.
P (A and B) =
P(AandB) = P(A/B)P(B) = P(B/A)P(A)
If A and B are independent events, then P(A and B) =
= P(A) P(B)
i.e.
P(A) = P(A/B)
When P(A and B)=0, then_____.
If P(A and B) = zero, then mutually exclusive events.
permutation
an ORDERED arrangement in which r objects are chosen from n distinct objects.
without replacement:
nPr = n! / (n-r)!
with replacement:
n ^ r
multiplication rule of counting
Given 4 choices of A, 3 choices of B, 4 choices of C, 8 choices of D, (one of each) how many combinations are there?
434*8 = 384
factorial symbol, n!
Used for counting without repetition, when the number of options is declining.
If n is an integer (n greater than or equal to zero), then
n! = n(n-1)
ex. Possible routes between 7 different schools. There will be 7 route choices for school A, 6 for B, etc.
7654321 = 7! = 5040
This could also be written 7P7, or P(7,7).
There is a factorial (!) key on my calculator.
Evaluate the expression P(14,3), equivalently 14P3.
This is a factorial problem.
14P3 (equivalently P(14,3)
= 14 * 13* 12 = 2184
Combination
selection of r objects from a set of n different objects when the order doesn’t matter, and no repetition.
nCr = n! / r!(n-r)!
ex. 20C3, equivalently C(20,3) = 1140
Random variable
a numerical measure of the outcome of a probability experiment, so its value is determined by chance.
Denoted with capital letters such as X. Possible values are denoted with lower case letters such as x.
Discrete random variable
a numerical summary of the outcome of a probability experiment which has either a finite or countable number of values.
So, possible values of X are x=0,1,2,…
Continuous random variable
a numerical summary of the outcome of a probability experiment which has infinite values.
notation for an event in which the random variable is exactly 2…
X = 2
random variable notation to say that the probability that the face value is exactly 4 is equal to 0.17
P(X=4)=0.17
random variable notation to say that the probability that the face value is exactly 3
P(X=3)
expected value E(X)
The expected value E(X) of a random variable is the same as the mean, because it represents what we would expect to happen in the long run.
binomial distribution
ie. Bernoulli trial
A binomial trial is a random experiment with exactly two possible outcomes–success and failure. The outcomes fit a binomial probability distribution.
X = number of successes
Conditions for binomial distribution:
- experiment performed a fixed number of times under identical conditions = trials.
- the trials are independent.
- each trial has two mutually exclusive outcomes–success and failure.
- the probability of success is the same for each trial of the experiment.
p = the probability of success on one trial.
q= 1 - p = the probability of failure on one trial.
Poisson Distribution
A random variable X = The number of successes in a fixed interval of time or space.
Conditions:
-The probability of two or more successes in any sufficiently small subinterval is 0.
- The probability of success is the same for any two intervals of equal length.
- The number of successes in any interval is independent of the number of successes in any other interval, proved the intervals are not overlapping.
Hypergeometric Distribution
A probability experiment is a hypergeometric experiment if…
- Finite population N with two subgroups; one of the subgroups is the group of interest and has a size k.
- For each trial, there are two possible outcomes–success and failure. k group = success.
- A sample of size n is taken from population N without replacement, so trials are dependent.
The area under a normal distribution curve can be interpreted as…
A probability or a proportion.
The value of the total area under the normal distribution curve represents 1.
The area to the right of the mean represents 0.5.
Inflection points
the points on the normal curve at
x=mean + st dev
and at
x=mean - st dev
model
an equation, table, or graph used to describe reality
properties of a normal density curve
A normal density curve (normal probability distribution)…
- Is symmetric around the mean at a single peak.
- mean = median = mode.
- Inflections at
mean - 1 st dev, and at
mean + 1 st dev.
- Area under the curve = 1.
- Area under curve to the right and left of the mean = 0.5.
- As x increases or decreases, the graph approaches, but never reaches, the horizontal axis.
Sampling distribution
The sampling distribution of a statistic is a probability distribution for all possible values of the statistic computed from sample size n.
sampling distribution of the sample mean
x-bar is the probability distribution of all possible values of the random variable x-bar computed from a sample of size n from a population of a given mean and standard deviation.
Notes on inferring samples to populations
- The spread of a population should be larger than that of a sample
- As sample size n increases, the standard deviation of the distribution decreases.
The Central Limit Theorem
Regardless of the shape of the underlying population, the sampling distribution of x-bar becomes approximately normal as the sample size n increases.
Okay to assume normal distribution if one of these two conditions are met…
Assume normal distribution IF EITHER…
1. the sample size, n, is sufficiently large (n > or equal to 30)
OR
2. the underlying population is known to be normally distributed.