3: Probability distributions Flashcards
Quantile
a specific value; defines a particualr part of data set. a quantile determines how many values in a distribution are above or below a certain limit.
Inference
Drawing conclusions about a population from a sample
Probability + calculation
the chance of something happening (always between 0 and 1; the area under the normal distributed curve).
For instance, if the probability of a value being less than 1.8 is 0.85 (85%), then the probability of it being greater than 1.8 is 1 - 0.85 = 0.15 (15%).
probability distribution
describes the chance of different outcomes of a random variable
Normal distribution + shape
continuous probability distribution in which most data points cluster toward the mean, while the rest taper off toward either extreme. Bell/hill shaped
Poisson distribution
counts; A discrete, non-negative probability distribution that can be right skewed. Has only one parameter, average rate at which these events occur, rate parameter λ (lambda), which is the mean number of events.
The Poisson distribution exactly models the number of events in a fixed time or space when the events are independent (one doesn’t affect the other) and happen at a constant rate.
Binomial distribution
a fixed number of independent trials, each with two possible outcomes: success or failure. ratios, fractions, binary data. Can be skewed, left and right. Two parameters, probabilty of succes and number of trials.
difference is that the Binomial distribution deals with a fixed number of trials and a constant probability of success, while the Poisson distribution deals with the rate of events over time or space and is often used when the number of trials is very large or not fixed.
Variance
Shows the extent to which observations deviate from one another (variance large = differences in group large)
the spread between numbers in a data set (used to determine how far each number is from the mean and from every other number in the set).
Random Variable
a variable whose outcome (values) is subject to a random process (determined by chance), Like flipping a coin, heads or tail, it is random and no other influence. A random variable can be either discrete (having specific values) or continuous (any value in a continuous range).
represent measurable properties from random processes, and their distributions give insight into variability
properties of Random variable
- We cannot predict the value of a random variable with absolute precision. as the test in each sample group will be different.
- Functions base on random variables are also random variables.
The function calculating the mean uses random variables so is a RV. New samples can give different means.
-
Statistics and RV
Measures like the mean, variance, and standard deviation are random variables themselves and have distributions. How good these estimates are, is measured by the standard error (SE).
standard deviation (SD)
tells you how much the data itself varies. A measure of spread
The spread of data. The average amount of variability in your dataset. It tells you, on average, how far each value lies from the mean.
SD=√residual stand error^2 or √residual variance
standard error (SE)
meassure of uncertainty there is in a sample statistic like the mean or a slope
SE gets smaller as the sample size increases because more data provides a better estimate of the population parameter, leading to reduced variability in the estimate
SE = SD / √(sample size n)
or SE = coefficient/t-value
Difference SE and SD(stdev)
SD tells you how much the data itself varies. A measure of spread
SE measure of uncertainty there is in a sample statistic like the mean or a slope
Degrees of Freedom + formula
Is the amount you have to calculate a statistic. It’s calculated as the sample size minus the number of paramters estimated. df=n-1 (n=sample size)
WhenDF runs out model is to complicated for the number of observations