3: Probability distributions Flashcards
Quantile
a specific value; defines a particualr part of data set. a quantile determines how many values in a distribution are above or below a certain limit.
Inference
Drawing conclusions about a population from a sample
Probability + calculation
the chance of something happening (always between 0 and 1; the area under the normal distributed curve).
For instance, if the probability of a value being less than 1.8 is 0.85 (85%), then the probability of it being greater than 1.8 is 1 - 0.85 = 0.15 (15%).
probability distribution
describes the chance of different outcomes of a random variable
Normal distribution + shape
continuous probability distribution in which most data points cluster toward the mean, while the rest taper off toward either extreme. Bell/hill shaped
Poisson distribution
counts; A discrete, non-negative probability distribution that can be right skewed. Has only one parameter, average rate at which these events occur, rate parameter λ (lambda), which is the mean number of events.
The Poisson distribution exactly models the number of events in a fixed time or space when the events are independent (one doesn’t affect the other) and happen at a constant rate.
Binomial distribution
a fixed number of independent trials, each with two possible outcomes: success or failure. ratios, fractions, binary data. Can be skewed, left and right. Two parameters, probabilty of succes and number of trials.
difference is that the Binomial distribution deals with a fixed number of trials and a constant probability of success, while the Poisson distribution deals with the rate of events over time or space and is often used when the number of trials is very large or not fixed.
Variance
Shows the extent to which observations deviate from one another (variance large = differences in group large)
the spread between numbers in a data set (used to determine how far each number is from the mean and from every other number in the set).
Random Variable
a variable whose outcome (values) is subject to a random process (determined by chance), Like flipping a coin, heads or tail, it is random and no other influence. A random variable can be either discrete (having specific values) or continuous (any value in a continuous range).
represent measurable properties from random processes, and their distributions give insight into variability
properties of Random variable
- We cannot predict the value of a random variable with absolute precision. as the test in each sample group will be different.
- Functions base on random variables are also random variables.
The function calculating the mean uses random variables so is a RV. New samples can give different means.
-
Statistics and RV
Measures like the mean, variance, and standard deviation are random variables themselves and have distributions. How good these estimates are, is measured by the standard error (SE).
standard deviation (SD)
tells you how much the data itself varies. A measure of spread
The spread of data. The average amount of variability in your dataset. It tells you, on average, how far each value lies from the mean.
SD=√residual stand error^2 or √residual variance
standard error (SE)
meassure of uncertainty there is in a sample statistic like the mean or a slope
SE gets smaller as the sample size increases because more data provides a better estimate of the population parameter, leading to reduced variability in the estimate
SE = SD / √(sample size n)
or SE = coefficient/t-value
Difference SE and SD(stdev)
SD tells you how much the data itself varies. A measure of spread
SE measure of uncertainty there is in a sample statistic like the mean or a slope
Degrees of Freedom + formula
Is the amount you have to calculate a statistic. It’s calculated as the sample size minus the number of paramters estimated. df=n-1 (n=sample size)
WhenDF runs out model is to complicated for the number of observations
Describe the relation between a random variable, degree of freedom and a statistical model;
A random variable represents outcomes of a random process, a degree of freedom refers to the number of independent values that can vary in a calculation, and a statistical model uses random variables and degrees of freedom to estimate parameters and make inferences about data
Residuals
The difference between the actual outcome and that predicted by the model -> the sample estimate of the error.
Residuals represent the differences between the observed data points and the predicted values from a model. Residual = Observed value - Predicted value.
They show how well the model fits the data: smaller residuals mean a better fit, while larger residuals indicate that the model is not fully capturing the data’s pattern.
Residuals are also a reflection of the random, unexplained variability in the data (also referred to as error or noise).
The random part of a model that accounts for the unpredictability or unexplained variation (error).
What does it mean that for the Poisson distribution, the mean is equal to the variance
In a Poisson distribution, which counts the number of events happening in a specific time or space, the mean (average number of events, λ) is the same as the variance (the spread of the data, also λ). This means that as the average number of events goes up, the variation in how many events actually occur also increases. This property is important for understanding how Poisson distributions work, especially for rare events.
Skew
refers to the asymmetry of a probability distribution. It indicates whether data points tend to fall to the left or right of the mean
Binomial:
1. Right Skewed (Positive Skew)
Occurs when the probability of success is small. When p is low, there are many more ways to get a low number of successes than to get a high number. Thus, the distribution has a longer tail on the right side.
2. Left Skewed (Negative Skew)
Occurs when the probability of success is large. When p is high, there are many more ways to get a high number of successes than to get a low number. Therefore, the distribution has a longer tail on the left side.
Normal can not be skewed and poisson Right skewed only when the mean (λ) is low. In this case, there are more occurrences of low counts (like 0 or 1 events), and the tail on the right side (for larger counts) is longer.
Difference variance and Residuals
Variance shows difference btween data and residuals show between predicted value by model and real value