3: Probability distributions Flashcards

1
Q

Quantile

A

a specific value; defines a particualr part of data set. a quantile determines how many values in a distribution are above or below a certain limit.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Inference

A

Drawing conclusions about a population from a sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Probability + calculation

A

the chance of something happening (always between 0 and 1; the area under the normal distributed curve).

For instance, if the probability of a value being less than 1.8 is 0.85 (85%), then the probability of it being greater than 1.8 is 1 - 0.85 = 0.15 (15%).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

probability distribution

A

describes the chance of different outcomes of a random variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Normal distribution + shape

A

continuous probability distribution in which most data points cluster toward the mean, while the rest taper off toward either extreme. Bell/hill shaped

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Poisson distribution

A

counts; A discrete, non-negative probability distribution that can be right skewed. Has only one parameter, average rate at which these events occur, rate parameter λ (lambda), which is the mean number of events.

The Poisson distribution exactly models the number of events in a fixed time or space when the events are independent (one doesn’t affect the other) and happen at a constant rate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Binomial distribution

A

a fixed number of independent trials, each with two possible outcomes: success or failure. ratios, fractions, binary data. Can be skewed, left and right. Two parameters, probabilty of succes and number of trials.

difference is that the Binomial distribution deals with a fixed number of trials and a constant probability of success, while the Poisson distribution deals with the rate of events over time or space and is often used when the number of trials is very large or not fixed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Variance

A

Shows the extent to which observations deviate from one another (variance large = differences in group large)
the spread between numbers in a data set (used to determine how far each number is from the mean and from every other number in the set).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Random Variable

A

a variable whose outcome (values) is subject to a random process (determined by chance), Like flipping a coin, heads or tail, it is random and no other influence. A random variable can be either discrete (having specific values) or continuous (any value in a continuous range).

represent measurable properties from random processes, and their distributions give insight into variability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

properties of Random variable

A
  • We cannot predict the value of a random variable with absolute precision. as the test in each sample group will be different.
  • Functions base on random variables are also random variables.
    The function calculating the mean uses random variables so is a RV. New samples can give different means.
    -
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Statistics and RV

A

Measures like the mean, variance, and standard deviation are random variables themselves and have distributions. How good these estimates are, is measured by the standard error (SE).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

standard deviation (SD)

A

tells you how much the data itself varies. A measure of spread

The spread of data. The average amount of variability in your dataset. It tells you, on average, how far each value lies from the mean.

SD=√residual stand error^2 or √residual variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

standard error (SE)

A

meassure of uncertainty there is in a sample statistic like the mean or a slope

SE gets smaller as the sample size increases because more data provides a better estimate of the population parameter, leading to reduced variability in the estimate
SE = SD / √(sample size n)
or SE = coefficient/t-value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Difference SE and SD(stdev)

A

SD tells you how much the data itself varies. A measure of spread
SE measure of uncertainty there is in a sample statistic like the mean or a slope

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Degrees of Freedom + formula

A

Is the amount you have to calculate a statistic. It’s calculated as the sample size minus the number of paramters estimated. df=n-1 (n=sample size)

WhenDF runs out model is to complicated for the number of observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Describe the relation between a random variable, degree of freedom and a statistical model;

A

A random variable represents outcomes of a random process, a degree of freedom refers to the number of independent values that can vary in a calculation, and a statistical model uses random variables and degrees of freedom to estimate parameters and make inferences about data

17
Q

Residuals

A

The difference between the actual outcome and that predicted by the model -> the sample estimate of the error.

Residuals represent the differences between the observed data points and the predicted values from a model. Residual = Observed value - Predicted value.
They show how well the model fits the data: smaller residuals mean a better fit, while larger residuals indicate that the model is not fully capturing the data’s pattern.
Residuals are also a reflection of the random, unexplained variability in the data (also referred to as error or noise).

The random part of a model that accounts for the unpredictability or unexplained variation (error).

18
Q

What does it mean that for the Poisson distribution, the mean is equal to the variance

A

In a Poisson distribution, which counts the number of events happening in a specific time or space, the mean (average number of events, λ) is the same as the variance (the spread of the data, also λ). This means that as the average number of events goes up, the variation in how many events actually occur also increases. This property is important for understanding how Poisson distributions work, especially for rare events.

19
Q

Skew

A

refers to the asymmetry of a probability distribution. It indicates whether data points tend to fall to the left or right of the mean

Binomial:
1. Right Skewed (Positive Skew)
Occurs when the probability of success is small. When p is low, there are many more ways to get a low number of successes than to get a high number. Thus, the distribution has a longer tail on the right side.
2. Left Skewed (Negative Skew)
Occurs when the probability of success is large. When p is high, there are many more ways to get a high number of successes than to get a low number. Therefore, the distribution has a longer tail on the left side.

Normal can not be skewed and poisson Right skewed only when the mean (λ) is low. In this case, there are more occurrences of low counts (like 0 or 1 events), and the tail on the right side (for larger counts) is longer.

20
Q

Difference variance and Residuals

A

Variance shows difference btween data and residuals show between predicted value by model and real value