Probablility Flashcards
probability
provides mathematical framework for thinking about the uncertainty of future events
compound events
Independence do you like conditional probability knew line random variable
Probability density function
Cumulative density function knew line quintiles and percentiles
Classical Probability distributions
binomial (discrete), negative binomial (discrete),Poisson (discrete), Uniform (continuous), normal/gaussian (continuous), exponential (continuous),power law (continuous)
binomial (discrete)
tbc
negative binomial (discrete)
tbc
Poisson (discrete)
tbc
Uniform (continuous)
tbc
normal/gaussian (continuous)
tbc
exponential (continuous)
tbc
power law (continuous)
tbc
statistics
provide a mathematical framework to collect analyse interpret and present data
descriptive statistics
used to describe and summarise the data
inference statistics
used to make predictions by taking samples of data from a population of making generalisations about that population
Central limit theorem CLT
if we repeatedly take independent random samples of the size and from population under when Anna is large the distribution of the samples means it will approach a normal distribution. This allows us to make inferences from a sample about a population without needing the characteristics of the whole population. Confidence intervals hypothesis testing and P value analysis are all based on the CLT.
law of large numbers LLN
if an experiment is repeated independently a large number of times the average should be close to the expected value.
sampling
the process of collection and selection of data the goal is to make a statistical inference about a population from a small set of observations
random sampling
Every member in your population has equal chance of being sampled
stratified sampling
the population is split into groups and then members are randomly sampled from each group
cluster sampling
the population is first split into groups or clusters and then some clusters of randomly selected to be in your sample
Systematic sampling
every member in your population is ordered into a list you then choose a random point and then select every Kth member
sampling errors
errors introduced to your data via sampling excuse in the data in some way therefore not reflective of the real world distribution. this can be combated by increasing the sample size and ensuring the sample accurately represents the entire population.
hypothesis testing
a tool used to determine the probability that a given hypothesis is true
hypothesis testing process
formulates the null hypothesis
choose a test statistic
calculate the P value
compare the P value to a significant value alpha
Z-test
test to determine if the sample mean is the same as the population mean
T-test (one sample )
test to determine if the mean of a normally distributed population is different from a hypothesise value
T-test (two-sample )
tested determine if the means of two populations are significantly different from one another
chi-square test (goodness of fit)
test to determine how the observed data fits some given probability distribution
chi squared test (for independence )
test to determine if two categorical variables are related
Correlation
describe the relationship between two variables in a context such that one variable affects another
Pearson coefficient
the degree of the relationship between linear linearly related variables
Spearman rank coefficient
computed on ranks and depicts monotonic relationships also known as the person correlation coefficient between rank variables
linear algebra
provides a mathematical framework to operate on matrices