Statistics Flashcards
Descriptive data
Methods for organising, summarising, and presenting data in an informative way - Graphs, tables, and numbers
Inferential
Methods for drawing conclusions about a population, from a sample
Qualitative (Categorical)
Nominal - Categories that cannot be ordered (eg male, female)
Ordinal - Categories that can be ordered, but the numerical difference between groups cannot always be determined (eg, low-income, middle-income, high-income)
Quantitative (Numerical)
Discrete: Number
Continuous - Interval data, doesn’t contain a true zero, and ratio
Raw data
Collected data that has not been organised numerically or grouped
Frequency
How many times does value/category appear in the data. Can be expressed as the total number of individuals or expressed as a fraction/percentage
Quartiles
Quarters
1st quartile - located where 25% of all data points are equal to or lower than this Q1 value and 75% equal to or higher
Percentiles
100s
Median quartile
Second quartile, 50th Percentile
Interquartile range(IQR)
Q3-Q1
Sturge’s rule
A rule for determining the number of classes to use in a histogram or frequency distribution table - Optimal bins
k=1+3.33*log10(n)
Mean calculation
X̄=∑x,/n
Median
Middle of the data set. equal halves
Mode
Value which occurs with the greater frequency
Deviation from the mean
Difference between each price and the average price
x,-X̄
Symmetric distribution
Graph is a mirror image, Median=mean
Left skewed
mode>median>mean
Negative
Right skewed
mean>median>mode
Positive
Variance
The average of all deviations
σ^2=∑(x,-X̄)^2,/n-1
Standard deviation
A quantity expressing by how much the members of a group differ from the mean value of group
Sx=SQR(∑(x,-X̄)^2,/n-1)
Skewness
=3(mean-median)/standard deviation
Kurtosis
Measure of the tailedness of a distribution - how often outliners occur
=∑(x,-X̄)^4/n/S^4
Cross-sectional data
Observations from a particular point in time, containing different variables
Time series data
Data across time periods
Heteroskedasticity
Periods of variable volatility
Serial Correlation
Little to no variation in tend of time data
Growth factor
Xt/Xt-1
The approximate average growth rate
Average of each points growth rate over the period. (First data point cannot have an average, as such only dividing by n-1) Arithmetic equation
The accurate growth rate
Geometric mean of the growth factors
=^nSQRT(gt/(t-1)*gt-1/(t-2) -1
Approximate average log equation
=In(Xt)-In(I1)/n-1
Probability
Certainty of an outcome
A change experiment
A procedure carried out under controlled conditions which has a well-defined set of possible results
Sample space
All possible outcomes
Simple set
A single outcome
Compound set
Collection of possible outcomes
The law of large number
The greater the number of turns, the closer that the outcome will approach its probability
Independent events
P(A|B)=P(A)
P(B|A)=P(B)
P(A&B)=P(A)P(B)
Complement rule
P(A)=1-P(A’)
Multiplication rules (Joint Probability) And
Dependent: P(A&B)=P(A)P(B|A)
Independent: P(A&B)=P(A)P(B)
Mutually exclusive: P(A&B)=0
Addition Rules (Union of Events) Or
Non mutually exclusive events: P(A or B)=P(A) + P(B) - P(A&B)
Mutually exclusive: P(A or B)=P(A) + P(B)
Bayes’ Theorem
P(A|B)=P(B|A)*P(A)/P(B)
Factorial
! Multiplication of all positive consecutive numbers up to and including the original number
Permutation
Number of unique ways of arranging data set where its order matters
Calculated by the factorial
P(n,r)=n!/(n-r)!
P=6!/P(n,r)
Combination
Binomial Coefficient
Do not require a particular ordering of number
C(n,k)=k!/r!(n-k)!
Discrete Random variables
Outcome is random but can only take a limited number of outcomes
Probability distribution function
Chance of picking data from random set
Expected value
Long term average or mean
Law of large numbers
The higher number of turns the closer the outcome will resemble the probability
Standard deviation of a probability function
=SQRT(∑(x-E(X))*P(x))
Characteristics of Binomial distribution
Fixed number of trials
Only two possible, mutually exclusive outcomes
The trials are independent
X~B(n,p)
Binomial sample space
number of outcomes^n
Number of combinations
nCxP^x(1-p)^n-x
nCx=n!/x!(n-x)!
The number of combinations, times by the probability of event x to the power of its occurs, timed by the probability of event 2 to the power of its probability
Probability density function
Represents the distribution of a continuous function
Cumulative distribution function
Area under he curve
Used to evaluate the probability of X assuming values in a particular interval
Probability = Area
Properties of Continuous probability distributions
The outcomes of random variable X are measured, no counted
The entire area under the curve is =1
P(c<x<d) is the probability that the random variable X takes a value x in the interval between the values c and d. P(c<x<d) is the area under the curve, above the x-axis, to the right of c and the left of d
P(x = c) = 0 The probability that x takes on any single individual value is zero. The area below the curve, above the X-axis, and between x = c and x = c has no width, and therefore no area (area = 0)
P(c < x < d) is the same as P(c ≤ x ≤ d) because probability is equal to area (and the probability that X is equal to the end points c or d is 0)
Bell curve distribution
Mean=Median=Mode
X~N(,)
() standard deviation
The standard normal distribution
Denoted by Z
Transforms any distribution X variable into Z such that Z has a mean of 0 and a standard deviation of 1
X~N(a,b) mean of X and SD of C
Subtract the mean from both sides, and then divide both sides by the standard deviation
Bell curve distribution Rule
68% within one standard deviation
95% within two standard deviations
99.7 within three standard deviations