10 Descriptive Statistics Flashcards
What do stats allow us to do?
- understand world phenomena with use of available data
- obtain tools to summarise and interpret data
- make proper inference/ forecasting
Descriptive statics
Used to summarise information which would otherwise be too complex to take in
Statistical inference
The drawing of lessons about a population from studying a sample of data drawn from that population
Variable
A specific characteristic of a unit
What are the two different types of variable?
Numerical- where each observation takes a numerical value
Categorical- records which of a series of categories are observed
What are the types of numerical variables
Discrete- possible values are limited to a sequence of number (usually natural numbers)
Continuous- can take on any value within a range of real numbers
What are the types of categorical values
Nominal- the categories have no ordering or ranking
Ordinal- the categories have a ranking
Types of data
Cross section data- data on several units at one point in time
Time series data- data on one unit across several points in time
Panel data- data on several units across several points in time
Population
Describes the complete set of all units of interest to an investigator (N)
Sample
An observed subset of the population (n)
Simple random sampling
- each member of the population is chosen strictly by chance
- each member of the population is equally likely to be chosen
- every possible sample of n objects is equally to be chosen
Xi
An observation in a sample
Sigma
The sum of the values
Frequency distribution
A list or table containing groupings and corresponding frequencies for days within each group
K
Possible groups which data could fall in
Absolute frequency
The number of observations belonging to a group
Relative frequency
The proportion of observations belonging to that group
Cumulative frequency
The total number of observations in that and any previous class
Cumulative relative frequency
The proportion of observations in that and any previous class
Arithmetic mean
The sum of all the values divided by the number of values
Median
The middle observation. If the number of values is even, the median is the mean of the tie middle values
Mode
Most common value
Which one out of mean, median and mode is most affected by outliers?
Mean
Left skewed data
When the mean is less than the median
Right skewed data
When the mean is greater than the median
Geometric mean
Used to measure the rate of change of a variable over time. It is the nth root of a product of n numbers
Range
The difference between the smallest and largest value of the data
Interquartile range
Calculates the range of the middle 50% of the data Q3-Q1
Variance
The average of squared deviations of values from the mean.
What is the sample variance divided by and why?
n-1 because the sample variance is an estimation and is underestimated since extreme values are rare and are unlikely to be included in the data
Standard deviation
The square root of the variance
Why is standard deviation more useful than variance
Standard deviation allows us to measure the spread from the men’s in units
Advantages of variance and standard deviation
- each value of the data set is used in the calculations
* values far from the mean are given extra weight
Coefficient variation
Measures the relative variation and can be used to compare two or more sets of data. It is always given as a percentage
Covariance
Measures the joint variability of two variables
What does the sign of the covariance indicate
The direction of the relationship between the two variables.
Cov(x,y)>0 x and y have positive correlation
Cov(x,y)<0 x and y have negative correlation
Coefficient of correlation
Measures the relative strength of the linear relationship between two variables
What values does the coefficient of correlation take?
-1 to 1
Why is coefficient of correlation more useful than covariance?
It gives both the direction and strength of the relationship