Week 1 Probabilities and Interpretations Flashcards
define the two types of data
qualitative - non numeric
quantitative - numeric
define the two types of quantitative data
discrete - data can only take certain values
continuous - data can take any value within a range
what is the first step for data visualisation
Create the simplest graph that conveys the information you want to convey
what is the 2nd step for data visualisation
consider the type of encoding object and attribute used to create a plot
what is the 3rd step for data visualisation
focus on visualising patterns or on visualising details, depending on the purpose of the plot
what is the 4th step for data visualisation
select a meaningful axis value
what is the 5th step for data visualisation
data transformation and graph aspect ratios can be used to emphasize ratios of change
what is the 6th step for data visualisation
plot overlapping points that allows density to become apparent
what is the 7th step for data visualisation
use lines when connecting sequential data in time-series plots
what is the 8th step for data visualisation
aggregate larger datasets
what is the 9th step for data visualisation
keep axis ranges as similar as possible
what is the 10th step for data visualisation
select an appropriate colour scheme
when are central tendency values useful
describing data with single values
define the arithmetic mean
it is the central measure that is the result of the sum of all terms divided by the number of terms
what is the geometric mean
calculated as the N-th root of the product of the N elements in the datasets
what is the harmonic mean
reciprocal of the arithmetic mean of the reciprocals of the data values
what is the root mean square and its equation
the square root of the mean of sum of the squares of the data values
sqrt[ Σx^2 / n ]
what is the median
the data value that separates the data into an upper and lower half
what is the mode
the value that appears most frequently in a dataset
what are the 7 measures of dispersion that can be used to characterise a dataset
variance/standard deviation mean absolute deviation skewness kurtosis covariance correlation covariance matrix
what is variance and its equation
it is the spread of distribution
V(x) = 1 / N Σ(xi - μ)
μ is the true mean
what is standard deviation in terms of variance
the square root of the variance
what is the mean absolute deviation equation
MAD = 1/N ΣIxi - I
what is skewness and its equation
measure of asymmetry of a distribution
γ = 1/σ^3 )^3> = 1/Nσ^3 Σ(xi -)^3
what is kurtosis and its equation
measure of tailedness of a distribution
κ = 1/σ^4 )^4 - 3 = 1/Nσ^4 Σ(xi - )^4 - 3
when are covariance, correlation and covariance matrix more useful as measures of dispersion
when dealing with multiple variables
what is covariance and its equation
measure of the joint variability of two random variables
cov(x,y) = 1/N Σ(xi - )(yi - )
what is correlation and its equation
it is the normalisation of covariance via standard deviation
p(x,y) = cov(x,y) / σxσy
what is the covariance matrix
the combination of covariance in a matrix