Data analysis Flashcards
Key forms of data analysis
Descriptive
Inferential
Predicative
Descriptive analysis
Presents data in a simpler format that is more easily understood and by the user
Describes the data actually presented
Key measures/parameters used in a descriptive analysis
Measure of central tendency
Measure of the dispersion
(Also the shape of the (empirical) distribution)
Measurements of central tendency
Mean
Median
Mode
Measurements of the dispersion
Standard deviation
Ranges such as the interquartile range
Inferential analysis
Gather data in respect of a sample which is used to represent the wider population
Measures/Paramaters of inferential analysis
Measure of central tendency
Measure of the dispersion
(Testing Hypothesis)
Predictive analysis
Extends the principles behind inferencial analysis in order for the user to analyse past data and make predictions about future events
How is predictive analysis used to make projections
It uses an existing set of data with known attributes/featues (training set) in order to discover potentially predictive relationships.
Those relationships are tested using a different set of data (test set) to assess the strength of those relationships
Typical example of a predictive analysis
Regression analysis
Linear regression
The relationship between a scalar dependant variable and an explanatory or independent variable is assumed to be linear and the training set is used to determine the slope and intercept of the line
Eg a car’s speed and braking distance
Data Analysis Process
Develop a well-defined set of objectives
Identify the data items required for the analysis
Collection of the data from appropriate sources
Processing and formatting data for analysis
Cleaning data
Exploratory data analysis (despriptive/ inferential/ predictive)
Modelling the data
Communicating the results
Monitoring the process, update the data and repeat if necessary (actuarial control cycle)
The modelling team throughout the data analysis process
Ensure that any relevant professional guidance has been complied with
Ensure any relevant legal requirements are complied with
Possible issues with the data collection process that the analyst should be aware of
Whether the process was manual or automated
Limitations on the precision of the data collected
Whether there was any validation at source
If data was not collected automatically, how was it it converted to an electronic form
Why is randomisation used?
Reduce the effect of bias
Reduce the effect of confounding variables (a variable that influences both the dependent variable and independent variable causing a false association)
Random sampling schemes
Simple random sampling
Stratisfied sampling
Another Sampling method