Lecture 2 REVISED Flashcards
what are common applications of statistics?
- predictive modelling
- pattern recognition
- anomaly detection
- classification
- sentiment analysis
what are common business use cases of statistics?
- customer analytics
- targetted advertising
- website personalisation
- risk management
- investment optimisation
- fraud detection
examples of challenges in statistics?
- varied/massive amounts of data
- varied types of data ((un)/(semi)/structured data)
- eliminating bias
“numbers don’t lie” ?
even when numbers are correct, people and organisations with their own agendas may use them to mislead
can skew the story and hide relevant facts
‘numbers don’t lie’ is false
unethical uses of statistics
- biased sampling
- eradicating data that doesn’t support your views
- eradicating data without justifiable reason
- using jargon
- deliberately using wrong method of analysis
statistical enquiry circle? (PPDAC)
- problem
- plan
- data
- analysis
- conclusion
primary/secondary data?
primary = data collected directly from the source
secondary = data previously collected by someone else
differences between data and information?
data = raw facts/figures, input, meaningless unless contextualised
information = polished data with context, meaningful, easier to understand, output
qualitative/quantitative data?
quantitative data = represents measures/counts - always numeric (interval/ratio scale)
qualitative data = names or labels used to identify an attribute (nominal/ordinal scale)
what does level of measurement determine?
the amount of information contained in the data
what are the 4 levels of measurement?
- nominal
- ordinal
- interval
- ratio
nominal data
- consists of labels/names used for identification
- can be numeric or non-numeric
- categories are in no logical order and have no particular relationship
ordinal data
- exhibits properties of nominal data and may be rank ordered
interval data
represented by numbers but doesn’t have a true 0
ratio data
represented by numbers and has a true 0
how do measurement levels work?
there are qualitative and quantitative measurement levels
qualitative = nominal & ordinal
quantitative = interval & ratio
the higher the level of measurement…
the more precise the data is
precision doesn’t ensure accuracy
big data
refers to the large & diverse sets of information
3 V’s of big data?
volume, variety, velocity
- the volume of information
- the velocity/speed at which data is created/collected
- the variety of data available
structured & unstructured data?
structured = easily formatted & stored
unstructured = free form, less quantifiable