13: Data Analysis Flashcards
What is data and information?
Data is numbers etc that have been recorded but not yet processed
Info is data that’s been processed in a way that’s meaningful
Data + meaning = information
Three reasons we need info?
Assist in planning
Assist in decision-making
Controlling day to day operations
Four types of data?
Quantitative
Qualitative
Discrete - non-continuous
- counted
Continuous - unbroken with no gaps
- measured
What are the sources of data?
Internal (from software etc)
External
- formally gathered (from research and specialists etc)
- informally gathered (ongoing basis)
Internet of things
What are the ACCURATE qualities of good information?
Accurate
Complete
Cost-beneficial
User-targeted
Relevant
Authoritative
Timely
Easy to use
What are the 5 stages of data analysis?
Identify the information needs
Collect the data
Analyse the data
Present the information
Use the information
5 ways of analysing the data?
Inferential statistics
- random sample to make inferences
Exploratory data analysis
- pattern is identified in a set
Confirmatory data analysis
- confirms a hypothesis or not
Population
Sampling (Random, Systematic, Surveys, Stratified)
What are 5 functions of spreadsheets?
What if analysis
Budgeting and forecasting
Reporting performance
Variance analysis
Inventory valuation
Advantages and disadvantages of spreadsheets?
Plus:
- manipulate large volumes of data
- quicker processing
- can be shared
- easier to read
Negatives:
- time consuming
- input errors
- sharing violations
- difficult to spot errors
- cyber attacks
- finite records
7 types of data bias?
Selection
Self selection
Observer
Omitted variable
Cognitive
Confirmation
Survivorship
Hypothesis testing?
Data is used to confirm an idea
Null hypothesis - no difference between certain characteristics
Statistical significance - results have specific cause
Type I and II errors
I: null hypothesis is correct but rejected
II: null hypothesis is incorrect but accepted
3 ways to present info:
Bar charts - easiest to use
Pie charts - relative contribution
Line graphs - over a continuous period of time
What is big data and what are the four features?
Size beyond the ability of typical database software
Volume (amount of data)
Variety (various formats)
Velocity (speed of data)
Veracity (reliability of data)
What is data science?
Collecting, preparing, interpreting, visualising large and complex data sets
Scientific approach which applies mathematic and statistical ideas to process big data