Data Flashcards
What are data outliers?
Observations that are abnormal and can significantly distort the results
Can be removed from the data set
BUT must be a clear and valid reason, or may be a risk of data manipulation
What is big data?
Data sets so large and varied they are beyond the capability of traditional data-processing
Obtained in addition to the traditional management information data
Provide a deeper understanding of customer’s needs
The 4Vs of big data?
Volume: scale and amount of information
Velocity: timeliness of the data
Variety: formats, including structured and unstructured data
Veracity: reliability of information, keeping it clean and free from bias, interpretation
The purposes of big data in budgeting?
Identify trends and other correlations
Improve forecasting
Improve overall profitability
What is data analytics and data mining?
Analytics: process if collecting, organising and analysing data to generate trends and aid decision making
Data mining: sorting through data to identify patterns and relationships, using algorithms
Structured v unstructured data?
Structured: data contained within a field in a data record or file
Unstructured: data not easily contained within structured data fields
Ads of big data?
Substantial amount of info can be processed
Different sources
Accurate model of future demand
Understand customer’s preferences
Short term and long term decisions
Provides real time information
Dis of big data?
Company needs to be seen as trustworthy
Lack of forecasting tools
Infringement of privacy
Security required to hold information
Incorrect data
Lack of skilled data analysis
AI versus Machine Leaning?
AI - use of computers to do tasks which are thought to require human intelligence
ML - field in AI whereby computers and learn and do things rather than follow pre-programmed rules
The 7 types of data bias?
Selection - sample size doesn’t represent population
Self-selection - individuals select themselves
Observer - researcher allows assumptions to influence the observation
Omitted variable - key data not included
Cognitive bias - presentation of data is misleading
Confirmation - people see data that confirms their beliefs
Survivorship - sample only contains items that survived a previous event