Chapter 13 - Data Analysis Flashcards
What is Data?
Consists of numbers, letters, symbols, raw facts, events and transactions which have been recorded but not yet processed into a form which is suitable for use by management
What is information?
Data which have been processed in such a way that is meaningful to the person who receives it
Why is information useful to management?
- Helps planning
- Helps making decisions
- Helps controlling day-to-day operations, for example by comparing actual results with those planned
What are the four types of data?
- Quantitative data = numerical data that provides measurements or quantities. Expressed as numbers for e.g. number of KG needed to make a unit of product
- Qualitative data = Cannot be expressed as numbers or values and it is much harder to analyse
- Discrete Data = Non-continuous data can take on any value (within a range) for e.g. time or distance
What are internal sources of data
- Accounting records
- HR/payroll records
- Machine logs/computer systems
- Procurement data systems
- Timesheets
- Communication to/from staff
What are the sources of internal information?
Formally gathered
- Market research e.g. new trends, customer tastes, competitor products
- Research and development
- Tax and accounting specialists
- Legal specialists
Informally gathered
- Any information gathered on an ongoing basis e.g. newspapers, internet, meetings with external business colleagues
What is the internet of things IoT
internet connected devices continually collect and exchange data
Using the mnemonic ACCURATE - What are qualities of good information?
A - Accurate e.g. no typos, roundings, categorised, assumptions
C - Complete e.g. all information provided with purpose
C - Cost-beneficial e.g. benefit > cost of producing information
U - User-targeted e.g. understandable and useful to recipient
R - Relevant for purpose intended
A - Authoritative e.g. genuine, highest quality for purpose, source should be known and reliable
T - Timely e.g. produced in advanced of when needed
E - Easy to use e.g. clear, concise, constructive, communicated appropriately
What is Data analysis?
- Identify the information needs
- Collect the data
- Analyse the data
- Present the information
- Use the information
What are ways in which data can be analysed?
- Inferential statistics e.g. draw conclusions about a set of data taken from a population to describe and make inferences about the population
- Exploratory data e.g. when pattern is identified in types of data. This type of analysis may use regression and correlation analysis.
- Confirmatory data analysis - confirms (or not) a hypothesis using statistical methods. For example a price increase of 3% will reduce demand by 5%
- Sample e.g. a group of items drawn from a population. The population may consist of items such as metal bars, invoices, packets of tea
What is sampling?
Collecting a sample by selecting a unit e.g. people, organisations) then using the information to generalise to the wider population
What are the three main reasons why sampling is necessary
- Whole population may not be known
- Even if the population is known the process of testing every item can be extremely costly in time and money e.g. gaining information about the popularity of TV programmes by interviewing every viewer
- Items being tested may be completely destroyed in the process, for e.g. in order to check the lifetime of an electric light bulb it is necessary to leave the bulb burning until it breaks and is of no further useW
What are the rules involved with sampling?
Sample must be chosen in such a way that is representative of the population
Sample must be of certain type. In general large the sample, the more reliable the results will be
What are the four types of sampling?
- Random
- Systematic
- Surveys
- Stratified
What are spreadsheets?
- Computer package used to manipulate data
What is the SUM function used for?
Totals the values in the list
What are the AVERAGE function used for?
Average of the values in the list
What are the MAX function used for?
Highest values in the list
What are the MIN values used for
Lowest values in the list
What are the disadvantages of using spreadsheets?
- Can be time consuming
- Not able to identify data input errors or prevent accidental deletion so training of staff is important
- Sharing violations among users wishing to view or change data at the same time
- Difficult to identify an error in the design of the spreadsheet as some formula are very complicated.
- Spreadsheets are open to cyber-attack through viruses, hackers and general system failure
- Spreadsheets are restricted to a finite number of records and they may not be a true reflection of the ‘real’ world.
What are the two problems with data?
- Comparability: is it possible to compare data from different sources?
- Data bias: When a sample is chosen does it truly represent the population
What are the 7 types of bias
- Selection bias
- Self-selection bias
- Observer bias
- Omitted variable bias
- Cognitive bias
- Confirmation bias
- Survivorship bias
What is selection bias?
When selecting a sample all items in a population should have the same chance of being picked - true random sampling
If data is not random then selection bias can occur and sample may not be representative
What is self-selection bias?
When an individual selects whether or not to include themselves as part of a sample
What is observer bias
When assumptions of a researcher can, unintentionally influence observations
What is omitted variable bias?
If variable is left out when data is being analysed that could affect the analysis. Such as age or gender when analysing shopping habits
What is cognitive bias
How data is perceived can influence the understanding of the results and lead someone to misinterpret the information
What is confirmation bias?
Confirmation bias can occur when information is processed that favours previously existing beliefs. It can lead to inconsistent information being ignored
What is survivorship bias?
If a sample only contains items that have survived a previous event survivorship bias can occur.
The act of focussing on successful people, businesses or strategies and ignoring those that have failed
What is a hypothesis testing?
Where data is used to confirm if an idea or hypothesis is true.
What is null hypothesis
Type of hypothesis used in statistics that proposes there is no difference between certain characteristics of a population
What is statistical significance
Where the results are deemed to have occurred due to a specific cause rather than by chance
What is a Type I error
False positive error occurs when a null hypothesis is rejected even if it’s true and should not be rejected
What is a type II error
False or negative error occurs when a null hypothesis is false, but it’s accepted
What is data visualisation
Use of charts and diagrams to present information
What are forms of data visualisations?
- Bar charts
- Pie charts
- Line graph (time)
What is a big data?
Datasets with sizes beyond the ability of typical database software to capture, store, manage, and analyse
What are the key features of Big data [FOUR V
- Volume: Considers the amount of data fed into the organisation
- Variety: Considers the various of formats of data received
- Velocity: Considers the speed that data is fed into the organisation
- Veracity: Considers the reliability of the data being received
What relevance does volume have on big data
- Does the organisation have resources to store and manage data?
- Does it have the financial resources required to invest in or upgrade IT/IS
What relevance does variety have on big data?
- Are systems compatible and capable of accepting various forms of data?
- Legally is the data owned by the organisation or by the third partyW
What relevance does Velocity have on big data?
- Are systems able to capture and process ‘real time’ data
- Does the organisation have the skills to provide timely analysis of this data?
What relevance does Veracity have on big data?
- Can the organisation challenge data received from third party
- Is the data received fully representative of the whole data population
What is the importance of big data?
- Potential to achieve competitive advantage
- Huge array of new data sources: Social media, Internet of things
- Exponential growth in computing power and storage capacity
- New avenues of knowledge certain such as crowd sourcing and open source softwareW
What is data science?
Collecting, preparing, managing, analysing, interpreting and visualising large and complex datasets
What is data analytics?
- Value extracted from big data by data scientists through the process of data analytics
- Source data is analysed to turn it into information that is useful to the business
What are the benefits of big data, data science and data analytics
- Decision making: real time analysed information allows managers to make better decisions
- Customer analysis: Market segmentation and customisation can occur from having a greater insight into customer needs
- Innovation: Analysed big data can reveal completely new ideas and lead to innovation
- Risk management: Big data can assist with the identification, quantification and management of risk
What are the risks of big data, data science and data analytics
- Storage e.g. systems must be reviewed and upgraded to cope with the data and processing required
- Skills e.g. data scientists and analysts are in short supply making it difficult for organisations to recruit and retain the right staff
- Data dependency e.g. data led decisions lead to significant risk should the data be weak, erroneous or corrupted
- Overload e.g. too much information and analysis can make businesses lose sight of the key data and also slow down decision making and responsiveness
- Data privacy e.g. there is a risk that data privacy legislation could be breached
- Data security e.g. protection needs to be put in place to protect any data from cyber security risks