Week 1 Flashcards
What are the three Vs big data can be explained by ?
Volume - this is the sheer size of the dataset
Velocity - the speed of data processing
Variety - the different types of the data set
What factors take part in increasing the volume of data ?
Transaction-based data stored in relational databases* for years make a part of the volume
Unstructured data that is being streamed from social media also plays a role
Sensory and machine-to-machine generated data is increasing with time
Storage was an issue in the past, however, the costs of decreased
Velocity: examples of high speed data
Radio-Frequency identification (RFID) tags sensors* and smart metering spell out large data within a short period, reacting fast enough to have velocity data is one of the challenges.
The speed of data could be inconsistent and can have peaks, this is especially true in social media when something trends.
Daily, seasonal and event-triggered can peak and data loads can be difficult to manage, especially when there is unstructured data involved.
Variety: Different types of data ?
Structural data - is traditional relational databases and file systems
Unstructured data - Text documents, email, video, audio log files etc
It comes from various sources, the challenge comes in managing, merging and governing different varieties of data
What is data mining
Data mining is the process of using large data sets to be able to identify patterns and trends. This can be used to gain a better insight into customer behaviour and this can be then used to then drive down costs and therefore increase revenue.
What use does data mining have in business ?
Data mining could transform business in the future. Businesses could use the data to analyse the buying patterns of customers, investigate any anomalies that were not predicted, and forecast future possibilities. Through data mining, they could use it for more direct marking campaigns.
How do auditors use data mining ?
have been using data mining techniques to analyse large sets of data rather than the traditional sampling techniques used to gain assurance over large balances.
How would management accountants use data mining ?
They may be required to do forecasting and the ability to analyse both financial and non-financial data can help to improve the understanding of cost drivers.
Tax data analytics
An example of how data analytics might be used is the capability to predict the potential tax consequences of potential M&A
Benefits of big data
the sheer amount of data that can be collected means that sampling errors/bias can be avoided, as you are reviewing all data.
Quality of data might be less important if analysing larger data sets, however, you need to consider how reliable the data is and how that can impact the quality of the conclusions.
Issues with big data
Are we drawing the wrong conclusion from the patterns? An example of this is where investors tried to deduct sales tends at Walmart from satellite photos of the car parks found that many motorists were visiting rival stores.
It is important for you to distinguish between correlation and causation. A famous example shows that there is a correlation between ice cream sales and crime.