Big Data Flashcards
Because of big data…
More data has been created in the last 2 yeas than in the entire history of human race
A typical fortune 1000 company…
Just a 10% increase in data accessibility will result in more than $65 million additional net income (marr)
John Massey
Credited with creation of term big data in 1998 - but was mentions before then
Big data is
Generally used to describe data sets that are so big we can’t analyse them with conventional methods (excel)
However ‘conventional methods’ is changing all time
3 V’s of big data (Gartner)
Volume - scale of data (US had 100,000 gigabytes of data stored)
Velocity - analysis of streaming data (moderns car have 100 sensors to monitor things)
Variety - different forms of data (30 billion pieces of content shared on Facebook every month)
Debate of 4th V
IBM introduced ‘veracity’ which described he uncertainty of data - how accurate is data
However original creator was not happy and even tweeted about how he disagreed
He said he can’t be applied to traditional data sets
Similar Arguments ‘value’
Under armour
Spent $170 million buying fitness apps (like my fitness pal) and more than 120 million athletes data
CEO explained that this strategy would put them directly into oath of where big data is heading - wearable technology that goes beyond watched
In future will be in clothing - accurately record movement, performance, route and location
Data has been transformative
Changed marketing landscape
Innovations in key areas: data collection, data storage, data analysis and data communication
Objects generating big data - referred to as internet of things (IOT)
Smartphones Virtual assistant (Alexa) Computers Cars Wearables In home (lightbulbs, hive heating) Smart mirrors (Unilever) Sensors for infrastructure
Tesla example
Interested in big data and AI
Tesla crowdsource sits data from their vehicles and store it in cloud
Data is used to: diagnose and patch problems with car ops, generate data through maps, showing speed of traffic and hazards as well as autopilot Vehicles
The cars upload info to ‘fleet’
Data collected by vechiles by 2030 could be worth $750 billion (McKinsey)
Innovation can happen in the way data is collected
Object type - through all the different types of object used to collect data
Data type - through different types of data collected, blogs, comments, loyalty cards, gps location
Data storage
Innovation needs to happen to store big data and cope with the volumes, whilst also be scalable and provide input/output ops per second
Companies which deal with largest amount of data run what is known as hyper scale computing environments (google)
Google’s data storage
Server farms
Google are energy efficient and environmentally friendly using 50% less energy than a typical server, using outside air to cool servers, sharing performance data to move industry forward
Data analysis
How you do this depends on if you are structured or unstructured
Structured - stored in a nice organised way, easily searchable
Unstructured - machine or human generated data which is not organised or easily searchable
Popular programmes for analysis include : python, java, matlab
Algorithms for data analysis
Supervised learning: classification
Unsupervised learning: clustering
Semi supervised learning: mix of 2 above
Reinforcement learning: machine continually trains itself using trail and error