Big Data Flashcards
Big Data
An exponential growth in the amount of data between 2009-2020; the sheer growth is what makes big data big.
Your data
Social medias will scrape everything you type or read to formulate ads. Anything you sign up or use for free will use or sell data
What is big data?
All encompassing term including data, data frameworks and the tools and techniques used to process and analyse data
5 ‘V’s of Big Data
Volume, Velocity, Variety, Veracity and Value
Volume
Data at Rest: huge volumes of data generated from various sources such as social media, machines, networks and interactions
Velocity
Data in Motion: refers to the speed at which data is being created in real time
Variety
Data in Many Forms: data is gathered from multiple forms: pdfs, emails, video, social media posts, location etc.
Unstructured sources of data pose issues for data analysis and storage
Veracity
Data in Doubt: uncertainty due to inconsistency and incompleteness, ambiguities, latency, deception and approximations
Value
Data in Worth: value of the data to a business in terms of its ability to generate profit
Structured Data
Guarantees that every entry of data has the same format e.g spreadsheet/csv columns
Unstructured Data
Search results; website links, images, videos - all in different formats and structures
Semi-structured data
XMR document: a combination of both structured and unstructured data. Structured in form, but with less constraints than structured data
Data Ubiquity
Automatic data capture, opening up of existing data, simulations, approximations, synthetic data, exponential growth in storage, increase in bandwidth, faster algorithms
Reason exponential growth of data
Examples of Big Data
NYSE generates one terabyte of new trade data per day
Facebook generates 4 petabytes of data per day
Single jet engine can generate 10 terabytes of data per flight
Walmart processes 40 petabytes of data per day
Data-Information-Knowledge-Wisdom Model
- Raw data (red)
- Meaning of data (Traffic light has turned red)
- Context of data (The traffic light I’m driving towards has turned red)
- Data is then applied (Stop the car)