BIG DATA Flashcards
TYPES OF DATA
The 4 V’s
Volume, Velocity, Variety, Veractiy
Volume
A large amount of data that increasingly requires more storage space
Velocity
An amount of data that are growing exponentially fast
Variety
Data that are generated in different formats
Veracity
Data are generated by the public rather than employees; therefore, it has varying levels of accuracy
Where does Data originate from
Data originates from sensors and anything that has been scanned, entered, and released to the internet
Collected Data can be:
categorized as structured or unstructured
Structured Data
are created by applications that use fixed format input such as spreadsheets. May need to be manipulated into a common format such as CSV
Unstructured data
are generated in a freeform style such as audio, video, web pages, and tweets
Huge Data
Each day we create ____ bytes of data
2.5 quintillion
To calculate the size of database (see example in answer )
assume 1000 bytes/transaction. 1000 bytes/transaction * 30 billion transaction/quarter * 4 quarters/year * 10 years = 1200 petabytes
Big Data Storage (5 major storage problems with big data)
Management: Cloud or on premise, Security: Ensuring good security policies are in place and followed, Redundancy: Need good backups, Scale: Data storage needs may change at any time, Access: Data needs to be easy to access with a friendly user interface
Big Data Storage (Benefits of Big Data)
Analyzing a large amount of data for data-driven decisions, Businesses can utilize other big data warehouses for decision making, Improved customer service, Increasing operational efficiencies of manufacturing, products, and services