Module 2 Flashcards
Bid data…
-the sources of digital data used for analytics are growing _______
-big data is an _____ of data
-this is the ______ revolution
-exponentially
-explosion
-information
bit
byte is ____
kilobyte ____
megabyte ___
gigabyte ____
terabyte____
petabyte ___
8 bits
1,000 bytes
>1mill bytes
> 1 bill bytes
> q trill bytes
> 1000 terabytes
The scale of “big” in big data:
big data operated data larger than _____
Gigabyte
From a statistical perspective, big data means:
- large number of observation (N)
- large number of variables (K)
examples of big data:
- the internet
-browsing pages
-social media - large, digital corps
- mountains of transactional data
-who bought it and why - from paper to electronics
-corporate filings. public records - the future
-sensors
big data is so large and complex that its difficult to process using on-hand data base management tools or traditional data processing applications. What needs to change?
- the mindset
- the technology
-the analysis
Data mining is more __________ than other forms of analytics
- exploratory
-letting the computers lead the way to looks for interesting findings
-can be used for raw, unstructured data
Big data and society. Three roles:
- role of individual
-corporation
-government
five principles of data analytics
- ownership: an individual has ownership over their personal information
- Transparency: in addition to owning their personal info, data subjects have a right to know how you plan to collect, store, and use it
- Privacy: Another ethical responsibility that comes with handling data is ensuring data subjects privacy
- intention: intentions matter. why you need info, what you’ll gain from it, and what changes you’ll be able to make after analysis
- outcomes: even when intentions are good, the outcomes of data analysis can cause inadvertent and unintentional harm to individuals or groups
5 V’s of big data
- volume: size and amounts of big data
-veracity: the accuracy
-velocity: the speed - value: (most important) pattern recognition
-variety: diversity and range including unstructured, semi and raw data