Big Data Flashcards
Definition Big Data
extremely large data sets that may be analysed computationally to reveal patterns, trends, and associations, especially relating to human behaviour and interactions
Which characteristics has big data in relation to security, privacy and welfare concerns? (5)
1) Volume
2) Velocity (Geschwindigkeit)
3) Variety
5) Variability
6) Complexity
Explain Volume
-huge amounts of data from a wide range of sources (transactions, unstructured streaming from text, images, audio, voice, VoIP, video, TV and other media, sensor and machine-to data)
Explain Velocity
- some data is time-sensitive
- > speed is more important than volume
- needs to be stored, processed and analyzed quickly
Explain Variety
- data comes in in various formats:
- structured, numeric data, traditional database, unstructured text docs, email, video, audio, fin. transactions
Explain Variability
- data flows can vary greatly with periodic peaks and throughs
- related to social media trends, daily, seasonal and event-triggerde peak data loads and other factors
Explain complexity
-data comes from multiple sources which requires different strategies of data preparation as linking, cleansing, and transforming across different systems
Explain Collection/storing for the characteristic volume
- high volume -> higher attractiveness for cybercriminals
- amplified technical impact
- transparency principles might be violated
Explain Collection/storing for the characteristic velocity
- customer concerns over privacy are increasing because of behavioral advertising based on real-time profiling and tracking technologies such as cookies
- individual participation principle of FIPPs is violated (individual can´t give consent or deny data usage)
What is PII?
- personally, identifiable information
- can be used to distinguish or trace an individual´s identity (e.g. name, social security number, biometric records)
- highly personal
What are the FIPPs?
The Fair Information Practice Principles (FIPPs), are a set of eight principles regarding data usage, collection, and privacy. They were published by the Organization for Economic Cooperation and Development (OECD)
Explain Collection/storing for the characteristic variety
- unstructured data is more likely to conceal PII
- large variety makes it more difficult to detect security breaches, react and respond appropriately
Explain Collection/storing for the characteristic variability
- Organizations may lack capabilities to securely store huge amounts of data and manage the collected data during peak data traffic
- Attractiveness as a crime target increases during peak data traffic
Explain Collection/storing for the characteristic complexity
- prepared, complex data is often more personal than the data a person would consent to give
- Data collected from illicit sources is more likely to have information on technologically less savvy consumers
Explain sahring/accessibility by third parties and various user types for the characteristic volume
-firms may need to outsource data analysis to cloud-service-providers which may give rise to privacy and security issues
Explain sharing/accessibility by third parties and various user types for the characteristic velocity (fast data)
- increase in supply and demand of location-based real time personal information (stalking people in real-time)
- risk of hurting the personal living sphere and physical security risks
Explain sharing/accessibility by third parties and various user types for the characteristic variety
-Most organizations lack mechanisms to ensure that employees and third-parties have appropriate access to unstructured data and they are in compliance with data protection regulations.
Explain sharing/accessibility by third parties and various user types for the characteristic variability
-peak data traffic may cause higher needs to outsource to cloud-service-providers which may lead to security issues
Explain sharing/accessibility by third parties and various user types for the characteristic complexity
- data from different sources can be combined
- with that de-identified data can be re-identified
- > violation of FIP(P)s
Give a company as an example for the characteristic volume
- Amazon
- has a massive database with all the customer details, search history and purchase activities
- has its own cloud service AWS (Amazon Web Services) to handle data and offer services to other customers
Give a company as an example for the characteristic velocity
- starbucks
- uses geo-push to make highly targeted offers
- tracks users via GPS or cell towers
- geo-fences, virtual boundaries around starbucks shops
- crossing them leads to a specific action as sendig a coupon to your phone
Give a company as an example for the characteristic complexity
- starbucks
- combines geo data (geo-fences) with users’ purchase history to anticipate user desires and lure the potential customer into the stores