Chapter 13 Flashcards
IOT and Big Data Facts and Background
Big Data is a term used to describe the nearly ubiquitous collection of data about individuals from multitudinous sources, coupled with the low costs to store such data and the new data mining techniques used to draw connections and make predictions based on this collected information.
IOT: The data analyzed by analytics programs, algorithms, machine learning, and other data mining techniques—the underpinnings of the term Big Data—are often gathered by devices collectively known as IoT. The next evolution of interaction with computer devices combines sensors almost anywhere with connection to the Internet.
The number of sensors connected to the Internet is now counted in the tens of billions.
By 2025, it is estimated that amount of data will double every 12 hours.
Big Data is characterized by the “three Vs”: velocity (how fast the data is coming in), volume (the amount of data coming in), and variety (what different forms of data are being analyzed).
Microsoft’s CEO Satya Nadella proposed a list of AI design principles, including several that focus on privacy issues: “AI must be designed to help humanity, AI must be designed for intelligent privacy, AI must be transparent, and AI needs algorithmic accountability so humans can undo unintended harm.
Friends and Family Test
would the managers feel comfortable if data on themselves and their family and friends were in the database, subject to possible breach? For instance, would managers at the bank feel comfortable with their own family’s data going into the ACF database? If not, that is a reason to take greater precautions from cybersecurity perspective.
Means of preventing Big Data Breach
Data minimization
Segmentation
De-identification
Collection, purpose and use limitations (FIPPS)
Access controls
Direct and indirect identifiers
Direct identifiers = data that identify an individual with little or no additional effort.
Examples: address, phone number
Indirect identifiers = data such as age or gender that can increase the likelihood of identifying an individual.
De-id terms: pseud, de-id, anon
- Pseudonymous data: Information from which the direct identifiers have been eliminated. Indirect identifiers remain intact.
- De-identified data: Direct and known indirect identifiers have been removed.
- Anonymous data: Direct and indirect identifiers have been removed or technically manipulated to prevent re-identification.
These categories do not result from a single method or from reducing the identifiability of data. Instead, reduction of the risk of re-identification results from a collection of techniques that can be applied to different kinds of data with differing levels of effectiveness.
Blurring
• Blurring. This technique reduces the precision of disclosed data to reduce the certainty of individual identification. For example, date of birth is highly identifying (because a small portion of people are born on a particular day of a particular year), but year of birth is less identifying. Similarly, a broader set of years (such as 1971-1980, or 1981-1990) is less identifying than year of birth.
Masking
Masking. This technique masks the original values in a data set with the goal of data privacy protection. One way this may be accomplished is to use perturbation—make small changes to the data while maintaining overall averages—to make it more difficult to identify individuals.
Differential Privacy
• Differential Privacy. This technique uses a mathematical approach to ensure that the risk to an individual’s privacy is not substantially increased as a result of being part of the database.29
FTC Characterization of a Data Broker in 2014
The FTC characterized the data broker industry as: collecting consumer data from numerous sources, usually without consumers’ knowledge or consent; storing billions of data elements on nearly every U.S. consumer; analyzing data about consumers to draw inferences about them; and combining online and offline data to market to consumers online.
FTC broad categories of products offered by data brokers - 2014
(1) marketing (such as appending data to customer information that a marketing company already has),
(2) risk mitigation (such as information that may reduce the risk of fraud) and
(3) location of individuals (such as identifying an individual from partial information).
For each of these segments of the industry, the FTC suggested that data brokers engage in data minimization practices, review collection practices carefully as they relate to children and teens, and take reasonable precautions to ensure that downstream users did not use the data for discriminatory or criminal purposes.
FTC Report on Big Data (2016)
The agency expressed its understanding that Big Data brought with it significant benefits coupled with significant risks. Examples of the benefits identified included providing healthcare tailored to individual patients, enhancing educational opportunities by tailoring the experience to the individual student, and increasing equal access to employment. Examples of the risks included: exposing sensitive information; reinforcing existing disparities; and creating new justifications for exclusion.
IOT Background
In 2016, estimates for the number of IoT devices in use topped 15 billion worldwide,
with spending on these devices approaching $1 trillion globally.3
By 2020, the number of wearable device shipments is estimated to be more than 200 million.
much of IoT—such as temperature, traffic statistics, and sensors around industrial production—often does not implicate PII.
IOT devices share 2 characteristics that are important for privacy and security discussions
(1) the devices interact with software running elsewhere (often in the cloud) and function autonomously and
(2) when coupled with data analysis, the devices may take proactive steps and make decisions about or suggest next steps for users.
Concerns regarding privacy and cybersecurity wrt IOT devices stem from
(1) limited user interfaces in the products;
(2) lack of industry experience with privacy and cybersecurity;
(3) lack of incentives in the industries to deploy updates after products are purchased; and
(4) limitations of the devices themselves, such as lack of effective hardware security measures.
Wearables - Issues
Most of this information is not protected by HIPAA, because HIPAA applies only to the activities of covered entities such as providers and health insurance plans
Challenges include:
- Right to forget - hard to remember to delete
- impact of location disclosure - stalking
- Screens read by tose nearby
- video/audio recording without knowledge - e.g. google glasses.
- lack of control of data - how will it be used?
- Automatic syncing with social media - without controls
- Facial recognition.