Data Classification Flashcards
Name 4 Terms and Roles of Data Ownership
- Data Owner - Data Controller
- Data Custodian
- Data Stewards
- Data Processor
Explain Data Controller:
The data owner is the organization that has collected or created the data, in general terms.
Within the organization, we often assign a specific data owner as being the individual with rights and responsibilities for that data; this is usually the department head or business unit manager for the office that has created or collected a certain dataset.
From a cloud perspective, the cloud customer is usually the data owner. Many international treaties and frameworks refer to the data owner as the data controller.
Data Owners: What is the essential point about rights and responsibilities?
Data owners remain legally responsible for all data they own. This is true even if data is compromised by a data processor several times removed from the data owner.
What is a Data Custodian?
Data-Ownership #Data-Custodian
The data custodian is any person or entity that is tasked with the daily maintenance and administration of the data. The custodian also has the role of applying the proper security controls and processes as directed by the data owner.
Within an organization, the custodian might be a database administrator.
Explain Data Stewards.
Data stewards are tasked with ensuring that the data’s context and meaning are understood, and they use that knowledge to make certain the data they are responsible for is used properly.
Explain Data Processors.
The Data Processor is any organization or person who manipulates, stores, or moves the data on behalf of the data-owner.
Processing is anything that can be done to data:
- copying it
- printing it
- destroying it
- utilizing it
From an international law perspective, the cloud provider is a data processor.
What could count as data processing?
Processing is anything that can be done to data:
- copying it
- printing it
- destroying it
- utilizing it
Data Processors: What is a essential point to remember about the rights and responsibilities of data ownership and custody?
Data processors do not necessarily all have direct relationships with data owners; processors can be third parties, or even further removed down the supply chain.
Data Categorization: Categorization is commonly driven by what kind of 3 factors?
- Regulatory Compliance
- Business Function
- By Project
Data Categorization: Explain the impact of regulatory compliance.
Different business activities are governed by different regulations.
The organization may want to create categories based on which regulations or requirements apply to a specific dataset.
This might include the:
- Graham-Leach-Bliley Act (GLBA)
- Payment Card Industry Data Security Standard (PCI DSS)
- Sarbanes-Oxley (SOX)
- Health Insurance Portability and - Accountability Act (HIPAA)
- EU’s General Data Protection Regulation (GDPR)
- or other international, national, state, or local requirements
Data Categorization: Explain the impact of business function.
The organization might want to have specific categories for different uses of data. Perhaps the data is tagged based on its use in billing, marketing, or operations; by types ofcustomers; or by some other functional requirement or descriptor.
Data Categorization: Explain the impact of By Project.
Some organization might define datasets by the projects they are associated with as means of creating discrete, compartmentalized projects.
What is Data Classification?
Much like data categorization, data classification is the responsibility of the data owner and is assigned according to an overall organizational policy based on specific characteristic of a given dataset.
The classification, like the categorization, can take any form defined by the organization and should be uniformly applied.
Data Classification: Name three types of classifications?
- Sensitivity
- Jurisdiction
- ## Criticality
Data Classification: Explain Sensitivity.
Sensitivity
This is the classification model used by the U.S. military. Data is assigned a classification according to its sensitivity, based on the negative impact an unauthorized disclosure would cause. In models of this kind, classification must be assigned to all data, even in the negative, so material that is not deemed to be sensitive must be assigned the “unclassified” label.
We will discuss labeling shortly.
Data Classification: Explain Jurisdiction.
The geophysical location of the source or storage point of the data might have significant bearing on how that data is treated and handled.
For instance, personally identifiable information (PII) data gathered from citizens of the European Union (EU) is subject to the EU privacy laws, which are much stricter and more comprehensive than privacy laws in the United States.
Data Classification: Explain Criticality.
Criticality Data that is deemed critical to organizational survival might be classified in a manner distinct from trivial, basic operational data.
As we know from the previous chapter, the Business Impact Analysis (BIA) helps us determine which material would be classified this way.
What is Data Mapping?
Data that is shared organizations (or sometimes even between departments) must be normalized and translated so that it conforms in a way that is meaningful to both parties. This is typically referred to as data mapping.
When used in the context of classification efforts, mapping is necessary so that data that is known as sensitive (and in need of protection) in one system/organization is recognized as such by the receiving system/organization so that those protections can continue.
Without proper mapping efforts, data is classified at a specific level might be exposed to undue risk or threats.
Do privacy based regulations now require data mapping?
Yes!
An increasing number of privacy-based regulations now require data mapping. That means you may be legally required to identify data like **Personally identifiable information (PII) **to meet compliance requirements.
Examples of laws that include data mapping requirements include the European Union’s General Data Protection Regulation (GDPR) as well as California’s Consumer Privacy Act of 2018.
What is Data Labeling?
When the data owner creates, categorizes, and classifies the data, the data also needs to be labeled.
The label should indicate who the data owner is, usually in terms of the office or role instead of an individual name or identity (because, of course, personnel can change roles within an organization or leave for other organizations).
The label should take whatever form is necessary for it to be enduring, understandable, and consistent; for instance, while labels on data in hard copy might be printed headers and footers, labels on electronic files might be embedded in the filename or added as metadata.
What kinds of information could labels include?
- Data of creation
- Date of scheduled destruction/disposal
- Confidentiality level
- Handling directions
- Dissemination/distribution instructions
- Access limitations
- Source
- Jurisdiction
- Applicable regulation
Explain Data Flow
In this simplified model, an account is created and data is sent to an analytics platform where everything but the password is used to conduct data analysis and reporting for the organization.
At the same time, key user information is sent to an account renewal system that sends notifications when subscriptions are expiring. If the account is not renewed, the organization will use the canceled account process that removes some data from the active accounts list but is likely to retain things like the UserID and other data in case the subscriber chooses to resubscribe in the future.
What are Data Discovery Methods?
Data discovery is a term that can be used to refer to several kinds of tasks: it might mean that the organization is attempting to create that initial inventory of data its owns or that the organization is involved in electronic discovery (e-discovery), the legal term for how electronic evidence is collected as part of an investigation or lawsuit; and it can also mean the modern use of data mining tools to discover trends and relations in the data already in the organization’s inventory.
Name 3 Data Discovery Methods?
- Label-Based Discovery
- Metadata-Based Discovery
- Content-Based Discovery
What is Label-Based Discovery?
Obviously, the labels created by data owners will greatly aid any data discovery effort. With accurate and sufficient labels, the organization can readily determine what data it controls and what amounts of each kind.
This is another reason the habit and process of labeling is so important.
Labels can be especially useful when the discovery effort is undertaken in response to a mandate with a specific purpose, such as a court order or a regulatory demand: if all data related to X is required, and all such data is readily labeled, it is easy to select and disclose all the appropriate data, and only the appropriate data.
What is Metadata-Based Discovery?
Colloquially referred to as “data about data,” metadata is a listing of traits and characteristics about specific data elements or sets.
Metadata is often automatically created at the same time as the data, often by the hardware or software used to create the parent data.
Data discovery can therefore use metadata in the same way labels might be used; specific fields of the metadata might be scanned for particular terms and all matching data elements collected for a certain purpose.
Note
Labels are often a type of metadata, so it is important to remember that these discovery methods may overlap.
What is Content-Based Discovery?
Even without labels or metadata, discovery tools can be used to locate and identify specific kinds of data by delving into the content of datasets. This technique can be as basic as term searches or can use sophisticated pattern-matching technology.
What kind of Data exist?
- Structured Data
- Unstructured Data
- Sem-Structured Data
What is Structured Data?
Data that is sorted according to meaningful, discrete types, and attributes, such as data in a relational database, is said to be structured data.
What is Unstructured Data?
Unstructured Data
Unsorted data (such as the content of various emails in a user’s Sent folder, which could include discussions of any topic or contain all types of content) is considered unstructured data.
It is typically much easier to perform data discovery actions on structured data because that data is already situated and arranged.
What is Semi-Structured Data?
Semi-structured data uses tags or other elements to create fields and records within data without requiring the rigid structure that structured data relies on.
Examples of semi-structured data include XML (extensible markup language) and JSON (JavaScript object notation), both of which provide flexible structures that still allow data descriptions.
Data Discovery: What kind of challenge does data location cause in the discovery process?
-** Laws and regulations** may limit the types or methods of discovery you can engage in, or what you can do with the data, as well as where and how you can store it.
-
technical hurdles
Data location can also create technical hurdles to discovery. If data is stored in unstructured form, or in a service that handles data in ways that make it challenging to conduct discovery, you may have to design around those constraints.
bearing costs
Location can also have a bearing on costs because cloud ingress and egress costs can vary greatly, potentially impacting both where you process data and whether you transfer it or process it in place.
It will be far easier to conduct some types of discovery action
-
unstructured data
Unstructured data with data embedded inside of it, like freeform text from customers, can require far more complex queries, which is more likely to result in missed data.
What are common types of data analytics methods?
- Data Mining
- Real Time
- Business Intelligence
What is Data Mining?
Data Mining -
The term for the family of activities from which the other options on this list derive. This kind of data analysis is an outgrowth of the possibilities offered by regular use of the cloud, also known as “big data.”
When the organization has collected various data streams and can run queries across these various feeds, it can detect and analyze previously unknown trends and pattern that can be extremely useful.
What is Real-Time Data Analytics?
**Real-Time **-
Analytics in some cases, tools can provide data mining functionality concurrently with data creation and use. These tools rely on automation and require efficiency to perform properly.
Explain Business Intelligence?
Business Intelligence -
State-of-the-art data mining involves recursive, iterative tools and processes that can detect trends in trends and identify even more oblique patterns in both historical and recent data.