Explain Data Discovery Flashcards by Eduardo de Miranda

Generally speaking, there are several discovery approaches. The simplest one is looking at

metadata conditions, like the permissions of a file, to see if it’s accessible to all. This is a simple check that can help companies identify over-exposed objects.

How well did you know this?

Not at all

Perfectly

Another discovery approaches is going into the data itself and is pattern-based, which

usually leverages regular expressions to find PII. You cannot use it to connect the data to its owner. You can however use it to see if a certain data source contains a certain type of data, and it produces significant value with very little effort.

How well did you know this?

Not at all

Perfectly

Going one level up, we can leverage NLP (natural language processing) to

identify names, addresses, phone numbers and other contextual data. It also cannot connect data to its owner but would be enough for some
legacy regulations.

How well did you know this?

Not at all

Perfectly

the highest level of data discovery, which is the most complex to implement but also the most
extensive, uses

smart value matching and machine learning to correlate entities with their data. This level is required for modern privacy
egulations like GDPR and CCPA and really the only way to fulfil use cases like
DSAR and breach response.

How well did you know this?

Not at all

Perfectly

On top of all discovery approaches, a company would greatly benefit from

a catalog/registry that shows all the results

from all those discovery levels in one place.

How well did you know this?

Not at all

Perfectly

BigID Discovery Types - Correlation:

find connected and associated data to an entity or person

How well did you know this?

Not at all

Perfectly

BigID Discovery Types - Classification:

locate specific format or type of data

How well did you know this?

Not at all

Perfectly

BigID Discovery Types - Clustering:

find duplicate and related data by topic

How well did you know this?

Not at all

Perfectly

BigID Discovery Types - Catalog:

metadata collection for fast PII catalog view

How well did you know this?

Not at all

Perfectly

BigID Discovery Methods - Reference Set (IDSoR)

Discovery Algorithm? Value Matching

When? Scan

Correlated? Yes

How well did you know this?

Not at all

Perfectly

BigID Discovery Methods - Enrichment

Discovery Algorithm? Proximity Analysis

When? Scan

Correlated? Yes

How well did you know this?

Not at all

Perfectly

BigID Discovery Methods - Data Classification

Discovery Algorithm? Pattern Matching

When? Scan

Correlated? No

How well did you know this?

Not at all

Perfectly

BigID Discovery Methods - Advanced Classification

Discovery Algorithm? Machine Learning (NLP)

When? Scan

Correlated? No

How well did you know this?

Not at all

Perfectly

BigID Discovery Methods - Document Classification

Discovery Algorithm? Machine Learning

When? Scan

Correlated? No

How well did you know this?

Not at all

Perfectly

BigID Discovery Methods - Subject Access Request

Discovery Algorithm? Index, Value Matching and Proximity

When? Report

Correlated? Yes

How well did you know this?

Not at all

Perfectly

There are different ways in which BigID could identify personal data. The default
method is

Study These Flashcards

using value matching and leveraging correlation. For each data source one
can also choose to enable enrichment and/or classification.

Machine learning is used in different areas of the platform:

Study These Flashcards

Correlation
Cleansing the information we find
Advanced Classification
Document classification

The value matching method requires

Study These Flashcards

the use of attributes from entity sources.

When correlation process revealsunknown personal data (i.e. “dark data”), the BigID ML automatically correlates this data to an entity based on

Study These Flashcards

parameters like uniqueness, proximity, frequency, etc, and then calculates the quality of the correlation using only metadata and not the private data itself.

BigID uses intelligent correlation algorithms utilizing entity sources to

Study These Flashcards

understand basic

identifiers, relationships, and distributions in other data stores.

confidence levels

are only calculated for

Study These Flashcards

structured data, not unstructured data.

For unstructured data, we do rely on what we discovered during the initial scan because with unstructured data we

Study These Flashcards

try not to sample

Value Matching Logic

Study These Flashcards

Break the data field into segments (a
segment starts at string start or a
delimiter, and ends at string end or a
delimiter).

A delimiter is a whitespace or a punctuation
character.

Segments of 4 characters or less are ignored.

Perform case-insensitive match of the
(as-is) entity field to each segment.

Enrichment is based on ____ , and is only applicable to ____.

Study These Flashcards

proximity - structured data.

Enrichment is an option that can be enabled for each data source. It enables BigID to

surface additional potential personal data findings that are highly identifiable in the proximity of attributes identified via value matching

The results from data classification are not correlated to ____, but they will still show as standalone findings in the ____ on the BigID UI.

entities - inventory

Data classification results can also be reviewed using

Scan Results Details report and in the Entity Correlation | page.

Data Subject Access Request is another method to identify and report more potential personal information beyond

the default reference set scan and Enrichment.

Discovery Flow

Configure Data Sources ➤ Configure Entity Sources ➤ Scan Entity Sources ➤ Scan Data Sources ➤ Correlate ➤ Scan Results ➤ Inventory ➤ Apps / Use Cases

``` Configure Entity Sources, set relevant data sources as entity sources, testing first (on first 1000 records) to get initial identifiability, select ```

entity attributes, override sure match, and set sampling.

Explain Data Discovery Flashcards

(30 cards)