Explain Data Discovery Flashcards

1
Q

Generally speaking, there are several discovery approaches. The simplest one is looking at

A

metadata conditions, like the permissions of a file, to see if it’s accessible to all. This is a simple check that can help companies identify over-exposed objects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Another discovery approaches is going into the data itself and is pattern-based, which

A

usually leverages regular expressions to find PII. You cannot use it to connect the data to its owner. You can however use it to see if a certain data source contains a certain type of data, and it produces significant value with very little effort.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Going one level up, we can leverage NLP (natural language processing) to

A

identify names, addresses, phone numbers and other contextual data. It also cannot connect data to its owner but would be enough for some
legacy regulations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

the highest level of data discovery, which is the most complex to implement but also the most
extensive, uses

A

smart value matching and machine learning to correlate entities with their data. This level is required for modern privacy
egulations like GDPR and CCPA and really the only way to fulfil use cases like
DSAR and breach response.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

On top of all discovery approaches, a company would greatly benefit from

A

a catalog/registry that shows all the results

from all those discovery levels in one place.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

BigID Discovery Types - Correlation:

A

find connected and associated data to an entity or person

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

BigID Discovery Types - Classification:

A

locate specific format or type of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

BigID Discovery Types - Clustering:

A

find duplicate and related data by topic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

BigID Discovery Types - Catalog:

A

metadata collection for fast PII catalog view

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

BigID Discovery Methods - Reference Set (IDSoR)

A

Discovery Algorithm? Value Matching

When? Scan

Correlated? Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

BigID Discovery Methods - Enrichment

A

Discovery Algorithm? Proximity Analysis

When? Scan

Correlated? Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

BigID Discovery Methods - Data Classification

A

Discovery Algorithm? Pattern Matching

When? Scan

Correlated? No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

BigID Discovery Methods - Advanced Classification

A

Discovery Algorithm? Machine Learning (NLP)

When? Scan

Correlated? No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

BigID Discovery Methods - Document Classification

A

Discovery Algorithm? Machine Learning

When? Scan

Correlated? No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

BigID Discovery Methods - Subject Access Request

A

Discovery Algorithm? Index, Value Matching and Proximity

When? Report

Correlated? Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

There are different ways in which BigID could identify personal data. The default
method is

A

using value matching and leveraging correlation. For each data source one
can also choose to enable enrichment and/or classification.

17
Q

Machine learning is used in different areas of the platform:

A
  • Correlation
  • Cleansing the information we find
  • Advanced Classification
  • Document classification
18
Q

The value matching method requires

A

the use of attributes from entity sources.

19
Q

When correlation process revealsunknown personal data (i.e. “dark data”), the BigID ML automatically correlates this data to an entity based on

A

parameters like uniqueness, proximity, frequency, etc, and then calculates the quality of the correlation using only metadata and not the private data itself.

20
Q

BigID uses intelligent correlation algorithms utilizing entity sources to

A

understand basic

identifiers, relationships, and distributions in other data stores.

21
Q

confidence levels

are only calculated for

A

structured data, not unstructured data.

22
Q

For unstructured data, we do rely on what we discovered during the initial scan because with unstructured data we

A

try not to sample

23
Q

Value Matching Logic

A

Break the data field into segments (a
segment starts at string start or a
delimiter, and ends at string end or a
delimiter).

A delimiter is a whitespace or a punctuation
character.

Segments of 4 characters or less are ignored.

Perform case-insensitive match of the
(as-is) entity field to each segment.

24
Q

Enrichment is based on ____ , and is only applicable to ____.

A

proximity - structured data.

25
Q

Enrichment is an option that can be enabled for each data source. It enables BigID to

A

surface additional potential personal data findings that are highly identifiable in the proximity of attributes identified via value matching

26
Q

The results from data classification are not correlated to ____, but they will still
show as standalone findings in the ____ on the BigID UI.

A

entities - inventory

27
Q

Data classification results can also be reviewed using

A

Scan Results Details report and in the Entity Correlation

page.

28
Q

Data Subject Access Request is another method to identify and report more potential
personal information beyond

A

the default reference set scan and Enrichment.

29
Q

Discovery Flow

A

Configure Data Sources ➤ Configure Entity Sources ➤ Scan Entity Sources ➤ Scan Data Sources ➤ Correlate ➤ Scan Results ➤ Inventory ➤ Apps / Use Cases

30
Q
Configure Entity Sources, set relevant data sources as entity sources,
testing first (on first 1000 records) to get
initial identifiability, select
A

entity attributes, override sure match, and set sampling.