Explain Data Discovery Flashcards
Generally speaking, there are several discovery approaches. The simplest one is looking at
metadata conditions, like the permissions of a file, to see if it’s accessible to all. This is a simple check that can help companies identify over-exposed objects.
Another discovery approaches is going into the data itself and is pattern-based, which
usually leverages regular expressions to find PII. You cannot use it to connect the data to its owner. You can however use it to see if a certain data source contains a certain type of data, and it produces significant value with very little effort.
Going one level up, we can leverage NLP (natural language processing) to
identify names, addresses, phone numbers and other contextual data. It also cannot connect data to its owner but would be enough for some
legacy regulations.
the highest level of data discovery, which is the most complex to implement but also the most
extensive, uses
smart value matching and machine learning to correlate entities with their data. This level is required for modern privacy
egulations like GDPR and CCPA and really the only way to fulfil use cases like
DSAR and breach response.
On top of all discovery approaches, a company would greatly benefit from
a catalog/registry that shows all the results
from all those discovery levels in one place.
BigID Discovery Types - Correlation:
find connected and associated data to an entity or person
BigID Discovery Types - Classification:
locate specific format or type of data
BigID Discovery Types - Clustering:
find duplicate and related data by topic
BigID Discovery Types - Catalog:
metadata collection for fast PII catalog view
BigID Discovery Methods - Reference Set (IDSoR)
Discovery Algorithm? Value Matching
When? Scan
Correlated? Yes
BigID Discovery Methods - Enrichment
Discovery Algorithm? Proximity Analysis
When? Scan
Correlated? Yes
BigID Discovery Methods - Data Classification
Discovery Algorithm? Pattern Matching
When? Scan
Correlated? No
BigID Discovery Methods - Advanced Classification
Discovery Algorithm? Machine Learning (NLP)
When? Scan
Correlated? No
BigID Discovery Methods - Document Classification
Discovery Algorithm? Machine Learning
When? Scan
Correlated? No
BigID Discovery Methods - Subject Access Request
Discovery Algorithm? Index, Value Matching and Proximity
When? Report
Correlated? Yes