Domain 2 - Cloud Data Security Flashcards

Question

What is data obfuscation?

Answer 1

- Similar to data masking/de-identification of data - Used when sensitive data needs to be used in a different situation - e.g. obfuscated data from production used for test systems (to simulate real world data).

Answer 2

- Substitution - swap some information for other - Shuffling - e.g.Change Chris to Hrisc - Value Variance - Add/sub +/- $1,000 - Deletion of nullification (change to null values) - Encryption Note: important to maintain data integrity when obfuscating (e.g. credit card numbers must still be 16 digit numbers).

Answer 3

Obfuscating data with the goal of reversing it - e.g. substituting an index value for PII before storing in the cloud.

Answer 4

Direct IDs are- name or identification number Indirect IDs - are information that can be combined with other information to identify a person (e.g. demographic information and shopping history).

Answer 5

Also known as anonymization. Removing direct and indirect PII identifiers from datasets. Unlike pseudo anonmyzation, this de-identification is not meant to be reversed.

Answer 6

- Data discovery by DLP tools for inventory. | - Business Intelligence tools that analyze data for trends etc. (e.g. understanding trends to build inventory).

Answer 7

Data Lake - unstructured data/blob | Data warehouse - structured data or normalized data

Answer 8

Extract, Transform, Load Data from multiple disparate sources is normalized and loaded into a data warehouse. ETL improves searchability

Answer 9

Data that has been warehoused, analyzed and made available for specific use by a particular business unit.

Answer 10

Involves discovering, analyzing, and extracting patterns in data.

Answer 11

Online Analytic Processing Analytics includes consolidation, drill-down, and slice-and-dice functions. E.g. security incidents require forensic analysis that uses OLAP to extract relevant information from log files.

Answer 12

AI/ML learn from data sets, improves computer algorithms by experience. Training data is used.

Answer 13

- Has a consistent format (Same info in same place) - Example - database record - Data may also be structured using XML tags (e.g. JSON, YAML) - Data Schema or data model - abstract view of the data in a system. - Easier to identify and protect PII/PHI if data is structured. - Metadata is data that describes data - Semantics is the meaning of the data - it is described in the data schema or data model.

Answer 14

Information stored without following a common format. Data labels is one approach to dealing with such data. Content analysis is another approach.

Answer 15

Pattern Matching - e.g. know credit card formats, or SSN formats. Lexical analysis - may be suited for email/IMs. Hashing - identifies known data types - e.g. system files, compares hash of data to known hashes of sensitive files.

Answer 16

- The act of forming classes or groups, by identifying common attributes. - Data classification determines the controls/protection level needed - e.g. Public data can be unencrypted, but "Internal Use Only" data may need to be. - Speeds up security decision making by providing risk-based security controls based on classification

Answer 17

- Data Type - PII, Financial, PHI, Legal, Educational etc. determines what regulations apply - Legal Constrains - e.g. GDPR for data on EU citizens - Ownership - shared ownership of data may impose how data is used - Value/criticality - how loss of that data would affect an organization's operations.

Answer 18

The application of scientific methods and algorithms to identify knowledge or useful information from data sets.

Answer 19

Identifying data and mapping its location. | Most DLP tools do this (scan of IP address, domain, reading tags etc.)

Answer 20

1. Hard Copy Materials - labeled with a printed watermark 2. Physical assets (disc drives, servers, removable media etc.) Risks: content of media may change; found assets may be difficult to label (cannot be read, or danger of malware) 3. Digital Files - labeled with meta data, digital water mark, footer with classification 4. Complex or shared systems - requires training and reference materials. DLP tools are the biggest consumers of labels.

Answer 21

1. Compliance requirements inherent at various classification levels 2. Data retention and disposal requirements 3. What is considered sensitive or regulated data 4. Appropriate or approved uses of data 5. Access control and authorization 6. Encryption needs

Answer 22

Digital or Information Rights Management Controls access and usage rights when data is meant to be shared but not freely distributed. Two types: a) Consumer Grade IRM also known as DRM - e.g. copyrighted material such as movies, music etc. DRM tools prevent copying b) Enterprise Grade IRM - e.g. images/documents. Also enforces usage restrictions and access rights (e.g. only users with a signed URL). IRMs/DRMs may use ACLs to enforce restrictions.

Answer 23

1. Persistence - ACLs must follow the data e.g. password to open a document 2. Dynamic policy control - owner can update policy/restrictions even after sharing data; needs a connection back to an IRM server 3. Expiration - Time limited access (e.g. movie only playable for 24 hours) 4. Continuous Audit trail - for accountability 5. Interoperability - different DBs, OSs, email servers etc.

Answer 24

1. Discretionary Access Control - owner defines access restrictions based on document or data set. 2. Mandatory Access Control (MAC) - owner defines a classification level, and the IRM system does the rest - e.g. all sensitive data is only viewable by users with clearance.

Answer 25

- Organization's operational need (e.g. to support analysis for business intelligence) - Compliance (e.g. retention of PII, Financial records). Data retention must balance organizational needs with cost. Cloud is generally used as a backup location. Cloud offers different storage options that balance speed of retrieval and cost.

Answer 26

- Schedules - how long is data to be retained - Integrity checking - ensuring that data is readable and unaltered due to environmental factors; use of hashing - Retrieval procedures - who gets access to the data (e.g. live data needed by more users than archived data) - Data formats - ensuring that archived data can still be read (file formats and hardware change over time).

Answer 27

1. Clear - remove data from user addressable storage (e.g. delete files, empty trash bin etc.). Renders data invisible but still recoverable using tools. Only for non-sensitive data 2. Purge - overwrite drives with dummy data or physical state change (e.g. magnetic degaussing) Expensive operation, may also degrade the physical media (e.g. overwrite 0s 35 times). Data may still be recoverable using very specialized tools. Crypto-shredding is an alternative 3. Destroy - pulverization/melting; not possible for customers to do when in cloud. More appropriate for CSPs.

Answer 28

Data retention is indefinitely suspended - i.e. it must be retained even if seven years have passed (for e.g.). Challenges with Legal Hold are a) how to identify which data to hold b) exception procedures to not delete that data c) supporting systems such as crypto keys are also not deleted

Answer 29

1. Synchronize time across all servers and devices - timestamping is important to establish chain of activity 2. Differing classification schemes - e.g. some OS classify user login as "informational" while other call it "user events" 3. Identity attribution - who did what when? 4. Application specific logs - they may be in unique formats that need to be normalized 5. Integrity of log files - App and OS log files on the same hosts as the SW generating logs - hence susceptible to tampering.

Answer 30

SIEM = Security Information and Event Management (e.g. Splunk) - Log centralization and aggregation (SIEM platform gathers log data from multiple locations) - Data Integrity - SIEM runs on a separate host with access permissions - Normalization - Automated or continuous monitoring (aka correlation) to identify potential attacks - Alerting - generate tickets/emails/pages; take automated IPS actions (e.g. suspend user account) - Investigative monitoring (log file queries, reporting)

Answer 31

Holding a user accountable for a particular action in a way that they cannot deny doing it. Normally done via log files with unique ids and timestamps

Answer 32

A defensible record of how evidence was handled and by whom from its collection to its presentation as evidence.

Answer 33

- Encryption and key management (various levels of encryption - storage, volume, object, file, app, DB) - Hashing - Masking - Tokenization - Data Loss Prevention - Data Obfuscation

Domain 2 - Cloud Data Security Flashcards

(57 cards)