Domain 2: Cloud Data Security Flashcards

Question 1

Q

What are the phases in a secure data lifecycle?

Answer

A

Create - data is created when it is first entered into a system or modified.
Store - act of saving data in a retrievable location (e.g. SSD)
Use - accessing, viewing, processing of data (data handling)
Share - access to data is granted to others
Archive - data reaches end of useful life, but still needs to be retained (e.g. legal or compliance); placed in long-term retrievable storage
Destroy - data is destroyed; overwriting, crypto-shredding, physical destruction (not possible in cloud)

Question 2

Q

In the Create phase of a secure data lifecycle what are the controls?

Answer

A

Data Classification
- Done by the creator (e.g. header/footer in a document)
- Done by system owner (e.g. all data stored in an email system is automatically classified as confidential).

Question 3

Q

In the Store phase of a secure data lifecycle what are the controls?

Answer

A

Protection for data in transit to storage (TLS, SSH, VPN)
Location of data storage based on classification - governed by policies and procedures
Access Controls (determining who has access, how it is granted)
Encrypting data at rest
Backups to preserve integrity and availability

Question 4

Q

In the Use phase of a secure data lifecycle what are the controls?

Answer

A

Data Loss Prevention (DLP) controls
Information Rights Management (IRM)
System access controls
Network monitoring tools
Logging and Monitoring

Question 5

Q

In the Share phase of a secure data lifecycle what are the controls?

Answer

A

Proactive Access controls (role-based auths, access granting)
Reactive Access Controls - DLP, IRM, access review

Question 6

Q

In the Archive phase of a secure data lifecycle what are the controls?

Answer

A

Similar controls to Storage phase
Additionally, periodic encryption key rotation needed
Risk of storage medium/format becoming degraded or obsolete (affects integrity/availability)

Question 7

Q

What is Data Dispersion?

Answer

A

Data Dispersion used in cloud computing, refers to breaking data into smaller chunks and storing them across different physical storage devices.

Question 8

Q

What is Erasure Coding?

Answer

A

Similar to the idea of parity bit coding
Ability to reconstruct a lost segment of data from other segments and parity bits.
Like solving for an unknown in algebra.

Question 9

Q

What are the benefits/tradeoffs of Data Dispersion and Erasure Coding?

Answer

A

Increases availability (if there is a failure of a single disk, data can be reconstructed)
Decreases risk of compromise (if one segment of data is compromised not all is lost)

Downsides

ensuring that data location does not violate data residency requirements.
Additional latency needed to reconstruct data.

Question 10

Q

What are the storage types in an IaaS?

Answer

A

Ephemeral - (e.g. EC2 Instance Store) - on the same physical host as the instance. Only exists for the lifetime of the VM. Faster access. Typically used for cache buffers, system files and memory swap files.
Raw: Raw Device Mapping (RDM) - VMs access a particular portion of overall storage (marked by a Logical Unit Number) allocated to them.
Long-term: Durable, persistent storage - for data archiving - e.g. S3/Glacier
Volume: like a traditional drive and Stores data in blocks e.g. EBS.
Object: like windows file server, for unstructured data (music, video files); data is stored as objects e.g. S3

Question 11

Q

What are the storage types in a PaaS?

Answer

A

Disk- may be volume or object store
Databases - both a storage and a PaaS offering
Binary Large Object (blob): for unstructured data; e.g. S3; access via URLs.

Question 12

Q

What are the storage type in SaaS?

Answer

A

Information storage and management; user enters data, SaaS stores in a databased managed by the CSP.
Content and file storage (e.g. content sharing apps, ticketing systems with allow file attachments)
Content delivery network

Question 13

Q

What are threats to the cloud storage types?

Answer

A

Unauthorized Access
Unauthorized Provisioning - shadow IT
Regulatory noncompliance - cloud services not meeting regulator requirements such as encryption algorithms.
Jurisdictional - data crossing borders
Denial of Service
Data corruption or destruction
Theft or media loss
Malware or ransomware
Improper disposal of media

Question 14

Q

What is the Kerckhoff’s principle?

Answer

A

A cryptosystem should be secure even if everything about the system, except the key, is public knowledge.

Question 15

Q

What are the stages in the lifecycle of a cryptographic key?

Answer

A

Same as data lifecycle.

Create - use strong random number generators
Store - encrypted and stored in key vault
Use - access control and accountability
Share - not common, but using PKI to share symmetric keys
Archive - keys no longer needed for routine use, but needed for older encrypted data
Destroy - destruction of keys no longer needed.

Question 16

Q

What are the various levels of encryption options?

Answer

A

Storage level encryption - encryption as data is written to storage with keys controlled by CSP; normally protects data in case of theft of storage device
Volume level encryption - encryption as data is written to volume attached to instance; only be accessed through the OS; keys controlled by customer; protects against theft, external admin access and storage level exfiltration.
Object level encryption - can use the following types
1. File level encryption - e.g. MS Word/Adobe PDF using passwords or an IRM system; client encrypts
2. Application level encryption - application encrypts data before writing to object store
Database level encryption - maybe either file level (e.g. whole database file) or transparent encryption by the DBMS which encrypts specific columns or whole tables.

Question 17

Q

What is hashing?

Answer

A

One-way encryption used to verify integrity of data.
Used as part of digital signatures
Digital signatures verify both authenticity and integrity of a message.
Secure Hashing Algorithm (SHA3) is an example of a hashing algorithm approved by FIPS in its Secure Hash Standard (SHS).

Question 18

Q

What is masking?

Answer

A

Obfuscation of part of the data to keep it secure.
For e.g. displaying just the last 4 digits of the SSN.

Question 19

Q

What is tokenization?

Answer

A

Non-sensitive representation of sensitive data.
Token is a substitute.
Normally managed via a tokenization service (which implements access controls).

Question 20

Q

What controls comprise Data Loss Prevention?

Answer

A

Detective Controls (identify where sensitive data is being stored and used)
Preventative Controls (enforcing policy requirements on the storage and use of sensitive data
Corrective (displaying alert to user informing them of policy violation and preventing inappropriate actions).

Question 21

Q

What are the major components of a DLP?

Answer

A

Discovery
Monitoring
Enforcement

Question 22

Q

What happens in the Discovery phase of the DLP?

Answer

A

Identify, categorize and inventory data assets.
Typically using network scans (IP address range, domain search)
Data scan to identify sensitive info (eg. SSN or “Confidential” tags, or PII/PHI or Payment Info)

Question 23

Q

What happens in the Monitoring phase of the DLP?

Answer

A

Enables security team to identify how data is being used and prevent inappropriate use.
Network -based v/s. Host-based DLP Monitoring
In-motion data monitoring can be done either by N-based tools (e.g. proxy/firewall/email server) or h-based(e.g. laptop agent)
At-rest monitoring normally done by agent-based tool
In-use monitoring - agent-based; e.g. on a user’s laptop
Agent-based DLP tools compatibility with OS is a concern.
Network-based DLP concerns (remote users who bypass network, encrypted comms not visible).

Question 24

Q

What happens in the Enforcement phase of the DLP?

Answer

A

DLP applies rules based on the results of monitoring to enforce security policies.
e.g. prevent a user from saving a file to removable media (USB)

Question 25

Q

What is data obfuscation?

Answer

A

Similar to data masking/de-identification of data
Used when sensitive data needs to be used in a different situation
e.g. obfuscated data from production used for test systems (to simulate real world data).

Question 26

Q

What are some techniques to obfuscate data?

Answer

A

Substitution - swap some information for other
Shuffling - e.g.Change Chris to Hrisc
Value Variance - Add/sub +/- $1,000
Deletion of nullification (change to null values)
Encryption

Note: important to maintain data integrity when obfuscating (e.g. credit card numbers must still be 16 digit numbers).

Question 27

Q

What is pseudo-anonymization?

Answer

A

Obfuscating data with the goal of reversing it later or when required- e.g. substituting an index value for PII before storing in the cloud.

Question 28

Q

In a PII data, what are direct and indirect identifiers?

Answer

A

Direct IDs are- name or identification number
Indirect IDs - are information that can be combined with other information to identify a person (e.g. demographic information and shopping history).

Question 29

Q

What is data de-identification?

Answer

A

Also known as anonymization.
Removing direct and indirect PII identifiers from datasets.
Unlike pseudo anonmyzation, this de-identification is not meant to be reversed.

Question 30

Q

What are the two types of data discovery?

Answer

A

Data discovery by DLP tools for inventory.
Business Intelligence tools that analyze data for trends etc. (e.g. understanding trends to build inventory).

Question 31

Q

What’s the difference between Data Lake and Data Warehouse?

Answer

A

Data Lake - unstructured data/blob
Data warehouse - structured data or normalized data

Question 32

Q

What is ETL?

Answer

A

Extract, Transform, Load
Data from multiple disparate sources is normalized and loaded into a data warehouse.
ETL improves searchability

Question 33

Q

What is a Data Mart?

Answer

A

Data that has been warehoused, analyzed and made available for specific use by a particular business unit.

Question 34

Q

What is Data Mining?

Answer

A

Involves discovering, analyzing, and extracting patterns in data.

Question 35

Q

What is OLAP?

Answer

A

Online Analytic Processing
Analytics includes consolidation, drill-down, and slice-and-dice functions.
E.g. security incidents require forensic analysis that uses OLAP to extract relevant information from log files.

Question 36

Q

What is AI/ML?

Answer

A

AI/ML learn from data sets, improves computer algorithms by experience.
Training data is used.

Question 37

Q

What is structured data?

Answer

A

Has a consistent format (Same info in same place)

Example - database record

Data may also be structured using XML tags (e.g. JSON, YAML)
Data Schema or data model - abstract view of the data in a system.
Easier to identify and protect PII/PHI if data is structured.
Metadata is data that describes data
Semantics is the meaning of the data - it is described in the data schema or data model.

Question 38

Q

What is unstructured data?

Answer

A

Information stored without following a common format.
Data labels is one approach to dealing with such data.
Content analysis is another approach.

Question 39

Q

What are the different types of content analysis in dealing with unstructured data?

Answer

A

Pattern Matching - e.g. know credit card formats, or SSN formats.
Lexical analysis - may be suited for email/IMs.
Hashing - identifies known data types - e.g. system files, compares hash of data to known hashes of sensitive files.

Question 40

Q

What is data classification?

Answer

A

The act of forming classes or groups, by identifying common attributes.
Data classification determines the controls/protection level needed - e.g. Public data can be unencrypted, but “Internal Use Only” data may need to be.
Speeds up security decision making by providing risk-based security controls based on classification

Question 41

Q

What factors drive data classification?

Answer

A

Data Type - PII, Financial, PHI, Legal, Educational etc. determines what regulations apply
Legal Constrains - e.g. GDPR for data on EU citizens
Ownership - shared ownership of data may impose how data is used
Value/criticality - how loss of that data would affect an organization’s operations.

Question 42

Q

What is data science?

Answer

A

The application of scientific methods and algorithms to identify knowledge or useful information from data sets.

Question 43

Q

What is data mapping?

Answer

A

Identifying data and mapping its location.
Most DLP tools do this (scan of IP address, domain, reading tags etc.)

Question 44

Q

What are the different asset types that can be labeled?

Answer

A

Hard Copy Materials - labeled with a printed watermark
Physical assets (disc drives, servers, removable media etc.)
Risks: content of media may change; found assets may be difficult to label (cannot be read, or danger of malware)
Digital Files - labeled with meta data, digital water mark, footer with classification
Complex or shared systems - requires training and reference materials.

DLP tools are the biggest consumers of labels.

Question 45

Q

What elements must a data classification policy specify?

Answer

A

Compliance requirements inherent at various classification levels
Data retention and disposal requirements
What is considered sensitive or regulated data
Appropriate or approved uses of data
Access control and authorization
Encryption needs

Question 46

Q

What is the purpose of DRM or IRM

Answer

A

Digital or Information Rights Management Controls access and usage rights when data is meant to be shared but not freely distributed.

Two types:
a) Consumer Grade IRM also known as DRM - e.g. copyrighted material such as movies, music etc. DRM tools prevent copying

b) Enterprise Grade IRM - e.g. images/documents. Also enforces usage restrictions and access rights (e.g. only users with a signed URL).

IRMs/DRMs may use ACLs to enforce restrictions.

Question 47

Q

What are the attributes of an IRM/DRM system?

Answer

A

Persistence - ACLs must follow the data e.g. password to open a document
Dynamic policy control - owner can update policy/restrictions even after sharing data; needs a connection back to an IRM server
Expiration - Time limited access (e.g. movie only playable for 24 hours)
Continuous Audit trail - for accountability
Interoperability - different DBs, OSs, email servers etc.

Question 48

Q

What are the access control models that IRM/DRMs use?

Answer

A

Discretionary Access Control - owner defines access restrictions based on document or data set.
Mandatory Access Control (MAC) - owner defines a classification level, and the IRM system does the rest - e.g. all sensitive data is only viewable by users with clearance.

Question 49

Q

What factors drive data retention?

Answer

A

Organization’s operational need (e.g. to support analysis for business intelligence)
Compliance (e.g. retention of PII, Financial records).
Data retention must balance organizational needs with cost.
Cloud is generally used as a backup location. Cloud offers different storage options that balance speed of retrieval and cost.

Question 50

Q

What factors must data retention practices consider?

Answer

A

Schedules - how long is data to be retained
Integrity checking - ensuring that data is readable and unaltered due to environmental factors; use of hashing
Retrieval procedures - who gets access to the data (e.g. live data needed by more users than archived data)
Data formats - ensuring that archived data can still be read (file formats and hardware change over time).

Question 51

Q

Per NIST 800-88 (Guidelines for Media Sanitization) what are the three categories of delete action for defensible destruction?

Answer

A

Clear - remove data from user addressable storage (e.g. delete files, empty trash bin etc.). Renders data invisible but still recoverable using tools. Only for non-sensitive data
Purge - overwrite drives with dummy data or physical state change (e.g. magnetic degaussing) Expensive operation, may also degrade the physical media (e.g. overwrite 0s 35 times). Data may still be recoverable using very specialized tools. Crypto-shredding is an alternative
Destroy - pulverization/melting; not possible for customers to do when in cloud. More appropriate for CSPs.

Question 52

Q

What is a Legal Hold?

Answer

A

Data retention is indefinitely suspended - i.e. it must be retained even if seven years have passed (for e.g.).
Challenges with Legal Hold are a) how to identify which data to hold b) exception procedures to not delete that data c) supporting systems such as crypto keys are also not deleted

Question 53

Q

What are OWASP guidelines on logging?

Answer

A

Synchronize time across all servers and devices - timestamping is important to establish chain of activity
Differing classification schemes - e.g. some OS classify user login as “informational” while other call it “user events”
Identity attribution - who did what when?
Application specific logs - they may be in unique formats that need to be normalized
Integrity of log files - App and OS log files on the same hosts as the SW generating logs - hence susceptible to tampering.

Question 54

Q

What are SIEM tools and what do they do?

Answer

A

SIEM = Security Information and Event Management (e.g. Splunk)

Log centralization and aggregation (SIEM platform gathers log data from multiple locations)
Data Integrity - SIEM runs on a separate host with access permissions
Normalization
Automated or continuous monitoring (aka correlation) to identify potential attacks
Alerting - generate tickets/emails/pages; take automated IPS actions (e.g. suspend user account)
Investigative monitoring (log file queries, reporting)

Question 55

Q

What is non-repudiation?

Answer

A

Holding a user accountable for a particular action in a way that they cannot deny doing it.
Normally done via log files with unique ids and timestamps

Question 56

Q

What is chain of custody?

Answer

A

A defensible record of how evidence was handled and by whom from its collection to its presentation as evidence.

Question 57

Q

What are some of the data security technologies and strategies?

Answer

A

Encryption and key management (various levels of encryption - storage, volume, object, file, app, DB)
Hashing
Masking
Tokenization
Data Loss Prevention
Data Obfuscation