Domain 2: Cloud Data Security Flashcards

1
Q

What are the phases in a secure data lifecycle?

A
  1. Create - data is created when it is first entered into a system or modified.
  2. Store - act of saving data in a retrievable location (e.g. SSD)
  3. Use - accessing, viewing, processing of data (data handling)
  4. Share - access to data is granted to others
  5. Archive - data reaches end of useful life, but still needs to be retained (e.g. legal or compliance); placed in long-term retrievable storage
  6. Destroy - data is destroyed; overwriting, crypto-shredding, physical destruction (not possible in cloud)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In the Create phase of a secure data lifecycle what are the controls?

A
  • Data Classification
    • Done by the creator (e.g. header/footer in a document)
    • Done by system owner (e.g. all data stored in an email system is automatically classified as confidential).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

In the Store phase of a secure data lifecycle what are the controls?

A

  • Protection for data in transit to storage (TLS, SSH, VPN)
  • Location of data storage based on classification - governed by policies and procedures
  • Access Controls (determining who has access, how it is granted)
  • Encrypting data at rest
  • Backups to preserve integrity and availability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

In the Use phase of a secure data lifecycle what are the controls?

A
  • Data Loss Prevention (DLP) controls
  • Information Rights Management (IRM)
  • System access controls
  • Network monitoring tools
  • Logging and Monitoring
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

In the Share phase of a secure data lifecycle what are the controls?

A
  • Proactive Access controls (role-based auths, access granting)
  • Reactive Access Controls - DLP, IRM, access review
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

In the Archive phase of a secure data lifecycle what are the controls?

A
  • Similar controls to Storage phase
  • Additionally, periodic encryption key rotation needed
  • Risk of storage medium/format becoming degraded or obsolete (affects integrity/availability)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Data Dispersion?

A

Data Dispersion used in cloud computing, refers to breaking data into smaller chunks and storing them across different physical storage devices.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Erasure Coding?

A

  • Similar to the idea of parity bit coding
  • Ability to reconstruct a lost segment of data from other segments and parity bits.
  • Like solving for an unknown in algebra.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the benefits/tradeoffs of Data Dispersion and Erasure Coding?

A
  • Increases availability (if there is a failure of a single disk, data can be reconstructed)
  • Decreases risk of compromise (if one segment of data is compromised not all is lost)

Downsides

  • ensuring that data location does not violate data residency requirements.
  • Additional latency needed to reconstruct data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the storage types in an IaaS?

A
  1. Ephemeral - (e.g. EC2 Instance Store) - on the same physical host as the instance. Only exists for the lifetime of the VM. Faster access. Typically used for cache buffers, system files and memory swap files.
  2. Raw: Raw Device Mapping (RDM) - VMs access a particular portion of overall storage (marked by a Logical Unit Number) allocated to them.
  3. Long-term: Durable, persistent storage - for data archiving - e.g. S3/Glacier
  4. Volume: like a traditional drive and Stores data in blocks e.g. EBS.
  5. Object: like windows file server, for unstructured data (music, video files); data is stored as objects e.g. S3
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the storage types in a PaaS?

A

  1. Disk- may be volume or object store
  2. Databases - both a storage and a PaaS offering
  3. Binary Large Object (blob): for unstructured data; e.g. S3; access via URLs.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the storage type in SaaS?

A
  1. Information storage and management; user enters data, SaaS stores in a databased managed by the CSP.
  2. Content and file storage (e.g. content sharing apps, ticketing systems with allow file attachments)
  3. Content delivery network
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are threats to the cloud storage types?

A
  1. Unauthorized Access
  2. Unauthorized Provisioning - shadow IT
  3. Regulatory noncompliance - cloud services not meeting regulator requirements such as encryption algorithms.
  4. Jurisdictional - data crossing borders
  5. Denial of Service
  6. Data corruption or destruction
  7. Theft or media loss
  8. Malware or ransomware
  9. Improper disposal of media
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the Kerckhoff’s principle?

A

A cryptosystem should be secure even if everything about the system, except the key, is public knowledge.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the stages in the lifecycle of a cryptographic key?

A

​Same as data lifecycle.

  1. Create - use strong random number generators
  2. Store - encrypted and stored in key vault
  3. Use - access control and accountability
  4. Share - not common, but using PKI to share symmetric keys
  5. Archive - keys no longer needed for routine use, but needed for older encrypted data
  6. Destroy - destruction of keys no longer needed.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the various levels of encryption options?

A
  1. Storage level encryption - encryption as data is written to storage with keys controlled by CSP; normally protects data in case of theft of storage device
  2. Volume level encryption - encryption as data is written to volume attached to instance; only be accessed through the OS; keys controlled by customer; protects against theft, external admin access and storage level exfiltration.
  3. Object level encryption - can use the following types
    1. File level encryption - e.g. MS Word/Adobe PDF using passwords or an IRM system; client encrypts
    2. Application level encryption - application encrypts data before writing to object store
  4. Database level encryption - maybe either file level (e.g. whole database file) or transparent encryption by the DBMS which encrypts specific columns or whole tables.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is hashing?

A
  • One-way encryption used to verify integrity of data.
  • Used as part of digital signatures
  • Digital signatures verify both authenticity and integrity of a message.
  • Secure Hashing Algorithm (SHA3) is an example of a hashing algorithm approved by FIPS in its Secure Hash Standard (SHS).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is masking?

A
  • Obfuscation of part of the data to keep it secure.
  • For e.g. displaying just the last 4 digits of the SSN.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is tokenization?

A
  • Non-sensitive representation of sensitive data.
  • Token is a substitute.
  • Normally managed via a tokenization service (which implements access controls).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What controls comprise Data Loss Prevention?

A
  • Detective Controls (identify where sensitive data is being stored and used)
  • Preventative Controls (enforcing policy requirements on the storage and use of sensitive data
  • Corrective (displaying alert to user informing them of policy violation and preventing inappropriate actions).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are the major components of a DLP?

A
  1. Discovery
  2. Monitoring
  3. Enforcement
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What happens in the Discovery phase of the DLP?

A

  • Identify, categorize and inventory data assets.
  • Typically using network scans (IP address range, domain search)
  • Data scan to identify sensitive info (eg. SSN or “Confidential” tags, or PII/PHI or Payment Info)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What happens in the Monitoring phase of the DLP?

A
  • Enables security team to identify how data is being used and prevent inappropriate use.
  • Network -based v/s. Host-based DLP Monitoring
  • In-motion data monitoring can be done either by N-based tools (e.g. proxy/firewall/email server) or h-based(e.g. laptop agent)
  • At-rest monitoring normally done by agent-based tool
  • In-use monitoring - agent-based; e.g. on a user’s laptop
  • Agent-based DLP tools compatibility with OS is a concern.
  • Network-based DLP concerns (remote users who bypass network, encrypted comms not visible).
24
Q

What happens in the Enforcement phase of the DLP?

A

  • DLP applies rules based on the results of monitoring to enforce security policies.
  • e.g. prevent a user from saving a file to removable media (USB)
25
Q

What is data obfuscation?

A
  • Similar to data masking/de-identification of data
  • Used when sensitive data needs to be used in a different situation
  • e.g. obfuscated data from production used for test systems (to simulate real world data).
26
Q

What are some techniques to obfuscate data?

A
  • Substitution - swap some information for other
  • Shuffling - e.g.Change Chris to Hrisc
  • Value Variance - Add/sub +/- $1,000
  • Deletion of nullification (change to null values)
  • Encryption

Note: important to maintain data integrity when obfuscating (e.g. credit card numbers must still be 16 digit numbers).

27
Q

What is pseudo-anonymization?

A

Obfuscating data with the goal of reversing it later or when required- e.g. substituting an index value for PII before storing in the cloud.

28
Q

In a PII data, what are direct and indirect identifiers?

A
  • Direct IDs are- name or identification number
  • Indirect IDs - are information that can be combined with other information to identify a person (e.g. demographic information and shopping history).
29
Q

What is data de-identification?

A

  • Also known as anonymization.
  • Removing direct and indirect PII identifiers from datasets.
  • Unlike pseudo anonmyzation, this de-identification is not meant to be reversed.
30
Q

What are the two types of data discovery?

A
  • Data discovery by DLP tools for inventory.
  • Business Intelligence tools that analyze data for trends etc. (e.g. understanding trends to build inventory).
31
Q

What’s the difference between Data Lake and Data Warehouse?

A
  • Data Lake - unstructured data/blob
  • Data warehouse - structured data or normalized data
32
Q

What is ETL?

A
  • Extract, Transform, Load
  • Data from multiple disparate sources is normalized and loaded into a data warehouse.
  • ETL improves searchability
33
Q

What is a Data Mart?

A
  • Data that has been warehoused, analyzed and made available for specific use by a particular business unit.
34
Q

What is Data Mining?

A

Involves discovering, analyzing, and extracting patterns in data.

35
Q

What is OLAP?

A
  • Online Analytic Processing
  • Analytics includes consolidation, drill-down, and slice-and-dice functions.
  • E.g. security incidents require forensic analysis that uses OLAP to extract relevant information from log files.
36
Q

What is AI/ML?

A
  • AI/ML learn from data sets, improves computer algorithms by experience.
  • Training data is used.
37
Q

What is structured data?

A
  • Has a consistent format (Same info in same place)

Example - database record

  • Data may also be structured using XML tags (e.g. JSON, YAML)
  • Data Schema or data model - abstract view of the data in a system.
  • Easier to identify and protect PII/PHI if data is structured.
  • Metadata is data that describes data
  • Semantics is the meaning of the data - it is described in the data schema or data model.
38
Q

What is unstructured data?

A
  • Information stored without following a common format.
  • Data labels is one approach to dealing with such data.
  • Content analysis is another approach.
39
Q

What are the different types of content analysis in dealing with unstructured data?

A
  • Pattern Matching - e.g. know credit card formats, or SSN formats.
  • Lexical analysis - may be suited for email/IMs.
  • Hashing - identifies known data types - e.g. system files, compares hash of data to known hashes of sensitive files.
40
Q

What is data classification?

A
  • The act of forming classes or groups, by identifying common attributes.
  • Data classification determines the controls/protection level needed - e.g. Public data can be unencrypted, but “Internal Use Only” data may need to be.
  • Speeds up security decision making by providing risk-based security controls based on classification
41
Q

What factors drive data classification?

A
  • Data Type - PII, Financial, PHI, Legal, Educational etc. determines what regulations apply
  • Legal Constrains - e.g. GDPR for data on EU citizens
  • Ownership - shared ownership of data may impose how data is used
  • Value/criticality - how loss of that data would affect an organization’s operations.
42
Q

What is data science?

A

The application of scientific methods and algorithms to identify knowledge or useful information from data sets.

43
Q

What is data mapping?

A
  • Identifying data and mapping its location.
  • Most DLP tools do this (scan of IP address, domain, reading tags etc.)
44
Q

What are the different asset types that can be labeled?

A
  1. Hard Copy Materials - labeled with a printed watermark
  2. Physical assets (disc drives, servers, removable media etc.)
    Risks: content of media may change; found assets may be difficult to label (cannot be read, or danger of malware)
  3. Digital Files - labeled with meta data, digital water mark, footer with classification
  4. Complex or shared systems - requires training and reference materials.

DLP tools are the biggest consumers of labels.

45
Q

What elements must a data classification policy specify?

A
  1. Compliance requirements inherent at various classification levels
  2. Data retention and disposal requirements
  3. What is considered sensitive or regulated data
  4. Appropriate or approved uses of data
  5. Access control and authorization
  6. Encryption needs
46
Q

What is the purpose of DRM or IRM

A

Digital or Information Rights Management Controls access and usage rights when data is meant to be shared but not freely distributed.

Two types:
a) Consumer Grade IRM also known as DRM - e.g. copyrighted material such as movies, music etc. DRM tools prevent copying

b) Enterprise Grade IRM - e.g. images/documents. Also enforces usage restrictions and access rights (e.g. only users with a signed URL).

IRMs/DRMs may use ACLs to enforce restrictions.

47
Q

What are the attributes of an IRM/DRM system?

A
  1. Persistence - ACLs must follow the data e.g. password to open a document
  2. Dynamic policy control - owner can update policy/restrictions even after sharing data; needs a connection back to an IRM server
  3. Expiration - Time limited access (e.g. movie only playable for 24 hours)
  4. Continuous Audit trail - for accountability
  5. Interoperability - different DBs, OSs, email servers etc.
48
Q

What are the access control models that IRM/DRMs use?

A
  1. Discretionary Access Control - owner defines access restrictions based on document or data set.
  2. Mandatory Access Control (MAC) - owner defines a classification level, and the IRM system does the rest - e.g. all sensitive data is only viewable by users with clearance.
49
Q

What factors drive data retention?

A
  • Organization’s operational need (e.g. to support analysis for business intelligence)
  • Compliance (e.g. retention of PII, Financial records).
  • Data retention must balance organizational needs with cost.
  • Cloud is generally used as a backup location. Cloud offers different storage options that balance speed of retrieval and cost.
50
Q

What factors must data retention practices consider?

A
  • Schedules - how long is data to be retained
  • Integrity checking - ensuring that data is readable and unaltered due to environmental factors; use of hashing
  • Retrieval procedures - who gets access to the data (e.g. live data needed by more users than archived data)
  • Data formats - ensuring that archived data can still be read (file formats and hardware change over time).
51
Q

Per NIST 800-88 (Guidelines for Media Sanitization) what are the three categories of delete action for defensible destruction?

A
  1. Clear - remove data from user addressable storage (e.g. delete files, empty trash bin etc.). Renders data invisible but still recoverable using tools. Only for non-sensitive data
  2. Purge - overwrite drives with dummy data or physical state change (e.g. magnetic degaussing) Expensive operation, may also degrade the physical media (e.g. overwrite 0s 35 times). Data may still be recoverable using very specialized tools. Crypto-shredding is an alternative
  3. Destroy - pulverization/melting; not possible for customers to do when in cloud. More appropriate for CSPs.
52
Q

What is a Legal Hold?

A
  • Data retention is indefinitely suspended - i.e. it must be retained even if seven years have passed (for e.g.).
  • Challenges with Legal Hold are a) how to identify which data to hold b) exception procedures to not delete that data c) supporting systems such as crypto keys are also not deleted
53
Q

What are OWASP guidelines on logging?

A
  1. Synchronize time across all servers and devices - timestamping is important to establish chain of activity
  2. Differing classification schemes - e.g. some OS classify user login as “informational” while other call it “user events”
  3. Identity attribution - who did what when?
  4. Application specific logs - they may be in unique formats that need to be normalized
  5. Integrity of log files - App and OS log files on the same hosts as the SW generating logs - hence susceptible to tampering.
54
Q

What are SIEM tools and what do they do?

A

SIEM = Security Information and Event Management (e.g. Splunk)

  • Log centralization and aggregation (SIEM platform gathers log data from multiple locations)
  • Data Integrity - SIEM runs on a separate host with access permissions
  • Normalization
  • Automated or continuous monitoring (aka correlation) to identify potential attacks
  • Alerting - generate tickets/emails/pages; take automated IPS actions (e.g. suspend user account)
  • Investigative monitoring (log file queries, reporting)
55
Q

What is non-repudiation?

A

Holding a user accountable for a particular action in a way that they cannot deny doing it.
Normally done via log files with unique ids and timestamps

56
Q

What is chain of custody?

A

A defensible record of how evidence was handled and by whom from its collection to its presentation as evidence.

57
Q

What are some of the data security technologies and strategies?

A
  • Encryption and key management (various levels of encryption - storage, volume, object, file, app, DB)
  • Hashing
  • Masking
  • Tokenization
  • Data Loss Prevention
  • Data Obfuscation