Databricks Fundamentals Flashcards

1
Q

Open Data Lake

A

also known as a data Lakehouse. Databricks Data Intelligence is built on this. The first part of the pyramid.

  • Data Ingestion and storage
  • Data processing and support for continuous data engineering
  • Data Access and Consumption
  • Data Governance – Discoverability, Security, and Compliance
  • Infrastructure and operations
  • All Raw Data
    (Logs, Texts, Audio, Video, and Images)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Delta Lake

A

-Unified Data Storage for reliability and sharing

  • is a file-based open source storage format. ACID transaction guarantees

1st piece of the Data Intelligence Engine funnel/pyramid (after Open Data Lake)

  • Data layout is automatically optimized based on usage patterns, acid transaction guarantees, (scalable data and metadata handling), (audit history and time travel), (unified streaming and batch processing), (schema enforcement, and schema evolution)

-Features : Predictive I/O, Predictive Optimizations, Liquid Clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Unity Catalog

A

Unified security, governance, and cataloging

  • Context-aware search, auto-describe tables and columns, automated lineage, end-to-end observability and monitoring, sharing ai models

3rd piece of the data lake house funnel/pyramid (after Delta Lake)

  • Securely get insights in natural language
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Data Intelligence Engine

A

Use generative Ai to understand the semantics of your data

  1. Delta Lake
  2. Unity Catalog
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

ACID Transaction

A
  • Atomicity: A transaction is treated as a single atomic unit. All steps that make up the transaction must succeed or the entire transaction rolls back. If they all succeed, the changes made by the transaction are permanently committed to the managing system. Consider the transfer transaction example. For the transaction to be committed to the database, the $200 must be successfully deducted from the savings account and added to the checking account. The funds in both accounts must be verified to ensure their accuracy. If any of these tasks fail, all changes roll back and none are committed.
  • Consistency: A transaction must preserve the consistency of the underlying data. The transaction should make no changes that violate the rules or constraints placed on the data. For instance, a database supporting banking transactions might include a rule stating that a customer’s account balance can never be negative. If a transaction attempts to withdraw more money from an account than is available, the transaction will fail, and any changes made to the data will roll back.
  • Isolation: A transaction is isolated from all other transactions. Transactions can run concurrently only if they don’t interfere with each other. Returning to the transfer transaction example, if another transaction were to attempt to withdraw funds from the same savings account, isolation would prevent the second transaction from firing. Without isolation, it might be possible for the second transaction to withdraw more funds than are available in the account after the first transaction was completed.
  • Durability: A transaction that is committed is guaranteed to remain committed – that is, all changes are made permanent and will not be lost if an event such as a power failure should occur. This typically means persisting the changes to nonvolatile storage. If durability were not guaranteed, it would be possible for some or all changes to be lost, affecting the data’s reliability.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Elements of Data Governance

A
  1. Data cataloging
  2. Data Classification
  3. Auditing data entitlements and access
  4. Data discovery
  5. Data sharing and collaboration
  6. Data Lineage
  7. Data Security
  8. Data quality
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Databricks Data governance

A

Unity Catalog: Unified governance and security

Delta Sharing: Sharing between organizations. Share live data without copying it, open cross-platform sharing, centralized admin and gov

Databricks Marketplace; Commercialization of data assets

Databricks Cleanroom: Private, secure computing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Databricks Security Architecture

A
  • Control plane
  • Data Plane
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Data Plane

A
  • one of Databrick’s security architecture

-handle the movement of data packets within and between cloud environments.

-where the data is processed by clusters of compute resources

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Control plane

A
  • one of Databrick’s security architecture
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Photon

A

-Increased ETL, ingestion on data lake. Can be built on Spark

  • Loading data into Delta and Parquet, IoT use cases, SQL-based use cases
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Data Warehousing

A
  • Databricks SQL
    • Text to SQL
    • AI-driven queries
    • AI-driven serverless computing
      scales for cost efficiency and peak
      performance
    • AI-driven debugging and
      remediation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Delta Live Tables (DTL)

A

ETL & Real-Time Analytics

-Automated and scalable streaming ingestion and transformation
-Workload-specific autoscaling
-Intelligent orchestration, error handling, and optimization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Orchestration

A
  • Workflows

Intelligent ETL processing, AI-driven debugging and remediation, end-to-end observability and monitoring, broad ecosystem integration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q
A

GEN AI
- Custom Models
- Model serving
- RAG

End-to-End AI
- MLOPS (MLFLOW)
- AutoML
- Monitoring
- Governance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly