Azure Databricks Flashcards

1
Q

What is Azure Spark?

A

It is an open-source, streaming distributed computing system for big data and analytics. It can process big data across multiple nodes using Python, Scala, and R.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the difference between Hadoop and Apache Spark?

A

Hadoop is a batch system; data is batched processed, SPark is stream processes, and the node keeps it in memory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Is Apache Spark in memory or on disk?

A

In-memory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Can you stream data into Apache Spark, analyze it, and then store and then reanalyze the data using Spark?

A

Yes, you can store the analyzed streamed data in different storage systems, such as HIVE, HDFS, Cloud DStorage, etc. You can then query it using Spark and explore it further.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Where do Spark results stored?

A

HDFS, distributed FS, Cloud Storage, SQL, No SQL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the layers of the Apache Spark architecture?

A
  1. Libs (Spark SQL, ML Lib, Spark R)
  2. Spark Core
  3. Resource Management
  4. Datastore
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

In Apache Sparc, what are the three workspaces?

A
  1. Spark SQL
  2. ML Lib
  3. Spark R
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Azure Databricks?

A

It is an Apache Spark-based big data and machine learning platform.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

We are primarily an Azure orgnization but need a big data solution to run on AWS; what are my options?

A

Azure Databricks can be run on AWS.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

For Azure Databricks, what are the three environments supported?

A
  1. SQL (Run SQL queries on a data lake, create visualizations and dashboards)
  2. Data science and engineering (Collaborative workspace for working on big data pipeline and analytics)
  3. Machine Learning (End to end ML, train, serving)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

For Azure Databricks, what are the pricing models?

A
  1. Standard (All common Databricks capabilities, fully hosted)
  2. Premium (Aditional management, RBAC, monitoring and logs, AD Passthrough)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Is Azure Databricks fully hosted?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Is Azure Databricks fully managed?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

I wnat to use Azure Databricks, and what is my use to log in using their SD creds? What options do I have?

A

Use the premium pricing tier; this has an AD passthrough.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

I require passthrough authentication for Databricks to use Azure Entra ID, can i use Standard tier?

A

No, you need premium tier to support passthrough.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Azure Databricks?

A

Azure Databricks is a spark streaming platform for building, deploying, sharing, and maintaining enterprise-grade data, analytics, and AI solutions at scale.

17
Q

Where is data stored using Azure Databricks?

A
  1. It is store in Azure Data Lake using HDFS
  2. Temporary storage is local
18
Q

What external storage can Azure Databricks usise apart from HDFS with Azure Data Lake?

A

External databases like Azure SQL Database, Azure Synapse.

19
Q

How does Azure Databricks analyze the streamed data?

A
  1. It uses code like;
  2. Python (using pySpark)
  3. Scala
  4. Java
  5. R
20
Q

What are the use cases for Azure Databricks?

A
  1. Data processing scheduling and management, in particular ETL
  2. Generating dashboards and visualizations
  3. Managing security, governance, high availability, and disaster recovery
  4. Data discovery, annotation, and exploration
  5. Machine learning (ML) modeling, tracking, and model serving
  6. Generative AI solutions
21
Q

Can zure Databricks be used for Gen AI?

A

Yes

22
Q

Can zure Databricks be used for ML?

A

Yes

23
Q

Can Azure Databricks handle near-real-time data ingestion into Databricks Lakehouse?

A

Yes, Azure Databricks can ingest data into Azure Databricks Lakehouse.

24
Q

What is Azure Databricks Lakehouse?

A

Combines the best of a data lake and datawherehouse.

25
Q
A