Azure Databricks Flashcards

1
Q

What is Azure Sparc?

A

It is an open-source, streaming distributed computing system for big data and analytics. It can process big data across multiple nodes using Python, Scala, and R.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the difference between Hadoop and Apache Spark?

A

Hadoop is a batch system; data is batched processed, SPark is stream processes, and the node keeps it in memory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Is Apache Spark in memory or on disk?

A

In-memory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Can you stream data into Apache Spark, analyze it, and then store and then reanalyze the data using Spark?

A

Yes, you can store the analyzed streamed data in different storage systems, such as HIVE, HDFS, Cloud DStorage, etc. You can then query it using Spark and explore it further.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Where do Spark results stored?

A

HDFS, distributed FS, Cloud Storage, SQL, No SQL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the layers of the Apache Spark architecture?

A
  1. Libs (Spark SQL, ML Lib, Spark R)
  2. Spark Core
  3. Resource Management
  4. Datastore
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

In Apache Sparc, what are the three workspaces?

A
  1. Spark SQL
  2. ML Lib
  3. Spark R
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Azure Databricks?

A

It is an Apache Spark-based big data and machine learning platform.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

We are primarily an Azure orgnization but need a big data solution to run on AWS; what are my options?

A

Azure Databricks can be run on AWS.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

For Azure Databricks, what are the three environments supported?

A
  1. SQL (Run SQL queries on a data lake, create visualizations and dashboards)
  2. Data science and engineering (Collaborative workspace for working on big data pipeline and analytics)
  3. Machine Learning (End to end ML, train, serving)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

For Azure Databricks, what are the pricing models?

A
  1. Standard (All common Databricks capabilities, fully hosted)
  2. Premium (Aditional management, RBAC, monitoring and logs, AD Passthrough)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Is Azure Databricks fully hosted?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Is Azure Databricks fully managed?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

I wnat to use Azure Databricks, and what is my use to log in using their SD creds? What options do I have?

A

Use the premium pricing tier; this has an AD passthrough.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly