Azure Databricks Flashcards
What is Azure Spark?
It is an open-source, streaming distributed computing system for big data and analytics. It can process big data across multiple nodes using Python, Scala, and R.
What is the difference between Hadoop and Apache Spark?
Hadoop is a batch system; data is batched processed, SPark is stream processes, and the node keeps it in memory.
Is Apache Spark in memory or on disk?
In-memory
Can you stream data into Apache Spark, analyze it, and then store and then reanalyze the data using Spark?
Yes, you can store the analyzed streamed data in different storage systems, such as HIVE, HDFS, Cloud DStorage, etc. You can then query it using Spark and explore it further.
Where do Spark results stored?
HDFS, distributed FS, Cloud Storage, SQL, No SQL
What are the layers of the Apache Spark architecture?
- Libs (Spark SQL, ML Lib, Spark R)
- Spark Core
- Resource Management
- Datastore
In Apache Sparc, what are the three workspaces?
- Spark SQL
- ML Lib
- Spark R
What is Azure Databricks?
It is an Apache Spark-based big data and machine learning platform.
We are primarily an Azure orgnization but need a big data solution to run on AWS; what are my options?
Azure Databricks can be run on AWS.
For Azure Databricks, what are the three environments supported?
- SQL (Run SQL queries on a data lake, create visualizations and dashboards)
- Data science and engineering (Collaborative workspace for working on big data pipeline and analytics)
- Machine Learning (End to end ML, train, serving)
For Azure Databricks, what are the pricing models?
- Standard (All common Databricks capabilities, fully hosted)
- Premium (Aditional management, RBAC, monitoring and logs, AD Passthrough)
Is Azure Databricks fully hosted?
Yes
Is Azure Databricks fully managed?
Yes
I wnat to use Azure Databricks, and what is my use to log in using their SD creds? What options do I have?
Use the premium pricing tier; this has an AD passthrough.
I require passthrough authentication for Databricks to use Azure Entra ID, can i use Standard tier?
No, you need premium tier to support passthrough.