Core Data Concepts Flashcards
Which of the following technologies is not an example of a real-time message ingestion engine?
- Azure IoT Hub
- Azure Event Hubs
- Azure SQL Database
- Apache Kafka
- Azure IoT Hub,
Azure Event Hubs, and Apache Kafka are message brokers that can be used to ingest millions of events per second from one or more message producers. They can then queue messages before sending them to either a cold data store such as Azure Data Lake Store Gen2 or a stream processing engine such as Azure Stream Analytics. Azure SQL Database is a relational database that is used to store structured data.
Is the underlined portion of the following statement true, or does it need to be replaced with one of the other fragments that appear below?
DML statements include INSERT, UPDATE, and DELETE commands.
- GRANT, DENY, and REVOKE.
- SELECT, INSERT, UPDATE, and DELETE.
- BEGIN TRANSACTION, COMMIT TRANSACTION, and ROLLBACK TRANSACTION.
- No change is needed.
- Data manipulation language commands are used to manipulate data that is stored in a relational database. These commands include SELECT, INSERT, UPDATE, and DELETE.
You are the data architect for a game company and are designing the database tier for a new game. The game will be released globally and is expected to be well received with potentially millions of people concurrently playing online. Gamers are expecting the game to be able to integrate with social media platforms so that they can stream their sessions and in-game scores in real time. Which of the following database platforms is the most appropriate for this scenario?
1. Azure Cosmos DB supports millisecond reads and writes to avoid lags during gameplay and can easily integrate with social features.
2. Azure SQL Database is necessary because this workload is transactional by nature.
3. Azure Synapse Analytics dedicated SQL pool is necessary to analyze millions of user data at the same time.
4. Azure Cache to support in-memory storage of each player and their related gamer metadata.
- Azure Cosmos DB not only supports millisecond reads and writes to avoid lags, but its flexible schema makes for an easy platform to add a player’s membership information if they are a part of any gaming or social media communities.
You are developing a real-time streaming solution that processes data streamed from different brands of IoT devices. The solution must be able to retrieve metadata about each device to determine the unit of measurement each device uses. Which of the following options would serve as a valid solution for this use case?
1. Process the IoT data on demand and store it in micro-batches in Azure Blob Storage with the static reference data. Azure Databricks Structured Streaming can process both datasets from Azure Blob Storage to retrieve the required information.
2. Process the IoT data live and use static data stored in Azure Blob Storage to provide the necessary metadata for the solution. Azure Stream Analytics supports Azure Blob Storage as the storage layer for reference data.
3. This cannot be done in real time.
4. Either A or B will work.
- Azure Databricks Structured Streaming and Azure Stream Analytics can be used to create live and on-demand stream processing solutions. Both technologies can use data stored in Azure Blob Storage as reference data.
Which of the following is an example of a nonrelational data store?
1. Azure Blob Storage
2. Azure Cosmos DB
3. MongoDB
4. All of the above
- All of these options are nonrelational data stores. While Azure Blob Storage is not a database, it is still a nonrelational data store because of its ability to store nonrelational data such as binary, JSON, and Parquet files.
You are a data architect at a manufacturing company. You were recently given a project to design a solution that will make it easier for your company’s implementation of Azure Cognitive Search to analyze relationships of employees and departments. Which of the following is the most efficient solution for this project?
1. Use the Azure Cosmos DB Gremlin API to store the entities and relationships for fast query results.
2. Store the departments and employees as values in an Azure SQL Database relational model.
3. Store the data as Parquet files in Azure Data Lake Store Gen2 and then query the relationships using Azure Databricks.
4. Denormalize the data into column families using the Azure Cosmos DB Cassandra API.
- Azure Cosmos DB’s Gremlin API is the best choice for storing relationships between different department entities. While this can be accomplished with a relational model, graph databases such as the Azure Cosmos DB’s Gremlin API are better options since they do not require applications to perform complex queries with several join operations.
You are designing a solution that will leverage a machine learning model to identify different endangered species that inhabit different wildlife reservations. Part of this solution will require you to train the model against images of these animals so that it knows which animal is which. What storage solution should you use to store the images?
1. Azure SQL Database’s FILESTREAM feature
2. Azure Blob Storage
3. Azure Data Lake Storage Gen2
4. Azure Cosmos DB Gremlin API
- Azure Blob Storage is optimized for storing massive amounts of binary data such as images and can be accessed by several machine learning development platforms.
You are the administrator of a data warehouse that is hosted in an Azure Synapse Analytics dedicated SQL pool instance. You choose to transform and load data using ELT to eliminate the number of hops data must go through to get from your data lake environment to the data warehouse. Which of the following technologies provides the most efficient way to load data into the Azure Synapse Analytics dedicated SQL pool instance through ELT?
1. Azure Databricks
2. Azure Stream Analytics
3. COPY statement
4. Azure Data Factory
- The COPY statement provides the most flexibility for high-throughput data loading from external storage accounts into Azure Synapse Analytics.
Is the underlined portion of the following statement true, or does it need to be replaced with one of the other fragments that appear below?
Azure SQL Database is an example of an MPP system.
- Azure Data Factory.
- Azure Synapse Analytics dedicated SQL pool.
- Azure Batch.
- No change is needed.
- Azure Synapse Analytics dedicated SQL pool leverages scale-out architecture by distributing data and data processing to multiple nodes.
You are a data engineer for a retail company that sells athletic wear. Company decision makers rely on updated sales information to make decisions based on buying trends. New data must be processed every night so that reports have the most recent sales information by the time decision makers examine the reports. What type of processing does this describe?
1. Transactional processing
2. Stream processing
3. Scheduled processing
4. Batch processing
- Batch processing is the practice of transforming groups of data at scheduled periods of time.
As the data architect for your retail firm, you have been asked to design a solution that will process large amounts of customer and transaction data every night and store it in your Azure Synapse dedicated SQL pool data warehouse for analysis. There are multiple sources of data that must be processed to ensure that analysts are able to make the most appropriate business decisions based on these datasets. The solution must also be easy to maintain and have the minimal operational overhead. Which of the following is the most appropriate choice for this solution?
1. Create Azure Data Factory mapping data flows to process each entity and add them to a control flow in ADF to be processed in the correct order every night.
2. Develop the data flows in Azure Databricks and schedule them through an Azure Databricks job to run every night.
3. Create SSIS jobs in an Azure VM to process each entity and add them to a control flow in SSIS to be processed in the correct order every night.
4. Create workflows in Azure Logic Apps to process each entity every night.
- Azure Data Factory mapping data flows is a tool that allows data engineers to build data processing pipelines with a graphical user interface. Because there isn’t any code involved, this solution is the easiest to maintain and has the least amount of operational overhead.
Is the underlined portion of the following statement true, or does it need to be replaced with one of the other fragments that appear below?
Descriptive analytics answer questions about why things happened.
- Predictive.
- Cognitive.
- Diagnostic.
- No change is needed.
- Diagnostic analytics answer questions about why things happened, and descriptive analytics answer questions about what has happened.
Is the underlined portion of the following statement true, or does it need to be replaced with one of the other fragments that appear below?
Matrices can be used to display totals and subtotals for different groups of categorical data.
- Tables.
- Bar charts.
- Scatter plots.
- No change is needed.
- Matrices are useful infographics for clearly displaying numerical totals and subtotals over different groups of categories.
You are responsible for designing a report platform that will provide your leadership team with the information necessary to build the company’s long-term and short-term strategy. Analysts must be able to build interactive visualizations with the least amount of complexity to provide executives recommendations based on business performance and customer trends. Analysts will also need to be able to create views of the most critical pieces of information for executives to consume. These views are the only pieces of the platform that executives need to have access to. Which of the following is the most appropriate choice given the requirements?
1. Give analysts the ability to create and interact with reports in Power BI while also having them create dashboards for executives. Executives will only need access to Power BI dashboards.
2. Give analysts and executives the ability to create and interact with reports in Power BI. This will give executives the ability to build dashboards for time-sensitive decision making.
3. Recommend that analysts build infographics in a Jupyter Notebook with Python or R as this is the only way to build the dashboards the executives require.
4. Give analysts the ability to build static reports with SSRS and pin the most important SSRS visualizations to a Power BI dashboard.
- Analysts will need to be able to create and interact with reports as well as be able to pin the most relevant visualizations from those reports to dashboards for executives.
You are a report designer for a retail company that relies on online sales. Your boss has requested that you add a visualization to the executive performance dashboard that will show sales patterns over the last three years. Which of the following is the most appropriate options?
- Column chart
- Line chart
- Scatter plot
- Matrix
- Line charts are useful for displaying how data has changed over time.