2.10 Architecture Scenario Flashcards
So your customer needs to aggregate data from two separate systems for better reporting.
One system performs a daily export of data into a CSV every night. The data often needs cleaning as there are often sometimes some gaps. The second system runs a nightly process that normalizes and stores its data in a SQL database.
Your customer would like to be able to build reports and view them in Microsoft Power BI. What components might you use in a solution and how can you automate the process?
There are multiple ways we can achieve this.
However, data analytics that can be largely grouped into ingest, clean, train, model, and serve.
In this example, our data is coming from two sources. A normalized SQL table and a blob store with CSV files.
We can use Data Factory to pull in aggregate and store that data in a new centralized location such as Azure Data Lake. This is the ingest phase.
Next, we can use Databricks to clean and train the data.
The clean data could be output to a data lake again, but in this instance, we’ve chosen to use a SQL pool in Azure Synapse. And this can help us then model and perform further analysis on the data.
Finally, we use Power BI to create dashboards and reports to service the data in meaningful ways. Again, it’s important to reiterate we can actually achieve this task in multiple ways. And so for example, as Azure Synapse contains Spark clusters just like Databricks and a pipeline manager, we could in theory, have achieved many of these steps within the one tool.