Data Visualization Flashcards
Microsoft Power BI
Microsoft Power BI is a suite of tools and services that data analysts can use to build interactive data visualizations for business users to consume.
A typical workflow for creating a data visualization solution in Power BI:
Power BI Desktop -> Power BI Service -> Web Browser/Power BI Phone App
Power BI Desktop
a Microsoft Windows application in which you can import data from a wide range of data sources, combine and organize the data from these sources in an analytics data model, and create reports that contain interactive visualizations of the data.
Power BI service
Power BI service; a cloud service in which reports can be published and interacted with by business users.
Users can consume reports, dashboards, and apps in the Power BI service through a web browser, or on mobile devices by using the Power BI phone app.
Data cube
A theoretical model for data modeling.
Numeric values (measures) are aggregated by various attributes(dimensions).
e.g., A data cube may contain measures for revenue, quantity and dimensions for products, customers, time. You can aggregate these measures across one or more dimensions.
It is called a cube because of multiple dimensions. not limited to 3 but harder to visualize the concept in higher dimensions,
Data cube
A theoretical model for data modeling.
Numeric values (measures) are aggregated by various attributes(dimensions).
e.g., A data cube may contain measures for revenue, quantity and dimensions for products, customers, time. You can aggregate these measures across one or more dimensions.
It is called a cube because of multiple dimensions. not limited to 3 but harder to visualize the concept in higher dimensions,
Dimension Tables
Dimension tables represent the entities by which you want to aggregate numeric measures – for example product or customer.
Each entity is represented by a row with a unique key value. The remaining columns represent attributes of an entity – for example, products have names and categories, and customers have addresses and cities. It’s common in most analytical models to include a Time dimension so that you can aggregate numeric measures associated with events over time.
e.g.,
Customer (Dimension Table)
Key, Name, Address, City
Product (Dimension Table)
Key, Name, Category
Fact Tables
The numeric measures that will be aggregated by the various dimensions in the model are stored in Fact tables.
Each row in a fact table represents a recorded event that has numeric measures associated with it.
Sales (Fact table)
Key, ProductKey, CustomerKey, Quantity, Revenue
Note: ProductKey and CustomerKey are foreign keys to two dimension tables (Product and Customer respectively)
Star Schema
A schema where one fact table is related to one or more dimension tables. Aggregations across all dimensions are pre-calculated to increase performance.
Snowflake Schema
A more complex schema that is based on the star schema.
The dimension tables in the star schema can also have additional related tables.
Sales as fact table, product as dimension table, category as dimensions table that is related to product table.
Attribute hierarchies
The creation of attribute hierarchies that enable you to quickly drill-up or drill-down to find aggregated values at different levels in a hierarchical dimension table.
e.g., In a Product table, you can form a hierarchy in which each category might include multiple named products.
key, name, category, subcategory.
Category and subcategory are an attribute hierarchy and aggregation across all categories can be pre-computed allowing for quick drill-down and drill-up analysis.
Analytical modeling in Microsoft Power BI
You can use Power BI to define an analytical model from tables of data, which can be imported from one or more data source. You can then use the data modeling interface on the Model tab of Power BI Desktop to define your analytical model by creating relationships between fact and dimension tables, defining hierarchies, setting data types and display formats for fields in the tables, and managing other properties of your data that help define a rich model for analysis