Labs Flashcards
Which 4 activities fall into data engineering (data pre-processing)?
- data integration
- data transformation
- data reduction
- data cleaning
What is ETL?
Extraction, Transformation & Load
What are the definitions of Extraction, Transformation & Load (ETL)?
- Extraction: selecting data from one or more sources and reading the selected data
- Transformation: converting data from their original form to whatever form the DW needs. This step often also includes cleansing of the data to remove as many errors as possible.
- Loading: putting the converted (transformed) data into the DW
Thinking from a Power BI view, what are the different activities in the data integration process (5 in total)?
- Data Loading – read and convert from data sources
- Data Transformations – join, aggregate, filter, convert data
- Data de-duplication – finds multiple records referring to the same entity, merges them
- Data Profiling – builds tables, histograms, etc. to summarize data
- Data Quality – test against master values, known business rules, constraints, etc.
What does th Power Query Editor do?
- you can connect to one or many data sources,
- shape and transform the data to meet your needs,
- then load that model into Power BI Desktop
What is appending in Power BI?
When you have additional rows of data that you’d like to add to an existing query, you append the query.
What is merging in Power BI?
When you have one or more columns that you’d like to add to another query, you merge the queries
What is M language used for?
M is the query language that can be viewed in the Power Query Editor.
Power Query (or M) is an ETL (Extraction, Transformation, Load) tool, so it gets data and transforms it into tables that can be loaded and structured in a data model.
e.g. after filtering, if you go to the Advanced Editor, you can find the underlaying M code that was written by Power BI for filtering the values.
What is DAX language used for?
DAX is the language you use when you create transformations in the Power BI Desktop main screens rather than via the query editor.
DAX creates Measures & Calculated columns.
What is the difference between calculated columns and measures?
A calculated columns works at row level.
A Measure value is only activated given a certain context.
What are some of Power BI’s built-in time intelligence functions?
- SAMEPERIODLASTYEAR
- TOTALYTD
- PARALLELPERIOD
Which three views do you have in Power BI Desktop?
- Report view
- Data View
- Relationship View
What is a dimensional model used for?
A Dimensional model is designed to read, summarize, analyze numeric information like values, balances, counts, weights, etc. in a data warehouse.
What are relation models used for?
relation models are optimized for addition, updating and deletion of data in a real-time Online Transaction System (Databases).
What is a dimensional model?
- A Dimensional Model is a data structure technique optimized for Data warehousing tools. It is the architecture behind a DW.
- The concept of Dimensional Modelling was developed by Ralph Kimball and is comprised of “fact” and “dimension” tables.