Data Analysis and Mining Flashcards
Use Cases
Prompts the use of data analysis.
Data Technology
Prompts the use of data warehouses.
Aggregates in SQL
- count()
- max()
- min()
- avg()
Complex SQL Queries
Typically contain the use of aggregates, and examines a large amount of the database.
Data warehouses
Type of database systems to support data analysis.
Typically is not updated constantly, but could be updated every day or so.
Extract transform load
Transforms data into a specific schema/pattern.
OLAP
Online analytic processing.
Refers to the process of analysing complex data stored in a data warehouse.
OLAP Query
Complicated queries that touch a lot of data, discover trends and patterns in data and typically take a large amount of time to compute.
OLTP
Online transactions processing.
Typical DBMS tasks, where queries are typically fast and touch a small portion of the database.
Unique Fact Tables
Used in OLAPs to store events and objects of interest for the analysis.
May be thought as representing a data cube, where the length, width and depth all represent different variables, like product, or dates, or store etc.
Star Schemas
One of the more common data warehouse architectures, containing unique fact tables which contain points in a data cube.
We could also have dimensional tables.
Dimensional Tables
Describe values along each axis.
Star Schema Example
Lets say we have the fact table Sales(productNo, date, store, price). We can split this further:
- productNo comes from Products(productNo, type, model)
- date comes from Days(date, day, week, month, year)
- store comes from Stores(name, city, country, phone) where “name” is being changed to store.
- price can be its own dependent attribute.
All of these attributes are called dimensions, and the dimension tables are what make up the fact table itself.
Denormalised Schemas
We save the data in multiple areas because we care more about making complex queries faster.
The main data is in one table, the fact table, and the rest of the data can be joined with fact table very quickly.
And all in all, joins are not required as much.
Slice and Dice technique
Another way to find data from a data cube.