Data-warehousing/analysis/mining Flashcards
What do queries run by big stores or hospitals tend to feature?
- they tend to be complex: use aggregates and other advanced features
- they tend to examine large parts of the data
- they tend to block large parts of the database
What types of queries should we avoid executing on DBMS that serve a lot of customers simultaneously?
- Running very complex queries that examine and block large parts of the database
What are data warehouses?
- Systems designed to support data analysis to answer very complex queries that examine large parts of the database
How are data warehouses updated?
- By getting the updated/new information from the smaller servers
What is OLAP?
(Online Analytic processing) OLAP refers to the process of analysing complex data stored in a data warehouse
what is an OLAP query?
- An SQL query that is for an OLAP process?
What is OLTP?
- Online transaction processing is used in traditional DBMS
- OLTP queries only touch small (and often different) parts of the database that probably don’t overlap so you can therefore make a lot of fast queries simultaneously
What do OLAP applications typically feature?
- A unique fact cube that represents events and objects of interest for the analysis.
Describe an example fact table with schema Sales(productNo, date, store, price)?
- if it only has 3 attributes it has 3 dimensions so we can represent it at a data cube
- so on one side of the cube the height could be productNo, the length could be date and the width could be price
- this way, each point in the cube corresponds to the sale of a product
What do we get back when we query a data cube (fact table)with a range for each edge?
- as it’s represented as a cube, we get a slice of the cube back that all satisfies the ranges in the query
- we can then further query this smaller section of information
What are star schemas?
- they have unique fact tables (contains points in the data cube)
- Dimension tables: describe values along each axis of the cube
- use star schemas to represent dimension tables
What are star schemas made up of?
- name of the table
- keys to virtual fact tables that answer queries (known as dimensions)
- a dependent attribute that there is no other information about elsewhere
- dimensions are attributes we do have other information about elsewhere
What exactly does a star schema a describe?
A database consisting of:
a fact table R(A1,…An, B1…Bm)
-What are star schemas?
- In essence A1 to An are foreign keys
- Each dimension can also have further dimensions of their own
- B1…Bm are dependent attributes
What are the Characteristics of Star Schemas?
- They are denormalised (they have duplicate data)
- gains: they don’t require many joins
- faster and easier aggregation of data
Why don’t we mind that the data is denormalised in star schemas?
We don’t mind storing data in multiple places because we care more about making these type of queries fast