w12d2 - Olap Data mining Flashcards
Main characteristics of an Olap application
- Fast (Deliver to the user in about 5 seconds)
- Analysis (copes with business logic, and relevant statistical analysis)
- shared (system implements all security requirements for confidentiality)
- Multidimentional (system must provide multidimentional views of the data, including full support for hierarchies an multiple hierarchies)
- information (all needed data and derived information is available)
OLAP is used to ________
of _______
and of _______
produce reports
what is (especially trends)
what might be (by extrapolating and forecasting)
normalization and other database design techniques focus on _______
Whereas reports usually _______
designing single records
combine multiple records
A traditional report takes ______
that describes data in _______
The third dimension can be described by
a grid structure
up to 3 dimensions
The contents of cells at different {x,y} coordinates
Dimensions provide \_\_\_\_\_\_\_\_\_ come from \_\_\_\_\_\_\_\_\_\_\_\_\_ may or may not have \_\_\_\_\_\_\_\_\_ may have \_\_\_\_\_\_\_\_\_ may share \_\_\_\_\_\_\_\_
Different ways of looking at a set of data Different attributes of data Particular ordering Their own subdimensions Data with other dimensions
Temporal dimension includes
Events occur at
Activities occur over
activities and events
- a specific time only
- a range of times (Starting and ending event)
Temporal dimension
Ordered linearly in terms of time from the start of an organization or activity up to the present and beyond to the future
Temporal dimension includes a wide range of
granularities (often different records)
Customer dimension often composed of
a number of discrete customers without a required ordering principle
Customer dimension can be ordered based on
name, number, other dimensions
Use of coding within customer-numbers may also provide additional dimensions such as
customer type dimension
vendor dimension
location dimension is actually _________
Location may be limited to ________
Location is usually ordered relative to _____
multidimentional
a grouping of addresses based on some characteristic
other locations
financial dimension generally contains ______
And is ordered _______
Measures for financial dimension
numeric information
linearly
exact $ values, being within some range, some # of sales slips
Sub-dimensions
involve choices between different types of measures
Different attributes are from different sub dimensions if
Neither attribute is an instance of the other attribute
Granularities
Involve choices between different units for a given type of measure within a dimension
Different attributes are of different granularities if ________
While normalization avoids ____________
It may not deal ______________
- one attribute can be converted with or without loss of some exactness into the other attribute
- multiple granularities of numeric values
- other types of granularites
Olaps start with data in a ________
And progressivley reduces it by _______
data warehouse
slicing(removing whole columns)
Dicing(removing records based on attribute values or ranges of attribute values)
Consolidating (combining data by using finer granularities)
Drilling-down (examining more detailed granularities)
Processing (adding or replacing attributes based on the results of performing various computations on the data)
Olap should allow users to easily work with the data by
Extracting copies of the data warehouse
Identifying dimensions in the data
selecting and working with dimension, subdimensions, and granularities,
Slicing, dicing, consolidating, drilling-down and processing data
Save the results form multiple stages for further processing
producing reports
Data mining goes beyond \_\_\_\_\_\_\_\_ To \_\_\_\_\_\_\_\_ It operates on \_\_\_\_\_\_\_\_\_\_ It looks for \_\_\_\_\_\_\_\_\_\_ And suggests
Answering user questions
Identifying questions that users should consider
Data warehouses with good metadata
Trends and correlations across dimensions,
Models and visualizations that can be explored by the user
Data mining techniques
Predictive modelling
Database segmentation
Link Analysis
Deviation Detection
Predictive modelling
Classification - identifying groups based on common properties
Value Prediction - Extrapolating trends based on historical data
Database segmentation
clustering based on multiple properties
Link analysis
establishing associations between linked reports
Deviation detection
Identifying records that deviate from the norm
To be successful with olap and data mining
We need more than just tools
We need to know some things about the data
Actual database contains
user data
data records in tables
metadata linked to individual data records in other tables
Data types and schemas including data types, constraints, relationships, permissions
User programs typically
only interact with user data
leave data schema interations to the dba and dbms
olap and data mining programs need to interact with
the actual data warehouse that contains user data
- data records in tables
- metadata linked to individual data records in other tables
Meta information that describes the attributes of the database in terms of data definitions, dimensions, granularities, transformations and other allowable types of processing