Recent Developments in Data Warehousing - Watson Flashcards

1
Q

Recent Developments in Data Warehousing - Watson

INTRODUCTION and aim of the paper?

A
  • data warehouses are created to provide dedicated source of data to support decision-making applications  data repositories play important role in understanding customer behaviour, connecting trading partners along supply chain, supporting performance management systems etc
  • aim of paper: tutorial on data warehousing that introduces newest concepts in field
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Recent Developments in Data Warehousing - Watson

What is a Data warehouse?

A

collection of data created to support decision making; provides data infrastructure & thus eliminates failures that are due to lack of quality  ‘single version of truth’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Recent Developments in Data Warehousing - Watson

What are 4 DATA WAREHOUSE CHARACTERISTICS ?

A

1) Subject oriented: organised around specific subjects (sales, customer, products)
2) Integrated: data collected from multiple systems (internal/external) and integrated around subjected (eg in the end data around one customer identifier)
3) Time variant: maintains historical data
4) Non-volatile: users cannot change or update data; updates happen through IT controlled load processes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Recent Developments in Data Warehousing - Watson

What is Data warehousing?

A

Data warehousing: process that encompasses range of activities in data warehouse

extract data from source system -> transform the data -> load data to data stores -> User and applications access the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Recent Developments in Data Warehousing - Watson

What is a Data mart?

A

Data mart: similar to data warehouse but stores data only for limited number of subject areas (smaller in scope than data warehouses)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Recent Developments in Data Warehousing - Watson

What is a Independent data mart?

A

built directly from source systems; often ‘point solution’ that solves immediate problem but creates new ones in long run when firm tries to implement enterprise-wide data infrastructure (eg data warehouse)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Recent Developments in Data Warehousing - Watson

What is a Dependent data mart?

A

created with data drawn from data warehouse; provides copy of data extracted from data warehouse to give users customised view of specific data; are preferred over independent ones as they come from organisation-wide source - ‘single version of truth’ is maintained

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Recent Developments in Data Warehousing - Watson

What is an Operational data store (ODS): ?

A

consolidates data from multiple source systems & provides near real-time, integrated view of volatile, current (max 60 day old) data; aim to provide integrated data for organisational purposes; often used to avoid ERP implementation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Recent Developments in Data Warehousing - Watson

What is are Oper marts?

A

created when current operational data needs to be analysed multi-dimensionally; data is only subset of data in ODS; when analysis is completed, oper mart is deleted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Recent Developments in Data Warehousing - Watson

What is Architecture for data warehousing: ?

A

includes the component parts and the relationships among the parts  Data sources, ETL software, data stores, data analysis tools & applications, users

-> * data are extracted from the various sources and then fed into the staging arear where it is transformed (cleaned)  then loaded into data warehouse where it is stored and later accessed by various users with tools and applications

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Recent Developments in Data Warehousing - Watson

What is the Enterprise data warehouse approach ?

A

Enterprise data warehouse approach: by Bill Inmon; top-down approach of developing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Recent Developments in Data Warehousing - Watson

What is the Data mart strategy: ?

A

by Ralph Kimball; bottom-up approach of developing
 both approaches provide benefits but involve limitations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Recent Developments in Data Warehousing - Watson

What is the Data mart strategy: ?

A

‘start small, think big approach’  begins with specific business need for data and grows after time
* initial data mart contains data for only limited number of subject areas, draws from limited sources
* recognising that data mart will grow: needs to be planned and exercised beforehand as mart will grow if it is successful  if not: difficult to integrate data logically & physically
* advantages: due to limited scope initially developed quickly, at low cost, with less financial risk
* disadvantages: difficult to successfully grow a data mart while integrating new subject areas, data, users, applications along the way

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Recent Developments in Data Warehousing - Watson

What is the Enterprise data warehouse approach ?

A

Data warehouse approach: traditional approach t warehousing; does not preclude creation of data marts but mart are created after warehouse is built & pull data from warehouse, not from source systems (dependent data)
* advantages: faster system response time as marts + simpler data view as marts are dependent; results in integrated data warehouse containing many subject areas for multiple users
* disadvantages: risk that it is never completed or fails to meet end users needs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Recent Developments in Data Warehousing - Watson

EXTRACTION, TRANSFORMATION & LOADING PROCESSES (ETL) meaning?

A

ETL: data extraction, transformation, loading  takes fata from source systems, prepares it for decision-support purposes & places it in target data base; ‘plumbing’ – dirty, complex, time-consuming, expensive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Recent Developments in Data Warehousing - Watson

DATA SOURCES: applications?

A

data in applications are difficult to access for decision-support decision as accessing slows the processing of transactions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Recent Developments in Data Warehousing - Watson

DATA SOURCES: ERP systems?

A

ERP systems: important source of support data but software often stores data in complex structure which makes extracting difficult  use of ERP vendor solutions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Recent Developments in Data Warehousing - Watson

DATA SOURCES: clickstream data?

A

gathered from customers’ visits to firm website (IP address, download time, user agent, cookie data etc); very voluminous – must be filtered and cleaned up; used to support operational processes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Recent Developments in Data Warehousing - Watson

DATA SOURCES: external data?

A

provided by third party organisations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Recent Developments in Data Warehousing - Watson

DATA EXTRACTION: Custom-write data extraction programs?

A

firm writes own ETL software when: wanting to avoid costs of purchasing, knows source systems well, understand complexities  not the trend

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Recent Developments in Data Warehousing - Watson

DATA EXTRACTION: Purchasing commercial ETL software?

A

Purchasing commercial ETL software: available from major database vendors & firms specialising in ETL software; allows firms to specify source systems, indicate tables and columns to be used, move data to specified targets & automate process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Recent Developments in Data Warehousing - Watson

DATA STAGGING: Staging area?

A

work-in-process area in which data is processes prior to being loaded into warehouse; main action – data transformation

23
Q

Recent Developments in Data Warehousing - Watson

Data transformation?

A

several ways - cleansing the data, integrating the data, other transformations

  1. Parse the data (converting data) 2. correct and enhance data 3. standardise data 4. match the records 5. consolidate matching records
24
Q

Recent Developments in Data Warehousing - Watson

Data transformation: DIRTY DATA?

A

Dirty data: result of poor data quality practice which little can be done about the data that is already there
* sources of dirty data: dummy values - inappropriate values entered; absence of data; cryptic data; multipurpose fields; contradicting data; violations of business rules; non-unique identifiers
* solutions to data cleansing: 1) rely on basic cleaning capabilities of ETL software 2) custom-write cleansing routines 3) use special-purpose data cleansing software

25
Q

Recent Developments in Data Warehousing - Watson

DATA INTEGRATION ?

A

DATA INTEGRATION
* integrating data from multiple sources & systems  then integrating around common identifier
OTHER TRANSFORMATIONS
* replacing values, calculating derived values, preparing aggregates  create more understanding

26
Q

Recent Developments in Data Warehousing - Watson

DATA LOADING

A
  • first loading provides the initial data for warehouse; second loadings can be handled differently
27
Q

Recent Developments in Data Warehousing - Watson

DATA LOADING: What is Bulk loading the warehouse?

A

Bulk loading the warehouse: all of the data (old and new) is loaded each time; simple processing logic but impractical as volume increases

28
Q

Recent Developments in Data Warehousing - Watson

DATA LOADING: What is Change data capture/refreshing?

A

Change data capture/refreshing: refreshing the warehouse with only newly generated data; more complicated
* when refreshing, must be decided how many times to load  based on business need for data & business cycle that provides the data = trend: continuous updating (trickle loading)

29
Q

Recent Developments in Data Warehousing - Watson

What is meta data?

A

Metadata: data about the data; needed by both IT personnel & users who access data
* IT personnel: data sources and targets, database, table, column names; refresh schedules etc
* data users: attribute definitions, report tools available, report distribution info, held desk info
* meta data has not received much attention yet: uncertainty about what metadata should be stored & lack of methods to share it across vendors’ products

  • developments improving ability to create metadata:
    1) attempts to create standards for metadata models
    2) use of application program interfaces APIs
30
Q

Recent Developments in Data Warehousing - Watson

What are DATA STORES ?

A

where data is loaded to: data mart, operational data store, warehouse; use either relational &/or multidimensional database technology
* operational data stores support operational applications that need current data, whenever data is aged it is passed on to regular data warehouse

31
Q

Recent Developments in Data Warehousing - Watson

DATA STORES: Relational technology?

A

Relational technology: stores data in tables with attributes & rows; norm for data storage

32
Q

Recent Developments in Data Warehousing - Watson

DATA STORES: Multidimensional database technology?

A

Multidimensional database technology: stores data organised around dimensionality of data (cube); eg: how many television sold stored based on product, retail outlet & tune
* relational databases can provide dimensional view by using star schema data model

33
Q

Recent Developments in Data Warehousing - Watson

DATA STORES: Star schema data model?

A

Star schema data model: resembles star & is made up of fact table in center of star and dimension tables as points of star; provides a non-normalised data model

34
Q

Recent Developments in Data Warehousing - Watson

DATA STORES: Fact table?

A

Fact table: stores numerical data that were recorded for each transaction; eg: retail – number units sold; banking – average monthly balance etc

35
Q

Recent Developments in Data Warehousing - Watson

DATA STORES: Dimension tables?

A

Dimension tables: provide dimensions through which data in fact table can be analysed (here: payer, patient, physician, service, time periods)

36
Q

Recent Developments in Data Warehousing - Watson

DATA STORES: Conformed dimensions?

A

Conformed dimensions: whenever a firm that has multiple star schemas uses many of same dimensions and measures across star schemas  allows meaningful queries to be made across different subject areas

37
Q

Recent Developments in Data Warehousing - Watson

DATA STORES: SQL queries?

A

run against data stored in star schema  fact and dimension tales are joined = answer

38
Q

Recent Developments in Data Warehousing - Watson

DATA STORES: Online analystical processing OLAP?

A

Online analystical processing OLAP: associated with multidimensional analysis but may forms (ROLAP – relational database, MOLAP – multidimensional database, DOLAP – transfer data cubes to desktop)

39
Q

Recent Developments in Data Warehousing - Watson

DATA ACCESS TOOLS AND APPLICATIONS FOR END USERS: Users of data differ -> What the result?

A

result: organisations implement multiple data access tools & applications
* when selecting tool there is flexibility/ease of use trade-off

40
Q

Recent Developments in Data Warehousing - Watson

DATA ACCESS TOOLS AND APPLICATIONS FOR END USERS: Business Intelligence IB?

A

Business Intelligence IB: term commonly used to describe set of tools and applications used for decision support processes

41
Q

Recent Developments in Data Warehousing - Watson

MOST IMPORTANT DATA ACCESS TOOLS & APPLICATIONS: SQL queries?

A

SQL queries: require user to understand warehouse data model & know how to write SQL code

42
Q

Recent Developments in Data Warehousing - Watson

MOST IMPORTANT DATA ACCESS TOOLS & APPLICATIONS: Management Reporting Systems?

A

Management Reporting Systems: data in warehouse (single truth) helps managing reporting systems as it provides single and consistent, integrated source of data to draw upon

43
Q

Recent Developments in Data Warehousing - Watson

MOST IMPORTANT DATA ACCESS TOOLS & APPLICATIONS: Managed query environments?

A

Managed query environments: users are presented with interface that allows them to specify what needs to be done with data; once analysis is performed, it can be saved and made available as report/icon on screen; available with thin (browser which is mostly chosen) & fat clients

44
Q

Recent Developments in Data Warehousing - Watson

MOST IMPORTANT DATA ACCESS TOOLS & APPLICATIONS: DSS/EIS?

A

DSS/EIS: decision support systems (DSS) provide info to support decision making tasks & executive information systems (EIS) provide targeted info at broad information needs of senior management

45
Q

Recent Developments in Data Warehousing - Watson

MOST IMPORTANT DATA ACCESS TOOLS & APPLICATIONS: Enterprise intelligence portals?

A

Enterprise intelligence portals: integration of structured and unstructured data (videos, email etc) by seamlessly integrating data from variety of sources; portals are customisable to meet each user’s information needs

46
Q

Recent Developments in Data Warehousing - Watson

MOST IMPORTANT DATA ACCESS TOOLS & APPLICATIONS: Data mining?

A

Data mining: broad interpretation – any kind of data analysis; narrow & preferred implication – automated search of data to discover hidden relationships; involves use of specialised algorithms; needs teams of 3 analysts combining skills of 1) knowledge of data algorithms & software 2) ability to work with large amounts of data 3) domain knowledge

47
Q

Recent Developments in Data Warehousing - Watson

MOST IMPORTANT DATA ACCESS TOOLS & APPLICATIONS: Customer relationship management?

A

Customer relationship management: purpose of attracting, tenaing, enhancing customers by maintaining relationships to increase profits; integration of data into warehouse from every application/touchpoint with customer; operational CRM – applications use data to support operational activities; analytic CRM – data analysed by using data mining methods #

48
Q

Recent Developments in Data Warehousing - Watson

DIFFERENT TYPES OF INFORMATION USERS – 3 categories by Inmon: Information producers?

A

Information producers: like analysts, create info for own and for others use

49
Q

Recent Developments in Data Warehousing - Watson

DIFFERENT TYPES OF INFORMATION USERS – 3 categories by Inmon: Information consumers?

A

Information consumers: like executives, consume info that is prepared for them

50
Q

Recent Developments in Data Warehousing - Watson

DIFFERENT TYPES OF INFORMATION USERS – 3 categories by Inmon: Tourists?

A

1) Tourists: do not know what they want; look at overview of things with little deep data analysis; design of database is not issue as they do not do much with it; need metadata

51
Q

Recent Developments in Data Warehousing - Watson

DIFFERENT TYPES OF INFORMATION USERS – 3 categories by Inmon: Explorers?

A

2) Explorers: have idea what they are looking for but do not know how to find ‘priceless gem’ in data; biggest problem is vast amount of data; require metadata to get started

52
Q

Recent Developments in Data Warehousing - Watson

DIFFERENT TYPES OF INFORMATION USERS – 3 categories by Inmon: Farmers?

A

3) Farmers: know what they want, where and how to get it; find ‘flakes of gold’ in data rather than ‘gem’; use the same data regularly and thus do not rely on metdata

53
Q

Recent Developments in Data Warehousing - Watson

DIFFERENT TYPES OF INFORMATION USERS – 3 categories by Inmon: Conclusion?

A
  • future for data warehousing is bright  necessity in many companies
    REASONS
    1) organisations become more information intensive = need for reliable decision support
    2) through use of operational data stores, it is used with operational applications and leads to better understanding of market segments, buying behaviour and other changes
    3) data warehousing research is on rise
    4) industry is starting to provide resources to help in teaching about data warehousing