Midterm Flashcards

1
Q

What are the major challenges to large corporations in terms of information management for supporting decision making?

A

The pyramid

  • Data Mining (Patern Discovery and Evaluation)
  • Data Exploration (OLAP)
  • Data Warehouses/Data Marts (Data selection)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the main limitations of conventional information systems, i.e. the DBMS technology, in terms of information queries?

A
  • Data is growing at a phenomenal rate (“the yotta world”)
  • Data types are more mixed and complicated
  • Data rich but information poor!
  • New strategies for Decision Supporting Systems (DSS) or BI (Business Intelligence) system

UNCOVER HIDDEN INFORMATION AND PATTERNS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Name three information process for Decision Support Systems (DSS)

A
  • On Line Transactional (information) Process (OLTP)
  • On Line Analytical (information) Process (OLAP)
  • Knowledge discovery from data (KD)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is OLTP and what does it do?

A

On Line Transactional (information) Process (OLTP)

Query is viewed as a read-only transaction

Track/record/retrieve original data records of every day business operations for answering “what, when, where” type of questions: Operational databases (Relational DB and SQL)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is OLAP and what does it do?

A

On Line Analytical (information) Process (OLAP)

Summarization, consolidation, and aggregation

Store & manipulate summaries of various groupings of original data records for answering “what happened to the business” type of questions - Analytical databases: Data warehouses and OLAP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is KD and what does it do?

A

Knowledge discovery from data

Discover/analyze hidden patterns of abstractive information (knowledge) for answering “why and what to happen next” type of questions: Data Mining (DM)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

As a common user of a DB system, such as Dal online or RBC online banking, which type of information process do you deal with & why?

A

On Line Transactional (information) Process (OLTP)

Students only need a ‘view read-only’ data from the database without any manipulation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

For the president of Dalhousie Univ. or the dean of FCS, what type of information they are interested in getting?

A

On Line Analytical (information) Process (OLAP)

The president and dean could benefit from aggregations of Dal’s data such as “total male and female students” or “total first year students”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the “data rich but information poor situation?

A

The abundance of data, coupled with the need for powerful data analysis tools

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

As a store manager of War-Mart or Superstore, what type of information you need to know all the time?

A
????
Abstractive information (Knowledge discovery)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is data?

A
  • raw measures
  • unprocessed
  • some relevancy
  • has not structure (per say)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is information?

A
  • Structured data
  • Processed data that brings meaning
  • Information can be used to answer questions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is knowledge?

A
  • Laws or rules
  • Generalized to higher levels
  • Corresponds to regularity patterns hidden in datasets
  • Explain what happened
  • Predict what’s next
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Use examples to explain the differences between terms Data, Information and Knowledge. How does each term/concept link to business information queries according to three types of information processes?

A

Data –> OLTP
Information –> OLTP
Knowledge –> KD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why can’t you ask questions and get answers from a dataset and why?

A

They are data, but not information!

You need information, that is, structured data, processed and provides meaning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why the IT industry needs to develop DM and DW considering that RDBMS/SQL are already available for storing and querying information?

A
  • Efficient for data retrieval
  • Not efficient for grouping large data sets
  • Difficult to use SQL do define complex queries
  • Analyzing data and exploring relationship are not part of the SQL vocabulary.
  • It is constrained to retrieve information from single database
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Why and in which way DW model is more advanced than RDB model in supporting business management queries?

A
  • Integrated data from one or more disparate sources
  • Current and historical summarized data
  • -> Can created reports
  • -> Complex queries
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is data mining (DM)?

A

Finding unknown, valid and actionable knowledge (patterns and regularities) from large data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are the basic knowledge discovery tasks?

A
  • Classification
  • Association
  • Clustering
  • Generalization
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are the simple rules for choosing solution tools for getting different types of business information?

A

?????? (Slide 1.3)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are the two general purposes of DM (or any scientific research)?

A

– Explanation: understanding/explaining about current behaviours

– Prediction: predicting for future outcomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How the DM technologies are categorized?

A
  • Predictive (Classification and prediction)

- Descriptive (Clustering, summarization and association rules)

23
Q

Can you name three major DM tasks & what is each task about?

A
  • Classification is learning a function that maps (classifies) a data item into one of several predefined classes.
  • Clustering is a common descriptive task where one seeks to identify a finite set of categories or clusters to describe the data.
  • Summarization involves methods for finding a compact description for a subset of data.
24
Q

What is the Empirical Cycle Model (ECM) of scientific research, describe each stage of the process?

A
  1. Observation: we start with a number of observations.
  2. Analysis: we try to find patterns in these observations.
  3. Theory: if we have found some regularities, we formulate a theory (hypothesis) explaining the data.
  4. Prediction: our theory will predict new phenomena that can be verified by new observations.
25
Q

How are ML & DM associated with the ECM?

A

ML and DM both observe data to find patterns and make predictions.

26
Q

Why it says “a discovered knowledge only has temporary value”?

A

This is said because it can’t be proved until it is backup with actual stats.

27
Q

Why a discovered knowledge needs to be corroborated by statistics?

A

To prove it - state if the results are significant or not.

28
Q

What is the main difference and relationship between Information Retrieval (IR) and DM?

A

Information retrieval and data mining bot retrieve information

  • IR gets existing data
  • DM derives new information
29
Q

What are the main differences between statistics analysis and DM?

A
  • Stats forms the hypothesis first then observes and collect data
  • DM observes and collects data, then create hypothesis
30
Q

Give three reasons why data need to be processed for DW and DM tasks.

A
  • No quality data, not quality DM/DW results:
  • Incomplete
  • Noisy
  • Inconsistent
31
Q

What are the typical tasks for DP (provide a brief description for each)?

A
  • Data cleaning: fill in missing values, correct errors, smooth noisy data.
  • Data integration: integration of multiple DBs, data cubes, or files
  • Data transformation: Normalization and aggregation.
  • Data reduction: Obtains reduced representation in volume but produces the
    same or similar analytical results
32
Q

Choose a DM/DW project as example, such as from Dec/Thesis or from the Internet, to examine & explain about its DP tasks.

A

Data cleaning
Data integration
Data transformation
Data reduction

33
Q

Why it was said that businesses are the drivers of DW and OLAP technologies?

A

Because they are data rich, but information poor. Ability to have quick ad hoc queries:

  • Analytical information
  • Aggregation information data
  • Efficient management on enterprise-wide data
  • Focus on the usage of information/analytics (reports!!)
34
Q

For decision makers of a large enterprise to see big pictures of the organization and its business, what two general approaches for integrating information from different databases, can you describe each?

A

Data cleaning, transformation, integration techniques are applied:

– Ensure accurate data with consistency in naming conventions, encoding structures, attribute measures, etc. among different data sources

– When data is moved to the warehouse, it is converted

35
Q

What are the general DW properties which differ DW model from Relational DB model, and how they fit the main objectives of DW?

A

A decision support DB that is maintained separately from the organization’s operational DB.

  • Subject-oriented (sales, supplies, customers)
  • Integrated (multiple sources, files)
  • Time variant (historical data 5+ years - DB only 3 months)
  • Non-volatile (separate from actual databases)
36
Q

What are the 3 typical DW schemas (describe each)?

A

-Star schema (fact table)
- Snowflake schema
(normalized star schema)
- Fact constellations (multiple fact tables)

37
Q

What are the four OLAP operations for a data cube?

A
  • Roll up (summarize)
  • Drill down (reverse roll-up)
  • Slice (project)
  • Dive (select)
38
Q

What is the “Muti-Dimensional Space” model for DW/OLAP technology, and how it is described by the data cube lattice (know how to draw it)?

A

This logical space is also called a hypercube (data cube). Each dimension of the cube represents an aspect of the possible business events which is divided into discrete values representing attribute domain of the dimension.

39
Q

What are “Cuboids” in a MDS model? How to calculate the total number of cuboids in a DW? How to estimate the complexity of a DW design?

A

???????

40
Q

How different a DW model is compared with a conventional DB model and why?

A

DW is subject-oriented, whereas the DB model is just data store for retrieval not for information.

41
Q

What are the main differences between OLAP and OLTP process operations for answering users’ queries?

A
  • OLTP uses SQL queries

- OLAP Operations (tools) enable users to analyze multidimensional data interactively from multiple perspectives.

42
Q

What is the difference between logic dimensional space for OLAP analysis and the physical space for 3D computer graphics?

A

As distinguished from physical dimensions, which are based on angles and limited to three, logical dimensions have no such limits.

43
Q

How to define OLAP queries? (It may be answered based on the Starnet query model.)

A

????
Ad hoc business queries

creating dimensions (e.g. day, week, month, year)

44
Q

What is the visualization metaphor for displaying OLAP result of multiple logic space (data cube), and how to best use it?

A

?????????

45
Q

Use the demo example of wealth vs. health analysis (http://www.youtube.com/watch?v=jbkSRLYSojo) to explain how DW/OLAP may be able to support DM applications.

A

Process the data for the warehouse.

Generate cube to roll-up. drill-down, slice and dice.

46
Q

Both DW and DM technologies are about deriving new information from the stored operational data, what are the differences between them (Describe at least two aspects)?

A

?????????

47
Q

Name three major general DM takes. What general property they shared and what are different from each other?

A

????????

48
Q

Why it is said that association pattern mining is a backbone technology for CRM recommender system?

A

Because you are finding relationships between users’ purchases or viewing history is critical for recommendations (Amazon)

49
Q

What is frequent pattern analysis? How to define a rule may be derived from a frequent itemset (provide two examples)?

A

?????????

50
Q

What are the usefulness measures for association rules? Define each measure.

A

???????
Support rate: The percentage of transactions in D containing both X and Y.

Confidence rate: The percentage of transactions containing X also contain Y.

51
Q

What are the two major procedures of mining association rules in the Apriori Algorithm?

A

Find frequent itemsets

Generate association rules

52
Q

What is the apriori knowledge which may help AR mining?

A

???????
Each association rule is a piece of knowledge describing a statistics sound relationship between two particular item sets.

53
Q

How to estimate a search space given n unique items?

A

??????

2^n-1

54
Q

What is the Apriori knowledge/property, and why it is significant for the algorithm?

A

????