Lesson 1 Flashcards

Question 1

Q

What is data mining

Answer

A

Is the process of discovering patterns , relationships and useful information from large data sets

Question 2

Q

Importance of data mining

Answer

A

.extracting valuable insights
.improving decision making
.enhance customer experience
.fraud detection and risk management

Question 3

Q

Steps in knowledge discovery

Answer

A

1.data preparation(data cleaning , integration, transformation and data selection)
2.data mining (intelligence methods are applied to extract patterns)
3.pattern evaluation
4.knowledge presentation

Question 4

Q

Pyramid method of data mining

Answer

A

Data
Information
Knowledge
Wisdom

Question 5

Q

What is data ware house

Answer

A

Is a centralised repository designed to store large amounts of structured data from multiple sources for analysis and reporting

Question 6

Q

Purpose of data warehouse

Answer

A

Data integration (combines different data In one location)
Data consistency and quality
Historical data storage
Supports business intelligence BI

Question 7

Q

Difference btn operational database and data warehouse

Answer

A

1Support day to day business op while warehouse is for analysis and reporting
2.current real time transaction data while historical and aggregate data
3. Highly normalised while denomarlized for faster query
4.CRUD operations while Olap
5.optimized for quick inserts and updates while optimised for complex queries and summaries

Question 8

Q

Components of data warehouse architecture

Answer

A

1.data sources
2.ETL (extract, transform and load )layer
3.dat storage layer
4.metadata and management layer
5.data access layer

Question 9

Q

Types of data

Answer

A

1.structured data. Organised in predefined format in rows and columns.
2. Semistructure data does not follow a strict schema
3. Unstructured data that is not in predefined format

Question 10

Q

Examples of types of data

Answer

A

Structured. Phone no, address
2.semistructured. xml webpages
3.unstructured. images , text doc, social videos

Question 11

Q

Data types and attributes

Answer

A

1.nominal.(categories without meaningful order.names of things or symbols)
2.numeric.(quantitative integers or real values)
3.binary.(nominal attribute with only two categories. 0 or 1)
4.ordinal (categories with meaningful ordered or ranking)

Question 12

Q

What is a cluster

Answer

A

Is a collection of data objects such that the objects within the cluster are similar to one another and dismilar to the objects in other cluster

Question 13

Q

Question 14

Q

Methods for handling missing values

Answer

A

1.ignore the tuple
2.fill in the missing value manually
3.use a goal constant to fill the missing value such as unknown
4.use central tendency such mean
5.

Question 15

Q

What is a noise

Answer

A

Is a random error or variance in a measured variable

Question 16

Q

What is a data cleaning

Answer

A

Is the process of removing the noise data, filling the missing values and identifying the outliers in the data

Question 17

Q

How to fill the missing dat?

Answer

A

Ignore the data
Fill them manually
Using central tendency
Use a constant value
Use mean or median

Question 18

Q

Question 19

Q

What is pca

Answer

A

Is the linear method that transforms original data onto the smaller space

Question 20

Q