Lesson 1 Flashcards

1
Q

What is data mining

A

Is the process of discovering patterns , relationships and useful information from large data sets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Importance of data mining

A

.extracting valuable insights
.improving decision making
.enhance customer experience
.fraud detection and risk management

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Steps in knowledge discovery

A

1.data preparation(data cleaning , integration, transformation and data selection)
2.data mining (intelligence methods are applied to extract patterns)
3.pattern evaluation
4.knowledge presentation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Pyramid method of data mining

A

Data
Information
Knowledge
Wisdom

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is data ware house

A

Is a centralised repository designed to store large amounts of structured data from multiple sources for analysis and reporting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Purpose of data warehouse

A

Data integration (combines different data In one location)
Data consistency and quality
Historical data storage
Supports business intelligence BI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Difference btn operational database and data warehouse

A

1Support day to day business op while warehouse is for analysis and reporting
2.current real time transaction data while historical and aggregate data
3. Highly normalised while denomarlized for faster query
4.CRUD operations while Olap
5.optimized for quick inserts and updates while optimised for complex queries and summaries

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Components of data warehouse architecture

A

1.data sources
2.ETL (extract, transform and load )layer
3.dat storage layer
4.metadata and management layer
5.data access layer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Types of data

A

1.structured data. Organised in predefined format in rows and columns.
2. Semistructure data does not follow a strict schema
3. Unstructured data that is not in predefined format

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Examples of types of data

A
  1. Structured. Phone no, address
    2.semistructured. xml webpages
    3.unstructured. images , text doc, social videos
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Data types and attributes

A

1.nominal.(categories without meaningful order.names of things or symbols)
2.numeric.(quantitative integers or real values)
3.binary.(nominal attribute with only two categories. 0 or 1)
4.ordinal (categories with meaningful ordered or ranking)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a cluster

A

Is a collection of data objects such that the objects within the cluster are similar to one another and dismilar to the objects in other cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Methods for handling missing values

A

1.ignore the tuple
2.fill in the missing value manually
3.use a goal constant to fill the missing value such as unknown
4.use central tendency such mean
5.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a noise

A

Is a random error or variance in a measured variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a data cleaning

A

Is the process of removing the noise data, filling the missing values and identifying the outliers in the data

17
Q

How to fill the missing dat?

A

Ignore the data
Fill them manually
Using central tendency
Use a constant value
Use mean or median

19
Q

What is pca

A

Is the linear method that transforms original data onto the smaller space