Data mining nd warehouse Flashcards

1
Q

Data Mining

A

Data Mining Definition: Data mining is the process of discovering patterns, correlations, and insights from large datasets using techniques from statistics, machine learning, and database systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Applications of data mining

A

Marketing:
Data mining is used to analyze customer behavior, identify patterns in purchasing habits, and segment customers based on their preferences. This helps businesses target their marketing efforts more effectively.

Healthcare:
In healthcare, data mining is used to analyze patient records, identify trends in diseases, predict patient outcomes, and improve treatment strategies.

Finance:
In finance, data mining is used for fraud detection, risk management, customer segmentation, and stock market analysis.

Retail:
Retailers use data mining to analyze sales data, forecast demand, optimize pricing strategies, and improve inventory management.

Telecommunications:
In telecommunications, data mining is used to analyze call detail records, detect network faults, and optimize network performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Data Warehouse

A

A centralized repository that stores structured, historical data from multiple sources for analytical processing and decision-making.Data warehouses are used to support decision-making processes by providing a consolidated view of data for analysis and reporting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Difference between a Data Warehouse and a Database:

A
  1. Purpose: A database is primarily used for transaction processing, storing and retrieving data in real-time to support day-to-day operations. A data warehouse, on the other hand, is used for analytical processing, storing historical data for analysis and reporting.
  2. Data Structure: Databases typically store normalized data, which is optimized for transactional efficiency and reduces data redundancy. In contrast, data warehouses often use denormalized or dimensional data models, which are optimized for query performance and analytical processing.
  3. Data Usage: Databases are used for online transaction processing (OLTP), which involves inserting, updating, and deleting small amounts of data. Data warehouses are used for online analytical processing (OLAP), which involves complex queries that analyze large volumes of data.
  4. Schema: Databases use a schema that is often normalized to reduce redundancy and improve data integrity. Data warehouses use a schema that is often denormalized or star/snowflake schema to facilitate analysis and reporting.
  5. Performance: Databases are optimized for transactional performance, ensuring fast response times for individual transactions. Data warehouses are optimized for analytical performance, enabling complex queries to be processed efficiently over large datasets.
  6. Data Freshness: Databases often contain the most up-to-date data, reflecting the current state of the organization. Data warehouses contain historical data, which may be periodically refreshed from transactional systems.

.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Similarities between DatawareHouse and DB

A

Similarities:

  1. Both databases and data warehouses store data.
  2. Both can be queried to retrieve information.
  3. Both use a structured format for storing data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Data Warehouse Models:

A

Enterprise Data Warehouse (EDW): Centralized repository for all enterprise data, integrates data from various sources.
Data Mart: Subset of data warehouse focused on specific business function or department.
Operational Data Store (ODS): Short-term storage for operational data before loading into data warehouse.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Association Rule

A

Association rule mining is a data mining technique that discovers interesting relationships, or associations, between variables in large datasets. It is often used in market basket analysis to identify patterns in consumer behavior, such as what products are frequently purchased together.

The main goal of association rule mining is to find associations between items that occur together more frequently than would be expected by chance. These associations are represented in the form of “if-then” rules, where one set of items (the antecedent) implies another set of items (the consequent).

For example, a common association rule in market basket analysis might be: {Diapers} -> {Beer}, meaning that customers who buy diapers are also likely to buy beer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly