Data Warehous, Big Data And KMS - Lecture 3-3 Flashcards
Data warehouse
A repository of historical data that are organized by dimensions to support decision makers in the organization.
◦Organized toward data analytics
Data mart
◦A scaled down version of a DW that is designed for the end user needs in an individual department.
Characteristics of data warehouse
- Subject oriented
- customer
- location
- product - Integration
Data warehouse
- POS terminal
- ERP
- Website
- Clickstream - Time variant
- days, weeks, months, quarters - Non-vVolitile
- can’t edit or change it
Data warehouse framework
Sources
- can be in all different formats
Integration
- combining all different formats
E - extract
T - transportation (into uniform format)
L - load
Warehouse
- data warehouse
Sources, table, summary data
This creates the Data Mart
Ex/ profit loss -> financial analysis
Characteristics of BIG DATA
- Volume -> quantity
- Velocity -> rate/speed
- Variety -> types (video, text, audio, image)
- structured ( TPS, ERP, is)
- unstructured (textual - social media)
- semi - structural
Technical Challenges
- Volume -> storage -> media/devices -> optimize of hardware
- Velocity: rate <-> process -> techniques, architecture
NO SQL, must use HADOOP, mapreduce to deal with big data
- Data quality
(Errors, missing info, noises spell check)
-> machine learning -> decision making
Why is there no SQL for big data
Not only SQL; Processing unstructured data
Big data
data so large and complex it cannot be managed by traditional systems
HADOOP
a framework for storing & processing Big Data in a distributed environment
DATA QUALITY
Data could be “dirty,” i.e. inaccurate, incomplete, incorrect, duplicate or erroneous (e.g.
incorrect spelling)
Management challenges
- Consumption
- context recommendations
- content creation
- personalized profile
- content classification
(Production) - Team
- hybrid skill set - Privacy and control -> GDPR
General data protection regulation
Knowledge management systems
Knowledge management (KM): a process that helps organizations manipulate important knowledge that is part of the organization’s memory
Data
- organized
Information (comm213)
- processed
Knowledge
- explicit
Policies, vision, standard operating procedure
- knowledge
Tacit knowledge -> skills experience
Data
are the raw bits and pieces of information with no context.
- Data can be quantitative or qualitative
Information
data in the context. For example, “15, 23, 14, and 85′′ are the numbers of students that had registered for upcoming classes
Knowledge
Once we have put our data into context, aggregated and analyzed it, we can use it to make decisions for our organization. This consumption of information produces knowledge