Week 4 Flashcards
What is the distinction between data, information and knowledge?
Data: Raw facts, building blocks for information
Information: Processed data so that it has meaning to the user
Knowledge: How to interpret information and make decisions based on it.
Data: Air pressure at a moment
Information: Time series of air pressure, see how it evolves)
Knowledge: Deciding to shut off a gauge, but also knowing which gauge.
How can you obtain knowledge?
- Training
- Experience
- Others
What is a database?
A shared integrated computer structure that stores data.
What two (simple) types of data can a database store?
- End-user data
- Meta data
What is meta data?
Data about the data. Think about timestamps, who modified it etc.
What is a database management system (DBMS)
- Collection of programs that manage multiple databases;
- Also manages the structure of these databases
- Think about Oracle
What is the main purpose of a DBMS?
To make data management more efficient and effective.
What are advantages of a DBMS?
- Better access to better structured data for end-users;
- Integrated view on operations;
- Reduced number of data inconsistencies due to normalization;
- Quick answers with ad-hoc queries
What are the disadvantages of a DBMS?
- Everything needs to be integrated;
- Might not be efficient with big data
What are two types of data bases?
- Transactional databases (operational)
2. Data Warehouses
What is a transactional database
- A database containing meaningful business events.
- Used to support daily business operations
- Think of SAP
What is a datawarehouse?
A database that stores data to generate information to make tactical and strategic decisions
What are the key characteristics of a data warehouse?
- Subject oriented
- Large
- Historic data
- De-normalized data
- Batch Updates
- Complex queries
What are the key characteristics of a transactional warehouse?
- Transaction oriented
- Small
- Current data
- Normalized data
- Continious updates
- Simple to complex queries
What are the main data abstraction levels?
According to the lecture there are three, only 2 discussed.
- Physical model
- Conceptual model
What is the physical model data abstraction layer?
The data as it is, as it was collected and uploaded to the database. There is no meaning to the data and it is a set of tables. This layer can be modified by the information manager without end-users knowing
What is the conceptual model data abstraction layer?
The layer if you are interested in the meaning of the data. You can get different interface screens as a end-user and you can only call up the information you are interested in. You do not see or care about how the database is structured or stored
Why is database design important?
- It defines the expected use of the database
- You can avoid redundant data
- Poor design leads to errors which leads to poor decision making
What is a relational database?
The standard in terms of database design. It is build up in one or multiple tables where there is a focus on the relations between entities to avoid data renduncies
What are the key components in a relational database?
Entity: A category or object (say: student)
Attribute: A characteristic that descrives an entity in a table (say: name, class, age)
Key attribute: Identifies by which attribute different entities are identified (say: student number)
What kind of keys are there in a relational database?
Candidate key: Possible attributes that can become primary key
Primary key: Primary attribute by which an entity is identified
Foreign key: An attribute in table B that is the primary key in table A –> way to link tables
What are the advantages of data integration
- It is always best (golden rule)
- Reduces maintenance and overhead cost compared to having many seperate databases
What is a disadvvantage of data integration
- You take away data ownership. That can cause someone not to want to cooperate with the integration
What is a problem with IoT and data integration
IoT gathers too much data to integrate it all in a central database
What is data normalization??
Methodology for organizing attributes into tables to eliminate data redundancy among non-key attributes
What is the result of data normalization?
Properly structured relational database
What is your input for data normalization?
- All attributes that need to be incorporated
- A list of all functional dependencies between attributes
What is a functional dependency?
If attribute A has a specific value, that means that attribute B has a dependent value (zipcode –> city)
What is a determinant?
A unique value that determines the value of another value. –> employee number -> employee name
What are the steps (just name) of the data normalization process
- First nominal form
- Second nominal form
- Third nominal form
What is the first nominal form in data normalization?
Starting point, non-organized data. Could contain multi-value attributes
What is the second nominal form in data normalization?
- There are no partial functional dependencies. If there are you need to split it up in more tables.
- Every non-key element must fully depend on a key attribute.
- Multiple tables
What is the third nominal form in data normalization?
- End stage
- No more transitive dependencies. That means that a nonkey attribute may not depend on other non-key attributes
- No data rendundancy
What is a drawback of data normalization
- Could be less efficient or fast as everything is split up. BUT, nowadays they are pretty fast.
What is meant by data normalization is progressive?
The 2nd form adheres the rules of the first form, but builds on it.
What is data mining?
Process of looking for patterns and relationships in arge data sets
What is business intelligence?
Analyzing collected data hoping to obtain competitive advantage