Module 6 Flashcards
Data mangement, modelling and manipulation
Data
Raw facts and figures without any meaning.
Information
Data shaped into a meaningful form through context
Knowledge
Results from experience and education. The info collected by an entity and the ability to locate additional info
DNA of a Tweet
- data on hashtags used and users mentioned
- geo-data
- data on videos and pictures
- user-data
Relational Database
Database organises data into one or more tables. Tables consist of columns called attributes or fields and rows called tuples or records.
Keys
Unique Primary Key identify each row. Keys can be used to link different tables.
SQL
Standard Query Language is the defacto standard language used to communicate with database systems. We can create, delete and manipulate databases using SQL.
Categories of commands in SQL
- DDL data definition language: creating, deleting, altering databases and tables
- DML data manipulation language: inserting, deleting, changing, retreiving tuples
- DCL data control language: access management
SQLite
SQLite stores the whole database in a single file. Once you have access to the file, you have access to the database. DCL is not part of SQLite
Creating tables
CREATE TABLE Table Name (Column1 DataType (optional length), Column2 DataType (optional length),… PRIMARY KEY (KeyColumn1, KeyColumn2)
Creates a table and declares a composite primary key across multiple columns.
You can’t have spaces in table name and column names unless you put [].
Data Types
TEXT: character string
INTEGER/INT: signed integer
BLOB: binary large object, videos?photos stored in binary language
REAL: floating point numbers
NUMERIC/NUM: anything not covered by the rest, includes BOOLEAN, DATE, DATETIME
Redundancy
Redundancy=repetition should be avoided in databases since it:
- Increases the required amount of storage
- Is a source for data inconsistencies
BUT: it can improve databases since the join operator becomes uneccessary
Data Normalization
Normalization is a technique that manages the trade-off between redundancy and ease of use/speed of database. It is a standardized technique that transforms a database iteriatively accordingly to a strict set of rules.
A database satisfies a normal form, if all tables in that database satisfy the normal form.
First Normal Form 1NF
- The domain (space from wich attributes take their value from) of each attribute contains only atomic values. Atomic values: values that can’t be split into multiple logically distinct parts
- Each attribute contains only one value from the domain.
After first normalization redundancies remain
How to solve when there is multiple players with the same name?
a) create a composite key (player, date of birth)
b) create a unique player ID
Second Normal Form
- It satisfies 1NF
- In case there is a composite key there are no partial dependencies (a non-key attribute always depends on all components of the key and not just a subset.
[place depends on tournament, but not on year, it only depends on part of the key, 2NF not satisfied]
How to solve partial dependencies?
a) we can create a new column that provides an artificial key to the table such as ID (the primary key consists of a single value so 2NF is satisfied)-> doesn’t reduce redundancy
b) we split the table into 2 (with the second table containing the tournament information)-> reduces redundancy
Third Normal Form
- It satisfies 2NF
- There are no transitive dependencies (attributes may not be functionally dependant on other non-key attributes)
E.g: Country and date of birth are both dependant on player, which is not a primary key. They are only linked to the primary key because the player is linked to it. (this is called transitive dependency)
How to solve transitive dependencies?
By splitting the table (with the second table containing player info)
Results of Normalization
- Less suceptible to human error
- More complex SQL queries to join the tables to produce a complete statement
Normalization also resolves many-to-many relationships between entity sets
Entity Relationship Models
Entity Relationship Models describe the relationship between things of interest. They are visualized as Entity Relationship Diagrams. ERD
Components of ERD
- Entities are things of interest to the organization, may be physical or immaterial concepts.
- Attributes describe an entity (chracteristics of a person may be gender, age, name…)
- An entity set corresponds to the table notion, while an entity corresponds to a row,
Relationships and Cardinalities
Relationships explain the link between two entities. Cardinalities refer to the number or entities in one entity set that are related to entities in another entity set.
Types of Cardinalities
- One-to-one (1:1): Exactly one entity in set A is linked to one in set B. These are very rare as they usually get combined in one entity set.
- One-to-many (1:M): One entity in set A is linked to many in set B
- Many-to-many (M:N): Many entities in set A are linked to many (not always the same nr.) entities in set B. These are eliminated through normalization.
Optional Relationship
One or both ends of the relationship can have zero entities. Optionality refers to the minimum nr.of cardinalities (1 or 0)
Upper cardinality
Or multiplicities are about the maximum number of cardinalities.
E.g: Not all employees are given a car.
Not all players have won a tournament. (but every tournament needs a champion, so this is not optional).
Sets & Keys
- an entity set contains all entities of a similar kind
- a relationship set contains all relationships of a similar kind
To uniquely identify each entity in an entity set we need a key. - A key is an attribute that uniquely identifies each entity in an entity set, respectively each record
Phases of an ERD
- Conceptual model: identify the things of interest (entities)
- Logical model: identify the attributes and keys, decide the data type for each entity
- Physical model: database proof, no many to many relationships, implementable
How to solve M:N relationships?