Module 6 Flashcards
Data mangement, modelling and manipulation
Data
Raw facts and figures without any meaning.
Information
Data shaped into a meaningful form through context
Knowledge
Results from experience and education. The info collected by an entity and the ability to locate additional info
DNA of a Tweet
- data on hashtags used and users mentioned
- geo-data
- data on videos and pictures
- user-data
Relational Database
Database organises data into one or more tables. Tables consist of columns called attributes or fields and rows called tuples or records.
Keys
Unique Primary Key identify each row. Keys can be used to link different tables.
SQL
Standard Query Language is the defacto standard language used to communicate with database systems. We can create, delete and manipulate databases using SQL.
Categories of commands in SQL
- DDL data definition language: creating, deleting, altering databases and tables
- DML data manipulation language: inserting, deleting, changing, retreiving tuples
- DCL data control language: access management
SQLite
SQLite stores the whole database in a single file. Once you have access to the file, you have access to the database. DCL is not part of SQLite
Creating tables
CREATE TABLE Table Name (Column1 DataType (optional length), Column2 DataType (optional length),… PRIMARY KEY (KeyColumn1, KeyColumn2)
Creates a table and declares a composite primary key across multiple columns.
You can’t have spaces in table name and column names unless you put [].
Data Types
TEXT: character string
INTEGER/INT: signed integer
BLOB: binary large object, videos?photos stored in binary language
REAL: floating point numbers
NUMERIC/NUM: anything not covered by the rest, includes BOOLEAN, DATE, DATETIME
Redundancy
Redundancy=repetition should be avoided in databases since it:
- Increases the required amount of storage
- Is a source for data inconsistencies
BUT: it can improve databases since the join operator becomes uneccessary
Data Normalization
Normalization is a technique that manages the trade-off between redundancy and ease of use/speed of database. It is a standardized technique that transforms a database iteriatively accordingly to a strict set of rules.
A database satisfies a normal form, if all tables in that database satisfy the normal form.
First Normal Form 1NF
- The domain (space from wich attributes take their value from) of each attribute contains only atomic values. Atomic values: values that can’t be split into multiple logically distinct parts
- Each attribute contains only one value from the domain.
After first normalization redundancies remain
How to solve when there is multiple players with the same name?
a) create a composite key (player, date of birth)
b) create a unique player ID