132-databases Flashcards
Database
persistent, organised store of data. Persistent: stored on permanent storage (e.g hard drive) & not stored in memory-only data like data structures array, queue/list). Organised: certain standardised catalogued way, stored in records&fields. Store of data: raw facts/figures an application needs
Organising data into a database benefits & from paper-based to electronic databases:
-easier to efficiently search, manipulate, update and make data copies.
- can be accessed by multiple people at same time from different locations.
- Avoids inconsistencies and duplication of data, storing and managing data in a reliable and efficient way.
Database terminology (table, record, field, entity, attributes, field name, tuple)
A table is a collection of records representing one set of data in a database. A record is data fields about a single entity. A field is a piece of data about an entity in a record. A entity is an object, place, person, or thing that can be distinguished from others and has data stored about it
Attributes: Properties/characteristics of an entity.
Field name: An identifier for a piece of data.
Tuple: A record in a table.
Database concepts
- database is a collection of tables that store data about entities in the real world.
- Each table consists of records (rows/tuples) and fields (columns/attributes)
- database is managed by a database application like a DBMS.
Flat file database
- all data stored in only one table (a single file, often CSV and based around a single entity & it’s attributes)
- Typical uses: when only one type of entity is being stored, storing contact details, small product database, maintaining music/game collection
- Created using database/spreadsheet software but can be created manually using text editor/other tools
Flat file database pros
- very simple, easy & quick set up and design
- little expertise to maintain
- suitable for storing small amounts of data
-easier to import into another system
-easier to use for an inexperienced user
Flat file database cons
-very inefficient, a lot of repetitive data, takes up unnecessary space
-gets worse as database grows: difficult to maintain, slow to query
Relational database
- database worganise data into multiple tables, each representing an entity in the system.
- to minimize redundancy, repetitive data is split and stored in separate related tables, which are linked together using relationship
Relational database pros
- reduces data duplication - saves storage
- easier to change data or the format of data
- easier to add data
- improves data integrity and consistency
- allows different access levels/security levels
Relational database cons
- becomes complex as amount of data grows - decrease in performance - slow query responses
- expensive - set up and maintenance
- requires lot of physical storage.
Primary key
field, has a unique identifier (value) for each record in table. which is never repeated within a table so ensures each record is unique.
Secondary key
- field that has some identifying information but typically not uniquely identifies record
- allows database to be searched quicker for records/speeding up sorting - if key maintains a logical order to the records
Foreign key
- primary key (attribute) of another table(entity), to link tables together
Entry relation/entity relationships
- One-to-one: one entity can only be linked to one entity. e.g each customer’s firstname has one surname.
- One-to-many: one entity can be associated with many other entities .e.g customer can have many orders but each order belongs to only one customer.
- Many-to-many: many entities can be associated with many other entities. Each customer can buy many items, and each item can have many customers.
Data capture
- collecting data by extracting info from paper/electronic documents and converting it into data to place into database.
- manual- paper forms: human reading form & manually typing info into computer based system, don’t require technology/internet connection
- automated methods - helps speed up & automate data input, very fast n efficient, less prone to human error): smart card readers, barcode readers, scanners, OCR, OMR, magnetic ink character recognition, sensors
Data capture OCR (optical character recognition)
- Automatically reads text by interpreting letters shapes.
- Works better w/ printed text than handwriting.
- Used by post office to read postcode & route mail, road cameras to handle congestion charging n identify speeding drivers (form of OCR: ANPR automatic number plate recognition software)
Data capture OMR (optical mark recognition)
- detects the presence of marks or shaded areas.
- Provides more accuracy than OCR.
- Used by multiple choice tests, lottery tickets, surveys, ballots
Data capture - Inputting Data
- once data collected, can be transferred to database automatically using DBMS software/typing it in customised form/importing it from a spreadsheet or file/using EDI
Selecting data
- methods of querying the correct data and retrieving it.
- Using query languages e.g SQL/QBE.
- Both allow to specify tables/fields/different search criteria/output sorting (e.g ascending/descending order, & use boolean expressions.
QBE (Query By Example)
- makes use of visual tables where user enter commands n conditions
- doesn’t need to remember syntax finer details, just need to understand how to use graphical interface.
Managing data
- managing, manipulating, adding, editing and deleting data.
- Can be managed by SQL to sort, restructure and select certain sections - DML to modify (add new n edit existing data n delete data) a database once it has been initially set up(commands update delete insert) or by using built in facilities of database.
Database applications
- software programs designed to retrieve, manage, distribute and present information required by the end users effectively from the raw data of the database.
- Examples: database management system (DBMS), CRM systems, web applications, social media websites
Database management systems (DBMS)-
- software application, allows database administrator to maintain one/more relational databases.
- Hides underlying complex structure of data, provides layer of abstraction for user, so can define the database structures at logical level.
- Examples: MySQL, SQLite
Features of a DBMS
- allows different applications to access data at the same time, controls access to data
- ensures it remains integral (enforce referential integrity) by preventing creation of duplicate primary keys n any operations that could damage relationships between tables of data
- enforce validation rules, provides secure access (limits access), provides encryption
- provides program data independence
- manages multiple users
- provides backups n ability to restore from a backup if disaster occurs
- supports a query language n other languages (to extract/add/amend data & alter database structure)
- controls concurrency (lock data whilst someone is working on it so someone else cannot change it at same time)
Exchanging data -
- Process of transferring the collected data w/ other people/electronic systems
- Manual methods: memory stick, optical media, removable hard disk, email, paper-based
- Automatic methods: bypass manual data exchange and have 2 database interface with one another (reading n writing directly to and from eachother’s table) through live connection e.g EDI
EDI (electronic data interchange)
- Protocol between two systems to facilitate exchange of data.
- Doesn’t require human interact, enables data transfer from one computer to another. Increases data transmission speed & efficiency of processes for users. Process can be entirely automated without any human involvement
- though, any error in data will be replicated across multiple systems.
- Uses e.g automatic order placement system when item stocks low.
Exchanging data considerations
making sure can exchange data between different databases n other systems/application self spreadsheet software n accounting programs. Common formats: human readable, open formats for structuring data. Standards for storing n transporting data, systems allow. Examples: CSV, JSON, EDI, SOAP, RSS, APIs, SQL
CSV file (commas separated value)
- stored in text file data is in a list with a comma between each one.
- Used for excel documents.
- Database: each record is stored on separate line n each field separated by comma.
- Structure fixed and known, important routines can be written to extract data from file.
- Many systems allow data to be output in CSV format.
JSON file (JavaScript object notation)
- json strings can be easily converted into an object
- Common format for databases
- can export and output data in JSON format. [{“id”:1, “last name”:”Smith”,”firstname”:”Sara”}, … }
Indexing a database
- process of creating a database index, data structure that improves the speed of data retrieval
- method used to store position of each record ordered by certain attribute
- makes it quicker to query/search through n find required items on those particular attributes, as items not need to be searched sequentially.
- When field becomes indexed==secondary key. Primary key is automatically indexed so can be located quickly n easily, but almost never queried as not normally remembered so secondary keys used (fields used as search criteria).
Normalisation
- process of removing redundancy within a database
- way of structuring the data in a relational database according to formal rules
- process of coming up w/ best possible layout for relational database by splitting tables and arranging data to normalised forms( need to be normalised to ensure system operate properly
- data integrity maintained n unnecessary duplicates avoided).
- Tries to accomplish: no redundancy, consistent data throughout linked tables, records added n removed w/o issues, complex queries can be carried out.
- makes accessing and storing data easier and more efficient
- makes maintaining the database contents more quicker and easier
- reduces the chances of data errors or duplication