Database Principles Flashcards
What is a surrogate key and when is it used
A single column that you fill with artificial values. Used when a table doesn’t have data that supports creating a natural primary key
What is a candidate key
A minimal super key that is, a key that does not contain a subset of attributes
What is an alternate key
A key that contains all the properties needed to become a candidate key. Can be a set of single attributes or multiple attribute
What is a superkey
A key that can uniquely identify any row in the table. Time attribute is a good example of a super key
Does the order of creation matter
Yes cant create an intersection table cust_order before table customer
What is DDL
- Data Definition Language (DDL)
Allows the DBA or users to describe and name entities, attributes and relationships required for the applications that access it
Examples
CREATE TABLE, ALTER TABLE, DROP TABLE, RENAME TABLE
What is DML
- Data Manipulation Language (DML)
Provides the ability to manipulate data within the database.
Examples
SELECT, INSERT, UPDATE, DELETE
What is a join and how does it work
A JOIN links tables by selecting only the rows with common values in their common
How does inner join work
Only returns matched records from the tables that are being joined
How does outer join work
Matched pairs would be retained, and any unmatched values in the other table would be left NULL
What is data control language
Language used to control access to data stored in a database
Example
REVOKE
GRANT
What is a transaction
Any request for a piece of information from a database
What is transaction analysis
the process of ensuring that multiple database operations (like updates or inserts) are done correctly and completely as a single
What is the purpose of TA and what does it allow
- Purpose of transaction analysis is to be able to see which tables are used the most in transactions
- With this information we can optimize the tables
It guarantees data consistency and reliability
. Can help identify problems
. can improve performance
What are the 3 main categories of databases
○ According to the number of users
○ According to the type of use
○ According to Database site location
What are 2 categories for users in a database
Single User database systems
Multi User database systems
What is a Single User database systems
. Database is on the PC and hard disk
. All applications run on the same PC and directly access the database
. A single user accesses the application
What is a Multiuser database systems
. many PC’s connected through a LAN
. a file server stores a copy of the database files
. each user runs a copy of the same application and accesses the same files.
What are the 4 categories for type of use in a database
Production or Transactional Database system
Data Warehouse
Decision Support Database System
Research
What is Production or Transactional Database system
. Used for management of supply chain and for tracking production of items in factories, inventories of items in warehouses/stores and orders for items
. Used for purchases on credit cards and generation of monthly statements
. Used in Banks for customer information, accounts, loans and banking transaction
What is a data warehouse
. Historical collection of data and transaction
. Can be used to find trends
What is a Decision Support Database System
. Its a hybrid
. Aids users in judgement and choice activities
. Used in environments like business, health care and military where you have to make decisions
What are the 4 categories for database location
Centralised database system
Parallel database system
Distributed database system
Client/Server database system
What is a centralised database system
Single processor together with its associated data storage devices and other peripherals
What is a parallel database system
○ A database system implemented on a tightly coupled multiprocessor or on a loosely coupled multiprocessor
○ They link multiple smaller machines to achieve the same throughput as a single larger machine
What is a distributed database system
○ Data is spread across a variety of different databases
○ Managed by a variety of DBMS’s that are running on various types of machines having different operating systems
○ Each machine can have is own data and applications, and can access data stored on other machines
○ Each machine acts as a server as well as client
What is a client/server database system
○ Client-server technology instead of centralized system
○ There is a server which acts as a whole data base management system
Name some concurrency transaction problems
The lost update
The uncommitted dependency
The inconsistent analysis
What is the lost update
An apparently successfully completed update operation by one user can be overridden by another user
What is the uncommitted dependency
occurs when one transaction is allowed to see the intermediate results of another transaction before it has committed.
What is the inconsistent analysis
○ Occurs when a transaction reads several values from the database but a second transaction updates some of them during the execution of the first
○ For example, a transaction that is summarizing data in a database (for example, totalling balances) will obtain inaccurate results if, while it is executing, other transactions are updating the database
Name some transaction control techniques
Timestamping
Serializability
Locking
What is timestamping
○ A concurrency control protocol that orders transactions in such a way that older transactions, transactions with smaller timestamps, get priority in the event of conflict
○ A unique identifier created by the DBMS that indicates the relative starting time of a transaction
What is Serializability
○ Ensuring that a set of transactions in a database system produce the same results as if they were executed one after the other in some order, even though they may be executed concurrently.
What is locking
Prevents deadlock
○ A procedure used to control concurrent access to data. When one transaction is accessing the database, a lock may deny access to other transactions to prevent incorrect results.
○ Ensures serializability of concurrent transactions
§ Shared lock - If a transaction has a shared lock on a data item, it can read the item but not update it.
§ Exclusive Lock - If a transaction has an exclusive lock on a data item, it can both read and update the item.
What is a deadlock
A situation arises when two (or more) sessions are waiting to acquire a lock on a shared resource, and none of them can proceed because a second session also has a lock on some other resource that is required by the first session.
HAVE A LOOK AT RELATIONAL ALGEBRA
HAVE A LOOK AT RELATIONAL ALGEBRA
What are the 3 things we need to understand when starting a database
- Business rules
- Optimisation
- The correct data types
What do data types do
- This limits violations of data integrity
- Defines the kinds of values that can be used or stored
What do numeric data types do
Numeric data types allow us to specify the range of digits for specific values
What are the differences between char and varchar
- Char is fixed varchar is not
- They are stored on the disk differently
- Var char takes 2 bytes per character whereas char only takes 1
- Char saves more memory
What is one way we can optimise databases
Using less joins
What are the first things we look at when optimising a database
ERD
Data
Tables
What is an index
a logical structure that improves the speed of data retrieval
Name some indexes in postgres
B Tree
Hash
What is the code to create an index
○ CREATE INDEX index_name ON table_name(col_name)
Do we just apply indexes anywhere
You need to be able to identify which column is the most suitable for an index
The best candidates for indexes are columns that you filter on regularly
What is a view
A view is a virtual table whose contents are defined by a query.
It holds temporary data
How do we create a view
CREATE VIEW comedies AS
SELECT *
FROM films
WHERE kind = ‘Comedy’;
How does explain work
- With the explain keywords we can see the cost and runtime of queries
- Useful to see if indexes made a difference
What type of views are there
Standard view - normal one in postgres
Materialized view - cache the result of a complex and expensive query. Useful when you need fast data access often used in data warehouse and business
Recursive view - come in handy when working with self-referential data
What is a subquery
A query inside a query.
Th order if execution : Does the inner most query first
What is a trigger
- A trigger is procedural SQL code that is automatically invoked by the RDBMS upon the occurrence of a given data manipulation event.
- A trigger is invoked before or after a data row is inserted, updated, or deleted.
- A trigger is associated with a database table.
- A trigger is executed as part of the transaction that triggered it.
What can triggers be used for
- Triggers can be used to enforce constraints
- Triggers can provide warnings
- Triggers can be used to update table values, insert records in tables, and call other stored procedures.
What are the 3 main types of security violation in databases
- Unauthorized modification of data
- Unauthorized deletion of data
- Unauthorized reading of data
What are the security measures at their given levels to protect the database from violations
○ Database system - Have validation to check they’ve entered the correct thing. Prevent them from entering wrong thing
○ Human factor - Apply correct constraints for data to reduce human errors. Take the approach the user is an idiot and cannot be trusted. Users are a major threat to data integrity
○ Operating System - Operating system must be secured to unauthorised access
○ Network - Security within the network e.g. firewalls
Physical Security - e.g. tailgating, blocking USB ports
Name and define some types of authorization
○ Read access - Only allows you to read data
○ Update access - Only allows you to update
○ Insert access - Only allows the addition of new data not modification
○ Delete access - Only allows deletion of data
○ Index access - Allows the creation and deletion of indexes
○ Alteration access - It allows the addition or deletion of attributes in a relation
○ Resource access - Allows the creation of new relations
○ Drop access - Allows the deletion of relations
Should you give users drop access
No we try to avoid this
What keywords do we use to give and take away users access in a database
GRANT and REVOKE
What is the principle of least privileges
Never give the user more access than necessary to complete a task
What is a policy
○ Policies are general statements of direction or action that communicate and support DBA goals
What is a standard
Standards describe the minimum requirements of a given DBA activity; they are more detailed and specific than policies
They are rules/requirements that need to be met
What is a procedure in database security
○ Procedures are written instructions that describe a series of steps to be followed during the performance of a given activity
Give examples of policies ,procedures and standards
- Policies - All user must have a password that needs to be changed every 6 months
- Standard - The password must have a minimum have of 5 characters
- Procedures - To create a password follow these steps
Name some aggregate functions
MAX,MIN,SUM,AVG,COUNT
What is the difference between dynamic sql and embedded sql
- Dynamic SQL - adapting based on your input
- Embedded SQL - Can not give you flexibility outside of what is given.
What is the difference between a trigger and a procedure
- Triggers work at a table level e.g. listen for changes in the table and if there’s an insert check if it already exists
- Procedure is at the database level e.g. if attack is detected encrypt the data
What do functions start and end with
$$
Do we call both functions and procedures
No In SQL we CALL procedures but functions are usually parts of select statements
What privilege is needed for users to use procedures
Execute privilege
Name 1 language for procedures and functions
PSM
What happens in table replication
It creates a permanent table with the data your selecting
Show how you would replicate a table
CREATE TABLE new_table AS
(SELECT col1, col2, col3
FROM
existing_table
WHERE
condition);
What is DQL
Data Query Language is used for fetching data. E.g. SELECT
What is the order of execution of queries
1 FROM, JOIN
2 WHERE
3 GROUP BY
4 HAVING
5 SELECT
6 ORDER BY
7 LIMIT
What is the difference between having and where
- Having is filtering at the group level
- Where is filtering at the row level
What is the difference between procedural operators and declarative operators
○ procedural system, we tell the system what to do (e.g. arithmetic operations, ordering, substring comparison)
○ in a declarative system, we tell it what result we want (e.g. LIKE, DISTINCT, IN, EXISTS etc).
What do we always prioritize in a database
Security
Why do we monitor a database
For performance and more importantly security
Name a monitoring software for any database
PGadmin
What is overclocking
- Overclocking generates heat so faster computer means more heat
What is a benchmark
standardized sets of tasks that help to characterize the performance of database systems
* Benchmarks are needed to find out how fast the database is running
What is tuning the performance
involves adjusting various parameters and design choices to improve its performance for a specific application
Name some performance issues
- Security
- Long delays and timeouts
- Queries performing poorly
What is a bottle neck
Things that you dont have control over that affects the performance of the system e.g.
* You cant control the client or internet
What is a transaction manager
A Transaction manager is a specialized piece of software that guides the transactions on where to go.
What are the 3 ways Tuning is realised in a database:
○ Includes the schema and queries and is system independent.
○ Database-system parameters (e.g. as buffer size and checkpointing intervals)
○ Hardware
What does the recovery subsystem do
Responsible for making sure that the database is restored to the state it was in before the program started executing.
Ensure that the program is resumed from the point at which it was interrupted so that its full effect is recorded in the database
What is a full back up
Back up everything since the start
What is an incremental back up
You only back up the most recent changes
What is a partial back up
You only back up specific things for example you only want to back up orders
What are the 3 types of recovery
Crash recovery
Disaster recovery
Version recovery
Roll forward recovery
What is crash recovery
protects a database from being left in an inconsistent, or unusable, state when transactions (also called units of work) are interrupted unexpectedly.
What is disaster recovery
process to restore a database in the event of a fire, earthquake, vandalism, or other catastrophic events
What is version recovery
the restoration of a previous version of the database, using an image that was created during a backup operation (roll back)
What is roll forward recovery
Patterned based. You analyze what went wrong e.g. database crashes when customer pressed purchase
What does the log record do
Every transaction in the database creates a log record prior to modifying the database
. allow the system to undo changes made by a transaction in the event that the transaction must be aborted
. allow the system also to redo changes made by a transaction if the transaction has committed but the system crashed before those changes could be stored on the disk
What is deferred modification
If a transaction does not modify the database until it has committed
What is immediate modification
If database modifications occur while the transaction is still active
Name some ethical issues
Intellectual property
Copyleft
Privacy
What is intellectual property
Intellectual property is something that you create using your mind
You own intellectual property if you:
* Created it
* Bought intellectual property rights from the creator or a previous owner
* Have a brand that could be a trade mark, for example, a well-known product name
What is copyleft
a general method for making a program (or other work) free , and requiring all modified and extended versions of the program to be free as well
What is the privacy issue
Involves storing data
○ If you don’t need it don’t collect(e.g. dob)
○ You should justify why you need the data
Name some privacy legislation
- UK Data protection Act 2018
- General Data Protection Regulation (GDPR) of European Union
Name some of the data protection principles
Data is :
○ used fairly, lawfully and transparently
○ used for specified purposes
○ accurate and, where necessary, kept up to date
○ kept for no longer than is necessary
○ handled in a way that ensures appropriate security, including protection against unlawful or unauthorised processing, access, loss, destruction or damage
* There’s stronger protection for things like race, ethnicity, gender, relegion,biometrics and etc