Databases IB SL + HL Flashcards

1
Q

Distinction between Data and Information

A

Data is just raw facts, numbers and figures
(e.g. 1975)

Information is data that has been given a context and meaning.
(e.g. The Vietnam war ended in 1975)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Database

A

An organized collection of structured information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the software called that creates, defines, and manipulates a database?

A

Database Management System (DBMS)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the things at the top of columns in a database that describe what’s inside that column?

A

Fields/Attributes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a row in a database referred to?

A

A record

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

3 Benefits of a database

A
  • Enables for processing/querying of data
  • Can easily generate reports
  • Standardized (can be used by multiple systems)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Data Verification (in context of databases)

A

This is the checking that input matches expected values in database

E.g. login details of a user matches those stored in the database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Data Validation (in context of databases)

A

Checks that input follows rules for the type of input entered

For example: CVV code should be 3 numerical digits

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Entity

A

A real-world object or person represented by a record in a database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Primary Key

A

A unique key used to identify a given record in a database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Foreign Key

A

A field in one table that refers to the primary key field in another table. It is used to connect (relational) databases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

6 different types of data types in a database

A
  • Text:
    Combination of letters, numbers or symbols. Basically a string.
  • Character:
    In individual letter, number or symbol
  • Boolean:
    A binary value (True/False, Yes/No)
  • Integer:
    A whole number
  • Real:
    A decimal number
  • Datetime:
    A date and/or time
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Write the SQL to get ProductName from the table Products where the price is more than 20 but less than 25

A

SELECT ProductName FROM Products WHERE Price > 20 AND Price < 25

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Secondary Key

A

Keys that are also capable of functioning as a primary key. Also known as the alternate key.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

2 differences between Primary and Secondary Key

A

Primary Key - Cannot be Null, only one primary key possible

Secondary Key - Can be Null, multiple secondary keys possible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Candidate Key

A

Umbrella term for keys that can uniquely identify each record in a database

Both primary and secondary keys are candidate keys

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Composite Primary Key

A

When one field is not enough to uniquely identify records, two fields can function together as the primary key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Database schema

A

A diagram of how data is organized in a relational database.

This means a diagram showing table names, fields, data types of fields, and relationships between tables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Records can also be referred to as?

A

Tuples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Referential Integrity

A

This is a feature of relational databases. What this means is every row has a primary key, and that every value in a foreign key column will be found in the primary key of the table from which it originated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Concurrency, and what does it use to prevent more than one user accessing the same row/record at the same time?

A

This is the process referred to managing simultaneous updates or transactions at the same time. It ensures updates/transactions occur sequentially to avoid collision between users’ actions.

Row locking

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Data Dictionary

A

A file or set of files that stores how the information and tables should be organized and stored in a database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How does DBMS provide security to a database? (5 ways)

A
  • Access Rights
  • Audit trail (Record of any changes to the database)
  • Data/Row Locking
  • Encryption
  • Backups - Continually updated copes that can be accessed in case the original database is lost or corrupted
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Transaction

A

A sequence of one or more SQL statements that are executed as a single unit of work. A transaction can either be committed (applied to the database) or rolled back (undoing something). Essentially, it allows us to make temporary changes, making data integrity easier.

25
Q

Different states in transactions?

A

Active State
Partially Commited State
Commited State
Failed State
Aborted State
Terminated State

26
Q

What is the acronym for ACID?

A

Atomicity:
All tasks in a transaction are performed if successful. Else no tasks are performed.

Consistency:
All data written must be valid according to existing rules.

Isolation:
No transactions will collide/interfere with another. Sequential execution of transactions.

Durability:
Once a transaction is complete, the change to the database is permanent, even if there is system failure.

27
Q

Why do we have transactions?

A

They make sure there is never a case where there were incomplete changes.

28
Q

Log

A

Records every change making it easier to roll back

29
Q

Data Integrity

A

Accuracy:
The correct data and relationships are retained and preserved

Completeness:
All data needed is available

Validity:
Data meets all predetermined rules

30
Q

Audit Trail

A

This records all changes, login attempts, and other security related events to help detect and investigate security breaches.

31
Q

Data Redundancy

A

This is what happens where the same piece of data is stored in multiple places. It can happen in different or the same tables.

32
Q

Data Redundancy Pros & Cons

A

Pros:
- Faster data access speeds
(more locations from which data can be accessed)

  • Better data protection:
    In case of loss of data, it can be replaced from another location

Cons:
- Data inconsistency:
If data is updated at one location, other locations may not be updated as well

  • Takes more memory (more expensive)
33
Q

Normalization

A

This is the process in which larger tables in a database are divided into smaller tables to reduce data redundancy while ensuring data integrity.

34
Q

What are the different Normal forms, and at each step what do they seek to eliminate?

A

1st Normal form (1NF)
- Eliminate duplicate columns, columns with multiple types of values
- Create separate tables for each group of related data (with unique primary keys)

2nd Normal Form (2NF)
- Meet all the requirements for 1NF
- Eliminate partial dependency (Columns that are dependent on one attribute of a composite primary key)

3rd Normal Form (3NF)
- Meet all the requirements for 2NF
- Eliminates transitive dependency

35
Q

Why do we normalize databases?

A
  • More data storage
  • Increased Data integrity
  • Makes queries simpler and more efficient
36
Q

3 types of anomalies that can occurs without normalization

A

Insertion anomaly

Deletion anomaly

Update anomaly

37
Q

What are DBA, and what are they responsible for?

A

Database Administrators

They are responsible for:

Updating the database - adding/updating new data, relations, fields and tables

Maintaining security - Assigning different levels of access to users

Managing and creating backups

38
Q

What are DDL commands

A

Database Definition Language

DDL commands are a part of SQL. They are a set of commands that allow us to define and modify the structure and metadata of a database. They are only available to DBAs.

Examples such as:

CREATE
ALTER
DROP
TRUNCATE

39
Q

What is Data Modelling in databases

A

This is the process of creating a visual representation of all or some parts of a database. They help all stakeholders have a shared understanding of a system.

40
Q

What does Data Modelling set out to illustrate (at a minimum)

A

Type of data used
Relationships between those types of data
How data is grouped and organized
Data attributes

41
Q

3 types of data models

A

Conceptual:
Essentially just the entities and how they are going to be related to each other
Logical
Entities, attributes and foreign keys
Physical
Essentially, a complete table.

42
Q

Advantages of data modelling reduced data ____ (3 different answers)

A

Reduces data:
redundancy
lack of integrity
lack of consistency

43
Q

What is OOD?

A

Object-oriented database

This is when data is stored as objects, and each object contains both data and methods that can be performed on this data. SQL is not required, data is manipulated by these self-contained methods. The format and methods used in the OOD are determined by templates called classes.

44
Q

3 advantages of OOD

A
  • Can store a larger number of data types
  • Reusability (through inheritance of classes)
  • More accurately represents real-world entities
45
Q

3 disadvantages of OOD

A
  • OODs are less secure (No levels of access)
  • Not universally agreed upon
  • Lower efficiency when data and relationships are simple
46
Q

Data warehouse

A

Essentially a relational database on a much larger scale. Often used by businesses for data mining/trend analysis

47
Q

3 Advantages of Data warehousing

A
  • Better performance (Data warehouses are optimized for quick retrieval and analysis of data)
  • Standardization of Data due to ETL

-Access to both Historical data and current data

48
Q

ETL

A

Extraction, Transformation, Loading

1) Extraction - Structured and unstructured data is imported from multiple sources into a single repository
2) Transformation- Data is transformed to ensure data quality and accessibility
(removing duplicate data, standardizing)
3) Loading - Data is transmitted all at once or in intervals to the data warehouse

49
Q

3 differences between data warehouse and a database

A

Database:
- Mainly stores current data
- GBs of Data
- Deals with thousands of users

Data Warehouse:
- Mainly stores historical data
- TBs of data
- Deals with relatively smaller number of users

50
Q

Data mining

A

This is the process of analyzing a large data set with the goal of finding new trends or relationships

51
Q

Data Mining 3 pros

A
  • Helps to understand behaviors and trends
  • Helps detect risk and fraud
  • Helps to analyze large amounts of data very quickly
52
Q

Data Mining 3 Cons

A
  • Expensive
  • May use sensitive personal information
  • Requires a large amount of data
53
Q

Cluster Analysis

A

A method of data mining.

This segments data into groups that have similar characteristics (called “clusters”)

It uses unsupervised learning (no prior training of the algorithm required)

For example,
Netflix similar watching habits

54
Q

Classification

A

A method of data mining.

This is when a model is developed using training data (supervised learning) to make predictions of outcomes for new data added to the model based on the training data.

55
Q

Association Analysis

A

A method of data mining

Association analysis breaks up data sets by variables (such as gender, location, etc.) and aims to find interesting relationships in datasets where one data item is often dependent on other.

For example:
Doctors see what symptoms often lead to which diseases

56
Q

Link Analysis

A

A method of data mining.

Often used in an anti-criminal situation. It establishes relationships between different entities in a data set. It does this by first seeing what could link two entities. It then checks which entities match this criteria, to what extent, and which do not.

57
Q

Deviation Detection

A

A method of data mining

Deviation detection’s purpose is to detect the most significant anomalies in a data set. It uses unsupervised learning.

58
Q

Spatial Databases, and what can it be used to do?

A

Databases that are used to store spatial data and data linked to location.

For example:
Points, polygons, coordinates, 3D shapes

The data in a spatial database can be used to model the structure of geometric/3D objects and other geometric functions

59
Q

Data Segmentation

A

This is the process of taking your data and diving up and grouping similar data together based on chosen characteristics so it can be used more efficiently. Often used in marketing with demographics.