WSI Flashcards
Introduction to Information
refers to databases that don’t follow the traditional relational model used by SQL databases. The term can be interpreted as “non-SQL” or “not only SQL.” NoSQL databases offer more flexibility and natural data storage compared to SQL databases. NoSQL is a broader database management approach, while SQL is specifically a query language.
NoSQL
database
Key-value database
Document database
Wide-column database
Graph database
is information that’s set up for easy access, management and updating.
database
HISTORY: Early databases used flat files, which were unstructured and inefficient. Hierarchical and network models were developed to improve organization and relationships between data. The relational database model, introduced by Edgar F. Codd, revolutionized database management by using tables with rows and columns. His ideas led to the development of SQL (Structured Query Language).
flat files -> Hierarchical -> network -> relational (Edgar F. Codd)
define what constitutes a true relational database management system. The rules cover various aspects of data storage, access, manipulation, and integrity.
While no commercial database fully implements all 12 rules, they
serve as an ideal standard for relational database systems.
Codd’s rules
Pioneered relational concepts and SQL.
IBM System R
Early commercial RDBMS inspired PostgreSQL
Ingres
First commercial RDBMS, gaining market dominance.
Oracle
Popularized relational databases for microcomputers
BASE II
Robust, flexible, and advanced features.
PostgreSQL
Fast, simple, and widely used for web applications.
MySQL
Fork of MySQL with additional features and optimizations
MariaDB
Microsoft’s RDBMS for enterprise environments.
MS SQL Server
IBM’s RDBMS for high availability and scalability.
DB2
The central component that stores, manages,
and processes data.
Server
Applications that interact with the database
by sending requests to the server.
Client
Clients send requests to the server,
and the server processes them and returns results.
Communication
Centralized management, improved security,
better resource allocation, and scalability
Benefits
- Developed by IBM researchers Donald D. Chamberlin and Raymond F. Boyce in the 1970s
- Based on Codd’s relational model.
- Designed for querying, manipulating, and managing data in relational databases.
- Standardized by ANSI and ISO.
- Widely used in RDBMS such as MySQL, PostgreSQL, Oracle, and Microsoft SQL
SQL - Structured Query Language
- Unique Features: AUTO_INCREMENT, LIMIT, REPLACE INTO
- Use Case: Web development and applications
MySQL
- Unique Features: JSON/JSONB, WINDOW FUNCTIONS, RETURNING clause
- Use Case: Complex queries, data integrity, high concurrency
PostgreSQL
- Unique Features: TOP, TRY-CATCH, MERGE, CTEs
- Use Case: Enterprise-level applications, Microsoft ecosystem
Microsoft SQL Server (T-SQL)
- Unique Features: CURSORS, TRIGGERS, SEQUENCES, procedural extensions
- Use Case: Large enterprises requiring complex business logic
Oracle SQL (PL/SQL)
- Unique Features: Single-file database, limited concurrency, no user management
- Use Case: Embedded systems, mobile apps, small-scale software
SQLite
- Unique Features: Virtual columns, storage engines, compatibility with MySQL
- Use Case: Performance-enhanced alternative to MySQL
MariaDB
- Unique Features: PureXML, multi-dimensional clustering, RCAC
- Use Case: Large organizations requiring high throughput and complex queries
IBM DB2
For text fields like Title, Author, Category, ISBN, Publisher, Address, and Phone.
NVARCHAR(200):
The process of organizing data to minimize redundancy and ensure data integrity.
Normalization
A set of rules for achieving normalization.
Normal Forms
Data in a row-column intersection should be indivisible
Atomicity
- Unique Identifier: A primary key uniquely identifies each record.
- Avoid Changing: Primary keys should be stable to prevent errors.
- Artificial Primary Key: A generated integer or other unique value.
Primary Key
Separate Related Data: Move columns related to other entities into separate tables (e.g., publisher, author).
Redundancy
- Foreign Key Constraints: Automatically enforce referential integrity by ensuring foreign key values point to existing primary key values.
- Data Integrity: Prevent orphaned records and broken references.
Constraints: Ensuring Data Integrity
- SQL is declarative, allowing users to specify the desired result without detailing the process.
- This simplifies querying and makes it efficient for large datasets.
Declarative Language
- Different database systems have their own SQL dialects with variations in syntax and features.
- This provides flexibility and optimization for specific use cases.
SQL Dialects
- DQL (Data Query Language): Retrieves data.
- DDL (Data Definition Language): Defines database structure.
- DML (Data Manipulation Language): Modifies data.
- DCL (Data Control Language): Manages permissions and security.
- TCL (Transaction Control Language): Manages transactions.
SQL Sublanguages
- DQL (Data Query Language):
Retrieves data.
- DDL (Data Definition Language):
Defines database structure.
- DML (Data Manipulation Language):
Modifies data.
- DCL (Data Control Language):
Manages permissions and security.
- TCL (Transaction Control Language):
Manages transactions.
- CREATE TABLE: Creates a new table.
- ALTER TABLE: Modifies an existing table.
- DROP TABLE: Deletes a table.
- CREATE INDEX: Creates an index for faster querying.
- DROP INDEX: Deletes an index.
DDL Statements
Returns rows that have a match in both tables.
INNER JOIN
Returns all rows from the left table, along with matching rows from the right table.
LEFT JOIN
Returns all rows from the right table, along with matching rows from the left table.
RIGHT JOIN
Returns all rows from both tables, whether or not they have a match in the other table.
FULL OUTER JOIN
Types of Data Models
Hierarchical Data Model
Network Data Model
Relational Data Model
Entity-Relationship (ER) Model
Object-Oriented Data Model
NoSQL Data Models
- Tree-like structure with parent-child relationships.
- Limited flexibility due to rigid hierarchy.
Hierarchical Data Model
- Extends hierarchical model with multiple parents per child.
- Complex and challenging to manage.
Network Data Model
- Organizes data into tables with rows and columns.
- Most widely used model for structured data.
- Offers simplicity, flexibility, and powerful querying capabilities.
Relational Data Model
- Visual representation of entities and their relationships.
- Used for database design and understanding data structures.
Entity-Relationship (ER) Model
- Represents data as objects with properties and methods.
- Supports complex data structures and object-oriented programming paradigms
Object-Oriented Data Model
- Designed for large-scale, unstructured or semi-structured data.
- Includes various types:
o Key-value stores
o Document databases
o Columnar databases
o Graph databases - Offers scalability, high performance, and flexibility for specific use cases.
NoSQL Data Models
o Composed of rows and columns forming cells.
o Cell addresses are unique (e.g., A1).
o Number Formats
General, Number, Currency, Accounting, Date, Time, Percentage, Fraction, Scientific, Text.
Adjusts based on cell position.
o Relative
Remains constant (uses “$”).
o Absolute
Combines relative and absolute addressing.
o Mixed
Ideal for visualizing mathematical functions and trends.
- Scatter Charts
Effective for displaying proportions and percentages.
- Pie Charts
- Suitable for plotting functions with x and y values.
- Requires careful data entry and series selection.
- Can plot multiple functions on the same chart.
Scatter Charts:
- Represent data as proportions of a whole.
- Customize with data labels, legends, and 3D effects.
Pie Charts:
: Powerful tool for data analysis and summarization.
Pivot Tables
- Creation:
o Select data in the spreadsheet.
o Go to “Insert” > “PivotTable.”
o Choose location and click “OK.”
- Fields:
o Drag fields into “Row Fields,” “Column Fields,” “Values,” and “Filters” areas.
o “Values” can be summarized (sum, average, count, etc.).
- Data Manipulation:
o Filtering: Select specific categories or values.
o Sorting: Arrange data in ascending or descending order.
o Calculations: Perform calculations (sum, average, etc.) within the pivot table.
- Data Updates:
o Refresh the pivot table to reflect changes in the source data.
- Data Preparation:
o Ensure data is clean and organized (no empty rows/columns, consistent headers).
o Convert text to numbers if necessary.
- Calculated Fields:
o Create new fields within the pivot table using formulas.
o Combine existing fields to derive new insights.
o Enhance data analysis by adding custom calculations.
- Grouping:
o Group data into ranges (e.g., sales ranges, date ranges).
o Customize group intervals (equal or unequal).
o Combine and ungroup groups as needed.
- Data Manipulation:
o Sort, filter, and rearrange data within the pivot table.
o Modify field names and formats.
o Hide and show details for better readability.
- Advanced Techniques:
o Use GetTableData function for more flexible formula creation.
o Combine grouping with other pivot table features.
is a graphical representation of a database that shows the entities (important objects), their attributes (descriptive details), and the relationships between them. It helps us understand the structure of a database before it is actually built.
An Entity-Relationship Diagram (ERD)
- Represent real-world objects or concepts (e.g., Human, Pet, Vet).
- Shown as rectangles in the ERD.
Entities
- Provide details about the entities (e.g., Name, Surname, Pesel for
Human). - Shown inside the entity rectangles.
Attributes
- Show how entities are connected (e.g., a Human owns a Pet).
- Represented by lines connecting the entity rectangles.
Relationships
- A unique identifier for each entity record (e.g., Pesel for Human).
Primary Key
- A reference to a primary key in another entity (e.g., Owner_Pesel
in Pet points to Pesel in Human).
Foreign Key
- A process of organizing the database to minimize data redundancy and improve data integrity.
Normalization
- A way to create new entities that inherit attributes from existing entities (e.g., Owner and Vet inherit attributes from Person).
Inheritance
- An entity used to represent a many-to-many relationship between two entities (e.g., Owns table to connect Humans and Pets).
Associative Entity
- When the same data is stored in multiple places in a database, which can lead to inconsistencies.
Data Redundancy
Used to retrieve data from one or more columns.
- SELECT
select all columns
o SELECT * FROM table
select specific columns
o SELECT column1, column2 FROM table
Specifies the table from which to retrieve data
- FROM
Filters data based on conditions.
- WHERE
o WHERE column = value
o WHERE column > value
o WHERE column BETWEEN value1 AND value2
o WHERE column IN (‘value1’, ‘value2’)
o WHERE column IS NULL / IS NOT NULL
Sorts the results.
- ORDER BY
Removes duplicate values.
- DISTINCT
o SELECT DISTINCT column FROM table
Renames columns.
- AS
o SELECT column AS new_column_name FROM table;
Concatenates strings.
- ||
o SELECT column1 || ‘ ‘ || column2 FROM table;
Used for pattern matching.
- LIKE
contains pattern
o WHERE column LIKE ‘%pattern%’
starts with pattern
o WHERE column LIKE ‘pattern’
second character is pattern
o WHERE column LIKE ‘_pattern’
Extracts the year from a date.
- EXTRACT(YEAR FROM column):
- Use NVL() function to handle NULL values.
- SQL commands are case-insensitive (e.g., SELECT and select are the same).
Combines data from two or more tables based on a related column.
- JOIN
o SELECT * FROM table1 JOIN table2 ON table1.column = table2.column;
Adds new rows to a table.
- INSERT
o INSERT INTO table (column1, column2) VALUES (value1, value2);
Modifies existing data in a table.
- UPDATE
o UPDATE table SET column1 = value1, column2 = value2 WHERE condition;
Removes rows from a table.
- DELETE
o DELETE FROM table WHERE condition;
(DDL):
Data Definition Language
Creates a new table with specified columns and data types
- CREATE TABLE
o CREATE TABLE table_name (column1 data_type, column2 data_type, …);
Modifies the structure of an existing table.
- ALTER TABLE
o ALTER TABLE table_name ADD column data_type;
o ALTER TABLE table1 ADD CONSTRAINT fk_name FOREIGN KEY (column1) REFERENCES table2(column2);
Deletes an entire table.
- DROP TABLE
o DROP TABLE table_name;
- DML commands manipulate data within existing tables.
- DDL commands change the structure of the database itself.
- Use DML commands with caution, as they can permanently alter data.
- Always test DDL commands carefully before executing them on a production database.
Automatically generates a list of all the headings and subheadings within a document, along with their corresponding page numbers.
Table of Contents
Creates a list of all the figures (images, charts, diagrams) used in the document, along with their captions and page numbers.
Table of Figures
Creates a formatted list of all the sources cited within the document.
Bibliography