Database Domain Flashcards
I: DATABASE
is an organized collection of data.
Databases are classified by the way they store this data.
SQL vs. NoSQL
I: DATA MODELING
is the process of diagramming data flows. When creating a new or alternate database structure, the designer starts with a diagram of how data will flow into and out of the database.
I: INDEX
is the way to get an unordered table into an order that will maximize the query’s efficiency while searching.
I: STATEFUL DB
remembers client data (state) from one request to the next.
Stateful servers do store session state.
I: SHARDING
is a type of database horizontal partitioning that separates very large databases the into smaller, faster, more easily managed parts called data shards.
I: DB SCALING
SQL databases, they are vertically scalable which means that you can increase the load on a single server by increasing components like RAM, SSD, or CPU.
NoSQL databases are horizontally scalable which means that they can handle increased traffic simply by adding more servers to the database.
I: DB HIGH AVAILABILITY
Amazon RDS Multi-AZ deployments.
When you provision a Multi-AZ DB instance, Amazon RDS automatically creates a primary DB instance and synchronously replicates the data to a standby instance in a different Availability Zone (AZ).
Amazon Aurora increases availability by replicating your data six ways across three Availability Zones. This means that your DB cluster can tolerate a failure of an Availability Zone without any loss of data and only a brief interruption of service.
I: DB MIRRORING
is a primarily software solution for increasing database availability. Mirroring is implemented on a per-database basis and works only with databases that use the full recovery model.
I: TABLE SCANS
is the reading of every row in a table and is caused by queries that don’t properly use indexes.
I: OLAP
OLAP is an online system that reports to multidimensional analytical queries like financial reporting, forecasting, etc.
I: IN-MEMORY DB
In-memory databases are designed to attain minimal response time by eliminating the need to access disks.
Elasticache for Redis/Memcached – real-time bidding, gaming leaderboards, caching
I: DATABASE PERFORMANCE
Query Tuning
Improving Indexing
Input/Output Tuning
Scaling
Caching
I: DATABASE TROUBLESHOOTING
Gathering the Facts – you need to know what kind of problem is happening.
Test in Different environments and Machines – you might find only one environment is affected, a set of environments or all environments.
Review the SQL Server Error Log – SQL Server creates a log file called “ERRORLOG”. A new ERRORLOG file is created every time SQL Server starts up.
Review the Event Log – use the Event Viewer to look at the different event log records. The event log contains both informational warnings and error events.
Review the Default Trace – this trace captures all configuration changes to an instance.
Review the Change Log – a change log is some centralized location that identifies all changes the have been introduced.
Develop a testing plan – identify how the application is connecting to SQL Server and the T-SQL code that is being executed.
Backup Database – this backup will provide you a recovery point should you want to start tweaking SQL Server as part of your diagnostics troubleshooting steps.
I: CACHING
Caching allows you to efficiently reuse previously retrieved or computed data.
Trading off capacity for speed, a cache typically stores a subset of data transiently, in contrast to databases whose data is usually complete and durable.
I: RAID FOR DB
Use RAID 10 for high IO databases, when possible, but keep in mind that 50% of the total disk space is used for mirroring.
RAID 5 / RAID 6 have slower write performance, but fast read performance, and are suitable for read-only databases.
Logs should be written to separate RAID arrays from data files.
I: SQL (Relational DB)
Relational Database (Aurora, MySql, Oracle, SQL Server, MariaDB):
is a collection of data items with pre-defined relationships between them.
These items are organized as a set of tables with columns and rows.
Tables are used to hold information about the objects to be represented in the database.
Each column in a table holds a certain kind of data and a field stores the actual value of an attribute.
I: READ REPLICAS
allow data to be available for reading across any number of servers, called “slaves”. One server remains the “master” and accepts any incoming write requests, along with read requests. This technique is common for relational databases.
I: DATABASE PORTS
Aurora/MySQL/MariaDB = 3306
PostgeSQL = 5432
Oracle = 1521
SQL Server = 1433
I: NORMALIZATION OF DATA
the process of eliminating unstructured data and redundancy in order to ensure logical data storage.
I: DENORMALIZE TABLE
Denormalization is a database optimization technique in which we add redundant data to one or more tables.
Can help with avoiding costly joins in a relational DB.
I: CAP THEOREM
that a distributed system can deliver only two of three desired characteristics: consistency, availability, and partition tolerance (the ‘C,’ ‘A’ and ‘P’ in CAP).
Consistency – means that all clients see the same data at the same time, no matter which node they connect to.
Availability – means that that any client making a request for data gets a response, even if one or more nodes are down.
A partition tolerance – is a communications break within a distributed system—a lost or temporarily delayed connection between two nodes.
I: DATA CHARACTERISTICS
Accuracy
Completeness
Consistency
Timeliness
Validity
Uniqueness
I: SAGA PATTERN
is a way to manage data consistency across microservices in distributed transaction scenarios.
I: STATELESS
A Stateless server keeps no state information. Stateless file servers do not store any session state.
I: DB CLUSTERING
is a technology that provides high availability for the entire SQL Server instance. This means that it’s possible to have to servers with identical structure (OS, applications, services, service packs, patches…etc) and with shared storage (the data itself).
I: OLTP
is a system that manages transaction-oriented applications on the internet for example, ATM.
I: KEY-VALUE DB
is a type of non-relational database that uses a simple key-value method to store data. A key-value database stores data as a collection of key-value pairs in which a key serves as a unique identifier.
DynamoDB/Cassandra – session state, shopping cart
I: NOSQL (Non-Relational)
is particularly useful for storing unstructured data, which is growing far more rapidly than structured data and does not fit the relational schemas of RDBMS. Common types of unstructured data include: user and session data; chat, messaging, and log data; time series data such as IoT and device data; and large objects such as video and images.
DynamoDB
Cassandra
MongoDB
Document databases.
Key-value stores.
Column-oriented databases.
Graph databases.