General Study Pt2. Flashcards
things i missed out
Phantom Phenomena
Any transaction processing related commands need to be translated into basic and explicit reads and writes on database objects.
Sharing Phenomena
Running concurrent transaction through different isolation levels:
Serializable Read - Not possible Dirty Read/Inconsistent Read/Phantom Read
Repeatable Read - Not possible Inconsistent Read/Phantom Read
Read Committed - Possible Inconsistent Read/Phantom Read
Read Uncommitted - Possible Dirty Read/Inconsistent Read/Phantom Read
Wait for Graph
Draw a graph of the processes and what processes are waiting for what resources. If there are cycles then there is a deadlock.
Two Phase Locking (2PL)
2PL is a concurrency control mechanism used to manage locks, it is in charge of assigning locks and unlocks. It has two phases, growing and shrinking. During the growing phase it hordes locks, and during the shrinking phase it releases locks. This is a strict phase, after a lock is released, no more locks may be attained.
Conflict Serializable
A serializable schedule is conflict serializable if it can be transferred to a serial schedule by swapping non-conflicting operations.
Serial Schedules
A serial schedule executes transactions in their totality. It is correct. It doesn’t allow interleaving.
Recoverable Schedules
it is recoverable, no need to roll back. If no transaction T in S commits until all transactions Ti that have written an item that T reads have committed.(all changes are made before reading)
Cascading Rollback:
Abort a transaction, To avoid this, make sure that every transaction reads only transactions that were written by committed transactions.
Strict Schedule:
On top of cascading rollbacks, writes delayed until all transactions previously written are either committed or aborted.
List of common ports at…
Can find list of common ports at /etc/services on UNIX.
IP
Internet Protocol is part of OS. Address in dotted decimal.
IP: TCP Specification in OS
Allocate local resources -> Define local and remote endpoints -> initiate communication -> send and receive data.
Socket
API to TCP. Created on UNIX as a set of OS calls:
Communication connection point which can be named and addressed
Data Structure
Set of API functions
COM/OLE
COM (Component Object Model) and OLE (Object Linking and Embedding) are technologies developed by Microsoft for creating modular and extensible software components. In the context of distributed databases, COM/OLE can be used to facilitate communication and interaction between different software components or applications.
How does a JAVA RMI work
It does as so:
1. Server sets up service and registers objects
2. Client looks for objects in registry
3. Client retrieves object stub
4. As client uses stub, communicates with server
Interoperability
Interoperability refers to the ability to be able to run a process or referencing a process on another platform or address space.
Producer Consumer
Producers generate tasks or data and consumers process them. Communicate asynchronously through a buffer, preventing resource conflicts.
Master Slave
Master slave is a model in which there is one node which is a master and a number of nodes which are the slaves. The master nodes distributes tasks amongst the slave nodes, in the form of parallel programming.
Creating Parallel Program
1) First we want to break down our task into sub tasks
2) We want to assign each of these sub-tasks to a process
3) We then want to orchestrate synchronization, communication and data access
4) Finally map processes to processors
Technology needed to design Physical DDB
DB Connection (FDW, stored procedures…)
Triggers
Views
Replication
Perils of Replication
Security
Consistency
Performance
Recovery
2PC
Standards for two phase commit. Allows programmers to commit the entire transaction in one request with this request spanning over various systems and networks.
Procedural Replication
Proc. Replication queues calls to procedure and then delivers to all relevant sites. Conflict handlers required.
Multimaster Replication
Used if there is need to maintain a table at numerous sites with updates being reflected at all sites. Not good for massive data updates.
Uses synchronous for online data and asynchronous with queue for eventually delivered. Needs conflict resolution in async.
Updatable Remote Views
If local changes to a remote view are required, we have to send these changes to the master site which builds the remote view.
Remote Read Only Views
Copy part of a database into local database, used when connection dependency is high.
In logical replication, a publisher can:
- Broadcast its updates
- Can be a number of subscribers to the same broadcast
- Can control what updates to show
- Response time and latency at subscriber are OK.
Logical Replication
Can publish row changes over one or more persistent tables in a source database to a number of interested parties through a subscription. Source DB must declare a named object as a publication.
Client Set Up
- Install a client software on a client machine
- Check client machine can ping servers
- Check current assigned IP is in range
- Add a connection
Server Set Up
- Install postgreSQL server
1.1 Adjust start-up files - Restart PostgreSQL server
- Extract the following details
3.1 Server Ip and listening port
3.2 DB name
3.3 DB username - Check local instance
PGSQL Shell \q
\q Remark quit shell
PGSQL Shell \I
\l Remark list of databases …
PGSQL Shell \c
\c scott Remark … connect to database scott
How do you create a table
CREATE TABLE name
How do you create a view
CREATE OR REPLACE VIEW name
AS SELECT * FROM scott.emp
WHERE emp.no = 10;
Input text Postgres in SQL
COPY table_name FROM ‘/path/to/input/file.csv’ WITH CSV HEADER;
Output text postgres in SQL
COPY table_name TO ‘/path/to/output/file.csv’ WITH CSV HEADER;
Foreign Data Wrapper enable functionality
CREATE EXTENSION postgres_fdw;
Create FDW server
CREATE SERVER fdw_database_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS ( host ‘127.0.0.1’, port ‘5432’, dbname ‘dm’ );
Create DBlink Extension SQL
CREATE EXTENSION dblink;
Get DBLink connections SQL
SELECT dblink_get_connections();
Disconnect DBlink SQL
SELECT dblink_disconnect(‘’con2tastrade’);
Copy data from other DB using DBLink SQL
SELECT * FROM dblink( ‘port=5432 dbname=server_b user=postgres password=?’, ‘select * from test.t1’) AS t1 (a int, b varchar(50));
Design Quirk
In Postgres, all databases are isolated from each other, even if they are on the same server.
State of Play
Use a middleman to connect components and apps to ease communications.
Speedup
Measures Decrease in Response Time when adding more distributed nodes. Linear speedup is ideal.
Scale Up
Measures how long response time can be maintained when adding more workload and nodes. Ideally remains the same.
Size Up
Measures the proportionality of response time to increase of workload
Design Considerations
3 Dimensions of design considerations:
Level of Sharing: We can choose between no sharing, data sharing or data+program sharing.
Behaviour of Access Patterns: Static or Dynamic
Level of Knowledge on Access Pattern Behaviour
Components of DDBMS
- User Processor
- UI: I/O of end users
- Semantic Data Controller: Integrity
- constraints & authorizations
- Global Query Optimizer & Decomposer
- Distributed Execution Monitor
- Data Processor
- Local Query Optimizer
- Local Recovery Manager
- Run Time Support Processor
DBMS Architectural Model
Distribution of Data: Each node is a fully fledged DBMS
Autonomy of Data: DBMS manages DBMSs that run in total isolation.
Heterogeneity: DDB is made up of DBMS’s that vary in data modelling, query languages etc.