General Study Flashcards
Explain the Pub/Sub Model (4 Features)
- Decoupling
- Messaging Server
- Asynchronous Communication
- Pub/Sub Model
What does the use of a queue model do to a Publish Subscribe Model
The use of a strict queue means that only one user will be able to read the message
Give an example of a publisher subscribe model.
Youtube Subscribers
What is a Local Conceptual Schema
A fragment of the global conceptual schema stored locally on site. Has its own local physical schema.
Advantages of Distribute + Replicate
Performance, Fault Tolerance, Scale Up, Application Related Aspects
When not to replicate
- When there is Low Replication Transparency
- When there is Data Consistency Issues
What is Update Propagation Protocol?
Set of rules for how changes/updates to data are propagated.
Synchronous/Primary Copy UPP:
Primary Copy
On Read: Read Locally and Return to User
On Write: Write locally, multicast to replicas
On Commit Req: Run 2PC Coordinator
On Abort: Abort + Inform other sites.
Secondary Copy
On Read: Read Locally
On Write from Client: Refuse or forward to primary.
On Write from Prim.: Write locally, multicast to replicas
On Commit Req: Commit locally
Participant of 2PC on primary
Synchronous/Primary Copy Advantages
- Updates don’t have to be coordinated
- No inconsistencies or Deadlocks
Synchronous/Primary Copy Disadvantages
- Long response time
- Useful only with few updates
- Local copies almost useless
- Not used in practise
Synchronous/Update Everywhere Advantages:
- No inconsistencies
- Elegant (Updates applied uniformly)
- Data Consistency
- Fault Tolerance
Asynchronous/Primary Copy
Primary Copy
On Read: Read Locally and Return to User
On Write: Write locally, return to user
On Commit/Abort: Terminate locally
After Commit: Multicast changed objects in single array.
Secondary Copy
On Read: Read Locally
On Write from Client: Refuse or forward to primary.
On Message from Prim.: Install changes in order
On Commit/Abort: Commit locally
Only local deadlocks
Asych/Primary Copy Advantages
- No coordination needed
- Short response time
- Good Performance
Asynch/Primary Copy Disadvantages
- Local Copies not updates
- Inconsistencies
- Limited Fault Tolerance
Asynch/Update Everywhere
Primary Copy
On Read: Read Locally and Return to User
On Write: Write locally, return to user
On Commit/Abort: Terminate locally
On msg from other site: Detect Conflicts
After Commit: Multicast changed objects in single array.
Secondary Copy
On Read: Read Locally
On Write from Client: Refuse or forward to primary.
On Message from Prim.: Install changes in order
On Commit/Abort: Commit locally
Only local deadlocks
Explain ROWA
Read One, Write All. Replication strategy where each site keeps local data copy and rules govern how operations are handled.
Concepts:
- Each Site uses 2PL
- Read Ops performed locally
- Write Ops performed at all sites
Reconciliation patterns
1) Latest Update Wins (Most recent update preferred when conflict occurs)
2) Site Priority (Prioritize Updates from HQ)
3) Largest Value (Prioritize Largest transaction)
Ad-hoc Reconciliation strategies
1) Identify Changes and try combine them
2) Analyze and eliminate unimportant transactions
3) Create your own priority schemas
Replication Protocols to deal with replications with failures
Site Failures -> Use ROWAA
Network/Comm errors -> Use Quorums
Quorum
Needs to reach a certain threshold of successful responses before considering a transaction committed.
ROWAA (Primary Site)
Read -> Read any copy, if timed out read another copy
Write -> Send write(x) to all copies. If a site rejects, abort. All sites that don’t respond are missing writes.
Validation -> To commit a transaction. Check if missing writes are still down, if no then abort. Also check if available sites are still available, if no then abort.
ROWAA (Update Everywhere)
Read -> Read any copy from available sites
Update -> Update any copy from available sites
Modify -> Run a special atomic transaction at all sites: Make sure no concurrent views exist. Make sure sites are of the highest version.
Recovery -> Get missed updates from active nodes.
What does NOTIFY do?
Raises notification event on certain cannel to clients that subscribed. If no session is listening, notifications are lost.
Notification Structure
Channel, Process ID, Payload
Notification Syntax
Notify Channel [, msg_payload]
OR
pg_notify(chn::text, msg::text);
What does Listen do
Registers the spawning session to a notification channel
Listen Operations:
- When notification raised, registered sessions are notified
- Sessions can issue ‘unlisten’ to server
- Registrations automatically UNLISTENED when session ends.
Listen Syntax
LISTEN channel
UNLISTEN channel
or
pg_listen conn notifyName
Timing of Producer sending messages
Calling Notify
- If raised during transaction, queue notification until after commit.
- If transaction rolled back, notification never delivered
Timing of Consumer seeing messages
Calling Listen
-If called during transaction, no access until local session transaction commits or rolls back.
- The channel is accessed after the local session terminates.
What is the Notify Listen Model
Implementation mechanism of Pub/Sub Models. Notify and Listen are part of dblink extension (non-standard SQL).
Generate Json from a table
COPY(SELECT row_to_json(r) FROM (SELECT * FROM scott.dept r)) to ‘filename.json’;
Generate XML from data
SELECT table_to_xml(‘scott.dept’, true, false”);
SELECT query_to_xml(‘scott.dept’, true, false”);
Show how XPATH is used
SELECT data FROM test_table WHERE CAST (xpath(‘||root|s_node*|text()’,data) as text[]) = ‘{a_value}’;
Get a resource using CURL
curl
–insecure
–request
GET “resource url&key”
What is REST (Just definition)
Representation State Transfer is an architecture used for interfacing with web services. Any API following REST principles is considered RESTful.
What are the Principles of REST (There are 6)
- Client Server Architecture
- Stateless Operation (receive enough info to understand msg in isolation)
- Resource Caching (Requests may be answered through a cache)
- Uniform Interface (Server announces available actions and resources for ease)
-Layered System (Client can’t tell if it’s connected directly or through middleman) - Code On Demand (optional; clients can send executable code)
What is an idempotent operation
An operation that can be carried out multiple times while leaving the server in the same state ex. x=1.
Fault tolerant. Safe as doesn’t alter state.
What is the function and idempotency of the following:
GET
PUSH
PUT
DELETE
GET - returns representation of a resource, idempotent.
PUSH - creates new resource, server decides and returns URL. Not idempotent nor safe.
PUT - creates new resource, client decides and returns URL. Idempotent but not safe.
DELETE - remove specified resource (not always physical). Idempotent but not safe.
What are the prerequisites of interfacing web services to application functionality.
- Service info transparent to language
- Service discovered across large collection of services and servers.
- Data exchange between machines.
- Dealing with errors, server comm. unavailable/busy.
What is SOAP? Is it industry Grade?
Simple Object Access Protocol is a protocol for inter-app communication with web services, used to exchange structured messages (ex. XML).
Not industry grade as it lacks guarantees of features such as transactionality. It is extendible to become industry grade,
What is the SOAP messaging stucture?
Envelope: The root of the message.
Header: Holds info about processing of message. mustUnderstand? Can have multiple headers. Optional
Body: contains the actual message in XML. Either app-specific data or a fault message
Fault: Communicates errors. Standardized codes/messages containing; Fault code, fault string, fault actor, details.
What is SOAP Binding
Can bind to different protocols ex. SMTP
- Put SOAP envelope inside protocol as a payload to that protocol
- With HTML using POST - envelope is big.
What is WSDL
Web services description language is a contract between service and consumer. It specified the location and method of a service.
What is UDDI
Universal Description, Discovery and Integration is an online directory to query and search for web services. New services can be published to it.
Explain ACID
Atomicity: Either all operations of a transaction go through or none of them do
Consistency: Data should remain correct
Isolation: To preserve consistency , two conflicting operations are not permitted
Durability: After completion of transaction, impact should persist even if system fails.
CAP Theorem and Solution
- Consistency
- Availability
- Partition Tolerance
Can beat it through eventual or casual consistency
What is BASE
- Basically Available (Available through repetition)
- Soft State (Values in DB can change over time)
- Eventually Consistent (DB can become consistent in long run)
What is Eventual Consistency?
Given some DDB, replicas are guaranteed to be consistent in state at some point in time with no writes present.
Allows us to have availability and partition tolerance, however no guarantees on conflicts or order.
has data inconsistency, transaction inconsistency, and integrity invariant violation.
What is a Casually Consistent DB?
Order of operations guaranteed, order of concurrent writes not guaranteed.
- CAP Free
- Strongest consistency model in fault tolerant DDB
May slow performance, doesn’t guarantee durability
Some approaches to a casual consistent DB
- Use RDBMS as prim. storage
- Deliver DDB as middleware
- Deliver CC+ with read only transactions/invariant preservation.
Define CQS
Command Query Separation (Change/Read Systems State)
Define CQRS
Command Query Responsibility Segregation (Favours separate data models for Read + Write)
Define ES
Event Sourcing (data changes captured as sequence of events, in a log)
What is D-Thespis
Middleware that delivers a CCDBMS for data. Accessible via REST API.
Improvements:
- Elastic horizontal scalability
- Improved update visibility latency
Features of D-Thespis Model (7)
- Rest Client API (Clients to R/W)
- Middleware Engine (NO knowledge of model but serves READ requests)
- Actor Provider (encapsulates logic to be executed)
- Data Centre Clock (other layers use to obtain physical timestamp)
- Cluster Clock (Access to read and maintain stable version vector)
- Data Replication Job (Periodically executes data replication protocol)
- Data Snapshotting Job (Periodically identify data entities with no more dependencies)
What are the two forms of client server communication, define them.
TCP/IP: Connection oriented, guarantees data transfer, has checksum.
UDP/IP: Connectionless protocol, single datagram sent and no acknowledgement for delivery.
What are the steps for defining the Implementation of IP using TCPIP in OS?
Allocate local resources, Specify local+remote endpoints, initiate connection, send or receive data.
What command is used to allocate port number
Bind()
Socket
API to TCP. Developed on UNIX as a set of OS calls:
- Comms. connection point that can be named and addressed in network
- Data structure
- Set of API functions
What are the three steps for Client-Software Communication?
1) Client prepares endpoint of addresses of server in corresponding data structure and returns reference to it
2) Client issues socket() call to create socket
3) TCP client issues connect call, fills socket data and attempts connection. UDP client uses sendmsg().
List the steps in the TCPIP connection flow.
1) Server creates socket and waits for remote clients to connect (listen())
2) Client calls connect(), server issues accept(). TCP 3 way handshake occurs (SYN, SYN/ACK, ACK)
3) Clients + Server exchange data over socket using read() or write(). Typically ACK each write.
4) Either party closes (FIN).
What are the differences between TCP and UDP?
Reliability: TCP uses ACK, has retransmission and timeouts. UDP has none of those.
Order: TCP is ordered, messages received in order sent. UDP has no order or guarantees.
Overhead: TCP has high overhead due to 3 way handshake. UDP has no overhead.
Method of Transfer: TCP data read in streams of bytes with no message boundaries. UDP reads data as packets with boundaries.
Applications: TCP has HTTP, SMTP, FTP… UDP has Video streaming and VOIP
How do you define a new socket in Java?
Socket s = new Socket(IP, Port);
What is the conceptual server algorithm?
Create socket, bind to service port, repeat indefinitely until closed.
Define Iterative and Concurrent Servers.
Iterative - Serve one client at a time until termination.
Concurrent - Serve several clients, using time slicing to not block clients.
Define and mention the differences between HTTPs and Web Sockets
HTTPs - protocol above TCP, used by REST. Client Requests, server responds. Long Polling used (only responds if new message available or timeout reached).
Websockets - Protocol, starts with ws(s)://. Allows for sending data similar to UDP but with TCP reliability. HTTPs used as initial transport mechanisms, request to open a websocket and respond if possible. If successful both parties use existing TCP connection as a websocket connection.
What can unrestricted sharing lead to (3 - and explain them)
- Lost Updates (Two concurrent updates on same data, only one goes through)
- Inconsistency Read (When transaction reads data thats being modified)
- Dirty Read (When data is read that has been modified but not committed, if rolled back it is a dirty read)
What is a transaction and what are its components?
Sequence of actions that realize a logical operation. Components:
- Disk read and write operations
- Actions supplemented by control instructions (start/commit/rollback)
What does a transaction model assume?
- R/W order is unchanged when being processed
- Transaction assumed to be serial and correct in its totality
What does a Transaction processing system do?
Applies and processes transactions over a DB. Must push for highest level of transaction troughput.
What makes a good TP system?
Generates good interleaving or avoids bad interleaving. We want good results without having to know what each transaction is up to. Each update activity broken down into a primitive R/W operation.
Comment on recoverable schedules.
Straightforward recovery process:
- Recoverable
- Cascading rollback
- Strict Schedule
What is the serializability theory?
A schedule that executes transactions in their totality is serial. Assume serial schedules of independent transactions are correct. A schedule of N transactions is serializable if equivalent to a serial schedule of the same N transactions.
What is a precedence graph and what does it do?
Directed graph in which nodes represent transaction on the schedule and the directed edges represent conflict operations between two transactions.
1) Looks at only r(x) or w(x) ops.
2) Constructs precedence/serialization graph
3) Edge created between nodes if preceding node operation appears before conflicting operation in latter node
4) Schedule serializable if and only if precedence graph has no cycles.
Topological sort extracts a serial schedule.
What are the conditions for view equivalence?
- Corresponding read operations in each schedule return the same values
- Both schedules must return final DB state (last write must be the same)
What’s the difference between Monolith and Microservices?
Monolith is one large service. Microservices are multiple small systems collated into a large one.
What can arise when multiple clients work with the same data?
Stale Data: Data which has changed since being retrieved by the current process.
Pessimistic Locking: Resource is locked as soon as it is accessed and released as soon as all intended changes are committed. Prevents conflicts.
Optimistic Locking: Resources can be read and changed freely (assumed changes won’t conflict). Then check for conflict when committing changed result and act according to specified conflict resolution protocol. Avoids overhead of locking resource for a long period of time.
Java RMI abstractions
- Remote interfaces (allows remote invocation)
- Stubs and Skeletons
- Use proxy for the remote object on the client, and skeleton on the server side to receive incoming method calls, making remote method calls appear as if they were local method calls.
- Serialization - Remote passing of objects and data without needing to serialize or deserialize it.
- Naming Service - Use a registry to locate the objects, simplify look-ups
What is transparency
Refers to the separation of the higher level semantics of a system from the lower level implementation issues.
Explain 3 different types of transparency.
Network transparency - Hide existence of network from end user. Isolation from network artefact’s implementation details.
Location Transparency - The process is independent of the processor that executes it.
Naming Transparency - Each DB object has a unique name.
Fragmentation Transparency: Queries must be broadcast to all fragments and results collated.
Replication Transparency: Should end users be aware that the DB uses replication?
How can we provide for transparency?
- Access Layer for data resources
- Operating system for network resources
- DDBMS takes role of DBMS, OS etc.
Explain the ANSI/SPARC Model.
Layers:
- End User
- External Level (Interacts with users and represents data)
- Global Conceptual Schema (To link local conceptual schemas)
- Conceptual Schema (Defines structure of DB w/o implementation details)
- Internal Level (Deals with physical storage and devices. Low Level Details)
2 Advantages of ANSI/SPARC
Data independence
Modularity & Flexibility
2 Disadvantage of ANSI/SPARC
Complexity & Overhead
Low Performance
What are the architectural components of a DDBMS?
- User Processor
- Data Processor
Name some global directory issues.
Global vs Local
Central vs Distributed
Single vs Multiple copies
RPO
Recovery Point Objective (At start of data loss)
RTO
Recovery Time Objective (At start of data being available again/recovery)
Consensus Issue
When all data servers present agree on a value, we have a consensus
Paxos
Family of consensus algorithms. Asynchronous. Totally order transactions across data servers by consensus. Allows select + appointment of leader by consensus.
Characteristics of Paxos
Accommodate a certain level of failures (some units). Messages can be lost, delayed, re-received but never corrupted.
Requirements for Paxos
Single Replica Semantics, Data Consistency between values, Progress expectation
Transactions outcomes
Either commit or rollback.
Is 2PC fault tolerant and why?
No because it blocks all participants when coordinator is blocked. It is a consensus broker.