Chapter 3 - Software architectures and their trade-offs Flashcards
What is a Distributed System?
Collection of independent Computers that appear to be a single coherent system
What does Reliability mean in the context of distributed System
Probability of a system to perform its required functions under stated conditions for a specified period of time.
What does Availability mean in the context of distributed System?
Proportion of the time a system is in a functioning state, i.e., can be used.
Five nines is 0.99999 or 99.999% available
How is Availability usually measured (nines)
Class of 9s ->
1 nine: up to 35.5 day / year
4 nines: up to 52.56 min / year
9 nines: up to 31.536s / year
downtime etc
Availability vs reliability
Server may never be down but return false result’s
What does Transparency mean in the context of distributed System?
Making the complexity of the system invisible or transparent. We want to create illusion of simplicity
e.g hide the data location info,
hide the fact that server went down and session was moved to a different one. etc
What is Middleware?
Services and Abstractions that help accomodate the distributed application. e.g Messaging we implentented in CBDP or webservices
How does a Typical Mainframe Model looks like?
Network in the middle
Terminals am Rande
Hauptserver (Mainframe) macht alles
What are Pros of a Typical Mainframe Model
Single point of administration
Simple architecture and low bandwidth
Little hardware maintanance
What are Cons of a Typical Mainframe Model
Single point of failure
Bottlenecks due to sharing of the data
Mainly console programms
How does a Three layered client / server arch looks like?
Client uses browser to render data
(Web / app) Server has:
1. Presentation Layer
2. Business Layer
3. Data Layer
What is a part of Presentation Layer of three layered client / server arch
UI Components and UI all together
What does Business Layer of the three-layered client server arch consists of
Application facade,
Business Components, entities(DTO) and workflow
What does Data Layer of three layered client server arch consists of
Data access (JPA)
Data utilities(BE)
Service agents(repo in bierkasse)
What are the pros of a Three layered client server arch
No single point of failure
Scalability of the architecture
Flexibility of the architecture
What are Subprograms in DB
PL / SQL Blocks (procedure / function) that take params. difference between procedure and function - one returns value the other is void
What are pros of Stored Procedures
Better reusability,
Contentarisation - stuff meant to be done on DB is done there
with it security, maintainability etc.
Less communication - no need for sending detailed instructions
What are cons of Stored Procedures
Potentially easier to change (no need for deployment)
Portability issues
Testing - requires a special setup
What is data warehousing (DW)
Collection of tools, methods and others that allow managers and analysts conduct data analysis in decision-making processes
Biggest problems of Data Analysis
Heterogeneous sources with similar / same data
bad Data quality
Data is volatile
What is data mart?
It’s a subset or aggregation of data stored in the primary warehouse (like a part athena)
Difference between OLTP (online transaction processing) and OLAP (online analitical processing)
OLTP DB are semi big, have many users are structured, repetitive good for aplication oriented day to day operations.
OLAP are very big, have less users are ad hoc and process complex queries with mostly read only help with particular subject on decision making.
What is a Data qube?
is a tool in OLAP, based on group by queries, allows for good visualization by slicing the data
What is meant by ETL (Extract, transform, load) and what problems come with it?
Data originates in different clusters / tables and has different formats
It needs to be extracted and trasformed into a single type that can be loaded into DW
It is by far the most complex part of DW development as almost 80% of dev time is spent here.
Central DW Arch
Simple, all data is in one central DW
Federated DW Arch
Data is stored in Data marts which contain the parts of data relevant to different e.g departments
main DW is logical a.k.a virtual
It allows for better performance due to distribution but is more complex to maintain
Tiered DW Arch
Central DW is materialized, data is then distributed to data marts that can be further divided into more data marts creating a type of a redundant tree.
Offers great performance on reads as it is redundant and well distributed, however it is hard to manage and complex to implement.
Big issues in DW
Metadata is needed as in OLAP the information how the data was created is relevant
Creation of a DW takes long time and resources,
Often fails at lack of knowledge, costs, ethical issues etc.
What is Big data?
It relates to datasets that are that big, that managing them in a usual way becomes awkward and problematic.
Key computing resources in Big Data
Memory, Storage, Network
Big data - Challenges
Most important - Scalability!!!!
What data is important to collect? (in times of AI) - definitely not what we already know
Creating system with little overhead
Big Data - Vertical Scaling
lower / higher capacity
e.g 2GB -> 4GB RAM
Big data - Horizontal scaling
more /less units (usually on demand)
1 EC2-> 2 EC2
Compare Horizontal & Vertical Scaling
Horizontal is usually less expensive instantly available, can be automated and not limited by hardware capacity.
Vertical scaling is more expesive (specialized servers), usually requires additional setup and might be limited by hardware capacity (there is only as much RAM a PC can handle)
Horizontal Partitioning (Sharding)
!= data pool
Used to partition data without it’s replication. used in NoSQL based DB like mongoDB etc.
Each shard represents single node in a cluster.
Requires a central lookup e.g hash table to know where an element is located
Sharding - Name strategies
Lookup, Range, Hash
Sharding - Lookup Strategy
Map of all shards
Sharding - Range Strategy
Every Shard responsible for different ranges ordered by a shard key.
Sharding - Hash Strategy
Hashfunction points to a shard
idea is to counteract potential hotspots as in range
NoSQL Databases idea
Schema free (non-relational),
horizontally scalable
Key-Value-Store Redis
Example of a fast key-value-store
Used e.g in real-time analytics, stock-prices and many more
used by Twitter, Github, StackOverflow
Key-value-store Bigtable (Google)
Bigtable map looks like follows:
row key, column key, timestamp
Good for additions, but not good for modifications
how does Document-oriented DB store information
Json like storage e.g. Mongo db
What is better in document-oriented DB
Documents are independent meaning their structure can be changed as we go
Application logic is easy as it transforms the entities from code directly to documents in the database and vice versa (no mapping needed)
Semi structured data allows more intercompatibility in case of a migration - no need to know it’s information schema
Graph Databases
Idea is to create free floating Objects that are interconnected by the meaning.
Thanks to the materialization of relationships at the creation level there is no penalty for browsing them at a later stage, allowing constant access time.
E.g Neo4j
What are the steps in Processing Big Data
- Iterate over the data (large number of records)
- Extract sth of interest from each
- Shuffle and sort for intermediate results
- Aggregate back for a full view
- Generate the final output
=> MapReduce
What is a webservice?
Component wrapped behind a standardized interface e.g REST / SOAP
Biggest advantage of Webservices
Can be called across platforms and operating systems regardless of programming language, at the same time allowing for cross use from different applications.
How does SOAP / WSDL work?
Consist of Envelope that contains Header and body. Body then cointains information in form of a xml file.
It is sent over HTTP / SMTP
Hint: Used in critical services
What is SOA?
Service - oriented Architecture
It’s design and the scale of it that matters here.
e.g Event-based interaction
Language independence e
What are the principles of SOA?
Standardized Contract - default way of accessing all services
Abstraction - hide as much as possible
Reusability - service should be resources
Loose Coupling - little dependencies
Stateless - only then stateful when needed
Discoverability
How does a service discovery works
When it launches it registers itself with a registry / load balancer and can be utilized from there.
Pros / Contras of SOA
Multi-Languge, Loosely Coupled, Independent of vendor or tech
Cons:
No service ecosystem, Complex, not easily scalable, less agile thourgh hardcoding and dependencies
What is REST?
architectural STYLE that exposes resources on a networked system
not a protocol or specification
What is REST Resource ?
A thing that:
is unique,
more then just ID
provides context,
is reachable within addressable universe (URL / URN)
e.g Website, resume, aircraft, employee, application, printer, song etc
REST: Statelessness
URI ( URL/URN) needs to contain a state within it, or be given it e.g:
https://www.google.de/search?q=cloud&ie=utf-8&oe=utf-
8&client=firefox-b-ab&…
No client application state should be stored by the server.
Important Resource state != Application state
What are possible REST: Resource representations?
current state of a resource e.g a list of open tickets in XML/JSON.HTML/CSV etc
metadata of the resource - cover image, reviews, stock-price etc
What does a Safe Operation on REST mean?
Read-only - nothing will change based on the operation
E.g GET
What does a Idempotent Operation on REST mean?
Operation will have the same effect no matter number of times executed.
E.g PUT, DELETE
What is Devops?
“DevOps is a set of practices intended to reduce the time between committing a change to a system and the change being placed into normal production, while ensuring high quality.”
What is a microservice architecture?
Software suites that can independently deployed with certain common characteristics such as business capability, automated deployment or the decentralized control of data.
How is microservice architecture build up?
Each user request is then satisfied in some order of services
Most services are private and not visible on the outside
All Services are independently deployable and updateble
Services are organized around business capabilities instead of resources etc
Differences between Monoliths and Microservices?
Monoliths put all functionality into one big process.
Microservices split it by different services.
This allows Microservices to be easier scalable as the monolith must be replicated in whole, while microservices can have single services duplicated.
How do microservices communicate with each other
All Services communicate via interfaces (REST, or lightweight “dumb” pipe messaging e.g RabbitMQ / ZeroMQ)
Patterns in Microservices
To adress the common problems some patterns have developed.
E.g service-per-container, service-discovery, db-per-service or shared db
Who are the actors in the Cloud
auditor,
broker,
carrier,
consumer,
provider
What does a cloud broker do?
Manages the use, performance and delivery of cloud services. Also negotiates relationships between cloud providers and consumers.
What does a cloud auditor do?
Conducts assessments of cloud services, IT systems, performance and security
What does a cloud carrier do?
Provides connectivity and transport to and from the cloud
What are essential characteristics of cloud?
on-demand self-service
broad network access,
resource pooling
rapid elasticity
measured service
What does On-demand self-service mean in the context of cloud?
Computing capabilities are easly provisioned as needed without human interaction with the service provider
What does Resource pooling mean in the context of cloud?
Providers resources are serving multiple consumers using a multi tenant model.
What does rapid elasticity mean in the context of cloud?
Quick scalability in/out is possible even automatically. E.g DNS / load balancing
What does Measured service mean in the context of cloud?
Automatic control and optimization of resources
Consumer ensuring to have a process detecting idle resources
What is a dynamic scalability architecture?
System with predefined scaling condition which trigger allocation of IT resources from the resource pools.
What are types of dynamic scalability architecture?
horizontal - e.g more instances
vertical - e.g more RAM
relocation - e.g move to different device, that e.g has more I/O capacity