Web Services 2 Flashcards

Question 1

Q

System Architecture

Answer

A

client
web server
application server
database server
DMS

Question 2

Q

Goals and Approaches System Architecture

Answer

A

Internet-based E-Commerce has made system growth more rapid and dynamic.
Improving performance and reliability to provide
A. Higher throughput
B. Lower latency (i.e., response time)
C. Increase availability
Approaches
a. Scaling network and system infrastructure
b. How performance, redundancy, and reliability are related to scalability
c. Load balancing
d. Web caching

goals are to make the E-Commerce system faster, more reliable, and always available.
The approaches involve strategies like expanding infrastructure, ensuring consistent performance as the system grows, distributing workload evenly, and storing frequently accessed data for quicker access.
These efforts ensure that users have a smooth and seamless experience when shopping online.

Question 3

Q

Restricts traffic based on rules and can “protect” the internal network from intruders

Question 4

Q

Directs traffic to a destination based on the “best” path; can communicate between subnets

Question 5

Q

Provides a fast connection between multiple servers on the same subnet

Question 6

Q

Takes incoming requests for one “virtual” server and redirects them to multiple “real” servers

Answer

A

Load Balancer

Question 7

Q

SPOF meaning

Answer

A

single point of failure

Question 8

Q

Scaling Servers: Two Approaches

Answer

A

Multiple smaller servers
Fewer larger servers to add more internal resources

Question 9

Q

Fewer larger servers to add more internal resources

Answer

A

Add more processors, memory, and disk space
Most commonly done with database servers

Question 10

Q

Where to Apply Scalability

Answer

A

To the network
To individual servers
Make sure the network has capacity before scaling by adding servers

Question 11

Q

People scale because they want better performance
But a fast site that goes down because of one component is a Bad Thing
Incorporate all three into your design in the beginning - more difficult to do after the eBusiness is live

Answer

A

Performance, Redundancy, and Scalability

Question 12

Q

Application Service Providers (sites) grow by

Answer

A

scale up: replacing servers with larger servers
scale out: adding extra servers

Question 13

Q

Approaches to Scalability

Answer

A

Farming
Clone
RACS
Partitioning
RAPS

Question 14

Q

the collection of all the servers, applications, and data at a particular site.

Question 15

Q

A service can be cloned on many replica nodes, each having the same software and data.
offers both scalability and availability.

Question 16

Q

Two Clone Design Styles

Answer

A

Shared Nothing
Shared Disc

Question 17

Q

simpler to implement and scales IO bandwidth as the site grows.

Answer

A

Shared Nothing

Question 18

Q

design is more economical for large or update-intensive databases.

Answer

A

Shared Disc

Question 19

Q

grows a service by duplicating the hardware and software; dividing the data among the nodes (by object), e.g., mail servers by mailboxes
should be transparent to the application; requests to a partitioned service are routed to the partition with the relevant data
does not improve availability
the data is stored in only one place
partitions are implemented as a pack of two or more nodes that provide access to the storage

Answer

A

Partition

partitioning helps split up the workload and make sure everything runs smoothly, but it doesn’t make the system invincible to problems, and each piece of data is kept in just one spot

Question 20

Q

What is the problem with load sharing

Answer

A

Too much load
Need to replicate servers

Question 21

Q

Load Sharing Strategies

Answer

A

flat architecture
(DNS rotation, switch based)
hierarchical architecture

Question 22

Q

rotates IP addresses of a Web site; treat all nodes equally
Hot-standby machine (failover); expensive, inefficient
Problem; Not sufficient for dynamic content

Answer

A

Flat Architecture - DNS Rotation

DNS rotation helps balance the load among multiple servers and ensures your website stays online even if one server fails. However, it might not be the best solution for websites that change a lot or show different things to different people.

Question 23

Q

Flat Architecture - DNS Rotation Pros

Answer

A

A simple clustering strategy

Question 24

Q

Flat Architecture - DNS Rotation Cons

Answer

A

Client-side IP caching: load imbalance, connection to down node

Question 25

Q

Switching products
Cluster servers by one IP
Distribute workload (load balancing)
typically round-robin
Failure detection
Cisco, Foundry Networks, and F5Labs
Problem
Not sufficient for dynamic content

Answer

A

Flat Architecture - Switch Based

setup helps manage internet traffic by spreading it out among multiple servers, but it might not be the best for handling content that changes a lot or is personalized for each user

Question 26

Q

Flat Architectures in General Problems

Answer

A

Not sufficient for dynamic content
Adding/Removing nodes is difficult Manual configuration required
limited load balancing in switch

Question 27

Q

Master/slave architecture
Two levels
Level I. Master: static and dynamic content
Level II. Slave: only dynamic

Answer

A

Hierarchical Architecture

ensures that responsibilities are clearly divided and that tasks are handled efficiently at different levels.

Question 28

Q

Hierarchical Architecture Benefits

Answer

A

Better failover support (Master restarts job if a slave fails)
Separate dynamic and static content (resource intensive jobs (CGI scripts) runs by slave)

helps ensure smooth operation and efficiency by distributing tasks among different levels of servers. If one server fails, the others can step in to keep things running, and tasks are divided based on their nature to make the best use of resources.

Question 29

Q

Web performance and scalability issues

Answer

A

network congestion
server overloading

Question 30

Q

A large distributed information system
Inexpensive and faster accesses to information
Rapid growth of WWW (15% per month)
Web performance and scalability issues

Answer

A

World Wide Web

Question 31

Q

Web Architecture WWW

Answer

A

web architecture is about how your browser asks for web pages from a server and how the server sends those pages back for you to see.

Question 32

Q

Intermediate between clients and Web servers
To implement firewall
To improve performance, caching can be placed

Answer

A

Web Proxy

Question 33

Q

Web Architecture Web Proxy

Answer

A

browser helps you access websites, a proxy can hide your identity or enhance security, a web server stores and serves websites, and a firewall protects your computer or network from threats. Each plays a role in your online experience and keeping you safe while browsing the internet

Question 34

Q

Caching popular objects is one way to improve Web performance.
Web caching at clients, proxies, and servers

Answer

A

Web Caching System

Question 35

Q

Advantages of Web Caching

Answer

A

Reduces bandwidth consumption (decrease network traffic)
Reduces access latency in the case of cache hit
Reduces the workload of the Web server
Enhances the robustness of the Web service
Usage history collected by Proxy cache can be used to determine the usage patterns and allow the use of different cache replacement and prefetching policies.

Question 36

Q

Disadvantages of Web Caching

Answer

A

1.Stale data can be serviced due to the lack of proper updating
2. Latency may increase in the case of a cache miss
3. A single proxy cache is always a bottleneck.
4. A single proxy is a single point of failure
5. Client-side and proxy cache reduces the hits on the original server.

Question 37

Q

Web Caching Issues

Answer

A

Cache replacement
Prefetching
Cache coherency
Dynamic data caching

Question 38

Q

Characteristics of Web objects

Answer

A

different size, accessing cost, access pattern.

Question 39

Q

There are new replacement policies for Web objects:

Answer

A

key-based
cost-based

Question 40

Q

Traditional replacement policies do not work well

Answer

A

LRU (Least Recently Used),
LFU (Least Frequently Used),
FIFO (First In First Out),

Question 41

Q

Two Replacement Schemes

Answer

A

Key-based replacement policies
Cost-based replacement policies

key-based replacement policies help decide which items to remove from storage when you need to make room for new ones. They each have their own way of choosing which items to remove based on factors like size, usage history, or download speed.

Question 42

Q

Cost-based replacement policies

Answer

A

Cost function of factors such as last access time, cache entry time, transfer time cost, and so on
Least Normalized Cost Replacement: based on the access frequency, the transfer time cost and the size.
Server-assisted scheme: based on fetching cost, size, next request time, and cache prices during request intervals.

help determine which items to keep or remove from storage based on factors like how often they’re used, how long it takes to access them, and any associated costs. Each policy uses different criteria to make these decisions, ultimately aiming to optimize storage efficiency and performance.

Question 43

Q

Question 44

Q

Prefetching

Answer

A

The benefit from caching is limited.
Maximum cache hit rate - no more than 40-50%
to increase hit rate, anticipate future document requests and prefetch the documents in caches
documents to prefetch
considered as popular at servers
predicted to be accessed by user soon, based on the access pattern
It can reduce client latency at the expense of increasing the network traffic.

prefetching is like getting ready for a party by preparing snacks before guests arrive.
By predicting which documents users will want to access and fetching them in advance, prefetching can help reduce wait times for users, but it also means more data traveling over the network.

Question 45

Q

Cache may provide users with stale documents.
HTTP commands for cache coherence
GET: retrieves a document given its URL
Conditional GET: GET combined with the header IF-Modified-Since.
Pragma no-cache: this header indicate that the object be reloaded from the server.
Last-Modified: returned with every GET message and indicate the last modification time of the document.
* Two possible semantics*
Strong cache consistency
Weak cache consistency

Answer

A

Cache Coherence

ensures everyone accessing the same document gets the most recent version.
HTTP commands like GET, Conditional GET, and Pragma no-cache help achieve this by fetching updated versions of documents as needed, and Last-Modified helps track when a document was last updated.
There are two approaches to cache consistency: strong, where everyone gets the same version immediately, and weak, where updates might take a bit longer to propagate to everyone

Question 46

Q

HTTP commands for cache coherence

Answer

A

GET: retrieves a document given its URL
Conditional GET: GET combined with the header IF-Modified-Since.
Pragma no-cache: this header indicate that the object be reloaded from the server.
Last-Modified: returned with every GET message and indicate the last modification time of the document.

Question 47

Q

Strong cache consistency

Answer

A

Client validation (polling-every-time)
sends an IF-Modified-Since header with each access of the resources
server responses with a Not Modified message if the resource does not change
Server invalidation
whenever a resource changes, the server sends invalidation to all clients that potentially cached the resource.
Server should keep track of clients to use.
Server may send invalidation to clients who are no longer caching the resource.

ensures that clients always have the latest version of a webpage.
Clients ask the server if a webpage has changed since the last visit, and if it has, the server sends the updated version. If the webpage hasn’t changed, the server tells the client it’s still the same, saving unnecessary data transfer.
If the webpage does change, the server notifies all clients, even those who aren’t currently caching the webpage, to ensure everyone stays up-to-date.

Question 48

Q

Weak Cache Consistency

Answer

A

Adaptive TTL (time-to-live)
adjust a TTL based on a lifetime (age) - if a file has not been modified for a long time, it tends to stay unchanged.
This approach can be shown to keep the probability of stale documents within reasonable bounds ( < 5%).
Most proxy servers use this mechanism.
No strong guarantee as to document staleness
Piggyback Invalidation
Piggyback Cache Validation (PCV) - whenever a client communicates with a server, it piggybacks a list of cached, but potentially stale, resources from that server for validation.
Piggyback Server Invalidation (PSI) - a server piggybacks on a reply to a client, the list of resources that have changed since the last access by the client.
If access intervals are small, then the PSI is good. But, if the gaps are long, then the PCV is good.
*

adaptive TTL adjusts how long files stay in the cache based on how often they change.
Piggybacking adds extra information to messages sent between clients and servers to help them keep their caches up-to-date.
This ensures that even if some files in the cache are outdated, the probability of having stale documents is kept low, and the cache stays relatively fresh.

Question 49

Q

Dynamic Data Caching

Answer

A

Non-cacheable data
authenticated data, server dynamically generated data, etc.
how to make more data cacheable
how to reduce the latency to access non-cacheable data
Active Cache
allows servers to supply cache applets to be attached with documents.
the cache applets are invoked upon cache hits to finish necessary processing without contacting the server.
bandwidth savings at the expense of CPU costs
due to significant CPU overhead, user access latencies are much larger than without caching dynamic objects.
Web server accelerator
resides in front of one or more Web servers
provides an API which allows applications to explicitly add, delete, and update cached data.
The API allows static/dynamic data to be cached.
An example - the official Web site for the 1998 Olympic Winter Games
whenever new content became available, updated Web reflecting these changes were made available within seconds.
Data Update Propagation (DUP, IBM Watson) is used for improving performance.
Data Update Propagation (DUP)
maintains data dependence information between cached objects and the underlying data which affect their values
upon any change to underlying data, determines which cached objects are affected by the change.
Such affected cached objects are then either invalidated or updated.
With DUP, about 100% cache hit rate at the 1998 Olympic Winter Games official Web site.
Without DUP, 80% cache hit rate at the 1996 Olympic Games official Web site.
*

dynamic data caching is about finding ways to speed up access to data that changes frequently, like personalized information.
Active cache adds little helpers to cached documents to do some processing without contacting the server each time. While this saves on bandwidth, it can also increase user access times due to extra processing overhead.
DUP helps spread these updates across the caching system quickly and efficiently. It ensures that everyone who visits your website gets the most up-to-date information without any lag.

Question 50

Q

Web Caching and ReplicationProducts Design Goals

Answer

A

Performance
Minimize document retrieval latency
Scalability
Reduce the amount of data transferred
Balance bandwidth use
Manageability: easy to deploy and support
Maintain backward compatibility with existing client software
Availability
Maximize document availability (fault tolerant)
Flexibility: run on multiple hardware and OS platforms, support multiple protocols, HTTP, FTP, etc.
Retain transparency to user