System Architecture Flashcards

1
Q

What is RESTful?

A

Representational State Transfer. It is a set of design principles for making network communication more scalable and flexible.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Discuss RESTful Fielding Constraint: Client-server

A

Client-server
The first Fielding Constraint specifies that the network must be made up of clients and servers. A server is a computer that has resources of interest, and a client is a computer that wants to interact with the resources stored on the server. When you browse the Internet, your computer is acting as a client and sends HTTP requests to a server in order to access and manipulate information. A RESTful system has to operate in the client-server model, even if a component sometimes acts like a client and sometimes acts like a server.

A non-RESTful alternative to client-server architecture is event-based integration architecture. In this model, each component continuously broadcasts events while listening for pertinent events from other components. There’s no one-to-one communication, only broadcasting and eavesdropping. REST requires one-to-one communication, so event-based integration architecture would not be RESTful.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Discuss RESTful Fielding Constraint: Stateless

A

Stateless does not mean that servers and clients do not have state, it simply means that they do not need to keep track of each other’s state. When a client is not interacting with the server, the server has no idea of its existence. The server also does not keep a record of past requests. Each request is treated as a standalone.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Discuss RESTful Fielding Constraint: Uniform Interface

A

The “uniform interface” constraint ensures that there is a common language between servers and clients that allows each part to be swapped out or modified without breaking the entire system. This is achieved through 4 sub-constraints: identification of resources, manipulation of resources through representations, self-descriptive messages, and hypermedia.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Discuss RESTful Fielding Constraint: Caching

A

Caching refers to the constraint that server responses should be labelled as either cacheable or non-cacheable. Caching occurs when the client stores previous responses it received from the server, so that when that data is needed again, it can save a round trip over the network by using the cached data. The ability to cache is made possible by the interface constraint of “self-descriptive messages”, since the client knows that all the relevant data about a single resource is being sent in one response. It doesn’t have to worry about accidentally only caching part of the information it needs, and missing other parts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Discuss RESTful Fielding Constraint: Layered System

A

Layered system refers to the fact that there can be more components than just servers and clients. This means there can be more than one layer in the system. However, each component is constrained to only see and interact with the very next layer. A proxy is an additional component, and it relays HTTP requests to servers or other proxies. Proxies can be useful for load balancing and security checks. A proxy acts like a server to the initial client that sends the request, and then acts like a client when it relays that request. A gateway is another additional component and it translates an HTTP request into another protocol, propagates that request, and then translates the response it receives back into HTTP. A client can simply treat a gateway as a regular server. An example use case for gateways is a system that needs to download a file from an FTP server.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Discuss RESTful Fielding Constraint: Code on demand

A

Code on demand is the only optional constraint and refers to the ability for a server to send executable code to the client. This is what happens in HTML’s tag. When the HTML document is loaded, the browser automatically fetches the JavaScript from the server and executes it locally.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

RESTFul Fielding constraint: Uniform Interface - Identification of resources

A

The first sub-constraint of “uniform interface” affects the way that resources are identified. In REST terminology, a resource could be anything – an HTML document, an image, information about a particular user, etc. Each resource must be uniquely identified by a stable identifier. A “stable” identifier means that it does not change across interactions, and it does not change even when the state of the resource changes. If a resource does get moved to another identifier, the server should give the client a response indicating that the request was bad, and give it a link to the new location of the resource.

The Web uses URI to identify resources, and HTTP as its communication standard. To get a resource stored on a server, a client makes a HTTP GET request to the URI that identifies that resource. Every time you type an address into your browser, your browser makes a GET request to that URI. If it receives a 200 OK response and an HTML document back, then it renders the page in the window so that you can view it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

RESTFul Fielding constraint: Uniform Interface - manipulation of resources through representations

A

The second sub-constraint of “uniform interface” says that the client manipulates resources through sending representations to the server–usually a JSON object containing the content that it would like to add, delete, or modify. In REST, the server has full control of the resources, and is responsible for making any changes. When a client wishes to make changes to resources, it sends the server a representation of what it would like the resulting resource to look like. The server takes the request as a suggestion, but still has ultimate control.

Let’s think about the case of a blog on the Web. When a user makes a new blog post, their computer wants to tell the server that a new blog post needs to be added. To do this, it sends an HTTP POST or PUT request with the content for the new blog post. The server sends back a response indicating whether the post was created, or if there was a problem. In a non-REST world, the client may literally be sending instructions for operations such as add a new line and make the title of the blog “What RESTful Actually Means”, instead of simply sending a representation of what it would like the final product to look like.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

RESTFul Fielding constraint: Uniform Interface - self-descriptive messages

A

Self-descriptive messages are another constraint that ensures a uniform interface between clients and servers. A self-descriptive message is one that contains all the information that the recipient needs to understand it. There should not be additional information in a separate documentation or in another message.

To understand how this applies to the Web, let’s analyze a set of HTTP requests and responses.

When a user types http://www.example.com in the address bar of their web browser, the browser sends the following HTTP request:

GET / HTTP/1.1
Host: www.example.com
This message is self-descriptive because it told the server what HTTP method was used, and the protocol that was used (HTTP 1.1).

This message is self-descriptive because it told the client how it needs to interpret the message body, by indicating that Content-type was text/html. The client has everything it needs in this single message to handle it appropriately.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

RESTFul Fielding constraint: Uniform Interface - hypermedia

A

Hypermedia is a fancy word for data sent from the server to the client that contains information about what the client can do next–in other words, what further requests it can make. In REST, servers should be sending only hypermedia to clients.

HTML is a type of hypermedia. To understand this better, let’s look again at the server response above. * <a> Check out the Recurse Center! </a> tells the client that it should make a GET request to http://www.recurse.com if the user clicks on the link. * <img></img> tells the client to immediately make a GET request to http://www.example.com/awesome-pic.jpg so it can display the image to the user.

When a system has identifiers for each resource, manipulates them through sending representations from the client to the server, and has messages that are self-descriptive and composed of hypermedia, it is said to have a uniform interface. This is perhaps the most important attribute of a RESTful system, as it allows for clients to intelligently adapt to changes. A server can change the underlying implementation without breaking all the clients that interacted with it, because each interaction is self-contained, identifiers do not change when the underlying state or implementation changes, and hypermedia gives clients instructions for state transitions that it can do next. The server does not need to remember anything about the client or do anything special to cater to it, and vice versa.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What happens when you type in “google.com” - 8 key points:

A
  1. You type maps.google.com into the address bar of your browser.
  2. The browser checks the cache for a DNS record to find the corresponding IP address of maps.google.com.
  3. If the requested URL is not in the cache, ISP’s DNS server initiates a DNS query to find the IP address of the server that hosts maps.google.com.
  4. Browser initiates a TCP connection with the server.
  5. The browser sends an HTTP request to the web server.
  6. The server handles the request and sends back a response.
  7. The server sends out an HTTP response.
  8. The browser displays the HTML content (for HTML responses which is the most common).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

The browser checks the cache for a DNS record to find the corresponding IP address of maps.google.com. What does this mean?

A

DNS(Domain Name System) is a database that maintains the name of the website (URL) and the particular IP address it links to. Every single URL on the internet has a unique IP address assigned to it. The IP address belongs to the computer which hosts the server of the website we are requesting to access. For an example, www.google.com has an IP address of 209.85.227.104. So if you’d like you can reach www.google.com by typing http://209.85.227.104 on your browser. DNS is a list of URLs and their IP addresses just like how a phone book is a list of names and their corresponding phone numbers.

The main purpose of DNS is human-friendly navigation. You can easily access a website by typing the correct IP address for it on your browser but imagine having to remember different sets of numbers for all the websites we regularly access? Therefore, it is easier to remember the name of the website using an URL and let DNS do the work for us with mapping it to the correct IP.

In order to find the DNS record, the browser checks four caches.

● First, it checks the browser cache. The browser maintains a repository of DNS records for a fixed duration for websites you have previously visited. So, it is the first place to run a DNS query.

● Second, the browser checks the OS cache. If it is not found in the browser cache, the browser would make a system call (i.e. gethostname on Windows) to your underlying computer OS to fetch the record since the OS also maintains a cache of DNS records.

● Third, it checks the router cache. If it’s not found on your computer, the browser would communicate with the router that maintains its’ own cache of DNS records.

● Fourth, it checks the ISP cache. If all steps fail, the browser would move on to the ISP. Your ISP maintains its’ own DNS server which includes a cache of DNS records which the browser would check with the last hope of finding your requested URL.

You may wonder why there are so many caches maintained at so many levels. Although our information being cached somewhere doesn’t make us feel very comfortable when it comes to privacy, caches are important for regulating network traffic and improving data transfer times.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

If the requested URL is not in the cache, ISP’s DNS server initiates a DNS query to find the IP address of the server that hosts maps.google.com. Que?

A

As mentioned earlier, in order for my computer to connect with the server that hosts maps.google.com, I need the IP address of maps.google.com. The purpose of a DNS query is to search multiple DNS servers on the internet until it finds the correct IP address for the website. This type of search is called a recursive search since the search will continue repeatedly from DNS server to DNS server until it either finds the IP address we need or returns an error response saying it was unable to find it.

In this situation, we would call the ISP’s DNS server a DNS recursor whose responsibility is to find the proper IP address of the intended domain name by asking other DNS servers on the internet for an answer. The other DNS servers are called name servers since they perform a DNS search based on the domain architecture of the website domain name.

Many website URLs we encounter today contain a third-level domain, a second-level domain, and a top-level domain. Each of these levels contains their own name server which is queried during the DNS lookup process.

For maps.google.com, first, the DNS recursor will contact the root name server. The root name server will redirect it to .com domain name server. .com name server will redirect it to google.com name server. google.com name server will find the matching IP address for maps.google.com in its’ DNS records and return it to your DNS recursor which will send it back to your browser.

These requests are sent using small data packets which contain information such as the content of the request and the IP address it is destined for (IP address of the DNS recursor). These packets travel through multiple networking equipment between the client and the server before it reaches the correct DNS server. This equipment use routing tables to figure out which way is the fastest possible way for the packet to reach its’ destination. If these packets get lost you’ll get a request failed error. Otherwise, they will reach the correct DNS server, grab the correct IP address, and come back to your browser.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Browser initiates a TCP connection with the server. Huh?

A

Once the browser receives the correct IP address it will build a connection with the server that matches IP address to transfer information. Browsers use internet protocols to build such connections. There are a number of different internet protocols which can be used but TCP is the most common protocol used for any type of HTTP request.

In order to transfer data packets between your computer(client) and the server, it is important to have a TCP connection established. This connection is established using a process called the TCP/IP three-way handshake. This is a three step process where the client and the server exchange SYN(synchronize) and ACK(acknowledge) messages to establish a connection.

  1. Client machine sends a SYN packet to the server over the internet asking if it is open for new connections.
  2. If the server has open ports that can accept and initiate new connections, it’ll respond with an ACKnowledgment of the SYN packet using a SYN/ACK packet.
  3. The client will receive the SYN/ACK packet from the server and will acknowledge it by sending an ACK packet.

Then a TCP connection is established for data transmission!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

The browser sends an HTTP request to the web server.

A

Once the TCP connection is established, it is time to start transferring data! The browser will send a GET request asking for maps.google.com web page. If you’re entering credentials or submitting a form this could be a POST request. This request will also contain additional information such as browser identification (User-Agent header), types of requests that it will accept (Accept header), and connection headers asking it to keep the TCP connection alive for additional requests. It will also pass information taken from cookies the browser has in store for this domain.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

The server handles the request and sends back a response.

A

The server contains a web server (i.e Apache, IIS) which receives the request from the browser and passes it to a request handler to read and generate a response. The request handler is a program (written in ASP.NET, PHP, Ruby, etc.) that reads the request, its’ headers, and cookies to check what is being requested and also update the information on the server if needed. Then it will assemble a response in a particular format (JSON, XML, HTML).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

The server sends out an HTTP response.

A

The server response contains the web page you requested as well as the status code, compression type (Content-Encoding), how to cache the page (Cache-Control), any cookies to set, privacy information, etc.

Example HTTP server response:

If you look at the above response the first line shows a status code. This is quite important as it tells us the status of the response. There are five types of statuses detailed using a numerical code.

● 1xx indicates an informational message only

● 2xx indicates success of some kind

● 3xx redirects the client to another URL

● 4xx indicates an error on the client’s part

● 5xx indicates an error on the server’s part

So, if you encountered an error you can take a look at the HTTP response to check what type of status code you have received.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

he browser displays the HTML content (for HTML responses which is the most common).

A

The browser displays the HTML content in phases. First, it will render the bare bone HTML skeleton. Then it will check the HTML tags and sends out GET requests for additional elements on the web page, such as images, CSS stylesheets, JavaScript files etc. These static files are cached by the browser so it doesn’t have to fetch them again the next time you visit the page. At the end, you’ll see maps.google.com appearing on your browser.

That’s it!

Although this seems like a very tedious prolonged process we know that it takes less than seconds for a web page to render after we hit enter on our keyboard. All of these steps happens within milliseconds before we could even notice. I sincerely hope this article helps you answer the question “What happens when you type an URL in the browser and press enter?”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is DNS and how does it work?

A

The Domain Name System (DNS) is often referred to as the backbone of the internet. It’s run by many engineers and their organizations, it ultimately shapes the future of the internet.

Internet is more a of design philosophy … if all parties agree on a protocol, the data gets sent seamlessly.

All devices on the internet have a number. And an IP address.

How it works:
A user asks their browser to visit freecodecamp.com
The browser queries a DNS Resolver (usually their ISP) “where’s freecodecamp.com?”
DNS Resolver queries the Root servers (which have a big important list that keeps this information) “where is .COM?” Replies with Verisign.
DNS Resolver then queries Verisign — “where is freecodecamp.com?” Verisign replies with the nameservers ns1.cloudflare.com and the IP address 192.168.178.1
Hosting servers are queried with the IP address. “Give me the files for IP address 192.168.178.1 (please)”
Website files are delivered and rendered on the page so user can learn to code…or whatever they were doing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Whats the biggest registry in the world?

A

Verisign, by far the biggest Registry in the world running .com .net .cc .tv and .name. It shows you the process in a nice way how the protocol works through the sequential queries and responses through the DNS structure.

22
Q

Discuss Packets - Routers - Reliability

How is data delivered reliably?

A

Data travels on the internet on an indirect fashion.
The way the data travels need not be a fixed path. It goes from one computer to another in a “packet of information”. Similar to driving in traffic, it chooses the least congested route.

The computer breaks large files into packets. Each packet has the internet address of where it came from and where is going.

Internet Protocol: Every router keeps track of the different paths available and chooses the cheapest available path for each of data.

Fault tolerant: Network keeps sending packets even if something goes wrong.

What if we want to request data and want to make sure that 100% is delivered.
TCP: Transmission Control Protocol: manages the sending and receiving of packets. Data is often broken up into many packets. TCP does a full inventory and sends back acknowledgements of each packet received.

If all packets are there, TCP will “sign them off”. Otherwise, it wont. for missing packets, the API will resend them.

TCP and router systems are scalable (8 devices or 8billion). The more routers we add, the reliable the internet becomes.

In short the data connects, communicates, and collaborates based on the the agreed upon standards for how data is sent.

23
Q

Discuss TCP/IP:
Transmission Control Protocol
Internet Protocol

A

Internet traffic has to be routed to the correct place correctly. There are several layers:

1) Application (browser - protocols: HTTP, smtp for email)
2) Transport (TCP, UDP for games) talks to application layer through a Port.
3) Port - where application layer and Transport layer communication. EX: browser data is always port 80.
4) Packets - TCP chops up data into small packets
5) Header - TCP packets have a header which contain instructions on how to reassemble packages.
6) Internet/network layer - get the origin and destination IP addresses so it know where the data is coming from and where it is going to.
7) Packet switching - allows data to choose different cheaper route.
8) Link - Wifi
9) Physical - Ethernet cable

** Modems tell sites where you are visiting from. (EX TunnelBear)

24
Q

UDP vs TCP

A
5 layer of structure:
Application
transport : UDP, TCP
network
link
physical

Why can 2 applications use the same connection?
We have 65k ports on our computer and 1 app can use multiple ports if it wants.

UDP advantage -
Packets are smaller.
Connectionless - dont need a connnection to send data
More control over when data is sent out

When UDP detect corruption, it might keep the section but put up a warning . Does not compensate for lost packets nor in-order packet delivery. No congectsion control.

UDP is lightweight, but less reliable. TCP is more reliable.

TCP is connection based, can do in-order delivery. They can arrive out of order, but TCP rearranges them.

What is 3 way handshake:
Delivery acknowledgements.
Retransmission

TCP downsides - needs a bigger header (60% bigger than UDP)
- data doesnt always get sent out immediately.
(Skype conversation would have lots of latency - too hard to do things like video streaming)

UDP makes more sense for things like video streaming. “Message oriented” - sends data in distinct chunks.

Uses for TCP - when data loss cant be tolerated.

  • File Transfers
  • Remote access
  • some firewalls block UDP for secruity reasons.
Uses for UDP - 
Less overhead 
Send delay in undesireable
Data loss can be masked
DNS lookups
No need to create and close connection.
25
Q

HTTP Methods:

A

1 GET
The GET method is used to retrieve information from the given server using a given URI. Requests using GET should only retrieve data and should have no other effect on the data.

2 HEAD
Same as GET, but transfers the status line and header section only.

3 POST
A POST request is used to send data to the server, for example, customer information, file upload, etc. using HTML forms.

4 PUT
Replaces all current representations of the target resource with the uploaded content.

5 DELETE
Removes all current representations of the target resource given by a URI.

6 CONNECT
Establishes a tunnel to the server identified by a given URI.

7 OPTIONS
Describes the communication options for the target resource.

8 TRACE
Performs a message loop-back test along the path to the target resource.

26
Q

HTTP Status Code:

200 OK

A

The request has succeeded. The information returned with the response is dependent on the method used in the request, for example:

GET an entity corresponding to the requested resource is sent in the response;
HEAD the entity-header fields corresponding to the requested resource are sent in the response without any message-body;
POST an entity describing or containing the result of the action;
TRACE an entity containing the request message as received by the end server.
Wikipedia
Standard response for successful HTTP requests. The actual response will depend on the request method used. In a GET request, the response will contain an entity corresponding to the requested resource. In a POST request the response will contain an entity describing or containing the result of the action.

27
Q

HTTP Status Code:

201 Created

A

he request has been fulfilled and resulted in a new resource being created. The newly created resource can be referenced by the URI(s) returned in the entity of the response, with the most specific URI for the resource given by a Location header field. The response SHOULD include an entity containing a list of resource characteristics and location(s) from which the user or user agent can choose the one most appropriate. The entity format is specified by the media type given in the Content-Type header field. The origin server MUST create the resource before returning the 201 status code. If the action cannot be carried out immediately, the server SHOULD respond with 202 (Accepted) response instead.

A 201 response MAY contain an ETag response header field indicating the current value of the entity tag for the requested variant just created, see section 14.19.

Wikipedia
The request has been fulfilled and resulted in a new resource being created.

Successful creation occurred (via either POST or PUT). Set the Location header to contain a link to the newly-created resource (on POST). Response body content may or may not be present.

28
Q

HTTP Status Code:

204 No content

A

The server has fulfilled the request but does not need to return an entity-body, and might want to return updated metainformation. The response MAY include new or updated metainformation in the form of entity-headers, which if present SHOULD be associated with the requested variant.

If the client is a user agent, it SHOULD NOT change its document view from that which caused the request to be sent. This response is primarily intended to allow input for actions to take place without causing a change to the user agent’s active document view, although any new or updated metainformation SHOULD be applied to the document currently in the user agent’s active view.

The 204 response MUST NOT include a message-body, and thus is always terminated by the first empty line after the header fields.

Wikipedia
The server successfully processed the request, but is not returning any content.

Status when wrapped responses (e.g. JSEND) are not used and nothing is in the body (e.g. DELETE).

29
Q

HTTP Status Code:

304 Not modified

A

If the client has performed a conditional GET request and access is allowed, but the document has not been modified, the server SHOULD respond with this status code. The 304 response MUST NOT contain a message-body, and thus is always terminated by the first empty line after the header fields.

The response MUST include the following header fields:

Date, unless its omission is required by section 14.18.1
If a clockless origin server obeys these rules, and proxies and clients add their own Date to any response received without one (as already specified by [RFC 2068], section 14.19), caches will operate correctly.

ETag and/or Content-Location, if the header would have been sent in a 200 response to the same request
Expires, Cache-Control, and/or Vary, if the field-value might differ from that sent in any previous response for the same variant
If the conditional GET used a strong cache validator (see section 13.3.3), the response SHOULD NOT include other entity-headers. Otherwise (i.e., the conditional GET used a weak validator), the response MUST NOT include other entity-headers; this prevents inconsistencies between cached entity-bodies and updated headers.

If a 304 response indicates an entity not currently cached, then the cache MUST disregard the response and repeat the request without the conditional.

If a cache uses a received 304 response to update a cache entry, the cache MUST update the entry to reflect any new field values given in the response.

Wikipedia
Indicates the resource has not been modified since last requested. Typically, the HTTP client provides a header like the If-Modified-Since header to provide a time against which to compare. Using this saves bandwidth and reprocessing on both the server and client, as only the header data must be sent and received in comparison to the entirety of the page being re-processed by the server, then sent again using more bandwidth of the server and client.

Used for conditional GET calls to reduce band-width usage. If used, must set the Date, Content-Location, ETag headers to what they would have been on a regular GET call. There must be no body on the response.

30
Q

HTTP Status Code:

400 Bad Request

A

The request could not be understood by the server due to malformed syntax. The client SHOULD NOT repeat the request without modifications.

Wikipedia
The request cannot be fulfilled due to bad syntax.

General error when fulfilling the request would cause an invalid state. Domain validation errors, missing data, etc. are some examples.

31
Q

HTTP Status Code:

401 Unauthorized

A

The request requires user authentication. The response MUST include a WWW-Authenticate header field (section 14.47) containing a challenge applicable to the requested resource. The client MAY repeat the request with a suitable Authorization header field (section 14.8). If the request already included Authorization credentials, then the 401 response indicates that authorization has been refused for those credentials. If the 401 response contains the same challenge as the prior response, and the user agent has already attempted authentication at least once, then the user SHOULD be presented the entity that was given in the response, since that entity might include relevant diagnostic information. HTTP access authentication is explained in “HTTP Authentication: Basic and Digest Access Authentication”.

Wikipedia
Similar to 403 Forbidden, but specifically for use when authentication is possible but has failed or not yet been provided. The response must include a WWW-Authenticate header field containing a challenge applicable to the requested resource. See Basic access authentication and Digest access authentication.

Error code response for missing or invalid authentication token.

32
Q

HTTP Status Code:

403 Forbidden

A

The server understood the request, but is refusing to fulfill it. Authorization will not help and the request SHOULD NOT be repeated. If the request method was not HEAD and the server wishes to make public why the request has not been fulfilled, it SHOULD describe the reason for the refusal in the entity. If the server does not wish to make this information available to the client, the status code 404 (Not Found) can be used instead.

Wikipedia
The request was a legal request, but the server is refusing to respond to it. Unlike a 401 Unauthorized response, authenticating will make no difference.

Error code for user not authorized to perform the operation or the resource is unavailable for some reason (e.g. time constraints, etc.).

33
Q

HTTP Status Code:

404 Not found

A

The server has not found anything matching the Request-URI. No indication is given of whether the condition is temporary or permanent. The 410 (Gone) status code SHOULD be used if the server knows, through some internally configurable mechanism, that an old resource is permanently unavailable and has no forwarding address. This status code is commonly used when the server does not wish to reveal exactly why the request has been refused, or when no other response is applicable.

Wikipedia
The requested resource could not be found but may be available again in the future. Subsequent requests by the client are permissible.

Used when the requested resource is not found, whether it doesn’t exist or if there was a 401 or 403 that, for security reasons, the service wants to mask.

34
Q

HTTP Status Code:

409 Conflict

A

The request could not be completed due to a conflict with the current state of the resource. This code is only allowed in situations where it is expected that the user might be able to resolve the conflict and resubmit the request. The response body SHOULD include enough information for the user to recognize the source of the conflict. Ideally, the response entity would include enough information for the user or user agent to fix the problem; however, that might not be possible and is not required.

Conflicts are most likely to occur in response to a PUT request. For example, if versioning were being used and the entity being PUT included changes to a resource which conflict with those made by an earlier (third-party) request, the server might use the 409 response to indicate that it can’t complete the request. In this case, the response entity would likely contain a list of the differences between the two versions in a format defined by the response Content-Type.

Wikipedia
Indicates that the request could not be processed because of conflict in the request, such as an edit conflict.

Whenever a resource conflict would be caused by fulfilling the request. Duplicate entries and deleting root objects when cascade-delete is not supported are a couple of examples.

35
Q

HTTP Status Code:

500 Internal Service Error

A

The request could not be completed due to a conflict with the current state of the resource. This code is only allowed in situations where it is expected that the user might be able to resolve the conflict and resubmit the request. The response body SHOULD include enough information for the user to recognize the source of the conflict. Ideally, the response entity would include enough information for the user or user agent to fix the problem; however, that might not be possible and is not required.

Conflicts are most likely to occur in response to a PUT request. For example, if versioning were being used and the entity being PUT included changes to a resource which conflict with those made by an earlier (third-party) request, the server might use the 409 response to indicate that it can’t complete the request. In this case, the response entity would likely contain a list of the differences between the two versions in a format defined by the response Content-Type.

Wikipedia
Indicates that the request could not be processed because of conflict in the request, such as an edit conflict.

Whenever a resource conflict would be caused by fulfilling the request. Duplicate entries and deleting root objects when cascade-delete is not supported are a couple of examples.

36
Q

What is the difference between HTTPS and HTTP

A

One is secure, the other is not.

The Hypertext transfer protocol is protected by symmetric or asymmetric key cryptography.
It’s called asymmetric, because even if you can encrypt a message (lock the box) you can’t decrypt it (open a closed box).
In technical speech the box is known as the public key and the key to open it is known as the private key

The only way to unlock the key is with certification authority.

They decide that they will use the box method (asymmetric cryptography) only to choose a key to encrypt the message using symmetric cryptography with (remember the Caesar cipher?).

This way they get the best of both worlds. The reliability of asymmetric cryptography and the efficiency of symmetric cryptography.

37
Q

What is a “Man in the middle attack”?

A

If hypertext is transfer with its key and the key is intercepted and interpreted by a third party.

38
Q

HTTP Methods: GET vs POST

A

Two commonly used methods for a request-response between a client and server are: GET and POST.

GET - Requests data from a specified resource
POST - Submits data to be processed to a specified resource

The GET Method
Note that the query string (name/value pairs) is sent in the URL of a GET request:

/test/demo_form.php?name1=value1&name2=value2
Some other notes on GET requests:

GET - data displayed in URL
GET requests can be cached
GET requests remain in the browser history
-EX going back simply redirects
GET requests can be bookmarked
GET requests should never be used when dealing with sensitive data
GET requests have length restrictions
GET requests should be used only to retrieve data
GET only ascii data types

The POST Method
Note that the query string (name/value pairs) is sent in the HTTP message body of a POST request:

POST /test/demo_form.php HTTP/1.1
Host: w3schools.com
name1=value1&name2=value2
Some other notes on POST requests:

POST - data not displayed in url
POST requests are never cached
POST requests do not remain in the browser history
-EX on reload, the data must resubmit.
POST requests cannot be bookmarked
POST requests have no restrictions on data length
POST no restructions on data type. binary data is ok too.

39
Q

HTTP Methods: PUT vs PATCH

A

The PUT method requests that the enclosed entity be stored under the supplied Request-URI. If the Request-URI refers to an already existing resource, the enclosed entity SHOULD be considered as a modified version of the one residing on the origin server. If the Request-URI does not point to an existing resource, and that URI is capable of being defined as a new resource by the requesting user agent, the origin server can create the resource with that URI.

The PATCH method requests that a set of changes described in the request entity be applied to the resource identified by the Request- URI.

PUT handles it by replacing the entire entity, while PATCH only updates the fields that were supplied, leaving the others alone.

40
Q

What are the technical pros and cons of localStorage, sessionStorage, session and cookies, and when would I use one over the other?

A

localStorage and sessionStorage
localStorage and sessionStorage are relatively new APIs (meaning not all legacy browsers will support them) and are near identical (both in APIs and capabilities) with the sole exception of persistence. sessionStorage (as the name suggests) is only available for the duration of the browser session (and is deleted when the tab or window is closed) - it does however survive page reloads (source DOM Storage guide - Mozilla Developer Network).

localStorage and sessionStorage are perfect for persisting non-sensitive data needed within client scripts between pages (for example: preferences, scores in games). The data stored in localStorage and sessionStorage can easily be read or changed from within the client/browser so should not be relied upon for storage of sensitive or security related data within applications.

Cookies - On the positive side cookies can have a degree of protection applied from security risks like Cross-Site Scripting (XSS)/Script injection by setting an HTTP only flag which means modern (supporting) browsers will prevent access to the cookies and values from JavaScript (this will also prevent your own, legitimate, JavaScript from accessing them).

As cookies are used for authentication purposes and persistence of user data, all cookies valid for a page are sent from the browser to the server for every request to the same domain - this includes the original page request, any subsequent Ajax requests, all images, stylesheets, scripts and fonts. For this reason cookies should not be used to store large amounts of information. Browser may also impose limits on the size of information that can be stored in cookies. Typically cookies are used to store identifying tokens for authentication, session and advertising tracking. The tokens are typically not human readable information in and of themselves, but encrypted identifiers linked to your application or database.

41
Q

Cross-site Scripting (XSS) Attack

A

Cross-site Scripting (XSS) refers to client-side code injection attack wherein an attacker can execute malicious scripts (also commonly referred to as a malicious payload) into a legitimate website or web application. XSS is amongst the most rampant of web application vulnerabilities and occurs when a web application makes use of unvalidated or unencoded user input within the output it generates.

By leveraging XSS, an attacker does not target a victim directly. Instead, an attacker would exploit a vulnerability within a website or web application that the victim would visit, essentially using the vulnerable website as a vehicle to deliver a malicious script to the victim’s browser.

While XSS can be taken advantage of within VBScript, ActiveX and Flash (although now considered legacy or even obsolete), unquestionably, the most widely abused is JavaScript – primarily because JavaScript is fundamental to most browsing experiences.

42
Q

How does XSS work?

A

In order to run malicious JavaScript code in a victim’s browser, an attacker must first find a way to inject a payload into a web page that the victim visits. Of course, an attacker could use social engineering techniques to convince a user to visit a vulnerable page with an injected JavaScript payload.

In order for an XSS attack to take place the vulnerable website needs to directly include user input in its pages. An attacker can then insert a string that will be used within the web page and treated as code by the victim’s browser.

43
Q

Discuss the anatomy of an XSS attack.

A

An XSS attack needs three actors — the website, the victim and the attacker.

the attacker’s goal is to impersonate the victim by stealing the victim’s cookie. Sending the cookie to a server the attacker controls can be achieved in a variety of ways, one of which is for the attacker to execute the following JavaScript code in the victim’s browser through an XSS vulnerability.

The attacker injects a payload in the website’s database by submitting a vulnerable form with some malicious JavaScript
The victim requests the web page from the website
The website serves the victim’s browser the page with the attacker’s payload as part of the HTML body.
The victim’s browser will execute the malicious script inside the HTML body. In this case it would send the victim’s cookie to the attacker’s server. The attacker now simply needs to extract the victim’s cookie when the HTTP request arrives to the server, after which the attacker can use the victim’s stolen cookie for impersonation.

44
Q

What tags are used in an XSS attack?

A

tag
The

 tag is the most straight-forward XSS payload. A script tag can either reference external JavaScript code, or embed the code within the script tag.

<body> tag
An XSS payload can be delivered inside <body> tag by using the onload attribute or other more obscure attributes such as the background attribute.

<img></img> tag
Some browsers will execute JavaScript when found in the <img></img>.

<iframe> tag
The <iframe> tag allows the embedding of another HTML page into the parent page. An IFrame can contain JavaScript, however, it’s important to note that the JavaScript in the iFrame does not have access to the DOM of the parent’s page due to the browser’s Content Security Policy (CSP). However, IFrames are still very effective means of pulling off phising attacks.

<input></input> tag
In some browsers, if the type attribute of the <input></input> tag is set to image, it can be manipulated to embed a script.

<link></link> tag
The <link></link> tag, which is often used to link to external style sheets could contain a script.

<table> tag
The background attribute of the table and td tags can be exploited to refer to a script instead of an image.

<div> tag
The <div> tag, similar to the <table> and <td> tags can also specify a background and therefore embed a script.

<object> tag
The <object> tag can be used to include in a script from an external site.
</object></object></td></table></div></div></table></iframe></iframe></body></body>

45
Q

CSRF Attacks

A

“Cross Site Request Forgery” - Allows only for state changes to occur and therefore cannot cater attacks that require the attacker receiving the contents of the HTTP response. In this manner, they are less lethal than the XSS attacks.

CSRF is an attack where a malicious entity tricks a victim into performing actions on behalf of the attacker.

The actions being perpetrated by the attacker will surely have a greater effect if the victim performing the actions is at an administrative level versus a low level user, with less privileges. CSRF attacks take advantage of the fact that a web application completely trusts a user, once it can confirm that the user is indeed who they say they are.

46
Q

What are the two main parts of a CSRF attack?

A

1) Trick the victim into clicking a link or loading a page
2) Send a crafted request in the victim’s browser

When a request is made to a web application, the browser will check if it has any Cookies associated with the web application’s origin that will need to be sent with the request. If it is the case, this authentication data, such as Cookies for example, will be included in any request being sent to this web application. This is done to provide the victim with a seamless experience, so they would not be required to re-authenticate for every page that they visit. If the website approves of the Cookie being sent and considers the session as still being valid, an attacker may use CSRF to send requests as if the victim is sending them, without the website being able to distinguish between requests being sent by the attacker or by the victim, since requests are always being sent by the victim with their own Cookie.

A CSRF attack simply takes advantage of the fact that the browser sends the Cookie to the web application automatically with each and every request.

47
Q

How is a CSRF carried out?

A

The victim needs to be authenticated (logged-in, for example). It’s not as big a deal if the victim is utilizing publicly accessible things like a contact form. The problem arises when a victim with additional privileges would be performing actions that are not accessible to everyone, which is when CSRF attacks are utilised.

48
Q

CSRF - using GET request

A

GET requests are by their very nature meant to be idempotent, which means that they should not be used to perform state changes, therefore, sending a GET request should have not change any data. Of course, some web applications still use GET instead of the more appropriate POST to perform state changes for operations such as changing a password or adding a user.

When the malicious link that we mentioned earlier is clicked, the attacker may direct the victim to their own malicious web application that will execute a script that will in turn trigger the victim’s browser to send an illegal request. This request is defined as illegal since the victim is not aware that it is being sent, even though it appears to the web server as if the user sent it, because it includes the necessary Cookies that the web server needs to verify that a victim is who they say they are.

Imagine if www.example.com processed fund transfers through a GET request that will include two parameters: the amount that is to be transferred and the identifier of the account to which the money will be transferred. The below example shows a legitimate URL, which will request that the web application transfers a 100,000 units of the appropriate currency to Fred’s account.

http://example.com/transfer?amount=1000000&account=Fred

The request will include with it the Cookie for the authenticated user, so there would be no need to define which account the money will be transferred from. This means that if a normal user would access this URL, they would be required to authenticate, so the web application will know from which account the funds will be withdrawn. Now that we know how this request can be used for legitimate reasons, we can figure out a way how to trick a victim into sending the request that the attacker wants, while authenticated as the victim .

If the web application being exploited is expecting a GET request, then the attacker can include an tag on their own website, that instead of linking to an image, it will send a request to the bank’s web application:

<img></img>
The browser, under normal circumstances, will automatically send the Cookies that are related to that web application, therefore allowing the victim to perform a state change on behalf of the attacker, where the state change is a transfer of funds.

49
Q

CSRF - POST requests

A

ttackers can also make use of the POST method to send requests. In fact, most state changing requests will be done through POST requests which will send any parameters and values in the request body and not the URL itself as in a GET request. This means that exploitable web applications will be more likely to accept POST instead of GET requests when a state change is involved.

Tricking a victim into sending a POST request may be slightly trickier than sending a GET request. With the GET request, the attacker only needed the victim to send a request that had all the necessary information in the URL, whereas POST requests require a request body to be appended to the request. With a bit of JavaScript, an attacker can lure a victim onto their malicious web application, that as soon as the webpage loads, the illegal POST request will be sent automatically.

Take the following example using the onload function, which will automatically send a request from the victim’s browser as soon as the page loads up. Let’s take the following example:

As soon as the page loads, the onload function ensures that the form named csrf is submitted, which will in turn send the POST request. Taking a look at the csrf form, we can see that it includes two parameters and their values, that have been statically set up by the attacker, where example.com will identify the request as legitimate, since it will include the victim’s Cookies.

50
Q

How do we prevent CSRF attacks?

A

Anti-CSRF Tokens
The most popular implementation to prevent Cross-site Request Forgery (CSRF), is to make use of a challenge token that is associated with a particular user and can be found as a hidden value in every state changing form which is present on the web application. This token, called a CSRF Token or a Synchronizer Token, works as follows:

The web server generates a token
The token is statically set as a hidden input on the protected form
The form is submitted by the user
The token is included in the POST data
The web application compares the token generated by the web application with the token sent in through the request
If these tokens match, the request is valid, as it has been sent through the actual form in the web application
If there is no match, the request will be considered as illegal and will be rejected.
This protects the form against Cross-site Request Forgery (CSRF) attacks, because an attacker crafting a request will also need to guess the anti-CSRF token for them to successfully trick a victim into sending a valid request. What’s more, is that this token should be invalidated after some time and after the user logs out.

For the anti-CSRF mechanism to be implemented properly, it will also need to be cryptographically secure, so that the token itself cannot be easily guessed, which is a possibility if the token is being generated based on a predictable pattern.

It is also recommended that you make use of the readily available option in popular frameworks that will defend against CSRF attacks for you, meaning that you should refrain from rolling your own anti-CSRF mechanism, if possible. This allows less room for error, while making the implementation quicker and easier.

CSRF attacks are only possible since Cookies are always sent with any requests that are sent to a particular origin, which is related to that Cookie. Due to the nature of a CSRF attack, a flag can be set against a Cookie, tuning it into a same-site Cookie. A same-site Cookie is a Cookie which can only be sent, if the request is being made from the same origin that is related to the Cookie being sent. The Cookie and the page from where the request is being made, are considered to have the same origin if the protocol, port (if applicable) and host is the same for both.

A current limitation of same-site Cookies is that not all modern browsers support them, while older browsers do not cater for web applications that make use of same-site Cookies (click here for a list of supported browsers). At the moment, same-site Cookies are better suited as an additional defense-in-depth layer due to this limitation, while still making use of other CSRF protection mechanisms.

51
Q

CSRF: General

A

Cookies are intrinsically vulnerable as they are automatically sent with each request, allowing attackers to easily craft malicious requests leading to CSRF. Although the attacker cannot obtain the response body or the Cookie itself, the attacker can perform actions with the victim’s elevated rights. The impact of a CSRF vulnerability is also related to the privilege of the victim, whose Cookie is being sent with the attacker’s request. While data retrieval is not the main scope of a CSRF attack, state changes will surely have an adverse effect on the web application being exploited.