Security - CL1 Flashcards
Authentication & Authorization
Authentication is the act of proving an assertion, such as the identity of a computer system user. In contrast with identification, the act of indicating a person or thing’s identity, authentication is the process of verifying that identity. It might involve validating personal identity documents, verifying the authenticity of a website with a digital certificate, determining the age of an artifact by carbon dating, or ensuring that a product or document is not counterfeit.
Authorization is the function of specifying access rights/privileges to resources, which is related to information security and computer security in general and to access control in particular. More formally, “to authorize” is to define an access policy. For example, human resources staff are normally authorized to access employee records and this policy is usually formalized as access control rules in a computer system. During operation, the system uses the access control rules to decide whether access requests from (authenticated) consumers shall be approved (granted) or disapproved (rejected). Resources include individual files or an item’s data, computer programs, computer devices and functionality provided by computer applications. Examples of consumers are computer users, computer Software and other Hardware on the computer.
Links:
https: //en.wikipedia.org/wiki/Authentication
https: //en.wikipedia.org/wiki/Authorization
Confidentiality
The goal of confidentiality is to keep the contents of a transient communication or data on temporary or persistent storage secret.
If Alice and Bob want to exchange some information that they do not want Eve to see, the challenge is to make sure that Eve is not able to understand that information, even if Eve can see the bits that are being transferred over the network.
Suppose Eve is an eavesdropper who may be able to listen in on the contents of Alice and Bob’s secret conversations. If Alice and Bob are communicating over a network, then Eve is able to see the bits—the zeros and ones—that make up Alice and Bob’s conversation go back and forth over the wires (or over the air, in the case Alice and Bob are using a wireless network).
A real-world Eve might employ various existing software tools to eavesdrop. On an Ethernet network that uses a hub (as opposed to a switch), for instance, each computer is capable of actually seeing all the network traffic that is generated and received by any other computer. A computer’s operating system is typically responsible for only allowing applications running
on that computer to access traffic that is directed to or from that computer, and filtering out traffic that originates or is destined for other computers on the same network. However, if a user has root or administrator privileges on a computer, that user can use a software package such as Ethereal, tcpdump, or dsniff to access network traffic. These software packages are run in a “promiscuous mode,” in which the operating system provides the software access to all traffic on the network instead of providing filtered traffic that is just directed to or from the computer on which it is running. While such packages exist to help network administrators and engineers debug problems, they can be used for eavesdropping. Attackers may not have
administrator privileges, but can obtain them by first getting access to some account, and then exploiting software vulnerabilities in the operating system to gain such privileges.
Usually, some kind of encryption technology is used to achieve confidentiality. Most encryption technologies use a key to encrypt the communication between Alice and Bob. A key is a secret sequence of bits that Alice and Bob know (or share) that is not known to potential attackers.4 A key may be derived from a password that is known to both Alice and Bob. An encryption algorithm will take the key as input, in addition to the message that Alice wants to transfer to Bob, and will scramble the message in a way that is mathematically dependent on
the key. The message is scrambled such that when Eve sees the scrambled communication, she will not be able to understand its contents. Bob can use the key to unscramble the message by computing the mathematical inverse of the encryption algorithm. If Alice and Bob use good encryption technology and keep the key secret, then Eve will not be able to understand their communication.
Data Integrity
When Alice and Bob exchange messages, they do not want a third party such as Mallory to be able to modify the contents of their messages.
Mallory has capabilities similar to Eve, but Eve is a passive eavesdropper while Mallory is an active eavesdropper. Though Eve is able to see the zeros and ones go by, she is unable to modify them. Eve therefore cannot modify any part of the conversation. On the other hand, Mallory has the ability to modify, inject, or delete the zeros and ones, and thus change the contents of the conversation—a potentially more significant kind of attack. Mallory is sometimes referred to as a man in the middle.
Alice and Bob can use an integrity check to detect if an active eavesdropper like Mallory has modified the messages in an attempt to corrupt or disrupt their conversation. That is, Alice and Bob want to protect the message integrity of their conversation. One approach that they can take to ensure message integrity is to add redundancy to their messages.
Consider a hypothetical scenario in which Alice wants to send an “I owe you” (IOU) message such as “I, Alice, owe you, Bob, $1.00,” and Mallory has the ability to change only one character in the message. If Mallory wants Alice to be in more debt to Bob, she could change the message to “I, Alice, owe you, Bob, $1000” by changing the dot to a zero. On the other hand, if Mallory wants to cheat Bob out of his dollar, she could change the message to “I, Alice, owe you, Bob, $0.00.” Assuming Mallory can only change a single character in a message, Alice could add redundancy to her message by repeating the dollar amount twice so that Bob could detect tampering. For example, if Alice sends the message “I, Alice, owe you, Bob, $1.00. Confirm, $1.00,” then Mallory would not be able to change both of the dollar values in the message, and Bob would be able to detect tampering by Mallory. If Mallory changes one of the amounts in the message, Bob will see a mismatch between the two dollar amounts and discard the message. In this manner, redundancy can be used to provide message integrity.
While Mallory may not be able to tamper with Alice’s IOU if she uses redundancy, she may still be able to conduct a denial-of-service attack. If Mallory changes one of the dollar amounts in the IOU each time Alice tries to send it to Bob, and Bob is forced to discard the message each time because of the mismatched dollar amounts, Bob will never receive the IOU he rightly deserves! (Denial-of-service attacks are discussed further in Section 1.7.)
Unfortunately, a real-world active eavesdropper will typically have the power to change much more than a single character in a message, and the simple approach of repeating the dollar amount will not work. In addition, repeating information more than once requires extra
communications bandwidth and is not terribly efficient.
In networking communications protocols, approaches such as CRCs (cyclic redundancy checks) can be used to achieve integrity and detect when bits in a message have been lost or altered due to inadvertent communications failures. These techniques compute short codes that are functions of the message being sent. Alice can attach a short code to the message
such that if the message or code are modified, Bob can determine whether they were tampered with.
However, while CRCs are sufficient to detect inadvertent communications failures, they are typically not good enough to deal with adversaries such as Mallory. If Mallory knows that a CRC is being used, and she has no restrictions on how many bytes she can modify, she can
also change the short code to match her modified message.
Instead, message authentication codes (MACs) are typically used to achieve message integrity in real-world security protocols. A MAC is not only a function of the message itself, but is also a function of a key known only to Alice and Bob, such that even if Mallory is able to
modify the bytes of a message, she will not be able to appropriately modify the corresponding MAC. (MACs are covered in more detail in Chapter 15.)
While the goal in confidentiality is to make sure that the contents of Alice and Bob’s communication cannot be understood by a third party like Eve or Mallory, there is no such requirement for message integrity. For message integrity to be achieved, it does not matter whether the eavesdropper can see the data in the message so long as she is unable to change it undetected. The goal of message integrity is to make sure that even if Mallory can “look,” she cannot “touch” the contents of the message
Accountability
While authentication and authorization are important, accountability is another key security goal (especially for a company’s internal systems). The goal of accountability is to ensure that you are able to determine who the attacker or principal is in the case that something goes wrong or an erroneous transaction is identified. In the case of a malicious incident, you want to be able to prosecute and prove that the attacker conducted illegitimate actions. In the case of an erroneous transaction, you want to identify which principal made the mistake. Most computer systems achieve accountability through authentication and the use of logging and audit trails. To obtain accountability, you can have a system write log entries every time a user authenticates, and use the log to keep a list of all the actions that the user conducted.
The chief financial officer (CFO) of a company may have the authority to transfer money from the company’s bank account to any another, but you want to hold the CFO accountable for any actions that could be carried out under her authority. The CFO should have the ability
to transfer money from the company account to other accounts because the company may have certain financial commitments to creditors, vendors, or investors, and part of the CFO’s job may involve satisfying those commitments. Yet, the CFO could abuse that capability. Suppose the CFO, after logging into the system, decides to transfer some money from the
company’s bank account to her own personal account, and then leave the country. When the
missing funds are discovered, the system log can help you ascertain whether or not it was the
CFO who abused her privileges. Such a system log could even potentially be used as evidence
in a court of law.
It is also crucial to make sure that when the logging is done and audit trails are kept, the logs cannot be deleted or modified after the fact. For example, you would not want the CFO to be able to transfer money into her own personal account and then delete or change the audit trail so that transaction no longer appears, or is covered up in any way to appear as if the
transaction had a different recipient. To prevent logs from being deleted or altered, they could immediately be transferred to another system that hopefully an attacker would not be able to access as easily. Also, Chapter 15 discusses how MACs (message authentication codes) can be used to construct integrity check tokens that can either be added to each entry of a log or associated with an entire log file to allow you to detect any potential modifications to the system log. You can also use write once, read many (WORM) media to store system logs, since once written, these logs may be hard (or even physically impossible) to modify—short of
destroying the media completely.
A good logging or audit trail facility also provides for accurate timestamping. When actions are written to an entry in a log, the part of the entry that contains the time and date at which the action occurred is called a timestamp. You need to ensure that no user can modify
timestamps recorded in the log. The operating system, together with all the other computers on the network, must be in agreement on the current time. Otherwise, an attacker can log into a computer whose clock is ahead or behind the real time to cause confusion about when certain actions actually occurred. A protocol such as Network Time Protocol (NTP) can be used to
keep the clocks of multiple computers synchronized.
One problem with many of today’s systems is that logging facilities do not have secure timestamping and integrity checking facilities. As a result, after attackers hack into a system, they can change the logs such that no one can detect that they hacked in. Therefore, it is especially important to think carefully about a secure audit trail facility when you design secure systems. If existing or third-party software tools are used when constructing systems, they may have to be instrumented or modified to satisfy accountability goals.
Availability
An available system is one that can respond to its users’ requests in a reasonable timeframe. While availability is typically thought of as a performance goal, it can also be thought of as a security goal. If an attacker is able to make a system unavailable, a company may lose its ability to earn revenue. For example, if an online bookstore’s web site is attacked, and legitimate customers are unable to make purchases, the company will lose revenue. An attacker that is interested in reducing the availability of a system typically launches a denial-of-service (DoS) attack. If the online bookstore web site were run on a single web server, and an attacker transmitted data to the web server to cause it to crash, it would result in a DoS attack in which legitimate customers would be unable to make purchases until the web server was started again. Most web sites are not run using just a single web server, but even multiple web servers running a web site can be vulnerable to an attack against availability.
In a distributed denial-of-service (DDoS) attack, perpetrators commandeer weakly protected personal computers and install malicious software (malware) on them that sends excessive amounts of network traffic to the victim web sites. The servers running the victim
web sites are then overwhelmed with the large number of packets arriving from the commandeered computers, and are unable to respond to legitimate users.
In February 2000, the eBay, E*TRADE, Amazon, CNN, and Yahoo web sites were victims of DDoS attacks, and some were disabled for almost an entire business day. This meant lost revenues and interruption of service for legitimate users. One study by the Yankee Group estimated the damage due to lost capitalization, lost revenues, and cost of security upgrades to be $1.2 billion (Kovar 2000); this cost figure was also cited in a FBI congressional statement on cybercrime (Gonzalez 2000).
We include availability as a security goal because it is sometimes difficult to provide a system that is both highly secure and available all the time. There is sometimes an interesting trade-off between availability and security. For example, if a computer is disconnected from the Internet and stored in a physically secure location where no one is allowed to access it, the
computer will be very secure. The problem is that such a computer is not readily available to anyone for use.
You want to design systems whose functionality is available to the largest possible intended audience while being as secure as possible. A service like PayPal (www.paypal.com), which supports person-to-person payments, is an example of a system that generates more revenue the more users take advantage of it, and as such, its availability is critical—users may get very upset if they cannot access their funds at a moment’s notice.
How does one achieve availability in a system? One method is to add redundancy to eliminate any single point of failure. For example, consider a telephone network. In such a network, phones connect to a switch (central office) that directs calls. If someone wants to
attack your ability to place phone calls, he might cut the telephone line that connects to that particular central office, and as a result you would not be able to make calls. Attackers sometimes cut off a victim’s ability to communicate prior to launching an attack.
One potential way to avoid single points of failure is to add redundancy. (Note that we are referring to a different type of redundancy than the redundancy we referred to in our discussion of message integrity.) A second switch can be added to the network so that if an attacker disables the first switch, the system will automatically connect you to the second.
Another potential DoS attack can be conducted by filling up a system’s disk. Suppose users are sharing a disk on a server that is used to store their photos. That server may be running critical processes that need some disk space themselves. If an attacker can sign up as a user (or compromise an existing account) and fill up the shared disk with his own photos (or garbage data), then the critical processes may not be able to properly function, and system failure may ensue. If you impose limits on the amount of disk space that each user can use, then even if the attacker is able to compromise one user’s account, he will only be able to use up a certain amount of disk space. The attacker would need to compromise additional accounts to use up more disk space. In such a system, even if a user is a legitimate, paying customer, that user should not be trusted with more than her fair share of disk space because her account could
be compromised.
Non-repudiation
The goal of non-repudiation is to ensure undeniability of a transaction by any of the parties involved. A trusted third party, such as Trent, can be used to accomplish this.
For example, let us say Alice interacted with Bob at some point, and she does not want Bob to deny that she interacted with him. Alice wants to prove to some trusted third party (i.e., Trent) that she did communicate with Bob. If, for instance, Alice sent a payment for a bill to Bob over the Web, she may want her payment to be non-repudiable. That is, she does not want Bob to be able to deny that he received the payment at some later point for any reason.
Alice, for example, may feel comfortable sending money to Trent, but not directly to Bob. Bob also trusts Trent. Trent may say to Bob, “Yes, Alice gave me the $500, so you can ship her the goods, and then I will pay you.” In such an example, Trent is playing the role of an escrow
agent, but trusted third parties may be able to serve in many other types of trusted roles beyond being escrow agents. Because Alice and Bob trust Trent, they may be able to conduct certain types of transactions that they could not have accomplished otherwise.
To illustrate another example in which Alice and Bob use the help of Trent, consider that Alice might want to sign a contract to be employed by Bob. Alice might want Trent to serve as a judge so that if Bob ever tries to pay her less than the salary specified by the contract, she can call on Trent to help enforce the contract. At the same time, Bob might not want Alice to show the employment contract to another potential employer to try to get a higher offer.
Alice and Bob can accomplish both of their goals by using Trent’s help. Bob can give Trent the employment contract. Trent tells Alice the amount of the offer, and agrees not to show the employment contract to other employers. Then, Alice can decide whether to accept the contract, but will not be able to use it to negotiate higher offers with other employers. Also, if Bob ever tries to cheat Alice by not issuing payment, Trent can intervene. Note that we assume that Trent is trusted to be impartial and will not collude with either Alice or Bob. To summarize, trusted third parties can help conduct non-repudiable transactions.
In general, non-repudiation protocols in the world of security are used to ensure that two parties cannot deny that they interacted with each other. In most non-repudiation protocols, as Alice and Bob interact, various sets of evidence, such as receipts, are generated. The
receipts can be digitally signed statements that can be shown to Trent to prove that a transaction took place.
Unfortunately, while non-repudiation protocols sound desirable in theory, they end up being very expensive to implement, and are not used often in practice.
Understanding Threats: Defacement, Infiltration, Phishing, Pharming, DoS
- Defacement
Consider what might be the most significant types of threats to a civil liberties web site or the White House web site. Since these web sites are created by organizations that advocate a particular political stance, an attacker is probably interested in making some kind of political statement against these organizations. Therefore, the most significant threat against such sites
may be defacement.
Defacement is a form of online vandalism in which attackers replace legitimate pages of an organization’s web site with illegitimate ones. In the years 1999 and 2001, for example, the White House web site was defaced by supposed anti-NATO activists (Dennis and Gold 1999) and Chinese hackers (Anderson 2001). In such defacement attacks, the attackers usually
replace the front page of a web site with one of their own choice.
Defacement is a very different type of threat than what other web sites, such as financial institutions or e-commerce vendors, might face. The attackers of these web sites may be most interested in compromising bank accounts or conducting credit card fraud. Therefore, how we design systems to be secure against attacks is dependent on the type of threats that we expect
them to face.
In the case of a politically oriented web site, say, www.whitehouse.gov, there may be a database where all of the content for that web site is stored. The owner of the web site may not care if an attacker gains read-only access to the information in that database—however, they do not want the attacker changing the information in that database. On the other hand, a financial institution or e-commerce web site does not want the attacker to be able to even read the information in the back-end database. If this happened, the credit card or account numbers of clients might be compromised. - Infiltration
In general, infiltration is an attack in which an unauthorized party gains full access to the resources of a computer system (including, but not limited to, use of the CPUs, disks, and network bandwidth). In later chapters, we study how buffer overflow, command injection, and
other software vulnerabilities can be used by attackers to infiltrate and “own” computers. In some defacement attacks, an attacker may have to infiltrate a web server to conduct the defacement. But the threat from infiltration can be quite different than that of defacement, depending on the type of web site. Consider the threat from an infiltration in which an attacker is able to write to a database running behind, say, a financial web site, but not be able to read its contents. If the attacker is able to write information to the database without reading
it, the situation might not be as bad as you might think. So long as you can detect that the attacker’s write took place, the situation can be mitigated. You can always restore the correct account numbers and balances from a backup database, and redo all transactions that occurred after the unauthorized writes to prevent your users from being affected. (For the purposes of
this example, we assume that even if an attacker is able to write the database content, the attacker would not be able to rewrite logs. In the real world, attackers can sometimes also rewrite logs, which presents greater problems.) So, in the case of the political web site, you
most importantly need to defend against an attacker who attempts to gain write capability, while in the case of a financial web site, it is most important to defend against an attacker who attempts to gain read capability.
The preceding example illustrates that different types of web sites are going to have different security goals. In the case of a political web site, the integrity of the web site content is the most significant concern, while in the case of a financial web site, integrity and confidentiality
of customer data are both of high importance.
Military web sites have still different security sensitivities. If a military web site is defaced, it might simply be embarrassing for them. Infiltration of a military web site, in which confidential or classified data is acquired by the attacker, however, could be a threat to national security. - Phishing
Phishing is an attack in which an attacker (in this case, a phisher) sets up a spoofed web site that looks similar to a legitimate web site. The attacker then attempts to lure victims to the spoofed web site and enter their login credentials, such as their usernames and passwords.
In a phishing attack, attackers typically lure users to the spoofed web site by sending them e-mails suggesting that there is some problem with their account, and that the user should click a link within the e-mail to “verify” their account information. The link included in the
e-mail, of course, is to the attacker’s web site, not the legitimate site. When unsuspecting users click the link, they arrive at the spoofed site and enter their login credentials. The site simply logs the credentials, and either reports an error to the user or redirects the user to the legitimate site (or both). The attacker later uses the logged credentials to log into the user’s account and transfer money from the user’s account to their own.
Why do users fall for clicking such links in e-mails sent by phishers? Phishers use various techniques to hide the fact that the link is to their illegitimate, spoofed site. Following is an example. First, in HTML documents, a link is constructed as follows:
<a>
Click here
</a>
When the e-mail is rendered by a browser, the link will look like this: Click here, and the destination address will not be apparent to an unsuspecting user.
An attacker can use code such as the following in an HTML e-mail sent to the victim:
<a>
http://www.legitimate-site.com/
</a>
The browser displays http://www.legitimate-site.com/, but when the user clicks the link, the browser loads the front page of www.evil-site.com since that is what is specified by the hyperlink reference (HREF) in the anchor (A) tag in the HTML e-mail. In real phishing attacks, the phisher might have the browser display www.paypal.com or www.google.com, and have the hyperlink reference point to www.paypa1.com (with a “1” instead of a “l”) or
www.gogole.com (“google” misspelled), respectively.
Slightly more sophisticated users may position their mouse over the link prior to clicking it. Many browsers will display the address of the destination site at the bottom of the browser window or in a pop-up tool tip. Such users may decide not to click the link if the actual destination site does not match their expectation. - Pharming
Pharming is another attack in which a user can be fooled into entering sensitive data into a spoofed web site. It is different than phishing in that the attacker does not have to rely on the user clicking a link in an e-mail. With pharming, even if the user correctly enters a URL
(uniform resource locator)—or web address—into a browser’s address bar, the attacker can still redirect the user to a malicious web site. When a user enters a URL—say, www.google.com/index.html—the browser needs to first figure out the IP address of the machine to which to connect. It extracts the domain name, www.google.com, from the URL, and sends the domain name to a domain name server (DNS). The DNS is responsible for translating the domain name to an IP address. The browser then connects to the IP address returned by the DNS and issues an HTTP request for index.html.
In a pharming attack, an attacker interferes with the machine name–to–IP address translation for which the DNS is responsible. The attacker can do so by, for instance, compromising the DNS server, and coaxing it into returning the attacker’s IP address instead of the legitimate one. If the user is browsing via HTTP, the attack can be unnoticeable to the user. However, if a user connects to a site using SSL, a pharming attack (in most cases) will result in a dialog box from the browser complaining that it was not able to authenticate the server due to a “certificate mismatch.”
5.Insider Threats
A surprisingly large percentage of attacks take place with the cooperation of insiders. Insiders could be, for instance, employees at a corporation who abuse their privileges to carry out malicious deeds. Employees are sometimes trusted with access to databases with customer information and employee records, copies of financial reports, or confidential information concerning product launches. Such information can be abused in the obvious ways: employee data could be sold to headhunters, customer credit card numbers could be sold on the black market, financial reports could facilitate insider trading, and product launches could be
leaked to the press.
As such, it is sometimes important to defend a system against the very people that are responsible for using it on a daily basis. Database administrators, for example, have traditionally been given the “keys to the entire kingdom,” and have complete access to all employee
and customer data stored in a database. System administrators similarly are given “superuser” access to all resources and data under the control of an operating system. Additional features are needed in both database and operating systems to provide for separation of privilege, the concept that an individual should only be given the privileges that he needs, without also being given unrestricted access to all data and resources in the system.
- Click Fraud
Prior to the advent of pay-per-click advertising, the threat of click fraud never existed. Pay-perclick advertising is an Internet advertising model in which advertisers provide advertisements to search engines. Search engines work with web site publishers to insert advertisements not
only on search result pages, but also on publisher’s content pages. The idea is that the entire page of content on a publisher’s site is considered a “query” for the search engine, and relevant ads are inserted on the publisher’s web page. Advertisers pay the search engine whenever users click on those advertisements. Web site publishers typically receive a revenue share for clicks on ads that occur on their site. Advertisers usually set a maximum daily budget for their advertising campaigns so that their advertising costs do not go unbounded.
Such a pay-per-click advertising system can be abused in several ways. We will describe two of them. In one type of click fraud, an advertiser will click a competitor’s ad with the intention of “maxing out” their competitor’s budget. Once their competitor’s budget has been exhausted, their ads may exclusively be shown to legitimate users. Such an attack ends up wasting the competitor’s financial resources, and allows the attacker to receive all the legitimate ad clicks that their competitor might have received. In another type of click fraud, a web site publisher will click on ads shown on their own web site in an attempt to receive the revenue share for those clicks. In some cases, the fraudulent publisher can hire a third-party firm or deploy malware to click on the ads.
Click fraud only became a relevant business threat when pay-per-click advertising started becoming big business. Similarly, credit and ATM card fraud only became an issue when credit card and electronic banking started to take off. Identity theft became a more serious issue
when enough electronic commerce took place that it became possible to do transactions based on exchanging numbers online or over the phone. - Denial-of-Service (DoS)
Another significant threat that e-commerce and financial institutions face are DoS attacks. In one type of DoS attack, the attacker sends so many packets to a web site that it cannot service the legitimate users that are trying access it. A financial institution or e-commerce site can
end up losing money and revenue as the result of such a DoS attack because its customers will not be able to conduct transactions or make online purchases. - Data Theft and Data Loss
In 2005 and 2006 alone, there were several incidents in which major organizations with reputable brands had significant amounts of sensitive data lost or stolen. Bank of America, ChoicePoint, and the Veteran’s Administration (VA) were among them. A list of data breaches since 2005 is available on the Privacy Rights Clearinghouse web page (www.privacyrights.org/ar/ChronDataBreaches.htm).
In Bank of America’s case, backup data tapes with sensitive information for over one million customers were lost as they were being transported from one location to another
(CNN/Money 2005; Lemos 2005). Bank of America provided one year of free credit monitoring services to all affected customers. ChoicePoint, one of the largest data aggregators in the United States, was scammed by fraudsters who set up approximately 50 impostor accounts and used them to query ChoicePoint’s database for social security numbers, dates of birth, and other sensitive information for 163,000 people (Hines 2005; PRC ChoicePoint 2005). ChoicePoint was fined $10 million by the Federal Trade Commission (FTC), and was forced to set up a $5 million fund to help identity theft victims (Sullivan 2006).
In the case of the VA, an employee who worked for Unisys, one of the VA’s subcontractors, took home computer equipment that had personal information for 26.5 million veterans stored on it, and the employee’s home was burglarized. The employee, who was not authorized to take the computer equipment home, was dismissed, and the employee’s supervisor resigned.
Due in part to a California state law passed in 2003, these companies were required to notify customers when these incidents occurred. It is possible that significant data theft had occurred prior to 2003, but companies were not required to report the theft to those affected.
The California law requires that companies report data breaches in which unencrypted data is accessed by an unauthorized party.
However, the original law, as written, may not apply if the customer data is encrypted—this is worrisome because although the data could be encrypted, the decryption key could be stored on the same media as the encrypted data. An attacker would simply need to use the decryption key to access the sensitive data! It might have been nice if the law also covered encrypted data, and also required that decryption keys be stored on media separate from the
data that they protect. A corresponding federal bill relating to data theft is in development at the time of writing of this book, although it is unclear whether it will be more or less stringent than the California law
HTTP & HTTPS protocols. Differences and Similarities
HTTP
Hypertext transfer protocol (HTTP) is the core communications protocol used to access the World Wide Web and is used by all of today’s web applications. It is
a simple protocol that was originally developed for retrieving static text-based resources. It has since been extended and leveraged in various ways to enable
it to support the complex distributed applications that are now commonplace.
HTTP uses a message-based model in which a client sends a request message and the server returns a response message. The protocol is essentially
connectionless: although HTTP uses the stateful TCP protocol as its transport mechanism, each exchange of request and response is an autonomous transaction and may use a different TCP connection.
HTTPS
The HTTP protocol uses plain TCP as its transport mechanism, which is unencrypted and therefore can be intercepted by an attacker who is suitably positioned on the network. HTTPS is essentially the same application-layer protocol as HTTP but is tunneled over the secure transport mechanism, Secure Sockets Layer (SSL). This protects the privacy and integrity of data passing over the network, reducing the possibilities for noninvasive interception attacks. HTTP requests and responses function in exactly the same way regardless of whether
SSL is used for transport.
Link:
https://techdifferences.com/difference-between-http-and-https.html
Basic HTTP Authentication
In the context of an HTTP transaction, basic access authentication is a method for an HTTP user agent (e.g. a web browser) to provide a user name and password when making a request. In basic HTTP authentication, a request contains a header field in the form of Authorization: Basic , where credentials is the base64 encoding of id and password joined by a single colon :.
Link:
https://en.wikipedia.org/wiki/Basic_access_authentication
Password Security
Hashing
In an attempt to remedy the situation, you could decide not to store passwords “in the clear.” Instead, you could store an encrypted version of the passwords, and decrypt the passwords in the file whenever you need to check them. To do so, you could use a symmetric encryption algorithm, such as AES (Advanced Encryption Standard). You would need to keep track of a key used to encrypt the passwords, and then you would need to determine where to store the key. Storing the key in the password file itself would be a bad idea, since then an attacker that gets hold of the password file could also decrypt all of the passwords in the file. If the key is stored anywhere on the same system as the password file, in fact, that system still becomes an extremely valuable attack target.
Instead of two-way, symmetric encryption, it’s better to have a mechanism that allows you to store an “encrypted” version of the password in the file, and lets you verify the password that the user enters upon login. You really don’t need to decrypt the password so long as you can verify that the user typed in the correct one. When the user enters a password in an attempt to log in, you can encrypt the user-entered password and compare it to the one in the file. What you need is sort of a “one-way encryption,” in which you can only encrypt the user’s password, but are never able to decrypt the version of the password stored in the password
file. If you store only one-way encrypted passwords in the password file, even if an attacker were to get hold of the password file, he would not be able to decrypt any of the users’ passwords.
To help you securely implement a password file, a more suitable cryptographic primitive than two-way, symmetric encryption is a one-way hash function. A hash function, h, takes a string p as input, and produces h(p). Due to the nature of how a hash function works, it is computationally infeasible to determine p from h(p). Some commonly used hash functions
are SHA-1 and MD5. While SHA-1 and MD5 are commonly used, there have been recent attacks against them, and it is advisable to use hash functions such as SHA-256 and SHA-512
instead.
An example of a password file that stores one-way hashed passwords is the following:
john:9Mfsk4EQh+XD2lBcCAvputrIuVbWKqbxPgKla7u67oo=
mary:AEd62KRDHUXW6tp+XazwhTLSUlADWXrinUPbxQEfnsI=
joe:J3mhF7Mv4pnfjcnoHZ1ZrUELjSBJFOo1r6D6fx8tfwU=
For each user listed in the preceding password file, a SHA-256 hash of the password is stored. For example, instead of directly storing John’s password, “automobile,” in the password file, the file stores 9Mfsk4EQ… in place of it.
When John’s password needs to be checked, the hash of the password that is entered is computed and compared against the hash in the password file, as shown in Figure 9-1. The
advantage of storing hashed passwords in the password file is that even if an attacker were to steal the password file, she would not be able to determine that John’s password is “automobile” just by looking at the file.
Offline Dictionary Attacks
Even with the preceding slightly more sophisticated mini–password manager that uses hashing, given the password file, the attacker can still attempt to determine some users’ passwords due to the fact that most users do not choose good passwords. Often, users will choose passwords that happen to be words in the dictionary (such as “automobile” or “balloon”), street names, company names, or other well-known strings. A good attacker can easily build a dictionary of words, common street names, common names of companies, and so forth; and use such a dictionary to mount an attack, as shown in Figure 9-2. If the attacker knows that you are using the SHA-256 hash function to store one-way encrypted versions of passwords, the attacker can iterate through all the words in a dictionary and compute the SHA-256 hashes of them. For instance, the attacker’s dictionary might be as follows:
automobile
aardvark
balloon
doughnut
…
The attacker can compute the following dictionary of hashes:
automobile:9Mfsk4EQh+XD2lBcCAvputrIuVbWKqbxPgKla7u67oo=
aardvark:z5wcuJWEv4xBdqN8LJVKjcVgd9O6Ze5EAR5iq3xjzi0=
balloon:AEd62KRDHUXW6tp+XazwhTLSUlADWXrinUPbxQEfnsI=
doughnut:tvj/d6R4b9t7pzSzlYDJZV4w2tmxBZn7YSmUCoNVx/E=
…
Now, the attacker will simply look for matches between the hashes in the password file and the hashes that she has computed! For example, since AEd62KRD… appears in the password file as Mary’s hashed password, the attacker knows that “balloon” must be Mary’s password!
Such an attack is called an offline dictionary attack, and is usually geared at determining some user’s password. The attacker may not care which user’s password is determined so long as she can determine some user’s password.
The attack is called “offline” because the attacker is not required to actually try username and password combinations online against a real system to conduct her attack, as she has
possession of the password file. It would be ideal if the only way for the attacker to guess passwords were for her to try them against the online running system. By ensuring this, you can
detect the attacker’s attempts to guess the passwords for particular usernames. However, if an attacker gains possession of the password file, she will be able to conduct a dictionary attack without your knowledge.
A natural question to ask is whether there might be some way to defend against an offline dictionary attack even when the attacker gets hold of the password file containing the hashed
passwords. While it might be difficult to make the offline dictionary attack impossible, you can raise the level of effort required on the part of the attacker with a technique called salting.
Salting
Salting is the practice of including additional information in the hash of the password. To illustrate how salting works and why it makes the attacker’s job harder, we first modify the structure of the password file. Instead of just having the password file store a username and a hashed password, we include a third field for a random number in the password file. When a user—for example, John—creates his account, instead of just storing John’s username and hashed password, we choose a random number called the salt (see Figure 9-3). Instead of just storing the hash of John’s hashed password, “automobile” in this case, we create a string that is
the concatenation of John’s password and the salt, and store the hash of that string in a file.
The entry in the password file may look as follows:
john:ScF5GDhWeHr2q5m7mSDuGPVasV2NHz4kuu5n5eyuMbo=:1515
In the preceding entry, ScF5GDhW… is the hash of John’s password, “automobile,” concatenated with the salt, 1515. That is, h(automobile|1515) = ScF5GDhW
Before we used salting, all that an attacker needed to do was go through a dictionary and hash all of the words to look for matches in the password file. However, what the attacker has to do now is a bit more complicated. The passwords are now hashed together with a salt. The attacker now needs to try combinations of dictionary words concatenated with salts to look for matches in the password file. Whereas the attacker just had to compute the hash of “automobile” before, and look for matches for the hash of “automobile” somewhere in the password file, the attacker now needs to hash the word “automobile” together with salts to look for
matches in the password file.
The good news about salting is that if the attacker is interested in compromising some arbitrary user’s account in the password file, she now needs to build a dictionary of hashes for every possible value of salt. If the dictionary is of size n, and the salts are k bits in length, then the attacker now has to hash 2k n strings instead of only n (in the case that salts are not used). So, it makes the attacker’s job 2k times harder, with only a constant number of additional operations required on behalf of the server to verify passwords. Password salting raises the bar of effort an attacker must expend, while keeping the amount of work the password system has to do approximately the same.
The bad news is that if the attacker is interested in compromising a particular victim’s account, she just needs to hash every possible dictionary word with the salt used for that victim’s account. Password salting has its limitations in that it does not absolutely prevent offline dictionary attacks, and is most effective against an attacker that does not have a particular victim account in mind. Password salting only makes the attacker’s job harder, as an attacker that can easily compute 2k n hashes will still be able to conduct an offline dictionary attack to crack into some user account. Also, while salting helps with a brute-force, offline dictionary attack against some user account, it does not do as well against a chosen-victim attack in which the attacker wants to determine the password for a particular user’s account—in that case, the attacker only has to compute hashes for each word in the dictionary using the victim’s salt.
Online Dictionary Attacks
In online dictionary attacks, the attacker actively tries username and password combinations using a live, running system, instead of, say, computing hashes and comparing them against those in some acquired password file. If an attacker cannot acquire a copy of the password file, and is limited to conducting online dictionary attacks, it at least allows you to monitor the attacker’s password guessing. As we mentioned in Section 3.2.3, if a large number of failed logins are coming from one or more IP addresses, you can mark those IP addresses as suspicious. Subsequent login attempts from suspicious IPs can be denied, and additional steps can be taken to mitigate the online dictionary attack.
In the password security schemes that we have considered thus far, if the user is logging in from a client, the user’s password is sent over the network to the server. The server sees the password in the clear. (Even if the password is transmitted over SSL and encrypted in transit to the server, the password is decrypted and made available to the server for verification.) If the server can be impersonated, as in a phishing attack (see Section 2.1.3), the impersonator will receive the user’s password. The impersonator can then log into the real server claiming to be the legitimate user. Hence, it may be worthwhile to use approaches in which the server can verify the client’s possession of the password without requiring the client to explicitly transmit the password to the server. Password-authenticated key exchange (PAKE) and zero-knowledge
proofs are examples of cryptographic protocols that can allow a client to prove its knowledge of a password without disclosing the password itself (Jakobsson, Lipmaa, and Mao 2007).
However, such protocols have not proved to be efficient or commercially viable yet, and are beyond the scope of this chapter.
Additional Password Security Techniques
In addition to the basic hashing and salting techniques for password management, we also cover a number of other approaches that can help you manage passwords more securely. Not all of them may be appropriate for your application, and you may want to sample using the ones that make the most sense to help protect your specific user base. Some of the enhancements that follow can be used to increase the difficulty of constructing an attack.
1. Strong Passwords
It is important to encourage users to choose strong passwords that cannot be found in a dictionary and that are not simple concatenations of dictionary words. Requiring users to choose
strong passwords is an important part of thwarting dictionary attacks.
Some suggestions for creating strong passwords include making them as long as possible; including letters, numbers, and special characters; and using passwords that are different from those that you have used on other systems. You can also create strong passwords from long phrases. For example, consider the phrase “Nothing is really work unless you would rather be doing something else” (a quote by J.M. Barrie). If it is easy for you to remember such a quote, you can transform it into a password such as n!rWuUwrbds3. The first letter of each word in the phrase has been used, and some of the characters have been transformed to
punctuation marks, uppercase and phonetically similar characters, and numbers.
However, since some users may not choose strong passwords, it is important to protect the password file from falling into the attacker’s hands, even if salting is used. In older versions of UNIX, the password file used to be readable by all and stored in /etc/passwd. The /etc/passwd file is still present in newer versions of UNIX, but does not store password hashes or salts. Instead, password hashes and salts are stored in a /etc/shadow file that is only accessible
to the system administrator and other privileged users.
2. “Honeypot” Passwords
To help catch attackers trying to hack into a password security system, you can use simple passwords and usernames as “honey” to attract the attackers. For instance, many systems might have a default username called “guest” that has the password “guest.” You do not expect normal users to use this guest account. You can set up your system such that if the attacker tries to log in using a default password for the guest user, you can set that as a trigger so that your system administration staff can be notified. When somebody tries logging into it, you know that it may be an indication that an attacker is trying to break into your system.
Once the system administration staff is notified that somebody might be trying to break into the system, you can then take action to identify which IP address the attacker is coming from. You can also allow the attacker to continue using the guest account to help you learn more about what the attacker is trying to get at.
3. Password Filtering
Since most users might not like to have passwords chosen or even suggested for them, you could let the users choose passwords for themselves. However, if a user chooses a password that is in the dictionary or identified by your password security system as easy to guess, you could then filter that password and require the user to choose another one.
4. Aging Passwords
Even if the user chooses a good password, you might not want the user to use that password for the entire time that they are going to use your system. Every time that the user enters the password, there is a potential opportunity that an attacker can be looking over the user’s shoulder. Therefore, you could encourage users to change their passwords at certain time intervals—every month, every three months, or every year, for example. Another way to “age” passwords is to only allow each password that the user chooses to work a certain number of times.
Note that if you require users to change their passwords too often, they might start writing them down or doing other potentially insecure things to try to remember what their current password is. At the same time, if you do not require them to change their passwords often enough, the attacker has more opportunities within a given time period to attempt to acquire their passwords.
5. Pronounceable Passwords
Password security system designers noticed that users sometimes want to choose dictionary words because they are easy to remember. Hence, they decided to create pronounceable
passwords that may be easy to remember because users can sound them out, but would not be words in the dictionary. Pronounceable passwords are made up of syllables and vowels
connected together that are meant to be easy to remember. Some examples of pronounceable passwords—generated by a package called Gpw (www.multicians.org/thvv/gpw.html)—are
ahrosios, chireckl, and harciefy.
6. Limited Login Attempts
You could give your users a limited number of login attempts before you disable or lock their account. The advantage of limited login attempts is that if an attacker is trying to break into a
particular user’s account, he will only be given a fixed number of tries (say, three or four). The downside of using this approach is that if a legitimate user happens to incorrectly enter her password just a few times, then her account will be locked. A legitimate user may then need to call a system administrator or customer service number to have her password reset.
Another disadvantage of account locking is that it gives an attacker the ability to launch a DoS attack against one or more accounts. For instance, if the attacker gets a lot of usernames in the system and tries a couple of random guesses for each username’s password, the attacker can end up locking a large fraction of the user accounts in a system.
7. Artificial Delays
You could introduce increasing artificial delays when users attempt to log into a system over the network. The first time that a user attempts to log in, you present the user with the username and password prompt immediately. If the user enters an incorrect password, you can have the system wait 2 seconds before allowing the user to try again. If the user still gets the username or password wrong, you can have it wait 4 seconds before allowing that user to try again. Generalizing, you can exponentially increase the amount of time before letting a particular client with a particular IP address try to log into your network.
For regular users, this might introduce an inconvenience. If a regular user happens to get his password wrong three times, he may have to wait on the order of 8 seconds before being
allowed to try again.
An online password-guessing attack against a system that introduces artificial delays may require many IPs to try many different combinations of a user’s password before the attacker
gets one right. By introducing artificial delays into the system, you decrease the number of different guesses that the attacker can try in a given unit of time.
8. Last Login
Another enhancement that you can employ to increase the security of your password system is that every time a user logs in, you can display the last date, time, and potentially even location from which the user logged in. You could educate users to pay attention to when and where their last login attempts were from. If a user ever notices an inconsistency between when and where she last logged in and when and where the system reported that she last
logged in, she can notify the system administration staff or customer service.
For example, if a user usually logs in once a month from her home in California, but upon login, the system informs her that the last time she logged in was at 3 a.m., two weeks ago in Russia, she will realize that something is wrong. She can then notify the appropriate personnel, and the security issue can be reactively dealt with. If the last login mechanism did not exist, then the occurrence of the attack may not have been noticed.
9. Image Authentication
One recent attempt at making password systems less susceptible to phishing attacks has been to use images as a second factor in conducting authentication. Upon account creation, a user
is asked to choose an image in addition to a username and password. When the user is presented with a login page, the user is asked for his username first. Upon entering a username, the user is shown the image that he chose when signing up, in addition to being prompted for his password.
The intent of using image authentication is to prevent the user from providing his password to an impostor web site. While an impostor web site may be able to spoof a legitimate web site’s home page, the impostor will not know what image a user has selected. Hence,
after a user enters his username into a web site that uses image authentication, he should not enter his password if the web site does not display the same image that he selected when he
signed up.
At the time of writing of this book, image authentication schemes have only recently been deployed by companies such as PassMark (acquired by RSA Security, which was acquired by EMC) on web sites such as www.bankofamerica.com. So far, their true effectiveness still remains to be seen. Many financial institutions implement image authentication to satisfy the FFIEC (Federal Financial Institutions Examination Council) guidance that requires two-factor authentication. However, many users are often not provided enough up-front education about why they are asked to select images, and do not know the purpose of the image when
they see it on the login page of a given web site that they might use. If a phisher were to simply not show an image, and fall back to prompting the user for a username and password, it is unclear as to how many users would fall prey to the phishing attack.
10. One-Time Passwords
The final type of password system we would like to touch upon is called a one-time password system. In all of the approaches that we have talked about so far, one of things that gives the
attacker some advantage is that each user uses their password multiple times to log into a system. Therefore, when an account is created for a user and that user chooses a password, the
user is allowed to use that password multiple times to log in. However, every time that a user logs into a system, there is a potential opportunity for that password to be eavesdropped on or
found by an attacker. This is especially a problem if that password has not changed over a long period of time.
In a one-time password system, every time a user logs in, the user is expected to log in with a different password. The one-time password system used to be implemented by giving users lists of passwords. These lists were essentially small books full of passwords customized for users each time they would log in. For example, the first time that the user logs in, she would use the first password on the list. The next time she logs in, she would be instructed to use the second password on the list. The system could also choose a random password number and expect the user to enter that number. These lists, however, became cumbersome for users.
Most one-time password systems today are ones in which the user is given some device with a small amount of computing power that is used to compute passwords. The device can be used as a source of passwords. The users, when they log into a system, take out the onetime password device, read off the password from that device, and enter it into the computer system. All the passwords that are generated by this device are based off of some cryptographic algorithm. There is typically some seed (initial value) that is used to generate an entire stream of many passwords over time. That seed is also known by the server. Therefore, given the current time and the seed, the server can check that the password the user is entering is correct.
The functionality provided in these one-time password devices are now integrated into PDAs, cell phones, and other mobile devices that users already carry. One-time passwords end up being a very good system for ensuring password security. In fact, some banks have started to give one-time password devices to some of their users in order to log into their web-based bank accounts. Hopefully, there will be more usage of one-time passwords in the future.
Core Defense Mechanisms
The fundamental security problem with web applications — that all user input is untrusted — gives rise to a number of security mechanisms that applications use to defend themselves against attack. Virtually all applications employ mechanisms that are conceptually similar, although the details of the design and the effectiveness of the implementation vary greatly.
The defense mechanisms employed by web applications comprise the following
core elements:
- Handling user access to the application’s data and functionality to prevent users from gaining unauthorized access
- Handling user input to the application’s functions to prevent malformed input from causing undesirable behavior
- Handling attackers to ensure that the application behaves appropriately when being directly targeted, taking suitable defensive and offensive measures to frustrate the attacker
- Managing the application itself by enabling administrators to monitor its activities and configure its functionality
Because of their central role in addressing the core security problem, these mechanisms also make up the vast majority of a typical application’s attack surface. If knowing your enemy is the first rule of warfare, then understanding these mechanisms thoroughly is the main prerequisite for being able to attack applications effectively. If you are new to hacking web applications (and even if you are not), you should be sure to take time to understand how these core mechanisms work in each of the applications you encounter, and identify the
weak points that leave them vulnerable to attack.
Handling User Access
A central security requirement that virtually any application needs to meet is controlling users’ access to its data and functionality. A typical situation has several different categories of user, such as anonymous users, ordinary authenticated users, and administrative users. Furthermore, in many situations different users are permitted to access a different set of data. For example, users of a web mail application should be able to read their own e-mail but not other people’s.
Most web applications handle access using a trio of interrelated security mechanisms:
- Authentication
- Session management
- Access control
Each of these mechanisms represents a significant area of an application’s attack surface, and each is fundamental to an application’s overall security
posture. Because of their interdependencies, the overall security provided by the mechanisms is only as strong as the weakest link in the chain. A defect in any single component may enable an attacker to gain unrestricted access to the application’s functionality and data.
Authentication
The authentication mechanism is logically the most basic dependency in an application’s handling of user access. Authenticating a user involves establishing that the user is in fact who he claims to be. Without this facility, the application would need to treat all users as anonymous — the lowest possible level of trust.
The majority of today’s web applications employ the conventional authentication model, in which the user submits a username and password, which the application checks for validity. Figure 2-1 shows a typical login function. In security-critical applications such as those used by online banks, this basic model is usually supplemented by additional credentials and a multistage login process. When security requirements are higher still, other authentication models may be used, based on client certificates, smartcards, or challenge-response
tokens. In addition to the core login process, authentication mechanisms often employ a range of other supporting functionality, such as self-registration,
account recovery, and a password change facility.
Despite their superfi cial simplicity, authentication mechanisms suffer from a wide range of defects in both design and implementation. Common problems
may enable an attacker to identify other users’ usernames, guess their passwords, or bypass the login function by exploiting defects in its logic. When you are attacking a web application, you should invest a signifi cant amount of attention to the various authentication-related functions it contains. Surprisingly frequently, defects in this functionality enable you to gain unauthorized access to sensitive data and functionality.
Session Management
The next logical task in the process of handling user access is to manage the authenticated user’s session. After successfully logging in to the application, the
user accesses various pages and functions, making a series of HTTP requests from his browser. At the same time, the application receives countless other requests
from different users, some of whom are authenticated and some of whom are anonymous. To enforce effective access control, the application needs a way to identify and process the series of requests that originate from each unique user.
Virtually all web applications meet this requirement by creating a session for each user and issuing the user a token that identifi es the session. The session itself is a set of data structures held on the server that track the state of the user’s interaction with the application. The token is a unique string that the application maps to the session. When a user receives a token, the browser automatically submits it back to the server in each subsequent HTTP request, enabling the application to associate the request with that user. HTTP cookies are the standard method for transmitting session tokens, although many applications use hidden form fields or the URL query string for this purpose. If a user does not make a request for a certain amount of time, the session is ideally expired, as shown in Figure 2-2.
In terms of attack surface, the session management mechanism is highly dependent on the security of its tokens. The majority of attacks against it seek to
compromise the tokens issued to other users. If this is possible, an attacker can masquerade as the victim user and use the application just as if he had actually
authenticated as that user. The principal areas of vulnerability arise from defects in how tokens are generated, enabling an attacker to guess the tokens issued to other users, and defects in how tokens are subsequently handled, enabling an attacker to capture other users’ tokens.
A small number of applications dispense with the need for session tokens by using other means of reidentifying users across multiple requests. If HTTP’s built-in authentication mechanism is used, the browser automatically resubmits the user’s credentials with each request, enabling the application to identify the user directly from these. In other cases, the application stores the state information on the client side rather than the server, usually in encrypted form to prevent tampering.
Access Control
The final logical step in the process of handling user access is to make and enforce correct decisions about whether each individual request should be permitted or
denied. If the mechanisms just described are functioning correctly, the application knows the identity of the user from whom each request is received. On this basis, it needs to decide whether that user is authorized to perform the action, or access the data, that he is requesting, as shown in Figure 2-3.
The access control mechanism usually needs to implement some fine-grained logic, with different considerations being relevant to different areas of the
application and different types of functionality. An application might support numerous user roles, each involving different combinations of specifi c privileges.
Individual users may be permitted to access a subset of the total data held within the application. Specific functions may implement transaction limits and other
checks, all of which need to be properly enforced based on the user’s identity.
Because of the complex nature of typical access control requirements, this mechanism is a frequent source of security vulnerabilities that enable an attacker to gain unauthorized access to data and functionality. Developers often make flawed assumptions about how users will interact with the application and frequently make oversights by omitting access control checks from some application functions. Probing for these vulnerabilities is often laborious, because essentially the same checks need to be repeated for each item of functionality.
Because of the prevalence of access control flaws, however, this effort is always a worthwhile investment when you are attacking a web application. Chapter
8 describes how you can automate some of the effort involved in performing rigorous access control testing.
Handling User Input
Recall the fundamental security problem described in Chapter 1: All user input is untrusted. A huge variety of attacks against web applications involve submitting unexpected input, crafted to cause behavior that was not intended by the application’s designers. Correspondingly, a key requirement for an application’s
security defenses is that the application must handle user input in a safe manner.
Input-based vulnerabilities can arise anywhere within an application’s functionality, and in relation to practically every type of technology in common use.
“Input validation” is often cited as the necessary defense against these attacks.
However, no single protective mechanism can be employed everywhere, and defending against malicious input is often not as straightforward as it sounds.
Varieties of Input
A typical web application processes user-supplied data in many different forms. Some kinds of input validation may not be feasible or desirable for all these forms of input. Figure 2-4 shows the kind of input validation often performed by a user registration function.
In many cases, an application may be able to impose very stringent validation checks on a specifi c item of input. For example, a username submitted to a login function may be required to have a maximum length of eight characters and contain only alphabetical characters. In other cases, the application must tolerate a wider range of possible input.
For example, an address field submitted to a personal details page might legitimately contain letters, numbers, spaces, hyphens, apostrophes, and other characters. However, for this item, restrictions still can be feasibly imposed. The data should not exceed a reasonable length limit (such as 50 characters) and should
not contain any HTML markup.
In some situations, an application may need to accept arbitrary input from users. For example, a user of a blogging application may create a blog whose subject is web application hacking. Posts and comments made to the blog may quite legitimately contain explicit attack strings that are being discussed. The application may need to store this input in a database, write it to disk, and display it back to users in a safe way. It cannot simply reject the input just because it looks potentially malicious without substantially diminishing the application’s value to some of its user base.
In addition to the various kinds of input that users enter using the browser interface, a typical application receives numerous items of data that began their
life on the server and that are sent to the client so that the client can transmit them back to the server on subsequent requests. This includes items such as
cookies and hidden form fields, which are not seen by ordinary users of the application but which an attacker can of course view and modify. In these cases,
applications can often perform very specific validation of the data received. For example, a parameter might be required to have one of a specific set of known values, such as a cookie indicating the user’s preferred language, or to be in a specific format, such as a customer ID number. Furthermore, when an application detects that server-generated data has been modified in a way that is not possible for an ordinary user with a standard browser, this often indicates that the user is attempting to probe the application for vulnerabilities. In these cases, the application should reject the request and log the incident for potential investigation (see the “Handling Attackers” section later in this chapter).
Approaches to Input Handling
Various broad approaches are commonly taken to the problem of handling user input. Different approaches are often preferable for different situations and different types of input, and a combination of approaches may sometimes be desirable.
“Reject Known Bad”
This approach typically employs a blacklist containing a set of literal strings or patterns that are known to be used in attacks. The validation mechanism blocks any data that matches the blacklist and allows everything else.
In general, this is regarded as the least effective approach to validating user input, for two main reasons. First, a typical vulnerability in a web application can be exploited using a wide variety of input, which may be encoded or represented in various ways. Except in the simplest of cases, it is likely that a blacklist will omit some patterns of input that can be used to attack the application. Second, techniques for exploitation are constantly evolving. Novel methods for exploiting existing categories of vulnerabilities are unlikely to be blocked
by current blacklists.
Many blacklist-based filters can be bypassed with almost embarrassing ease by making trivial adjustments to the input that is being blocked. For example:
- If SELECT is blocked, try SeLeCt
- If or 1=1– is blocked, try or 2=2–
- If alert(‘xss’) is blocked, try prompt(‘xss’)
In other cases, filters designed to block specific keywords can be bypassed by using nonstandard characters between expressions to disrupt the tokenizing performed by the application. For example:
SELECT/foo/username,password/foo/FROM/foo/users
<img></img>
Finally, numerous blacklist-based filters, particularly those implemented in web application firewalls, have been vulnerable to NULL byte attacks. Because of the different ways in which strings are handled in managed and unmanaged execution contexts, inserting a NULL byte anywhere before a blocked expression can cause some filters to stop processing the input and therefore not identify
the expression. For example:
%00alert(1)
“Accept Known Good”
This approach employs a whitelist containing a set of literal strings or patterns, or a set of criteria, that is known to match only benign input. The validation mechanism allows data that matches the whitelist and blocks everything else.
For example, before looking up a requested product code in the database, an application might validate that it contains only alphanumeric characters and is exactly six characters long. Given the subsequent processing that will be done
on the product code, the developers know that input passing this test cannot possibly cause any problems.
In cases where this approach is feasible, it is regarded as the most effective way to handle potentially malicious input. Provided that due care is taken in constructing the whitelist, an attacker will be unable to use crafted input to
interfere with the application’s behavior. However, in numerous situations an application must accept data for processing that does not meet any reasonable criteria for what is known to be “good.” For example, some people’s names contain an apostrophe or hyphen. These can be used in attacks against databases, but it may be a requirement that the application should permit anyone to register under his or her real name. Hence, although it is often extremely effective, the whitelist-based approach does not represent an all-purpose solution to the problem of handling user input.
Sanitization
This approach recognizes the need to sometimes accept data that cannot be guaranteed as safe. Instead of rejecting this input, the application sanitizes it in various ways to prevent it from having any adverse effects. Potentially malicious characters may be removed from the data, leaving only what is known to be safe, or they may be suitably encoded or “escaped” before further processing is performed.
Approaches based on data sanitization are often highly effective, and in many situations they can be relied on as a general solution to the problem of malicious input. For example, the usual defense against cross-site scripting attacks is to HTML-encode dangerous characters before these are embedded into pages of the application (see Chapter 12). However, effective sanitization may be difficult to
achieve if several kinds of potentially malicious data need to be accommodated within one item of input. In this situation, a boundary validation approach is desirable, as described later.
Safe Data Handling
Many web application vulnerabilities arise because user-supplied data is processed in unsafe ways. Vulnerabilities often can be avoided not by validating the input itself but by ensuring that the processing that is performed on it is
inherently safe. In some situations, safe programming methods are available that avoid common problems. For example, SQL injection attacks can be prevented through the correct use of parameterized queries for database access
(see Chapter 9). In other situations, application functionality can be designed in such a way that inherently unsafe practices, such as passing user input to an operating system command interpreter, are avoided.
This approach cannot be applied to every kind of task that web applications need to perform. But where it is available, it is an effective general approach to handling potentially malicious input.
Semantic Checks
The defenses described so far all address the need to defend the application against various kinds of malformed data whose content has been crafted to interfere with the application’s processing. However, with some vulnerabilities the input supplied by the attacker is identical to the input that an ordinary, nonmalicious user may submit. What makes it malicious is the different circumstances under which it is submitted. For example, an attacker might seek to gain access to another user’s bank account by changing an account number transmitted in a hidden form field. No amount of syntactic validation will distinguish between the user’s data and the attacker’s. To prevent unauthorized access, the application needs to validate that the account number submitted belongs to the user who has submitted it.
Handling Attackers
Anyone designing an application for which security is remotely important must assume that it will be directly targeted by dedicated and skilled attackers. A key function of the application’s security mechanisms is being able to handle and react to these attacks in a controlled way. These mechanisms often incorporate a mix of defensive and offensive measures designed to frustrate an attacker as
much as possible and give the application’s owners appropriate notification and evidence of what has taken place. Measures implemented to handle attackers typically include the following tasks:
- Handling errors
- Maintaining audit logs
- Alerting administrators
- Reacting to attacks
Handling Errors
However careful an application’s developers are when validating user input, it is virtually inevitable that some unanticipated errors will occur. Errors resulting from the actions of ordinary users are likely to be identified during functionality and user acceptance testing. Therefore, they are taken into account before the application is deployed in a production context. However, it is difficult to anticipate every possible way in which a malicious user may interact with the
application, so further errors should be expected when the application comes under attack.
A key defense mechanism is for the application to handle unexpected errors gracefully, and either recover from them or present a suitable error message to the user. In a production context, the application should never return any system-generated messages or other debug information in its responses. As you will see throughout this book, overly verbose error messages can greatly assist malicious users in furthering their attacks against the application. In some
situations, an attacker can leverage defective error handling to retrieve sensitive information within the error messages themselves, providing a valuable channel for stealing data from the application. Figure 2-6 shows an example of
an unhandled error resulting in a verbose error message.
Most web development languages provide good error-handling support through try-catch blocks and checked exceptions. Application code should make extensive use of these constructs to catch specific and general errors and
handle them appropriately. Furthermore, most application servers can be configured to deal with unhandled application errors in customized ways, such as by presenting an uninformative error message.
Effective error handling is often integrated with the application’s logging mechanisms, which record as much debug information as possible about unanticipated errors. Unexpected errors often point to defects within the application’s defenses that can be addressed at the source if the application’s owner has the required information.
Maintaining Audit Logs
Audit logs are valuable primarily when investigating intrusion attempts against an application. Following such an incident, effective audit logs should enable the application’s owners to understand exactly what has taken place, which vulnerabilities (if any) were exploited, whether the attacker gained unauthorized access to data or performed any unauthorized actions, and, as far as possible, provide evidence of the intruder’s identity.
In any application for which security is important, key events should be loggedas a matter of course. At a minimum, these typically include the following:
- All events relating to the authentication functionality, such as successful and failed login, and change of password
- Key transactions, such as credit card payments and funds transfers
- Access attempts that are blocked by the access control mechanisms
- Any requests containing known attack strings that indicate overtly malicious intentions
In many security-critical applications, such as those used by online banks, every client request is logged in full, providing a complete forensic record that can be used to investigate any incidents.
Effective audit logs typically record the time of each event, the IP address from which the request was received, and the user’s account (if authenticated).
Such logs need to be strongly protected against unauthorized read or write access. An effective approach is to store audit logs on an autonomous system that accepts only update messages from the main application. In some situations, logs may be fl ushed to write-once media to ensure their integrity in the event of a successful attack.
In terms of attack surface, poorly protected audit logs can provide a gold mine of information to an attacker, disclosing a host of sensitive information such as session tokens and request parameters. This information may enable the attacker
to immediately compromise the entire application, as shown in Figure 2-7.
Alerting Administrators
Audit logs enable an application’s owners to retrospectively investigate intrusion attempts and, if possible, take legal action against the perpetrator. However, in many situations it is desirable to take much more immediate action, in real time,
in response to attempted attacks. For example, administrators may block the IP address or user account an attacker is using. In extreme cases, they may even take the application offline while investigating the attack and taking remedial action. Even if a successful intrusion has already occurred, its practical effects may be mitigated if defensive action is taken at an early stage.
In most situations, alerting mechanisms must balance the conflicting objectives of reporting each genuine attack reliably and of not generating so many alerts that these come to be ignored. A well-designed alerting mechanism can
use a combination of factors to diagnose that a determined attack is under way and can aggregate related events into a single alert where possible. Anomalous events monitored by alerting mechanisms often include the following:
- Usage anomalies, such as large numbers of requests being received from a single IP address or user, indicating a scripted attack
- Business anomalies, such as an unusual number of funds transfers being made to or from a single bank account
- Requests containing known attack strings
- Requests where data that is hidden from ordinary users has been modified
Some of these functions can be provided reasonably well by off-the-shelf application firewalls and intrusion detection products. These typically use a mixture of signature- and anomaly-based rules to identify malicious use of the application and may reactively block malicious requests as well as issue alerts to administrators. These products can form a valuable layer of defense protecting a web application, particularly in the case of existing applications known to contain problems but where resources to fix these are not immediately available. However, their effectiveness usually is limited by the fact that each web application is different, so the rules employed are inevitably generic to some extent. Web application firewalls usually are good at identifying the
most obvious attacks, where an attacker submits standard attack strings in each request parameter. However, many attacks are more subtle than this. For example, perhaps they modify the account number in a hidden field to access
another user’s data, or submit requests out of sequence to exploit defects in the application’s logic. In these cases, a request submitted by an attacker may be identical to that submitted by a benign user. What makes it malicious are the
circumstances under which it is made.
In any security-critical application, the most effective way to implement realtime alerting is to integrate this tightly with the application’s input validation mechanisms and other controls. For example, if a cookie is expected to have one of a specific set of values, any violation of this indicates that its value has
been modified in a way that is not possible for ordinary users of the application. Similarly, if a user changes an account number in a hidden field to identify a different user’s account, this strongly indicates malicious intent. The application
should already be checking for these attacks as part of its primary defenses, and these protective mechanisms can easily hook into the application’s alerting mechanism to provide fully customized indicators of malicious activity.
Because these checks have been tailored to the application’s actual logic, with a fine-grained knowledge of how ordinary users should be behaving, they are much less prone to false positives than any off-the-shelf solution, however
configurable or easy-to-learn that solution may be.
Reacting to Attacks
In addition to alerting administrators, many security-critical applications contain built-in mechanisms to react defensively to users who are identified as potentially malicious.
Because each application is different, most real-world attacks require an attacker to probe systematically for vulnerabilities, submitting numerous requests containing crafted input designed to indicate the presence of various common
vulnerabilities. Effective input validation mechanisms will identify many of these requests as potentially malicious and block the input from having any undesirable effect on the application. However, it is sensible to assume that some bypasses to these filters exist and that the application does contain some actual vulnerabilities waiting to be discovered and exploited. At some point, an attacker working systematically is likely to discover these defects.
For this reason, some applications take automatic reactive measures to frustrate the activities of an attacker who is working in this way. For example, they might respond increasingly slowly to the attacker’s requests or terminate the
attacker’s session, requiring him to log in or perform other steps before continuing the attack. Although these measures will not defeat the most patient and determined attacker, they will deter many more casual attackers and will buy additional time for administrators to monitor the situation and take more
drastic action if desired.
Reacting to apparent attackers is not, of course, a substitute for fixing any vulnerabilities that exist within the application. However, in the real world, even the most diligent efforts to purge an application of security flaws may leave some exploitable defects. Placing further obstacles in the way of an attacker is an effective defense-in-depth measure that reduces the likelihood that any residual vulnerabilities will be found and exploited.
Managing the Application
Any useful application needs to be managed and administered. This facility often forms a key part of the application’s security mechanisms, providing a way for administrators to manage user accounts and roles, access monitoring and audit functions, perform diagnostic tasks, and configure aspects of the application’s functionality.
In many applications, administrative functions are implemented within the application itself, accessible through the same web interface as its core nonsecurity functionality, as shown in Figure 2-8. Where this is the case, the
administrative mechanism represents a critical part of the application’s attack surface. Its primary attraction for an attacker is as a vehicle for privilege escalation. For example:
- Weaknesses in the authentication mechanism may enable an attacker to gain administrative access, effectively compromising the entire application.
- Many applications do not implement effective access control of some of their administrative functions. An attacker may find a means of creating a new user account with powerful privileges.
- Administrative functionality often involves displaying data that originated from ordinary users. Any cross-site scripting flaws within the administrative interface can lead to compromise of a user session that is guaranteed to have powerful privileges.
- Administrative functionality is often subjected to less rigorous security testing, because its users are deemed to be trusted, or because penetration testers are given access to only low-privileged accounts. Furthermore, the functionality often needs to perform inherently dangerous operations,
involving access to files on disk or operating system commands. If an attacker can compromise the administrative function, he can often leverage it to take control of the entire server.