Start Secure. Stay Secure.TM Automated Cookie Analysis Are your web applications vulnerable? By Darrin Barrall Start Secure. Stay Secure.TM Automated Cookie Analysis Table of Contents Introduction The Basics Collecting Cookies Sets Randomness Encoding Subcookies Graphical Analysis Prediction Summary References Disclaimer The Business Case for Application Security About SPI Labs About SPI Dynamics About the WebInspect Product Line About the Author Contact Information 1 1 2 3 5 7 8 9 20 21 21 21 21 22 22 23 24 24 © 2005 SPI Dynamics, Inc. All Rights Reserved. No reproduction or redistribution without written permission. ii Start Secure. Stay Secure.TM Automated Cookie Analysis Introduction This paper examines a series of mathematical and theoretical attacks against Web applications that use cookies, and discusses the feasibility of automating the process and exploiting any weaknesses detected. The methodology reflects the perspective of an attacker who has no prior knowledge of how cookies are designed or implemented for a specific Web application. A Web-based system typically uses a cookie as a reference to data already stored on the server, and operates under the assumption that only a specific user knows the contents of the cookie. This system is vulnerable to attack if a malicious user can predict the cookie that will be assigned to another user. The attacker can then hijack a legitimate user's session by using the counterfeit cookie. When a Web server generates cookies in a truly random fashion, an intruder has little (if any) chance of correctly fabricating a cookie that can be used to attack the site. However, if an attacker can identify a pattern to the manner in which cookie values are assigned, he may easily generate an HTTP request that includes a cookie that will be accepted by the server. For example, if the Web server generates a series of four cookies having values of 2, 4, 6, and 8, it's a good bet that the fifth cookie will have a value of 10. Therefore, the ability to predict the value of a valid cookie is inversely related to the degree of "randomness" with which the cookie was generated. The Basics At a minimum, a cookie is name and value pair used to retain a value for future reference. For example: SmallCookie=1234. © 2005 SPI Dynamics, Inc. All Rights Reserved. No reproduction or redistribution without written permission. 1 Start Secure. Stay Secure.TM Automated Cookie Analysis The example shows a cookie named "SmallCookie" having a value of 1234. Frequently, several cookies are used, so each additional cookie should have a distinct name. Otherwise, a value intended to be stored would be lost. A cookie may have other attributes, such as an expiration date. It is possible that the attribute values may play a role during analysis, but this paper focuses on the value portion of the cookie. The common usage of cookies is to manage the state of a remote user's interaction with a web server. Keep this in mind when collecting cookies for analysis. A cookie collected before signing into a system may be radically different from the cookie assigned after signing in. Using a mix and match of these cookies could invalidate the analysis, so accurate data collection is an important first step. Collecting Cookies There are plenty of variables that determine or limit how many cookie samples are necessary for testing. These include: Algorithms used Acquisition time Computing resources Some statistical methods require thousands or more data points, while simpler procedures may require only two. If it takes 10 minutes to acquire each sample, maybe you should adjust your algorithms to work with a small number of samples. Brute-force number-crunching requires adequate computing hardware (unless you happen to be unnaturally patient). Reducing the amount of data to be processed can speed up the analysis. © 2005 SPI Dynamics, Inc. All Rights Reserved. No reproduction or redistribution without written permission. 2 Start Secure. Stay Secure.TM Automated Cookie Analysis While collecting the cookies, record a time associated with the cookie. This could be the server's timestamp in the HTTP response header, your local time, or some other convenient time. Having a known range of time values and associated cookie values is an aid when predicting a cookie value for a specific time in the future. Another factor to consider is the reproducibility of the conditions that are used to collect the cookies. Your software may use several different network endpoint ports. You could remove this influence by analyzing the cookies grouped by the port that received them. There is no requirement that the targeted web server use the remote port number. Consequently, there may be no advantage gained by grouping the cookies in this manner. Since the process used to compose the cookie is unknown, we can only hypothesize and then test. Sets Since we are taking a zero-knowledge approach to what the cookie may represent, the first step is to figure out what the cookie, or parts of it, may represent. We begin by defining some sets: © 2005 SPI Dynamics, Inc. All Rights Reserved. No reproduction or redistribution without written permission. 3 Start Secure. Stay Secure.TM Automated Cookie Analysis Set Numbers (n) Hexadecimal (h) Alphabetic (a) Text (t) Delimiters (d) Other Members 0-9 0-9, A-F, a-f A-Z, a-z 0-9, A-Z, a-z !0-9, !A-Z, !a-z everything else For each byte of a cookie, create a list of the smallest set to which the byte belongs. For the cookie value "12345:DBD," the set list would be "nnnnndhhh." The last three characters are members of both hexadecimal and alphabetic sets, but are recorded as hexadecimal because that is the smallest set containing these characters. Processing a second cookie value "12349:SCV" yields "nnnnndaha," and a third "12355:MFA" into "nnnnndahh." Combining the three results gives us "nnnnndaha." 12345:DBD 12349:SCV 12355:MFA nnnnndhhh nnnnndaha nnnnndaha Notice that each of the last three columns of characters (DSM, BCF, DVA) are mixed hexadecimal and alphabetic, though the middle column is all hexadecimal. This introduces the possibility that the middle character is actually an alphabetic character, (and our data set is flawed) or that it is © 2005 SPI Dynamics, Inc. All Rights Reserved. No reproduction or redistribution without written permission. 4 Start Secure. Stay Secure.TM Automated Cookie Analysis really a hexadecimal value and has some special significance. Let's explore the first possibility. Suppose we adjust the character set of the columns to reflect the character types of the columns that surround them. Numeric characters surrounded by more numeric characters are probably numeric values and do not need adjustment. If a few hexadecimal values were scattered among the numeric values, we could consider the whole sequence to be a hexadecimal value and adjust the set accordingly. The last three columns demonstrate this effect, but with hexadecimal and alphabetic types. If we adjust the character set accordingly, the result is "nnnnndaaa." The unknown data has been turned into a number and a text string, separated by a delimiter character. Since it is unlikely that the second part has any hexadecimal properties, text-specific tests can be applied to the second part of the cookie. Randomness Part of the cookie, from the previous example, is a number. What kind of number could it be? From an informational standpoint, there are two types of numbers: random and non-random. Mathematical analysis of numbers can help detect non-random sequences. One such test is Chi-Squared, described by its author as "useful for those evaluating pseudorandom number generators for encryption and statistical sampling applications, compression algorithms, and other applications where the information density of a file is of interest." In other words, this nifty number cruncher gives an indication of non-randomness -- the likelihood that the number computed rather than randomly generated. When a cookie is the result of a computational process, it becomes possible for an outsider to reproduce the process. © 2005 SPI Dynamics, Inc. All Rights Reserved. No reproduction or redistribution without written permission. 5 Start Secure. Stay Secure.TM Automated Cookie Analysis Another test for randomness, or the lack of it, is correlation. A correlation coefficient indicates how well the cookie values follow a line when the value of each cookie is plotted (on the Y axis) against the time the cookie was received (on the X axis). A scattered distribution indicates randomness, whereas a pattern approaching a line indicates predictability. When this occurs, a cookie value for any time in the future can be predicted within a range with statistical certainty. Randomness can also be evaluated by checking character frequencies. Given enough cookies, each different character would have an equal chance of appearing. If a cookie were determined to consist of numeric values only, then each character (0-9) should make up 10 percent of the total number of bytes of all cookies. The expected distribution of characters of the alphabetic set would be approximately 1.9 percent. A deviation from the expected distribution could indicate that only a subset of the character set is used. As an example, consider the range of 1 to 32767. Since 0-9 are in use, a 10 percent distribution would be expected. But if the leftmost column is limited to 1, 2, or 3, there will be somewhat more ones and twos found along with a few more threes, skewing the results. Since a skewed distribution can indicate the range of the values that are in play, a brute-force attack may be a viable alternative because we have some knowledge of the range of valid data. Yet another method is to check the number of bits that change between one cookie and the next. Here again, the character set of the cookie data estimates the level of randomness. Within a single character between one cookie and the next, half of the number of bits needed to represent the character set should change to maximize the number of possible results. When a cookie consists of hexadecimal (0-9, A-F or a-f) characters, four bits © 2005 SPI Dynamics, Inc. All Rights Reserved. No reproduction or redistribution without written permission. 6 Start Secure. Stay Secure.TM Automated Cookie Analysis are needed to represent the character set, so typically two bits should change per character from one cookie to the next. The larger alphabetic set requires approximately 4.7 bits to represent the A-Z range, and 5.7 bits for both upper and lower case sets. Comparing the expected bit changes to the measured bit changes will point out the more static portions of a value. An interesting side effect of tracking the bit changes is the detection of incrementing or decrementing values. For hexadecimal values, from one value to the next, the pattern of bit changes is 1,2,4. The ratio holds even where there are missing values, although more samples are necessary. Encoding A side effect of character frequencies is the detection of base-n encoding. The most common methods are base-16 (hexadecimal), and base-64 (mime). The normal representation of hexadecimal encoding is the familiar 0-9/A-F set, but those characters can be translated into a completely different set of characters (A-P, for instance). To detect the possibility of encoding being used, count the number of unique characters used among all of the cookies. When the count happens to be approximately a power of 2 (8, 16, 32, and so on), some sort of encoding may be in use. It is an approximate estimation because there is the possibility that a character or two is reserved as a delimiter, or a padding character. Setting the decoding matrix becomes a cryptanalysis exercise if the encoding matrix is not known. A simple example of a decoding matrix for the A-P example is to assign a unique four-bit value to each character: 0000 A 0001 B 0010 C ... © 2005 SPI Dynamics, Inc. All Rights Reserved. No reproduction or redistribution without written permission. 7 Start Secure. Stay Secure.TM Automated Cookie Analysis 1110 O 1111 P Each character in a cookie would be replaced by its associated four-bit value, until there are enough bits to form a new value large enough for convenient analysis. Encoding is not limited to powers of two. Perhaps it is simply the 0-9 range expressed as text, or 1-12 (as in months) expressed as unique characters. The possibilities are limited only by the craftiness of the cookie-creating coders. Subcookies When a cookie comprises several independent pieces of data, each individual portion of the cookie can be called a subcookie. The pieces of the subcookie may conveniently have their own names and delimiters, or the separation may be very subtle. When there are names and delimiters, break up the cookies and restart the analysis with each piece. A subtle approach to delimiting cookies is to use a character that does not belong in the encoding matrix, but appears to be part of the character set used by the rest of the cookie. A simple example shows a hexadecimal cookie encoded with the letters from A to P matrix, and uses W for a delimiter. The number of W's would be a multiple of the count of cookies being processed, while the rest of the characters in the set follows the random value character frequencies. The next easiest-to-detect subcookie is the one that contains an abrupt change in the character sets used. The most striking example is a transition of numeric data to text data. Simply cut on the dotted line and analyze the separated pieces. © 2005 SPI Dynamics, Inc. All Rights Reserved. No reproduction or redistribution without written permission. 8 Start Secure. Stay Secure.TM Automated Cookie Analysis Another type of subcookie is manufactured, not found. When the cookies being tested have varying lengths, delimited columns may not line up well enough to stand out. Adding some padding to the cookies and retesting them may expose some interesting aspects. Choosing a padding character is a simple matter of picking a character that is unused in the cookies, or will not affect the value of the cookie. Graphical Analysis It is said that a picture is worth a thousand words, but with cookies, a thousand cookies is worth a picture. The values of cookies, once decoded, can be graphed in several ways that may betray hidden patterns. Graphing the character set of cookies is a visual way of comparing the relative sizes of parts of a cookie, or pointing out the delimiters. The following graph shows a numeric value followed by a text value. © 2005 SPI Dynamics, Inc. All Rights Reserved. No reproduction or redistribution without written permission. 9 Start Secure. Stay Secure.TM Automated Cookie Analysis A graph of the frequencies of characters in the cookies could expose several anomalies. The presence of a particular character relative to the other characters could indicate how the usage of that character differs from the usage of the others. The following graph shows cookies that have an unusual frequency of the letter W: the number of W's is exactly the same as the number of cookies. © 2005 SPI Dynamics, Inc. All Rights Reserved. No reproduction or redistribution without written permission. 10 Start Secure. Stay Secure.TM Automated Cookie Analysis © 2005 SPI Dynamics, Inc. All Rights Reserved. No reproduction or redistribution without written permission. 11 Start Secure. Stay Secure.TM Automated Cookie Analysis Before exploring the implications of this information, let's look at another graph. Considering the position of the steep dip in the previous graph (no bit changes) and that the frequency graph shows that the number of W characters is equal to the number of cookies, it may be that this column has some special meaning within the cookie. Since it does not change, perhaps it is a boundary of some kind. A spike like this would be a good candidate to be treated as a boundary between two subcookie values. Consider the two seven-bit ASCII characters J and 5. From one character to the next, all seven bits are different, but to change all seven bits again, you could only alternate between the two characters. Though many bits change, the result is far from random. The maximum randomness using a seven-bit value would be found when an average of 3.5 bits change from one value to © 2005 SPI Dynamics, Inc. All Rights Reserved. No reproduction or redistribution without written permission. 12 Start Secure. Stay Secure.TM Automated Cookie Analysis the next. A hexadecimal value, represented by just four bits, would show an average change of two bits from one character to the next. The graph of bit changes shows a quickly recognizable curve when the cookie values increment somewhat regularly. © 2005 SPI Dynamics, Inc. All Rights Reserved. No reproduction or redistribution without written permission. 13 Start Secure. Stay Secure.TM Automated Cookie Analysis A simple "value versus time" graph plainly shows an incrementing cookie value as a line, or something that approximates a line. The correlation value indicates how well the cookie values follow the line. This graph shows a cookie that is very likely an incrementing count. The jump near the middle was caused by pausing the cookie collection for a short time, though the process generating the values did not pause. © 2005 SPI Dynamics, Inc. All Rights Reserved. No reproduction or redistribution without written permission. 14 Start Secure. Stay Secure.TM Automated Cookie Analysis © 2005 SPI Dynamics, Inc. All Rights Reserved. No reproduction or redistribution without written permission. 15 Start Secure. Stay Secure.TM Automated Cookie Analysis The following graph shows the presence of multiple cookie sources. At first glance, this set of cookies looks quite random, but there appear to be three separate bands of values. The strategy here is to separate the cookies into three groups, according their relative location on the graph, and test each group independently. Testing only the cookies found across the lowest band of the graph shows a nicely sequential set of cookies where there appeared to be chaos. © 2005 SPI Dynamics, Inc. All Rights Reserved. No reproduction or redistribution without written permission. 16 Start Secure. Stay Secure.TM Automated Cookie Analysis Separating and graphing the other two groups of cookies results in two more similar graphs. This result seems to indicate that there are three independent cookie sources. Perhaps there are three servers behind a load-balancing system. Randomness shows itself nicely in a disk plot. Here a cookie's value is plotted against the sine and cosine functions. When random data is plotted, the points are evenly distributed around the plotting area. © 2005 SPI Dynamics, Inc. All Rights Reserved. No reproduction or redistribution without written permission. 17 Start Secure. Stay Secure.TM Automated Cookie Analysis Misuse of a random-number generator or a weak random-number source will draw a visually interesting picture, as shown in the following graph. © 2005 SPI Dynamics, Inc. All Rights Reserved. No reproduction or redistribution without written permission. 18 Start Secure. Stay Secure.TM Automated Cookie Analysis The values have separated into discrete bands because of the limited range of values that result when a smaller value is multiplied with a larger value. Some systems use a decimal-based random-number generator where the result is in the range of 0 to 1 (such as 0.705152). If this number were multiplied by 1,000,000,000 the result would be 705,152,000. Notice how the last digits are 000, because the multiplicand has just six digits, while the multiplier has nine. Since the random source has just six digits, the result will always end with 000, so all values generated this way will fall into ranges separated by approximately 1,000. An attacker spotting this pattern can reduce the number of trials needed for a brute-force attack by simply skipping every one thousandth cookie. © 2005 SPI Dynamics, Inc. All Rights Reserved. No reproduction or redistribution without written permission. 19 Start Secure. Stay Secure.TM Automated Cookie Analysis A simple incrementing value becomes an interesting helix when graphed as a disk plot. Prediction In those rare cases where the correlation approaches 1, or Chi is favorable and the cookies are predictable, run the numbers. Sometimes, it is a simple case of linear regression where the result is accompanied by a range of possible values. At other times, you may need to call up your college roommate who now works at the NSA. © 2005 SPI Dynamics, Inc. All Rights Reserved. No reproduction or redistribution without written permission. 20 Start Secure. Stay Secure.TM Automated Cookie Analysis Once you have settled on a value, or range of possible cookie values, use a tool such as SPI Dynamics' HTTP Editor to insert your calculated cookie into the request in place of the one assigned by the server. If you have a long history of winning lotteries and manage to pick the right cookie value, you will become (to the web server) whoever happened to get the same cookie (through proper channels). Summary Cookie analysis and prediction is an intensive application of mathematics, information theory, and even cryptanalysis. Though cookies are hidden from most users, and can be obfuscated, methodical analysis can expose a pattern that can be useful to an intruder who can crunch the numbers. While the techniques covered here lend themselves to unattended operation, adding a little interaction and ingenuity will allow a determined intruder another avenue of attack. References http://mathworld.wolfram.com/ http://wwww.fourmilab.ch/ Disclaimer The cookie values to generate the graphs displayed were mostly generated by software to clearly illustrate the concept discussed. However, all of the concepts have been observed in use by various real websites around the Internet. The Business Case for Application Security Whether a security breach is made public or confined internally, the fact that a hacker has accessed your sensitive data should be a huge concern to your company, your shareholders and, most importantly, your customers. SPI © 2005 SPI Dynamics, Inc. All Rights Reserved. No reproduction or redistribution without written permission. 21 Start Secure. Stay Secure.TM Automated Cookie Analysis Dynamics has found that the majority of companies that are vigilant and proactive in their approach to application security are better protected. In the long run, these companies enjoy a higher return on investment for their ebusiness ventures. About SPI Labs SPI Labs is the dedicated application security research and testing team of SPI Dynamics. Composed of some of the industry's top security experts, SPI Labs is focused specifically on researching security vulnerabilities at the web application layer. The SPI Labs mission is to provide objective research to the security community and all organizations concerned with their security practices. SPI Dynamics uses direct research from SPI Labs to provide daily updates to WebInspect, the leading Web application security assessment software. SPI Labs engineers comply with the standards proposed by the Internet Engineering Task Force (IETF) for responsible security vulnerability disclosure. SPI Labs policies and procedures for disclosure are outlined on the SPI Dynamics web site at: http://www.spidynamics.com/spilabs.html. About SPI Dynamics SPI Dynamics, the expert in web application security assessment, provides software and services to help enterprises protect against the loss of confidential data through the web application layer. The company's flagship product line, WebInspect, assesses the security of an organization's applications and web services, the most vulnerable yet least secure IT infrastructure component. Since its inception, SPI Dynamics has focused exclusively on web application security. SPI Labs, the internal research group of SPI Dynamics, is recognized as the industry's foremost authority in this area. © 2005 SPI Dynamics, Inc. All Rights Reserved. No reproduction or redistribution without written permission. 22 Start Secure. Stay Secure.TM Automated Cookie Analysis Software developers, quality assurance professionals, corporate security auditors and security practitioners use WebInspect products throughout the application lifecycle to identify security vulnerabilities that would otherwise go undetected by traditional measures. The security assurance provided by WebInspect helps Fortune 500 companies and organizations in regulated industries -- including financial services, health care and government -- protect their sensitive data and comply with legal mandates and regulations regarding privacy and information security. SPI Dynamics is privately held with headquarters in Atlanta, Georgia. About the WebInspect Product Line The WebInspect product line ensures the security of your entire network with intuitive, intelligent, and accurate processes that dynamically scan standard and proprietary web applications to identify known and unidentified application vulnerabilities. WebInspect products provide a new level of protection for your critical business information. With WebInspect products, you find and correct vulnerabilities at their source, before attackers can exploit them. Whether you are an application developer, security auditor, QA professional or security consultant, WebInspect provides the tools you need to ensure the security of your web applications through a powerful combination of unique Adaptive-AgentTM technology and SPI Dynamics' industry-leading and continuously updated vulnerability database, SecureBaseTM. Through Adaptive-Agent technology, you can quickly and accurately assess the security of your web content, regardless of your environment. WebInspect enables users to perform security assessments for any web application, including these industry-leading application platforms: © 2005 SPI Dynamics, Inc. All Rights Reserved. No reproduction or redistribution without written permission. 23 Start Secure. Stay Secure.TM Automated Cookie Analysis Macromedia ColdFusion Lotus Domino Oracle Application Server Macromedia JRun BEA Weblogic Jakarta Tomcat About the Author Darrin Barrall is a researcher at large and special projects software developer with the SPI Labs division of SPI Dynamics. He may be reached via e-mail at dbarrall@spidynamics.com. Contact Information SPI Dynamics 115 Perimeter Center Place Suite 1100 Atlanta, GA 30346 Telephone: (678) 781-4800 Fax: (678) 781-4850 Email: info@spidynamics.com Web: www.spidynamics.com © 2005 SPI Dynamics, Inc. All Rights Reserved. No reproduction or redistribution without written permission. 24