Consecutive numbers considered harmful: Compaq rebate tracking exposes customer names and addresses by Andrew Schulman undoc@sonic.net http://www.undoc.com September 25, 2000 =================== Compaq's rebate tracking service at www.web-rebates.com is exposing tens of thousands of customer names and addresses. The Compaq rebate system uses an insecure method of creating customer ID numbers. This reflects a more general problem at many web sites; something as simple as the way customers are assigned numbers can result in privacy problems. A Compaq Presario 1200 laptop that I recently bought came with an offer of a $100 "Portable Power Rebate," for which consumers can apply online: http://www.web-rebates.com/compaq/ppr2c00 After you apply with your name, address, phone number, email address (and indicate if it's ok for them and their partners to send you email), along with the laptop's serial number, they display a barcoded form that you mail in, and they email you a confirmation. In the email Compaq gives you a URL you can use to check the status of your rebate. For example, I received email with something like the following: You can track your rebate status online by going directly to http://www.web-rebates.com/compaq/status/results.asp?o=ppr&i=98765 [Actual number is 41231; another one is 44670] Clicking on this link takes you to a web page that tells you your own name and address, with a note basically to the effect that the check's in the mail. Like so many features that are easy to implement, this one's not of much use, really. Of course, if you forwarded the Compaq confirmation email to someone else, they could click on the link to see your name and address, and could infer that you recently bought something from Compaq. Big deal. Now, notice the five-digit number on the end of this URL. I've changed it, because if I revealed the correct one, I would effectively be giving out my address. But more important, I've changed it because, if I revealed the actual five-digit number for tracking my rebate, I would also be making it trivially easy for you to see the names and addresses of lots of other people who recent purchased Compaq laptops. How? Just change a digit in the URL. If 98765 were valid, 98764 would be too. Yep, Compaq just assigns each customer a consecutive five-digit number, and tacks it on the end of the URL. This is an open invitation to change the number and see who else recently bought a Compaq computer. And it would be trivial to write a script that worked backward from a known valid number, sucking up a whole list of names and addresses. Compaq is exposing the names and addresses of its customers -- at least, the ones who apply for rebates. Does it really want HP to have its customer list? Wouldn't such a list be a useful part of some con? It's important to note that Compaq is not exposing phone numbers, credit cards numbers, email addresses, or the serial number of the computer. Furthermore, names and addresses are already public information (see, for example, www.555-1212.com, which even offers reverse lookup "to find all residents on the street"). But some people prefer post-office boxes, and Compaq's rebate program *requires* a street address, so the customer information it has exposed often includes both someone's PO Box and their street address -- and this association certainly isn't public. That someone recently bought a Compaq computer shouldn't be public either. Indeed, Compaq's privacy statement declares that the company will "seek to take appropriate steps to protect the information you share with us..." Actually, it's not so much Compaq that's exposing customer information, as the company that handles Compaq's online rebates. If you go to http://www.web-rebates.com, you're automatically taken to www.gfsinc.com (Global Fulfillment Services, based in Arizona). An email to webmaster@gfsinc.com, notifying them of the Compaq rebate tracking problem, has gone unanswered. [They did eventually answer, and fixed the problem.] Since the company offers general rebate handling services, perhaps other companies' customer lists are also being exposed. GFS's "About Us" page says they have "processed over 10,000 promotions totaling over $200,000,000 in billings." [Looks like Kensington also uses GFS for rebates?] In addition to the "Portable Power Rebate," Compaq has other rebate programs, covering products purchased since June 2000 (see http://athome.compaq.com/showroom/static/promo/index.asp, including ones with the usual $400 for locking into MSN for three years -- but these don't go through web-rebates.com). I did not have a valid tracking number for these other rebate programs, but I was able to guess one for the "Grand Slam" rebate (which offers up to $270 back on Compaq PCs, monitors, and printers). From a quick check, it appears that the "Grand Slam" rebate tracking alone is exposing tens of thousands of Compaq customer names and addresses. [http://www.web-rebates.com/compaq/status/results.asp?o=gs&recnum=1&i=71811] [I wrote a script, tried all consecutive numbers 41711-41811, got only 13 errors, the rest were names and addresses.] By using only five digits for the tracking number, Compaq and GSF are making it easy to guess tracking numbers even if you don't start with a known valid one. This is a very common problem. From press reports, it appears that until recently customer data was similarly exposed at Eve.com (CNET, Sept. 13, 2000) and Netmarket (CNET, May 10, 2000). According to CNET's report on the Netmarket problem, an engineer "noticed that his order number was incorporated into the URL. By replacing his order number with random numbers, he obtained access to pages revealing information about other customers." [Netmarket and web-rebates.com both running Microsoft IIS 4.0; anything in IIS (ASP etc.?) that facilitates this sort of numbering scheme? But eve.com running Netscape Enterprise 4.1 so probably not. Still, worth asking if there is some "canned" code that these sites are running.] This same problem keeps appearing because, in the rush to go online, and in a desire to not inconvenience customers, web developers are neglecting a few simple guidelines: * Don't use consecutive numbers * Don't use a small number of digits * Given one number, make it difficult to arrive at other valid numbers * Don't make it easy to "forge" customer numbers or order numbers These are really all just variations of a single rule. Basically, you want a sparse, random address space. You want to make it difficult to leverage one (possibly luckily-guessed) number to uncover the whole numbering scheme, and hence the whole customer list. Think of it as Russian Roulette: don't put bullets in every chamber. Note that incrementing the numbers by some fixed amount other than 1 is equivalent to, and just as bad as, consecutive numbering. For example, from cursory examination of the US Postal Service's "Delivery Confirmation" service, it appears that subtracting 7 from a valid delivery confirmation number results in another valid number. Similarly (and surprisingly), Fedex tracking numbers do not appear to be randomized. For example, if a tracking number ending in 410 is valid, then ones ending in 421, 432, etc. (incrementing by 11) are also valid. Fortunately, Fedex only discloses the pick-up and drop-off cites, and the name of whoever signed for the package (though you can also request a photo of the signature). Sites that ignore the need for a sparse, random address space compound the problem when they put the numbers in URLs, and allow submission via GET rather than POST. With safe numbering, POST is unnecessary, but with unsafe numbering, GET just compounds the problem by openly inviting someone to change a digit or two in the URL to see what will happen. The solutions are relatively simple. The following is ordered from most convenient to least: * Include at least two randomly-generated digits in any customer number or order number (one is insufficient, because it would mean that 1 in 10 guesses would still work). [Need to suggest that use a number of at least x digits, and at least y of these be random.] [Also, perhaps suggest adding a one-second delay to inhibit random probing of the address space?] * Require one additional piece of information that the customer would easily know, such as the customer's postal code or phone number. For example, a US ZIP code in effect adds five additional digits to the number. Of course, customers will wonder, "Why should I enter my ZIP code? You know that already!" For example, Fedex package tracking *used to* require entering the zip code of the destination as well as the tracking number; this was of course an added inconvenience. Compaq currently requires a tracking number *or* the corresponding phone number. If it required *both*, everything would be okay. [This is exactly what GSF ended up doing.] * At the risk of inconveniencing customers, require a password-based login. (See, for example, half.com, which uses consecutively-numbered transaction IDs, but which requires a login name and password to view the transaction.) Note that a checksum digit is inadequate for the same reason that a single random digit is, and because the formula the web developer used to generate the checksum can probably be easily derived. Interestingly, the bad-numbering-scheme problem was referred to back in 1997 in an anonymous "Risks" article that told the tale of a businessman who believed "his competitors are using the UPS tracking system to uncover his customers. Despite not being a computer person, he deciphered the simple checksum system after some day's thinking about it, and showed me how he is able to fetch a random package destination of his competitor's in under a minute of effort" (Risks #19.2, April 1, 1997). But April Fool's tales aside, consecutive (and other bad) numbering schemes really are a problem. As long as web developers continue to vote for expediency and excessive customer convenience, they will not only expose their customer's names and addresses and purchases -- they will also expose themselves to an unwanted privacy spotlight. [They fixed it by requiring both tracking number and postal code. In effect, this adds five non-consecutive digits to the tracking number. How does it handle addresses outside USA?] [Similar problem found at Buy.com site, managed with UPS: http://news.cnet.com/news/0-1007-200-3177584.html?tt.abc.newsfeed.body]