Content-Length (Mostly) Does Not Matter: The Reverse Bob Barker Rule
Posted by Larry Karnowski Fri, 23 May 2008 15:30:00 GMT
Your webapp user needs to download a file, but you don't know the exact size of this file ahead of time. So what do you send as the content-length? Is an incorrect content-length okay? Do you even need a content-length? The Internet is pretty silent on the subject.
For example, my webapp users want to download several files as one zip file, but I don't want to create the zip file ahead of time just to know the content-length. I'd rather create the zip file on the fly and stream it to the user a chunk at a time, as it's created. This saves me time, process memory, code to clean up temporary zip files, etc. However, the problem here is -- how big will this zip file be? What do I send as the content-length?
More importantly, what does content length really do? Is it important? What happens if you don't set it?
Effects of Content Length
- The web browser will NOT read more bytes than the content-length value. If you accidentally set your content-length too low, the download is halted prematurely and the downloaded file will most likely be corrupt. Do NOT set it too low. This is the only case where content-length can really hurt you.
- If the content-length is set the web browser will show a progress bar while downloading. This is a very important usability feature for medium and large files, and you really want it. You want your user to know how far along they are, so they don't cancel the download and start it over, or worse just abandon your site.
- If the content-length is not set, then the user gets an "Unknown Size" message while downloading, and they won't get a progress bar. Avoid this if possible, but it is sometimes okay for very small files that get downloaded so fast the user won't care they didn't see a progress bar. (Still, you want to avoid this.)
- If you set the content-length to zero (usually only by accident), then your web browser will usually gleefully say they downloaded the file successfully, and save a zero-length empty file. Do not set the content length to zero.
The Reverse Bob Barker Rule
That first effect is the most important: Don't set the content-length too low, or your users will suffer corrupted files. So this brings us to the reverse-Bob-Barker rule.
"Get as close to the real content-length as possible without going UNDER."
Why Not Go WAY Over?
Okay, so we know not to go under, but why not just go way over? Why not, for example, just double our estimated size? Well, nothing terrible will happen. The browser will keep trying to read content, and when the server says "no more," the browser will safely finish the file and say done, even if it thought there was more content coming.
The only negative is that you want the user to get the best download experience possible. So if you guessed your content-length to be 100 MB, but the real download is actually 50 MB, the user will see a progress bar showing they are 50% of the way done, with probably another minute to download, and then suddenly the browser goes from 50% to 100%, from one minute left to done! The user will be confused.
We call this the "Goldilocks Effect" in usability. In this case, the porridge was too hot -- the user will be left wondering why the download dramatically sped up at the last minute. Did the file download successfully? Did the download really fail but the browser didn't tell us? Although nothing bad actually happened, the user is still left with a sense of unease. We definitely want to avoid this.
How Close is Close Enough?
So, all this boils down to:
If you don't know the exact content-length ahead of time, determine a good cheating algorithm and pad for safety.
But how much padding do you need? How much is enough? What's too much? My experience is this:
Aim for a padding of 1%.
Here's my scale:
- Padding over 10% is so inaccurate that the user won't believe the progress bar. "We jumped from 90% to done immediately? Is this thing broken? Is my file corrupt?"
- Padding of between 3% to 10% is less fantastic, but still pretty bad.
- Padding of between 1% and 3% is the best you can expect in most cases. Depending on the size of the file, the download percentage usually jumps in 2-5% increments anyway. This is only unacceptable for the largest of files, where web browsers are showing download percentages in 1% increments. (And even there this is sometimes the best you can do.)
- Padding under 1% is a bit dangerous unless you have a lot of confidence in your guessing algorithm. Remember, the only way to really lose here is to guess under the real content-length.
So this brings us to 1% padding. In most cases, this should be fairly easy to predict, and the web browser will show a 99% to 100% download progress, which is what the user wants to see anyway.
Where Does This Work?
I've tested this "extra content-length" technique on the following platforms, all with success:
- Internet Explorer 6 on Windows XP (on VMWare on my Mac)
- Internet Explorer 7 on Windows XP (on VMWare on my Mac)
- Firefox 2 on Macintosh
- Firefox 3 RC1 on Macintosh
- Safari 3.1.1