The "protocol violation" with the misleading error message

Before reading this, refresh your memory of Joel Spolsky's wonderful post on the Law of Leaky Abstractions.

After determining it was possible to write a replacement HttpClient for WP8.1 Silverlight, I wanted to go a bit deeper and propose an actual fix for the devs at Mashape. Although submitting the error message (The server committed a protocol violation. Section=ResponseHeader Detail=CR must be followed by LF) seemed sufficient, given that the problem persisted a month later I wondered if something was wrong with the .Net framework (hah!)

Strings, Bytes, and CRLF

The relevant code in my "custom" HTTP client that read data from the StreamSocket.InputStream (via a DataReader) looked like this (simplified):

await reader.LoadAsync(10240);
data += 
  reader.ReadString(reader.UnconsumedBufferLength);

Viewing the value in the debugger or outputting it to the debug window didn't show any obvious missing line-feeds, but perhaps the reader was doing something weird.

Swapping the code to

await reader.LoadAsync(bytesToRead);
var buffer = new byte[reader.UnconsumedBufferLength];
reader.ReadBytes(buffer);

also showed that everything was as it should be. Every single line in the response header ended in 0D0A.

Just in case the reader was doing something weird, I double-checked in Fiddler. Sure enough, there were only 0D0A pairs.

Huh. I figured at this point it was time to go step through the Framework code.

ParseHeadersStrict

I knew from reading Mehdi's extremely in-depth article on the same problem that problem probably stemmed from somewhere in ParseHeadersStrict, although it took me a while to get there. Given that I had verified from two different sources that every header line did end in CRLF, I thought perhaps something else in the Framework was acting up. As it turns out, it was exactly where I should have looked in the first place.

Here's the relevant code from WebHeaderCollection.cs. At this point we are looping through a byte[] buffer containing the response.

// Find the header name; only regular characters allowed.
int iBeginName = i;
for (; i < effectiveSize && 
      (ch = 
         byteBuffer[i] > 127 ? 
           RfcChar.High :
           RfcCharMap[byteBuffer[i]])
       == RfcChar.Reg; 
       i++);    

Gotta love the ternary operator and assignment used for the comparison inside the condition section of a for loop :) Basically, this is looping through each line of the response header looking for lines like

Content-Type: application/json; charset=UTF-8

and ensuring they're valid. The above loop is ensuring that the header name (before the colon) doesn't have any weird characters in it.

The next portion of code ensures that we read the colon and header value and Most Importantly that the line ends with CRLF. Given how many times I had seen the error message CR must be followed by LF, I was quite curious to see how this was going to fail.

// Read to a colon.
int iEndName = i - 1;
int crlf = 0;  // 1 = cr, 2 = crlf
for (; i < effectiveSize && 
      (ch = 
         byteBuffer[i] > 127 ? 
           RfcChar.High : 
           RfcCharMap[byteBuffer[i]]) 
      != RfcChar.Colon; 
      i++)
{
  switch (ch)
  {
    case RfcChar.WS:
    if (crlf == 1)
    {
      break;
    }
    crlf = 0;
    continue;

    case RfcChar.CR:
    if (crlf == 0)
    {
      crlf = 1;
      continue;
    }
    break;

    case RfcChar.LF:
    if (crlf == 1)
    {
      crlf = 2;
      continue;
    }
    break;
  }
  parseStatus = DataParseStatus.Invalid;
  parseErrorCode = WebParseErrorCode.CrLfError;
  goto quit;
}

And yet it blew up. Again. Even though I could see in the VS Memory window that every single 0D was followed by OA.

After a round of tracing, I noticed that the header was mostly being parsed. And then it jumped out at me. The line it was failing on looked like this:

X-RateLimit-API Requests-Limit: 7500

and there was definitely something different about that header name compared to all the others.

The Devil, the details, and RFCs

Those of you who have worked with HTTP at this level will immediately know the issue here, but I'm brand new to it so off to read the spec I went. It didn't take long to discover these very pertinent sections:

Message Headers

message-header = field-name ":" [ field-value ]
field-name     = token

Augmented BNF

SP             = <US-ASCII SP, space (32)>

token          = 1*<any CHAR except CTLs or separators>
separators     = "(" | ")" | "<" | ">" | "@"
                  | "," | ";" | ":" | "\" | <">
                  | "/" | "[" | "]" | "?" | "="
                  | "{" | "}" | SP | HT

I can't highlight code, but the point is that you can't have a space in the header name. Very simple. And although Microsoft was correct to disallow this in their strict interpretation of RFC2616, it would have been nicer to have a different error message for this scenario instead of:

The server committed a protocol violation. Section=ResponseHeader Detail=CR must be followed by LF`

I sent all this info off to Mashape this evening and am hoping they can fix it soon.

Update: Mashape fixed this issue on August 21.

comments powered by Disqus