When you write pretty much any code dealing with text, you get used to viewing raw bytes. Probably the most annoying part of it all, once you get your head around Unicode, is dealing with newlines in a tolerant, cross-platform fashion.

From today's Wikipedia:

0x9B? Good heavens. And if "OverDose On Arrival" isn't stuck in your head somewhere, you haven't really parsed text yet in your programming career. To these, you might also add 0D0D0A from an apparent Notepad bug.

The most annoying part is probably that you can't easily treat newline characters as any fixed number of bytes when you're dealing with crossplatform text. And you should probably always assume you're dealing with crossplatform text.


So when I'm breaking things down into informal test cases, I often shove the parsing code I'm writing into a testing console app project to see what's going on. Why I never did the below before when dealing with newlines, I have no idea.

public static void NLWriteLine(string toWrite, bool bookend = true)
{
    if (bookend)
    {
        toWrite = "#" + toWrite + "#";
    }

    Console.WriteLine(toWrite.Replace("\r", "\\r\r").Replace("\n", "\n\\n") + "\n\n");
}

If I ignore Acorn BBC and RISC OS (...and done), every \n or LF either comes after other NL bytes or by itself. Put in the text version of it afterwards, duh. Vice versa the \r.

So much nicer to have...

#
\nstring with some\r
\nnewlines#

... rather than trying to figure out if a cut caught the CR previous to that \n by printing int values or something similarly overcomplicated.

Labels: , , ,