Put the knife down and take a green herb, dude. (c) Ruffin Bailey 2001-2021

When you write pretty much any code dealing with text, you get used to viewing raw bytes. Probably the most annoying part of it all, once you get your head around Unicode, is dealing with newlines in a tolerant, cross-platform fashion.

From today's Wikipedia:

LF:    Multics, Unix and Unix-like systems (Linux, OS X, FreeBSD, AIX, Xenix, etc.), BeOS, Amiga, RISC OS, and others^[1]

CR:    Commodore 8-bit machines, Acorn BBC, ZX Spectrum, TRS-80, Apple II family, Oberon, Mac OS up to version 9, MIT Lisp Machine and OS-9

RS:    QNX pre-POSIX implementation

0x9B: Atari 8-bit machines using ATASCII variant of ASCII (155 in decimal)

CR+LF: Microsoft Windows, DOS (MS-DOS, PC DOS, etc.), DEC TOPS-10, RT-11, CP/M, MP/M, Atari TOS, OS/2, Symbian OS, Palm OS, Amstrad CPC, and most other early non-Unix and non-IBM OSes

LF+CR: Acorn BBC and RISC OS spooled text output.

0x9B? Good heavens. And if "OverDose On Arrival" isn't stuck in your head somewhere, you haven't really parsed text yet in your programming career. To these, you might also add 0D0D0A from an apparent Notepad bug.

The most annoying part is probably that you can't easily treat newline characters as any fixed number of bytes when you're dealing with crossplatform text. And you should probably always assume you're dealing with crossplatform text.

So when I'm breaking things down into informal test cases, I often shove the parsing code I'm writing into a testing console app project to see what's going on. Why I never did the below before when dealing with newlines, I have no idea.

public static void NLWriteLine(string toWrite, bool bookend = true)
{
    if (bookend)
    {
        toWrite = "#" + toWrite + "#";
    }

    Console.WriteLine(toWrite.Replace("\r", "\\r\r").Replace("\n", "\n\\n") + "\n\n");
}

If I ignore Acorn BBC and RISC OS (...and done), every \n or LF either comes after other NL bytes or by itself. Put in the text version of it afterwards, duh. Vice versa the \r.

So much nicer to have...

#
\nstring with some\r
\nnewlines#

... rather than trying to figure out if a cut caught the CR previous to that \n by printing int values or something similarly overcomplicated.

Labels: c#, coding, newline, text editors

title: Put the knife down and take a green herb, dude.	descrip: One feller's views on the state of everyday computer science & its application (and now, OTHER STUFF) who isn't rich enough to shell out for www.myfreakinfirst-andlast-name.com Using 89% of the same design the blog had in 2001.
FOR ENTERTAINMENT PURPOSES ONLY!!! Back-up your data and, when you bike, always wear white. As an Amazon Associate, I earn from qualifying purchases. Affiliate links in green.
x MarkUpDown is the best Markdown editor for professionals on Windows 10. It includes two-pane live preview, in-app uploads to imgur for image hosting, and MultiMarkdown table support. Features you won't find anywhere else include... MarkUpDown Multiline Table & Bootstrap Grid support. Beautiful Easy Actions that keep the Markdown flowing. HTML paste to paste HTML source into your documents. You've wasted more than $15 of your time looking for a great Markdown editor. Stop looking. MarkUpDown is the app you're looking for. Learn more or head over to the 'Store now!

Friday, August 12, 2016
Visualizing newlines in C# When you write pretty much any code dealing with text, you get used to viewing raw bytes. Probably the most annoying part of it all, once you get your head around Unicode, is dealing with newlines in a tolerant, cross-platform fashion. From today's Wikipedia: LF: Multics, Unix and Unix-like systems (Linux, OS X, FreeBSD, AIX, Xenix, etc.), BeOS, Amiga, RISC OS, and others^[1] CR: Commodore 8-bit machines, Acorn BBC, ZX Spectrum, TRS-80, Apple II family, Oberon, Mac OS up to version 9, MIT Lisp Machine and OS-9 RS: QNX pre-POSIX implementation 0x9B: Atari 8-bit machines using ATASCII variant of ASCII (155 in decimal) CR+LF: Microsoft Windows, DOS (MS-DOS, PC DOS, etc.), DEC TOPS-10, RT-11, CP/M, MP/M, Atari TOS, OS/2, Symbian OS, Palm OS, Amstrad CPC, and most other early non-Unix and non-IBM OSes LF+CR: Acorn BBC and RISC OS spooled text output. `0x9B`? Good heavens. And if "OverDose On Arrival" isn't stuck in your head somewhere, you haven't really parsed text yet in your programming career. To these, you might also add `0D0D0A` from an apparent Notepad bug. The most annoying part is probably that you can't easily treat newline characters as any fixed number of bytes when you're dealing with crossplatform text. And you should probably always assume you're dealing with crossplatform text. So when I'm breaking things down into informal test cases, I often shove the parsing code I'm writing into a testing console app project to see what's going on. Why I never did the below before when dealing with newlines, I have no idea. `public static void NLWriteLine(string toWrite, bool bookend = true) { if (bookend) { toWrite = "#" + toWrite + "#"; } Console.WriteLine(toWrite.Replace("\r", "\\r\r").Replace("\n", "\n\\n") + "\n\n"); }` If I ignore Acorn BBC and RISC OS (...and done), every `\n` or `LF` either comes after other NL bytes or by itself. Put in the text version of it afterwards, duh. Vice versa the `\r`. So much nicer to have... `# \nstring with some\r \nnewlines#` ... rather than trying to figure out if a cut caught the `CR` previous to that `\n` by printing int values or something similarly overcomplicated. Labels: c#, coding, newline, text editors posted by ruffin at 8/12/2016 01:05:00 PM

<< Older \| Newer >>