Good heavens. Another fun afternoon of coding. I was going back to refactor my application's digest email parsing routine and make it use a more generic regular expression to find where one email began and another ended. Here's a quick word to the wise: "Not all regexp packages are the same."

I was originally using a fairly old regexp package from Apache's Jakarta project (the people who brought you Tomcat). When I tried a few nice regexps that included new line characters - my original logic embarrassingly took things one line at a time - I started getting stack overflow errors. Bad news.

Java 1.4 has its own regexp package now, java.util.regex, and even though 1.4 is not officially supported on Mac OS X just yet and will never be supported on Mac Classic - not to mention the headaches making sure someone's installed 1.4 and not, say, 1.2 on their Linux or Windows box - I thought I'd give it a whirl. Welp, the fine folk at Sun have apparently put on the blinders again. The regex package's newline is a UNIX newline any way you slice it. I was going to start substituting (\n|\r|\n\r) for each newline, but since I couldn't quite recall off the top of my head if that's exactly what I wanted and because 1.4 is hardly widespread, I decided to go looking for yet another regexp package. This, of course, is the smart thing to do. I almost stopped right then and there and wrote a custom method that just read bytes and got what I wanted, but I figured that would make for some particularly wack code.

Back to Jakarta, it turns out. The ORO project is yet another regexp package supported by the Jakarta project. There's a cop out on the site that they keep both because "Options are a good thing and we will leave which one you choose to use up to you". Sheesh. If there's something useful in one and not the other, either roll the code together or at least tell us why we'd want to use one over the other.

Anyhow, the ORO project seems to have a pretty good regexp package, even better than the one that comes with Sun's JVM in 1.4.1. Kinda sad. It's a real daunting task to weed through useful code these days. Though the parallel isn't perfect, it is nice to have Microsoft tested code behind many standard Visual Basic widgets. As a whole, I've found these to be more reliable than what I manage to dig up in the open source (but not GPL) world.

And as a final random blurb, having two screens is much much better than one. If you're a programmer and have never tried using multiple monitors, do. Having email, IMs, and "research" URLs open on another screen away from your IDE and "real work" makes for a much more productive environment. You can get cheap CRTs for about $100 these days, and even 1024x768 makes for some good real estate for your "support apps".