MacBook, defective by design banner

Put the knife down and take a green herb, dude.

One feller's views on the state of everyday computer science & its application (and now, OTHER STUFF) who isn't rich enough to shell out for

FOR ENTERTAINMENT PURPOSES ONLY!!! Back-up your data and always wear white.
URLs I want to remember:
* Atari 2600 programming on your Mac
* joel on software (tip pt)
* Professional links: resume, github, paltry StackOverflow * Regular Expression Introduction (copy)
* The hex editor whose name I forget
* JSONLint to pretty-ify JSON
* Using CommonDialog in VB 6 * Free zip utils
* that hardware vendor review site I forget about is here * Javascript 1.5 ref
* Read the bits about the zone * Find column in sql server db by name
* Giant ASCII Textifier in Stick Figures (in Ivrit) * Quick intro to Javascript
* Don't [over-]sweat "micro-optimization" * Parsing str's in VB6
* .ToString("yyyy-MM-dd HH:mm:ss.fff", CultureInfo.InvariantCulture); (src)
email if ya gotta, RSS if ya wanna RSS, ¢ if you're keypadless
Wednesday, April 01, 2015

More fun thinking aloud about MVC architectures.

After reading [David Hansson on "Russian doll caching"](, I think I'm coming around on why you'd use entities to put together piecemeal views, though I'm not sure I'm buying yet.

I'll have to find the post again, but there was one in the links I put up yesterday that said that full page caching was caching's holy grail. Compare the full page mentality to how Hansson describes the issues of caching at serious scale:

> This Russian doll approach to caching means that even when content
> changes, you're not going to throw out the entire cache. Only the bits
> you need to and then you reuse the rest of the caches that are still
> good.

This implicitly means that you're going to have extra cost piecing together every page, even if you're just stitching cached content, and the pseudo-formula to compare to C<strike>R</strike>UD is pretty easy to stub out. If the cost of rebuilding every cache that depends on some reusable subset of those cached views' information is greater than the cost of piecing together pages from incomplete/non-monolithic cache objects on each request, then you go with the [actually fairly conventional] "Russian doll" approach.

And this largely depends on how many of your widgets appear on more than one page where some [other] subset of the content churns.

> The only way we can get complex pages to take less than 50ms is to make
> liberal use of caching. We went about forty miles north of liberal and
> ended up with THE MAX. Every stand-alone piece of content is cached in
> Basecamp Next. The todo item, the todo lists, the block of todo lists,
> and the project page that includes all of it.
> To improve the likelihood that you're always going to hit a warm cache,
> we're reusing the cached pieces all over the place. There's one
> canonical template for each piece of data and we reuse that template in
> every spot that piece of data could appear.

Still, it's easy enough to conceive of each of these reusable chunks as embedded views, and then you're back to where you started. Pages might be Russian dolls of views (though that's the wrong metaphor beyond expressing the Herbertian concept of "views within views within views". Once you understand views can be made up of views can be made up of views, *ad infinitum*, you then have to remember that any number of "dolls" can live at any level, rather than the Russian dolls' one-within-one-within-one. Perhaps your main view has five "dolls" inside of it, and those have 2, 3, 0, 1, and 0 dolls inside of them, respectively, and those have...), but then so what?

If you get to the point that one of your embedded views only takes data from one table, great. I guess the only way this is useful is if *the same information appears more than once* on a composite page of subviews. I still think you're often getting yourself to a specialized DTO for each view, and then you should have an equally specialized Read and mapping that populates that DTO. Unless the price of querying a cache for reused information across many views is less than the price of rebuilding each cache that would be invalidated when that information changes. And that's directly dependent on the number of pages you serve between cache churns.

That is, you can call it an entity, but I think it's more useful to call it a ViewModel. **Stop mapping database tables to entities. Always read exactly what you're about to put onto the page directly from the database. That's what it's there for.** Really. Smart folks are working hard to optimize your queries. I realize caching makes you think you've already got the data on hand, but your hand-rolled or, worse, ORM's automatic execution plan isn't, at some point, going to be nearly as good as stating what you need in targeted SQL sent to your real rdbms.

So, and I'm perhaps overusing [Atwood's micro-optimization theater post]( a little, without a clear winner to the "monolithic refresh vs. stitched page composition" formula *a priori*, what's important to me is making the system easy to support. And then, certainly, C<strike>R</strike>UD is a heck of a lot easier than SQL>>>NHib/Caching>>>Automapper>>>ORM>>>Repository>>>MVVM.

(Worth adding that I'm unfairly equating Hansson with SQL/Cache/AutoMap/ORM/Repo/MVVM (SCAORM?) here. Totally unfair; he never says he's ORMing in these posts, afaict. I think the beef here is that he's serving modular pages, and I wonder if it's worth the extra complexity short of MAX SCALE!!1! -- and even then, when you get to displaying logically disparate information, we might be saying something similar anyhow.)

That's enough thinking aloud today. Way too many tedious box-watching style chores this week, sorry.

posted by ruffin at 4/01/2015 12:21:00 PM
Tuesday, March 31, 2015

Note to self on your recent MVC recommendations: Don't forget caching/data persistence. I think caching's the only thing the conventional library-crazy MVC overhead[1] gets you "for free" that's worth saving, but it is worth saving. Our cache was used an absolute ton in the big-ish system I hacked on in the previous job.

Need to look at something like these... (is missing locking on write/null cache) (has locking)
(hello, old and uncomplicated. Note that almost all of these have magic strings of some sort, which is Wrong.) (man, that's tuned towards large, complex, multi-associative (ha) entities)

... but within the context of this...

Guessing this probably ultimately wins:

Also interesting...

... though with the second I was left wondering something similar to what this guy was...

[1] That is, the benefits of a "typical MVC stack" like I was using previously: SQL Server to NHibernate with MemCache, Repository model to access NHibernate, QueryOver, and your typical MVC/MVVM setup on the other side. I want to kill off NHibernate, QueryOver, and use of the Repository model. The only baby in the bath is caching. Though, wow, caching is easier when you have a 1-to-1 and onto relationship between Views and queries. I'm not saying that makes things smarter, but it does reduce the caching complexity.

posted by ruffin at 3/31/2015 05:41:00 PM
Monday, March 30, 2015

I've been considering using AutoMapper in what should initially be a pretty simple MVC project that I may have to spin up in the next month or two, as I've got some experience using Automapper in the project I mentioned a post or two ago. I found a pretty good refresher in this CodeProject post, but was a little surprised to find this observation from its author:

I ran my tests many times and one of the possible outputs could be:

AutoMapper: 2117
Manual Mapping: 293

It looks like manual mapping is 7 times faster than automatic. But hey, it took 2 secs to map hundred thousands of customers.

Wow, HEH-lo. Not a big deal for simple pages, but probably not something you want underpinning the architecture of an app that could grow (which is to say "any app that calls for MVC").

A little googling showed atrocious Automapper performance isn't uncommon, not at all. Interesting quote from the accepted answer to the first question:

Also you have mentioned NHibernate in your question. Make sure that your source object along with its collections is eagerly loaded from the database before passing it to the mapping layer or you cannot blame AutoMapper for being slow because when it tries to map one of the collections of your source object it hits the database because NHibernate didn't fetch this collection.

In other words, as this article entitled "Stop using AutoMapper in your Data Access Code" explains...

Whilst I am a big fan of AutoMapper and use it in most projects I work on, especially for Domain to ViewModel mapping, when in comes to data access code, AutoMapper is not so useful. To put it simply, AutoMapper only works with in memory data, not the IQueryable interface which is more typically used in DAL scenarios. In the data access layer, whether we are using Entity Framework, LINQ to SQL or NHibernate, we often use the IQueryable interface, specifiying [sic] what we want to query before the OR/M engine translates this into SQL and returns our data. If you use AutoMapper in your DAL however, you are almost certainly returning more data than you need from the database, as the mapping will not occur until AFTER the original query has executed and the data has been returned. [emphasis mine -mfn]

Obviously there are ways around this, namely making sure that the query that pulls your data only returns what you want for that specific data load, but then you're right back to my complaints about using a repository in the first place. Once you're hand-rolling optimizations, you've left the realm of reusable generic code. Stop trying to backport a square peg into a round hole.

DevTrends links to a post by Bogard that says:

For a lot of read-only scenarios, loading up a tracked, persistent entity is a bit of a waste. And unless you're doing CQRS with read-specific tables [tables meaning "Queries, views, tables, all SQL-specific" - Bogart in comments], you're doing projection somehow from the write tables.

But many LINQ query providers help with this by parsing expression trees to craft specific SQL queries projecting straight down at the SQL layer. Additionally, projecting in to these DTOs skips loading persistent, tracked entities into memory. Unfortunately, we're then forced to write our boring LHS-RHS code when we drop to this layer...

Exactly. Though I'm not absolutely sure CQRS requires different sets of tables to gain the initial important architectural improvements I'm arguing for here.

No, I didn't know what CQRS was off-hand myself. It's apparently Command Query Responsibility Segregation. It's nice to see Martin Fowler essentially arguing the point from my previous post on ending conventional repository use for reads:

The rationale is that for many problems, particularly in more complicated domains, having the same conceptual [CRUD] model for commands and queries leads to a more complex model that does neither well.
The other main benefit is in handling high performance applications. CQRS allows you to separate the load from reads and writes allowing you to scale each independently. If your application sees a big disparity between reads and writes this is very handy. Even without that, you can apply different optimization strategies to the two sides. An example of this is using different database access techniques for read and update. [emphasis added -mfn]

That said, Fowler's not quite so contra-CRUD as I am, and seems to believe there are many real-world use cases for CRUD. "So while CQRS is a pattern I'd certainly want in my toolbox, I wouldn't keep it at the top." Really? Writing a lot of APIs maybe?

I just don't see using CRUD as the best, scalable route to build even a typical MVC app.

Though Fowler seems less CRUD-y than I am too in that he quickly jumps to divorce reads from your database of record by putting them into reporting databases instead, which seems like overkill if you're doing that from the start. That is, I think Fowler sees CQRS as a second step you take if CRUD lets you down. I think you should use CRUD from the start.

Just to be clear, I'm using CRUD with a bold "R" to indicate a conventional CRUD system, and CRUD (with a struck-through "R") for what I'm proposing everyone do from the start when making an MVC app where reads are all-but-always done with custom SQL and, in the context of Automapper inefficiencies, hand-rolled mappings to DTOs.

There's also an implicit argument in Fowler that the write database would have a different model than the reporting dbms. I don't know that the extra overhead of two domains, one for write and one for read, is going to be worthwhile. I can understand the reporting server being a sort of "permanent temp table with periodic (and ad hoc) updates" setup, but you've still got to base it on the data that's on your write side.

That is, I don't see how you break out of CRUD and entities, though, again, I want that entity business logic on the database first. If you optimize reads -- through, I propose, views and sprocs and maybe/probably temp tables, or, as Fowler seems to assume, some export process to a reporting database; it doesn't matter -- fine, but you're still basing that information on the content of your "CRUD" database setup.

Fowler's "hyper CQRS" with a reporting database is interesting, but, to me, moving to one or more reporting databases is a DevOps issue that's possible to insert well down the line, once you know reads are so out of proportion to writes that you need the support of another/distributed database servers -- a much easier move to accomplish in the future than ripping out an architecture based on Repository and automapping models. That is, you don't have to decide to use a reporting server when you hit File >>> New Project. You do need to decide not to get wrapped up with repositories and automapping.

Maybe we're similar things, just with Fowler giving more emphasis for the work as a unit rather than as many entities being affected as once. Just that, in my limited experience, optimizing writes (outside of batch-like use cases, but users seem conditioned to accept that they kick off batches and get notified when they're done) is rarely your primary performance bottleneck. Reads? That's what hobbles your systems seemingly anytime you're big enough that you're making money.

Getting back to automapping... The sum from the DevTrends post, above, is pretty good.

When hitting a database, as a developer, it is important to only return the data that you need and no more. When using modern ORM's, this is typically achieved by using projections when writing IQueryable queries. Typing this type of projection code can be tiresome and some people use AutoMapper instead, believing that it achieves the same thing. Unfortunately, AutoMapper knows nothing about IQueryable and only works with in-memory data, making it less than ideal in DAL scenarios. In order for AutoMapper to do its mapping, it needs to retreive all source data from the database, resulting in much more data being returned than is necessary, reducing performance and increasing database load and network traffic.

If your app has the potential to grow -- and let's just stipulate that any MVC app does -- you want to keep an eye on performance. And the more of this overhead you integrate into your architecture -- repositories, automapping in your data access layer -- the more tech debt you're going to have once that growth happens.

Anyhow, the tasks I was waiting on are done, so enough architecture discussion. Code time.

Bottom line: KISS now or you'll have debt to pay later.

Labels: , , ,

posted by ruffin at 3/30/2015 11:12:00 AM

Apparently [a quote by Tony Hoare](, the inventor of quicksort, speaking at the "1980 Turing Award Lecture":

"There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult."

Caught that on the website. Can't say I'm a huge fan of Kompozer, and I'm hoping NVu eats less memory than having a full Seamonkey open just for the editor. The quote's a nice find.

(In other news, I can't believe NVu 1.0 is nearly ten years old and we still don't have a great replacement that tends to follow Hoare's "first method". A good Markdown editor might be as close as we've gotten.)

Labels: ,

posted by ruffin at 3/30/2015 08:58:00 AM
Friday, March 27, 2015

I think I'm finally putting my foot down on the side against the repository pattern in MVC. I spend two years working on a reasonably complex MVC app that sold out almost completely to repositories, yet there were very few pages that didn't eventually require a DTO and, in the poorest performing pages (which eventually was nearly all of them), a custom QueryOver "query" to populate them.

Once you're using DTOs and not entities to populate your Views, and especially when you're using, in my past case, QueryOver (which means you *are* creating custom SQL, just one strange head rethread away from creating the SQL directly), you lose too much of the repository pattern's benefits. That is, once you turn the DTO corner, you're already pushing out a custom, usually one-off-ish solution. Why not simply write the right SQL and map from its results directly to your to DTO, skipping the insane overhead and obfuscated performance bottlenecks that are conventional entity mappers, especially when they're building implicit relationships between entities? Oh, the memory of and *for* an entity JOIN gone awry. ORM relationships are, if not difficult, *insanely* tedious, and are essentially never going to be as good as even your server's standard execution plan for the SQL equivalent. SQL's not difficult. If a coder can't understand sets, you were in BIG trouble waaaay before you got to the question of what pattern to use. Why move that logic anywhere but the server? And if you're going to take the time to hand roll relationships, why ever do *any* of that in entities, not SQL? It's like Dr. No trying to perform surgery.

There's only one time that entities are particularly important in the middle tier, though that "one time" crops up with *any* save: Validation, aka "business logic". You want to keep your validation process DRY, and that means putting a nice, reusable validation step between your users' newly inputted data and your data tier is absolutely crucial. You certainly don't want to support more than one filter for data going into the same place. That'll kill you when you're debugging, and cause bad data to seep in when you're not looking. And no matter how complicated your user interface and its initial SELECTs, you're always going to reduce what's entered or edited back to your table (or whatever other serialization method you've got) structure. That's just a truism. (Unless you're a crazy NoSQL-er. I get it, though if your data model changes considerably over time, well, godspeed, man.)

But would it be wrong to put validation logic only in the database too? It'd be a pain in the rear to have a set of "CUD" [sic -- CRUD without the Read] sprocs that return validation codes [that you'd look up and turn into human readable errors in your middleware] and related info, but you need to have some *complete* sort of validation in your dbms any way you cut it. Why get unDRY there? Too many people stop with validation in the middleware and trick themselves into thinking they're DRY. They're not only **not** DRY, their database is naked too. (Insert picture of Walter delivering the ringer... You didn't think I rolling out of here naked, did you?) As I was told years ago, "Put the logic in the database." I don't think I'm [just] showing my age there.

But at this point, you're very nearly back to the same ADO.NET pattern we started with years ago once you hit your Controllers, which worries me that I'm missing an obvious benefit.

But, honestly, for SELECT, repos are dead weight. Even ORM is dead weight. Good SELECT SQL, packaged into a well-protected sproc, mapped to a DTO seems like it's *always* The Right Thing to do. There's almost never a time when you're dealing with straight, unadulterated entities in complex pages, and even when you are, it's not like repos give you a huge leg up over smart Service reuse in your Controllers. That is, there's nothing particular to repositories that's a sole-source dealmaking advantage.

CUD is a slightly different animal, but for me the argument for entities here is more about convenience (running unit tests and coding great validation logic that's easy to send back to your end users with that validation only in your dbms is more difficult for Average Joe Programmer than, say, hacking it in C#) than Doing Things Right. I'm not sure I'm a big fan of ORMs and entities for CUD either, though, again, here I'm willing to be convinced I'm wrong.

Edit: See my discussion a little more recently on Command Query Responsibility Segregation, something I didn't know I was "rediscovering" here. The difference is I think you have to start with separation of your reads, something I kinda jokingly call "CRUD" in that post.

Labels: , ,

posted by ruffin at 3/27/2015 06:36:00 PM
Thursday, March 19, 2015

I always wish Humble Bundle would tell you more about the books they offer in their bundles without all the clicking. It's barely worth the time it takes to look up which of the books I might like -- at that point, I might as well just spend the cash on something I know I'd like. Why they don't get the info (descriptions and prices) off of Amazon or elsewhere when they put up a new bundle, I have no idea.

So I decided I'd do it this time for the current SciFi bundle. Here we go. No obvious gems (like the time an unpublished Frank Herbert novel was included in the "pay more than the average" bundle extras. Wow. Yes, please), but these are the books you get for a dollar five dollars (you also get a SciFi fiction magazine, Lightspeed, issue 28):

Kindle Price
Amazon rating
The Rock by Bob Mayer $4.99
4.2  (91 customer reviews)
In Australia...a U.S. Air Force computer operator receives a terrifying radio transmission out of Ayers Rock that knocks out the world's communications. In New Mexico...a boozy college engineering professor is suddenly escorted into a waiting car by two armed military men. In New York...a tall, slender woman leaves her husband a note on the refrigerator saying that she's leaving -- but can't tell him where she's going. In Colombia...a Special Forces officer breaks into a drug kinpin's bedroom and puts a bullet into a woman's brain before running to a waiting helicopter. And in England...a beautiful twenty-three-year-old mathematician prepares for a journey that will change her life forever.

The team assembles in Australia on a mission that can save the world. But first, they must figure out who is the message coming from? What are they trying to tell mankind? And can it be deciphered in time before Armageddon overtakes the world?

It’s about a bomb, it’s about a world about to be shattered. It’s about the last few days they have to save us all.

Bob Mayer is a West Point Graduate and a former Green Beret.
Fiction River: Time Streams (apparently a magazine)
4.8  (5 customer reviews)
Time-travel stories open the entire world and all of time to writers’ imaginations. The fifteen writers in this third original anthology in the Fiction River line explore everything from Chicago gangsters to Japanese tsunamis, and travel from 2013 to the nineteenth century to a vast future. Featuring work from award winners to bestsellers to a few newcomers whose time will come, Time Streams turns the time-travel genre on its head. "Fiction River is off to an auspicious start. It's a worthy heir to the original anthology series of the 60s and 70s. ... It's certainly the top anthology of the year to date." —Amazing Stories on Fiction River: Unnatural Worlds
Parallelogram Book 1: Into the Parallel by Robin Brande
4.6  (5 customer reviews)
Audie Masters is a 17-year-old girl who doesn't have it all: not a lot of money, not much confidence around guys, she can't seem to pass algebra, and she's desperate to get into the college she's been dreaming of for years.

But what Audie does have is a knack for physics that's about to catapult her into the adventure of a lifetime. 

Now all she has to do is survive it.
Time's Mistress by Steven Savile (apparently a short story collection)
5.0 (3 customer reviews)
Steven Savile's popular fantasy stories embrace all aspects of the fantastic. Be it the wonder of magical realism, the darkness of the macabre, or the mythological, these stories have one thing in common: faith. Savile offers up tales of hope and wonder in equal measure, whilst treating sadness as a long lost friend. Nothing in his world is quite as it seems. The world you think you know isn't the world you're about to enter. Everything you think you've learned about life is about to be unlearned. These are stories of love. These are stories of loss. In some you will find redemption, in others the simple act of memory is treacherous and cannot be trusted. But in all of them there is an aching sense of loss and love. Savile's stories here speak to the part in all of us who still dares to fall in love again after a broken heart.
Alternitech by Kevin J. Anderson
3.7, (3 customer reviews)
“Alternitech” sends prospectors into alternate but similar timelines where tiny differences yield significant changes: a world where the Beatles never broke up, or where Lee Harvey Oswald wasn’t gunned down after the Kennedy assassination, where an accidental medical breakthrough offers the cure to a certain disease, where a struggling author really did write the great American novel, or where a freak accident reveals the existence of a serial killer. Alternitech finds those differences—and profits from them.

Well, it's useful to me. ;^) Probably worth a dollar. The last scifi bundle I got, Anderson's was the only decent book in the bunch (though I may have only spent a dollar), but it was obviously his first novel.

Labels: ,

posted by ruffin at 3/19/2015 08:21:00 PM
Friday, March 13, 2015

Very good, though oversimplified and not exactly chock full of programming examples, introduction to Big O notation here:

I said that the idea behind Big O notation is simple and useful. It's simple because it lets us ignore lots of details in favor of descriptions like "each conference attendee adds more work than the last" or "each one adds the same amount of work". And it's useful because it lets us answer questions like "can we actually solve this problem in a reasonable amount of time?"

Speaking of which, if you're a programmer, you probably use hashes (also known as dictionaries) all the time. Did you know that no matter how many items are in a hash, their lookup and insertion time never bogs down? It's O(1). How do they do that?

I finally learned what hashing really was last year (just out of curiosity) -- and duh, and brilliant, and duh. Hashtables take about 30 seconds to explain, if you don't read through the snippet from or, The Whale, as we like to call it, below.

Create a hashing function that produces a hash that's significantly smaller than the average length of the value being stored in your hashtable (md5 is fine, for example). Now use those as keys for your values on insertion.

So this line of text...

Whenever I find myself growing grim about the mouth; whenever it is a damp, drizzly November in my soul; whenever I find myself involuntarily pausing before coffin warehouses, and bringing up the rear of every funeral I meet; and especially whenever my hypos get such an upper hand of me, that it requires a strong moral principle to prevent me from deliberately stepping into the street, and methodically knocking people's hats off--then, I account it high time to get to sea as soon as I can. This is my substitute for pistol and ball.

... will always be found at location 0e92fa0b51b9f8879eae58ad30bc943c. ALWAYS. There's no look-up at all. Wow. Neat. And, in retrospect, duh. (Brilliance to me is always something that's so painfully obvious after you hear it that you can never again forget it -- literally life-[view-]changing on some level.)

Now by virtue of having fewer bytes in the hash than the value (ignoring that we could zip that text and have less already, etc etc), we will have collisions, where more than one value will have the same hash as another. Oh noes! How can we store the value if something's already at its index? Push it from the U[nsolved] drawer to the underused X files?

No no. If that happens in your hashtable, you simply string each content piece along in a list that you store at that key. Now your lookup time for collisions is slightly longer than straight O(1), but it's still so good it's barely worth worrying about. Read the above to know why that means it's still essentially O(1). Also now obvious: Why Hashtables don't guarantee order.

Again, hashing is brilliant. And, well, after you hear it, duh.


posted by ruffin at 3/13/2015 02:48:00 PM
Thursday, March 12, 2015

If you're using Cscript to execute JavaScript on Windows...

This doesn't work:


This does:


This is not fun. It's one thing to learn idiosyncratic javascript engines in different browsers on different plats, but another when you're dealing with engines that just support whatever the heck they wanna. (He says without having checked the ECSAscript standard...)

I mean, Powershell is idiosyncratic like mad, but it's solidly idiosyncratic in consistent ways. Cscript might not quite understand that JavaScript is understood outside of Cscript context... (Why not just use Node? I would, but this is code we had delivered from YA Contractor, and it's Windows-objected-up all over the place.)

Labels: , ,

posted by ruffin at 3/12/2015 04:52:00 PM

Support freedom
All posts can be accessed here:

Just the last year o' posts:

Powered by Blogger