I've been considering using AutoMapper in what should initially be a pretty simple MVC project that I may have to spin up in the next month or two, as I've got some experience using Automapper in the project I mentioned a post or two ago. I found a pretty good refresher in this CodeProject post, but was a little surprised to find this observation from its author:

I ran my tests many times and one of the possible outputs could be:

AutoMapper: 2117
Manual Mapping: 293

It looks like manual mapping is 7 times faster than automatic. But hey, it took 2 secs to map hundred thousands of customers.

Wow, HEH-lo. Not a big deal for simple pages, but probably not something you want underpinning the architecture of an app that could grow (which is to say "any app that calls for MVC").

A little googling showed atrocious Automapper performance isn't uncommon, not at all. Interesting quote from the accepted answer to the first question:

Also you have mentioned NHibernate in your question. Make sure that your source object along with its collections is eagerly loaded from the database before passing it to the mapping layer or you cannot blame AutoMapper for being slow because when it tries to map one of the collections of your source object it hits the database because NHibernate didn't fetch this collection.

In other words, as this article entitled "Stop using AutoMapper in your Data Access Code" explains...

Whilst I am a big fan of AutoMapper and use it in most projects I work on, especially for Domain to ViewModel mapping, when in comes to data access code, AutoMapper is not so useful. To put it simply, AutoMapper only works with in memory data, not the IQueryable interface which is more typically used in DAL scenarios. In the data access layer, whether we are using Entity Framework, LINQ to SQL or NHibernate, we often use the IQueryable interface, specifiying [sic] what we want to query before the OR/M engine translates this into SQL and returns our data. If you use AutoMapper in your DAL however, you are almost certainly returning more data than you need from the database, as the mapping will not occur until AFTER the original query has executed and the data has been returned. [emphasis mine -mfn]

Obviously there are ways around this, namely making sure that the query that pulls your data only returns what you want for that specific data load, but then you're right back to my complaints about using a repository in the first place. Once you're hand-rolling optimizations, you've left the realm of reusable generic code. Stop trying to backport a square peg into a round hole.

DevTrends links to a post by Bogard that says:

For a lot of read-only scenarios, loading up a tracked, persistent entity is a bit of a waste. And unless you're doing CQRS with read-specific tables [tables meaning "Queries, views, tables, all SQL-specific" - Bogart in comments], you're doing projection somehow from the write tables.

But many LINQ query providers help with this by parsing expression trees to craft specific SQL queries projecting straight down at the SQL layer. Additionally, projecting in to these DTOs skips loading persistent, tracked entities into memory. Unfortunately, we're then forced to write our boring LHS-RHS code when we drop to this layer...

Exactly. Though I'm not absolutely sure CQRS requires different sets of tables to gain the initial important architectural improvements I'm arguing for here.

No, I didn't know what CQRS was off-hand myself. It's apparently Command Query Responsibility Segregation. It's nice to see Martin Fowler essentially arguing the point from my previous post on ending conventional repository use for reads:

The rationale is that for many problems, particularly in more complicated domains, having the same conceptual [CRUD] model for commands and queries leads to a more complex model that does neither well.
The other main benefit is in handling high performance applications. CQRS allows you to separate the load from reads and writes allowing you to scale each independently. If your application sees a big disparity between reads and writes this is very handy. Even without that, you can apply different optimization strategies to the two sides. An example of this is using different database access techniques for read and update. [emphasis added -mfn]

That said, Fowler's not quite so contra-CRUD as I am, and seems to believe there are many real-world use cases for CRUD. "So while CQRS is a pattern I'd certainly want in my toolbox, I wouldn't keep it at the top." Really? Writing a lot of APIs maybe?

I just don't see using CRUD as the best, scalable route to build even a typical MVC app.

Though Fowler seems less CRUD-y than I am too in that he quickly jumps to divorce reads from your database of record by putting them into reporting databases instead, which seems like overkill if you're doing that from the start. That is, I think Fowler sees CQRS as a second step you take if CRUD lets you down. I think you should use CRUD from the start.

Just to be clear, I'm using CRUD with a bold "R" to indicate a conventional CRUD system, and CRUD (with a struck-through "R") for what I'm proposing everyone do from the start when making an MVC app where reads are all-but-always done with custom SQL and, in the context of Automapper inefficiencies, hand-rolled mappings to DTOs.

There's also an implicit argument in Fowler that the write database would have a different model than the reporting dbms. I don't know that the extra overhead of two domains, one for write and one for read, is going to be worthwhile. I can understand the reporting server being a sort of "permanent temp table with periodic (and ad hoc) updates" setup, but you've still got to base it on the data that's on your write side.

That is, I don't see how you break out of CRUD and entities, though, again, I want that entity business logic on the database first. If you optimize reads -- through, I propose, views and sprocs and maybe/probably temp tables, or, as Fowler seems to assume, some export process to a reporting database; it doesn't matter -- fine, but you're still basing that information on the content of your "CRUD" database setup.

Fowler's "hyper CQRS" with a reporting database is interesting, but, to me, moving to one or more reporting databases is a DevOps issue that's possible to insert well down the line, once you know reads are so out of proportion to writes that you need the support of another/distributed database servers -- a much easier move to accomplish in the future than ripping out an architecture based on Repository and automapping models. That is, you don't have to decide to use a reporting server when you hit File >>> New Project. You do need to decide not to get wrapped up with repositories and automapping.

Maybe we're similar things, just with Fowler giving more emphasis for the work as a unit rather than as many entities being affected as once. Just that, in my limited experience, optimizing writes (outside of batch-like use cases, but users seem conditioned to accept that they kick off batches and get notified when they're done) is rarely your primary performance bottleneck. Reads? That's what hobbles your systems seemingly anytime you're big enough that you're making money.

Getting back to automapping... The sum from the DevTrends post, above, is pretty good.

When hitting a database, as a developer, it is important to only return the data that you need and no more. When using modern ORM's, this is typically achieved by using projections when writing IQueryable queries. Typing this type of projection code can be tiresome and some people use AutoMapper instead, believing that it achieves the same thing. Unfortunately, AutoMapper knows nothing about IQueryable and only works with in-memory data, making it less than ideal in DAL scenarios. In order for AutoMapper to do its mapping, it needs to retreive all source data from the database, resulting in much more data being returned than is necessary, reducing performance and increasing database load and network traffic.

If your app has the potential to grow -- and let's just stipulate that any MVC app does -- you want to keep an eye on performance. And the more of this overhead you integrate into your architecture -- repositories, automapping in your data access layer -- the more tech debt you're going to have once that growth happens.

Anyhow, the tasks I was waiting on are done, so enough architecture discussion. Code time.

Bottom line: KISS now or you'll have debt to pay later.

Labels: , , ,