Put the knife down and take a green herb, dude.
One feller's views on the state of everyday computer science & its application (and now, OTHER STUFF) who isn't rich enough to shell out for www.myfreakinfirst-andlast-name.com
|FOR ENTERTAINMENT PURPOSES ONLY!!! Back-up your data and always wear white.|
|Wednesday, April 01, 2015|
More fun thinking aloud about MVC architectures.
After reading [David Hansson on "Russian doll caching"](https://signalvnoise.com/posts/3112-how-basecamp-next-got-to-be-so-damn-fast-without-using-much-client-side-ui), I think I'm coming around on why you'd use entities to put together piecemeal views, though I'm not sure I'm buying yet.
I'll have to find the post again, but there was one in the links I put up yesterday that said that full page caching was caching's holy grail. Compare the full page mentality to how Hansson describes the issues of caching at serious scale:
> This Russian doll approach to caching means that even when content
This implicitly means that you're going to have extra cost piecing together every page, even if you're just stitching cached content, and the pseudo-formula to compare to C<strike>R</strike>UD is pretty easy to stub out. If the cost of rebuilding every cache that depends on some reusable subset of those cached views' information is greater than the cost of piecing together pages from incomplete/non-monolithic cache objects on each request, then you go with the [actually fairly conventional] "Russian doll" approach.
And this largely depends on how many of your widgets appear on more than one page where some [other] subset of the content churns.
> The only way we can get complex pages to take less than 50ms is to make
Still, it's easy enough to conceive of each of these reusable chunks as embedded views, and then you're back to where you started. Pages might be Russian dolls of views (though that's the wrong metaphor beyond expressing the Herbertian concept of "views within views within views". Once you understand views can be made up of views can be made up of views, *ad infinitum*, you then have to remember that any number of "dolls" can live at any level, rather than the Russian dolls' one-within-one-within-one. Perhaps your main view has five "dolls" inside of it, and those have 2, 3, 0, 1, and 0 dolls inside of them, respectively, and those have...), but then so what?
If you get to the point that one of your embedded views only takes data from one table, great. I guess the only way this is useful is if *the same information appears more than once* on a composite page of subviews. I still think you're often getting yourself to a specialized DTO for each view, and then you should have an equally specialized Read and mapping that populates that DTO. Unless the price of querying a cache for reused information across many views is less than the price of rebuilding each cache that would be invalidated when that information changes. And that's directly dependent on the number of pages you serve between cache churns.
That is, you can call it an entity, but I think it's more useful to call it a ViewModel. **Stop mapping database tables to entities. Always read exactly what you're about to put onto the page directly from the database. That's what it's there for.** Really. Smart folks are working hard to optimize your queries. I realize caching makes you think you've already got the data on hand, but your hand-rolled or, worse, ORM's automatic execution plan isn't, at some point, going to be nearly as good as stating what you need in targeted SQL sent to your real rdbms.
So, and I'm perhaps overusing [Atwood's micro-optimization theater post](http://blog.codinghorror.com/the-sad-tragedy-of-micro-optimization-theater/) a little, without a clear winner to the "monolithic refresh vs. stitched page composition" formula *a priori*, what's important to me is making the system easy to support. And then, certainly, C<strike>R</strike>UD is a heck of a lot easier than SQL>>>NHib/Caching>>>Automapper>>>ORM>>>Repository>>>MVVM.
(Worth adding that I'm unfairly equating Hansson with SQL/Cache/AutoMap/ORM/Repo/MVVM (SCAORM?) here. Totally unfair; he never says he's ORMing in these posts, afaict. I think the beef here is that he's serving modular pages, and I wonder if it's worth the extra complexity short of MAX SCALE!!1! -- and even then, when you get to displaying logically disparate information, we might be saying something similar anyhow.)
That's enough thinking aloud today. Way too many tedious box-watching style chores this week, sorry.
posted by ruffin at 4/01/2015 12:21:00 PM
|Tuesday, March 31, 2015|
Note to self on your recent MVC recommendations: Don't forget caching/data persistence. I think caching's the only thing the conventional library-crazy MVC overhead gets you "for free" that's worth saving, but it is worth saving. Our cache was used an absolute ton in the big-ish system I hacked on in the previous job.
Need to look at something like these...
http://stackoverflow.com/a/349111/1028230 (is missing locking on write/null cache)
... but within the context of this...
Guessing this probably ultimately wins:
... though with the second I was left wondering something similar to what this guy was...
 That is, the benefits of a "typical MVC stack" like I was using previously: SQL Server to NHibernate with MemCache, Repository model to access NHibernate, QueryOver, and your typical MVC/MVVM setup on the other side. I want to kill off NHibernate, QueryOver, and use of the Repository model. The only baby in the bath is caching. Though, wow, caching is easier when you have a 1-to-1 and onto relationship between Views and queries. I'm not saying that makes things smarter, but it does reduce the caching complexity.
posted by ruffin at 3/31/2015 05:41:00 PM
|Monday, March 30, 2015|
I've been considering using AutoMapper in what should initially be a pretty simple MVC project that I may have to spin up in the next month or two, as I've got some experience using Automapper in the project I mentioned a post or two ago. I found a pretty good refresher in this CodeProject post, but was a little surprised to find this observation from its author:
Wow, HEH-lo. Not a big deal for simple pages, but probably not something you want underpinning the architecture of an app that could grow (which is to say "any app that calls for MVC").
In other words, as this article entitled "Stop using AutoMapper in your Data Access Code" explains...
Obviously there are ways around this, namely making sure that the query that pulls your data only returns what you want for that specific data load, but then you're right back to my complaints about using a repository in the first place. Once you're hand-rolling optimizations, you've left the realm of reusable generic code. Stop trying to backport a square peg into a round hole.
DevTrends links to a post by Bogard that says:
Exactly. Though I'm not absolutely sure CQRS requires different sets of tables to gain the initial important architectural improvements I'm arguing for here.
No, I didn't know what CQRS was off-hand myself. It's apparently Command Query Responsibility Segregation. It's nice to see Martin Fowler essentially arguing the point from my previous post on ending conventional repository use for reads:
That said, Fowler's not quite so contra-CRUD as I am, and seems to believe there are many real-world use cases for C
I just don't see using CRUD as the best, scalable route to build even a typical MVC app.
Though Fowler seems less C
Just to be clear, I'm using CRUD with a bold "R" to indicate a conventional CRUD system, and C
There's also an implicit argument in Fowler that the write database would have a different model than the reporting dbms. I don't know that the extra overhead of two domains, one for write and one for read, is going to be worthwhile. I can understand the reporting server being a sort of "permanent temp table with periodic (and ad hoc) updates" setup, but you've still got to base it on the data that's on your write side.
That is, I don't see how you break out of C
Fowler's "hyper CQRS" with a reporting database is interesting, but, to me, moving to one or more reporting databases is a DevOps issue that's possible to insert well down the line, once you know reads are so out of proportion to writes that you need the support of another/distributed database servers -- a much easier move to accomplish in the future than ripping out an architecture based on Repository and automapping models. That is, you don't have to decide to use a reporting server when you hit File >>> New Project. You do need to decide not to get wrapped up with repositories and automapping.
Maybe we're similar things, just with Fowler giving more emphasis for the work as a unit rather than as many entities being affected as once. Just that, in my limited experience, optimizing writes (outside of batch-like use cases, but users seem conditioned to accept that they kick off batches and get notified when they're done) is rarely your primary performance bottleneck. Reads? That's what hobbles your systems seemingly anytime you're big enough that you're making money.
Getting back to automapping... The sum from the DevTrends post, above, is pretty good.
If your app has the potential to grow -- and let's just stipulate that any MVC app does -- you want to keep an eye on performance. And the more of this overhead you integrate into your architecture -- repositories, automapping in your data access layer -- the more tech debt you're going to have once that growth happens.
Anyhow, the tasks I was waiting on are done, so enough architecture discussion. Code time.
Bottom line: KISS now or you'll have debt to pay later.
posted by ruffin at 3/30/2015 11:12:00 AM
Apparently [a quote by Tony Hoare](http://en.wikiquote.org/wiki/C._A._R._Hoare), the inventor of quicksort, speaking at the "1980 Turing Award Lecture":
"There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult."
Caught that on the NVU.org website. Can't say I'm a huge fan of Kompozer, and I'm hoping NVu eats less memory than having a full Seamonkey open just for the editor. The quote's a nice find.
(In other news, I can't believe NVu 1.0 is nearly ten years old and we still don't have a great replacement that tends to follow Hoare's "first method". A good Markdown editor might be as close as we've gotten.)
posted by ruffin at 3/30/2015 08:58:00 AM
|Friday, March 27, 2015|
I think I'm finally putting my foot down on the side against the repository pattern in MVC. I spend two years working on a reasonably complex MVC app that sold out almost completely to repositories, yet there were very few pages that didn't eventually require a DTO and, in the poorest performing pages (which eventually was nearly all of them), a custom QueryOver "query" to populate them.
Once you're using DTOs and not entities to populate your Views, and especially when you're using, in my past case, QueryOver (which means you *are* creating custom SQL, just one strange head rethread away from creating the SQL directly), you lose too much of the repository pattern's benefits. That is, once you turn the DTO corner, you're already pushing out a custom, usually one-off-ish solution. Why not simply write the right SQL and map from its results directly to your to DTO, skipping the insane overhead and obfuscated performance bottlenecks that are conventional entity mappers, especially when they're building implicit relationships between entities? Oh, the memory of and *for* an entity JOIN gone awry. ORM relationships are, if not difficult, *insanely* tedious, and are essentially never going to be as good as even your server's standard execution plan for the SQL equivalent. SQL's not difficult. If a coder can't understand sets, you were in BIG trouble waaaay before you got to the question of what pattern to use. Why move that logic anywhere but the server? And if you're going to take the time to hand roll relationships, why ever do *any* of that in entities, not SQL? It's like Dr. No trying to perform surgery.
There's only one time that entities are particularly important in the middle tier, though that "one time" crops up with *any* save: Validation, aka "business logic". You want to keep your validation process DRY, and that means putting a nice, reusable validation step between your users' newly inputted data and your data tier is absolutely crucial. You certainly don't want to support more than one filter for data going into the same place. That'll kill you when you're debugging, and cause bad data to seep in when you're not looking. And no matter how complicated your user interface and its initial SELECTs, you're always going to reduce what's entered or edited back to your table (or whatever other serialization method you've got) structure. That's just a truism. (Unless you're a crazy NoSQL-er. I get it, though if your data model changes considerably over time, well, godspeed, man.)
But would it be wrong to put validation logic only in the database too? It'd be a pain in the rear to have a set of "CUD" [sic -- CRUD without the Read] sprocs that return validation codes [that you'd look up and turn into human readable errors in your middleware] and related info, but you need to have some *complete* sort of validation in your dbms any way you cut it. Why get unDRY there? Too many people stop with validation in the middleware and trick themselves into thinking they're DRY. They're not only **not** DRY, their database is naked too. (Insert picture of Walter delivering the ringer... You didn't think I rolling out of here naked, did you?) As I was told years ago, "Put the logic in the database." I don't think I'm [just] showing my age there.
But at this point, you're very nearly back to the same ADO.NET pattern we started with years ago once you hit your Controllers, which worries me that I'm missing an obvious benefit.
But, honestly, for SELECT, repos are dead weight. Even ORM is dead weight. Good SELECT SQL, packaged into a well-protected sproc, mapped to a DTO seems like it's *always* The Right Thing to do. There's almost never a time when you're dealing with straight, unadulterated entities in complex pages, and even when you are, it's not like repos give you a huge leg up over smart Service reuse in your Controllers. That is, there's nothing particular to repositories that's a sole-source dealmaking advantage.
CUD is a slightly different animal, but for me the argument for entities here is more about convenience (running unit tests and coding great validation logic that's easy to send back to your end users with that validation only in your dbms is more difficult for Average Joe Programmer than, say, hacking it in C#) than Doing Things Right. I'm not sure I'm a big fan of ORMs and entities for CUD either, though, again, here I'm willing to be convinced I'm wrong.
posted by ruffin at 3/27/2015 06:36:00 PM
|Thursday, March 19, 2015|
I always wish Humble Bundle would tell you more about the books they offer in their bundles without all the clicking. It's barely worth the time it takes to look up which of the books I might like -- at that point, I might as well just spend the cash on something I know I'd like. Why they don't get the info (descriptions and prices) off of Amazon or elsewhere when they put up a new bundle, I have no idea.
So I decided I'd do it this time for the current SciFi bundle. Here we go. No obvious gems (like the time an unpublished Frank Herbert novel was included in the "pay more than the average" bundle extras. Wow. Yes, please), but these are the books you get for
Well, it's useful to me. ;^) Probably worth a dollar. The last scifi bundle I got, Anderson's was the only decent book in the bunch (though I may have only spent a dollar), but it was obviously his first novel.
posted by ruffin at 3/19/2015 08:21:00 PM
|Friday, March 13, 2015|
Very good, though oversimplified and not exactly chock full of programming examples, introduction to Big O notation here:
I finally learned what hashing really was last year (just out of curiosity) -- and duh, and brilliant, and duh. Hashtables take about 30 seconds to explain, if you don't read through the snippet from or, The Whale, as we like to call it, below.
Create a hashing function that produces a hash that's significantly smaller than the average length of the value being stored in your hashtable (md5 is fine, for example). Now use those as keys for your values on insertion.
So this line of text...
Whenever I find myself growing grim about the mouth; whenever it is a damp, drizzly November in my soul; whenever I find myself involuntarily pausing before coffin warehouses, and bringing up the rear of every funeral I meet; and especially whenever my hypos get such an upper hand of me, that it requires a strong moral principle to prevent me from deliberately stepping into the street, and methodically knocking people's hats off--then, I account it high time to get to sea as soon as I can. This is my substitute for pistol and ball.
... will always be found at location 0e92fa0b51b9f8879eae58ad30bc943c. ALWAYS. There's no look-up at all. Wow. Neat. And, in retrospect, duh. (Brilliance to me is always something that's so painfully obvious after you hear it that you can never again forget it -- literally life-[view-]changing on some level.)
Now by virtue of having fewer bytes in the hash than the value (ignoring that we could zip that text and have less already, etc etc), we will have collisions, where more than one value will have the same hash as another. Oh noes! How can we store the value if something's already at its index? Push it from the U[nsolved] drawer to the underused X files?
No no. If that happens in your hashtable, you simply string each content piece along in a list that you store at that key. Now your lookup time for collisions is slightly longer than straight O(1), but it's still so good it's barely worth worrying about. Read the above to know why that means it's still essentially O(1). Also now obvious: Why Hashtables don't guarantee order.
Again, hashing is brilliant. And, well, after you hear it, duh.
posted by ruffin at 3/13/2015 02:48:00 PM
|Thursday, March 12, 2015|
This doesn't work:
posted by ruffin at 3/12/2015 04:52:00 PM
All posts can be accessed here:
Just the last year o' posts: