MacBook, defective by design banner

Put the knife down and take a green herb, dude.

One feller's views on the state of everyday computer science & its application (and now, OTHER STUFF) who isn't rich enough to shell out for

FOR ENTERTAINMENT PURPOSES ONLY!!! Back-up your data and always wear white.
URLs I want to remember:
* Atari 2600 programming on your Mac
* joel on software (tip pt)
* Professional links: resume, github, paltry StackOverflow * Regular Expression Introduction (copy)
* The hex editor whose name I forget
* JSONLint to pretty-ify JSON
* Using CommonDialog in VB 6 * Free zip utils
* that hardware vendor review site I forget about is here * Javascript 1.5 ref
* Read the bits about the zone * Find column in sql server db by name
* Giant ASCII Textifier in Stick Figures (in Ivrit) * Quick intro to Javascript
* Don't [over-]sweat "micro-optimization" * Parsing str's in VB6
* .ToString("yyyy-MM-dd HH:mm:ss.fff", CultureInfo.InvariantCulture); (src) * Break on a Lenovo T430: Fn+Alt+B
email if ya gotta, RSS if ya wanna RSS, ยข if you're keypadless
Friday, April 10, 2015

From the Appbot's blog, "Dissecting The App Store Top Charts":

In my mind games have always dominated the App Store, both in downloads and revenue, but what is the truth?

This inspired me to dig into the US top 200 charts (free, paid and grossing) to check out how the categories and age of the apps compared. The data is a snapshot take on April 8 2015.

For me, the most interesting revelation was the make-up of paid apps:

  • 42% Games
  • 12% Photo and video
  • 11% Health and fitness
  • 6% Entertainment
  • 4% Each for Utilities, Business, Weather, Music
  • 3% Each for Reference, Education, Productivity

I think that's percentage by app, not any weight for cost. It's just the number of apps on the store. Still, hello telling.

Notice too what's fallen essentially completely out: In free apps, Social Networking is 12% of the pie. In paid, zippo in the 3% or above.

Of course, what'd be really useful would be how people buy, not what's on the shelf. There can be dozens of brands of cookies, but if 98% of folks are buying Oreos, I'm not sure I want to be in the fig newton market, if you get my meaning. I mean, it could just be that every new Objective-C homebrewer brews a game first, trying to be the next Flappy Clash Birds.

Labels: , ,

posted by ruffin at 4/10/2015 02:15:00 PM
Thursday, April 09, 2015

So far, I hate the new Photos for OS X. Three beach balls on startup (looks like every reviewer used an SSD), and it immediately imported iPhoto without asking. Thanks.

I deleted both photo libraries and started over. Eventually I created a new, blank (or so I hoped) library so I could drag in my photo folders manually. And I've chosen not to import photos. I already have them in folders. I don't need them twice.

Then, suddenly, I start getting about eight random pictures. What the heck? Where are these coming from? I didn't want them in there. I hide them, since I apparently can't delete them.

Then I choose to import a folder. It doesn't, afaict, recurse directories. Wth?

Now I've got more randomly found photos. What the freakin' heck is Photos doing? Who's running this ride? These are not the photos I tried to import just a second ago.

Okay, so I start over again. I delete the photo library. I put the new photo library that I create on next startup into its own folder. Now there's nothing. Let's see if I can drag lots of year folders over. I can, but Photos doesn't tell me anything. No, "I see your folders, and I'm importing now." Nothing. It sits there. It's still sitting there. It either grabs random photos I don't want, or it sits. Nice.

So I try again. ONLY NOW does it act like it knows I tried to import something a few minutes ago.

Beauteous. Just beauteous.

I read the iMore review that makes things sound pretty rosy. Ain't true for the "start from scratch" use case. Not yet.

This stinks. STINKS. I guess Picasa, which does actually do what I tell it, still wins.

EDIT: Oh, wait. I guess I was supposed to see this horribly informative "alert" to know importing was underway:

I call it, "The Universally Recognized Circle o' Importing".

And now my late 2014 iMac goes from super responsive to crawling, thanks to the spinning platters. I hate the way OS X is tuned only for SSDs at this point. Its performance really depends on them. I installed an SSD on a Late 2009 MacBook -- the unibody white one -- and it does great now. But my quad-core iMac? Crawling in molasses. Looks like Photo is creating all those thumbnails, which is going wonderfully.

Then this happened.
Great job, Photos. Guess I'll go finish my Node testing on my Lenovo.

Labels: , ,

posted by ruffin at 4/09/2015 12:36:00 PM
Friday, April 03, 2015

As we continue to think aloud about MVC patterns... When you get rid of the Repository, you also get rid of the "sad tragedy" of repository architecture debate theater. If you want to see today's real lesson, go ahead and skip to the end.

CodeBetter --- DDD The Generic Repository

Consider the following code:

Repository<Customer> repository = new Repository<Customer>(); foreach(Customer c in repository.FetchAllMatching(CustomerAgeQuery.ForAge(19)) { }

The intent of this code is to enumerate all of the customers in my repository that match the criteria of being 19 years old. This code is fairly good at expressing its intent in a readable way to someone who may have varying levels of experience dealing with the code. This code also is highly factored allowing for aggressive reuse.

Especially due to the aggressive reuse the above code is commonly seen in domains. Developers are trained that reuse is good and therefore tend towards designs where reuse is applied

It bugs me that anyone could use "code reuse" as a positive when talking about repositories (but skip to the end to see what's really going on here). By definition, all this jive is repeated code -- or at least code that runs through a Rube Goldberg machine before it becomes SQL, which is worse.

Again, let me say again that I believe entities make some sense when you're looking to enforce business logic, but then I'm challenging you again to let me know why that isn't better handled -- ONCE! -- by your rdbms. Your entities are your data objects, and your capital-r Reads don't give a flying flip about them other than joining them together to produce their views.

CodeBetter -- DDD Specification or Query Object

One of the nice benefits of a Specification is that one could write some code like the following:

IEnumerable<Customer> customers =

Writing code like this has allowed the developer to reuse a specification from the domain within their repository as a method for querying. While this may seem to be a good thing at the outset this mentality introduces a host of problems.


The first and largest problem that one will run into when dealing with this type of API is that the Repository is necessarily a leaky abstraction. The GoldCustomerSpecification is a piece of code, it represents a predicate for whether a single customer is or is not a gold customer. In order to return a set of customers that represents all of the customers matching the GoldCustomerSpecification the repository will need to run the specification on every customer. ... On the read side of your domain (a different layer if you use cqs) you want clients to be able to pass query objects directly to your repositories. Keep in mind that these are not the repositories on the transactional side (read: domain) but are supporting the complex reporting behaviors needed. It is often times not possible to completely isolate every type of report you may like to run (but you should still try to do this where possible as the strong contract has benefits).

CodeBetter -- CQRS and Event Sourcing
related: Martin Fowler -- Event Sourcing

If we were to say use a relational database, object database, or anything else that only keeps current state we would have a slight issue. The issue is that we have two different models that we cannot keep in sync with each other. Consider that we are publishing events to the read model/other integration points, we are also saving our current state with a tool like nhibernate. How can we rationalize that what nhibernate saved to the database is actually the same meaning as the events we published, what if they are not?

Ayende -- Repository is the new Singleton

There most commonly used definition for Repository is defined in Patterns of Enterprise Application Architecture:

A system with a complex domain model often benefits from a layer, such as the one provided by Data Mapper, that isolates domain objects from details of the database access code. In such systems it can be worthwhile to build another layer of abstraction over the mapping layer where query construction code is concentrated. This becomes more important when there are a large number of domain classes or heavy querying. In these cases particularly, adding this layer helps minimize duplicate query logic.

That's actually pretty interesting -- I mean, query repetition is very obviously the problem with what I'm proposing (a SQL query per controller action), but worded fairly well. Of course my response is that there's nothing wrong with a defensive separation of logic. Think smartly self-contained microservice. -- What is that all about the repository anti pattern?

Complex queries should be placed into query objects according to his article. So do we really need a repository? This article tries to answer this question.

No. ;^D

SpiendWorks -- The Generic Repository Is An Anti-Pattern

A repository is a concept to abstract the access to the persistence, that is not to depend on data access implementation details. There is no formula and no rules. ... Other offender in regard to generic repositories is the fact that lots of developers just use it to wrap the DAO (Database Access Object) or an underlying ORM (like EF or Nhibernate). Doing so they add only a useless abstraction, pretty much just making the code more complex with no benefits. A DAO makes it easy to work with a database, an ORM makes it easy to access a database as an OOP virtual storage and to eventually abstract the access to a specific database.

Emphasis mine. Thanks for that line. Phew. Though I still dislike most ORM-based implementations, I think.

Moneyball for today

Also from the above link:

But the repository should abstract the whole persistence layer, hiding implementation details like database engine or what DAO or ORM the app is using but also providing a contract that makes sense from the application point of view. The repository serves the application needs, NOT the database needs.

Now we're getting somewhere, aren't we? THIS, not DRYness, is a repository's real advantage. And who the heck really swaps out the datastore of a mature app? Bueller? Then why abstract it?!?!!!1!

If you're not going to abstract the engine from the application, you don't use a repository. And if you want performance, you don't want to abstract the engine. Trust me.

That is, in brief, my bets are on SQL (though SQL is less important than your data persistence model -- and I'm leaving myself open to situationally microservice my way away from whatever persistence model I initially pick too), not the convoluted code overhead and repetition of Repositories.

If you're honest with yourself, you're very likely already are betting on [your code persistence model]. If you're not factoring your persistence engine into your code, you're almost certainly going to see performance problems at scale. That is, if you're "hiding implementation details like database engine or what DAO or ORM the app is using", you've already eliminated too many possibilities for optimization and made your codebase more difficult to maintain. Lose lose, man, lose lose.

Labels: , , ,

posted by ruffin at 4/03/2015 03:06:00 PM

Watching files grow to see when a process that's writing to them ends is a little like watching grass grow. But there are easier ways than ls -alF followed by arrow up, return, followed by arrow up, return, followed by arrow up, return... This is really neat -- if you add a -d for the first option (differences), you can get the changes in the command highlighted in realish time too, which is awesome.

You can use the very handy command watch

watch -n 10 "ls -ltr"

And you will get a ls every 10 seconds.

And if you add a tail -10 you will only get the 10 newest.

watch -n 10 "ls -ltr|tail -10"

So watch -d -n 10 "ls -ltr|tail -10" ftw.

Also neat was to learn a bit more about tail and how it is less "tail end of the file" as much as it is, "Put a tail on that file and tell me where it goes." Every time the file updates, bam, you get the lines that were appended. That's cool.

You can use tail command with -f :

tail -f /var/log/syslog

It's good solution for real time show.

Labels: , , ,

posted by ruffin at 4/03/2015 02:00:00 PM
Thursday, April 02, 2015

Getting the text of existing sprocs is apparently pretty easy: EXEC sp_helptext N'sp_get_composite_job_info';


I've been having trouble with ordering the results of a stored procedure, probably by putting results in a temp table. Seeing, in this case, the code to create the temp table the sproc's giving back should be useful.

Although, in my case, no dice. I ended up cheating and trivially rewriting the sproc and the sproc it called.

sp_help_job calls sp_get_composite_job_info, which ends with a statement with its own ORDER BY, which is all I was interesting in changing.

So a quick change there...

-- ...
FROM @filtered_jobs fj
LEFT OUTER JOIN msdb.dbo.sysjobs_view sjv ON (fj.job_id = sjv.job_id)
LEFT OUTER JOIN msdb.dbo.sysoperators so1 ON (sjv.notify_email_operator_id =
LEFT OUTER JOIN msdb.dbo.sysoperators so2 ON (sjv.notify_netsend_operator_id =
LEFT OUTER JOIN msdb.dbo.sysoperators so3 ON (sjv.notify_page_operator_id =
LEFT OUTER JOIN msdb.dbo.syscategories sc ON (sjv.category_id = sc.category_id)
--ORDER BY sjv.job_id

... and a quick change in `sp_help_job`...

-- Generate results set...
--EXECUTE sp_get_composite_job_info @job_id,
EXECUTE WACK_sp_get_composite_job_info @job_id,

... and I'm working. (Or I could have just overwritten spgetcompositejobinfo with one that sorted different, but that's obviously destructive, and usually A Very Bad Idea.)

Labels: , ,

posted by ruffin at 4/02/2015 11:23:00 AM
Wednesday, April 01, 2015

More fun thinking aloud about MVC architectures. After reading David Hansson on "Russian doll caching", I think I'm coming around on why you'd use entities to put together piecemeal views, though I'm not sure I'm buying yet.

I'll have to find the post again, but there was one in the links I put up yesterday that said that full page caching was caching's holy grail. Compare the full page mentality to how Hansson describes the issues of caching at serious scale:

This Russian doll approach to caching means that even when content changes, you're not going to throw out the entire cache. Only the bits you need to and then you reuse the rest of the caches that are still good.

This implicitly means that you're going to have extra cost piecing together every page, even if you're just stitching cached content, and the pseudo-formula to compare to CRUD is pretty easy to stub out. If the cost of rebuilding every cache that depends on some reusable subset of those cached views' information is greater than the cost of piecing together pages from incomplete/non-monolithic cache objects on each request, then you go with the [actually fairly conventional] "Russian doll" approach.

And this largely depends on how many of your widgets appear on more than one page where some [other] subset of the content churns.

The only way we can get complex pages to take less than 50ms is to make liberal use of caching. We went about forty miles north of liberal and ended up with THE MAX. Every stand-alone piece of content is cached in Basecamp Next. The todo item, the todo lists, the block of todo lists, and the project page that includes all of it. ... To improve the likelihood that you're always going to hit a warm cache, we're reusing the cached pieces all over the place. There's one canonical template for each piece of data and we reuse that template in every spot that piece of data could appear.

Still, it's easy enough to conceive of each of these reusable chunks as embedded views, and then you're back to where you started. Pages might be Russian dolls of views (though that's the wrong metaphor beyond expressing the Herbertian concept of "views within views within views". Once you understand views can be made up of views can be made up of views, ad infinitum, you then have to remember that any number of "dolls" can live at any level, rather than the Russian dolls' one-within-one-within-one. Perhaps your main view has five "dolls" inside of it, and those have 2, 3, 0, 1, and 0 dolls inside of them, respectively, and those have...), but then so what?

If you get to the point that one of your embedded views only takes data from one table, great. I guess the only way this is useful is if the same information appears more than once on a composite page of subviews. I still think you're often getting yourself to a specialized DTO for each view, and then you should have an equally specialized Read and mapping that populates that DTO. Unless the price of querying a cache for reused information across many views is less than the price of rebuilding each cache that would be invalidated when that information changes. And that's directly dependent on the number of pages you serve between cache churns.

That is, you can call it an entity, but I think it's more useful to call it a ViewModel. Stop mapping database tables to entities. Always read exactly what you're about to put onto the page directly from the database. That's what it's there for. Really. Smart folks are working hard to optimize your queries. I realize caching makes you think you've already got the data on hand, but your hand-rolled or, worse, ORM's automatic execution plan isn't, at some point, going to be nearly as good as stating what you need in targeted SQL sent to your real rdbms.

So, and I'm perhaps overusing Atwood's micro-optimization theater post a little, without a clear winner to the "monolithic refresh vs. stitched page composition" formula a priori, what's important to me is making the system easy to support. And then, certainly, CRUD is a heck of a lot easier than SQL>>>NHib/Caching>>>Automapper>>>ORM>>>Repository>>>MVVM.

(Worth adding that I'm unfairly equating Hansson with SQL/Cache/AutoMap/ORM/Repo/MVVM (SCAORM?) here. Totally unfair; he never says he's ORMing in these posts, afaict. I think the beef here is that he's serving modular pages, and I wonder if it's worth the extra complexity short of MAX SCALE!!1! -- and even then, when you get to displaying logically disparate information, we might be saying something similar anyhow.)

That's enough thinking aloud today. Way too many tedious box-watching style chores this week, sorry.

Labels: , , ,

posted by ruffin at 4/01/2015 12:21:00 PM
Tuesday, March 31, 2015

NOTE: Some of the links aren't the full URL in text, but they are in href for the link tag. Shortened so they wouldn't extend the blog template.

Note to self on your recent MVC recommendations: Don't forget caching/data persistence. I think caching's the only thing the conventional library-crazy MVC overhead[1] gets you "for free" that's worth saving, but it is worth saving. Our cache was used an absolute ton in the big-ish system I hacked on in the previous job.
Need to look at something like these... (is missing locking on write/null cache) (has locking)
(hello, old and uncomplicated. Note that almost all of these have magic strings of some sort, which is Wrong.) (man, that's tuned towards large, complex, multi-associative (ha) entities)
... but within the context of this...
Guessing this probably ultimately wins:
Also interesting...
... though with the second I was left wondering something similar to what this guy was...
[1] That is, the benefits of a "typical MVC stack" like I was using previously: SQL Server to NHibernate with MemCache, Repository model to access NHibernate, QueryOver, and your typical MVC/MVVM setup on the other side. I want to kill off NHibernate, QueryOver, and use of the Repository model. The only baby in the bath is caching. Though, wow, caching is easier when you have a 1-to-1 and onto relationship between Views and queries. I'm not saying that makes things smarter, but it does reduce the caching complexity.

Labels: , ,

posted by ruffin at 3/31/2015 05:41:00 PM
Monday, March 30, 2015

I've been considering using AutoMapper in what should initially be a pretty simple MVC project that I may have to spin up in the next month or two, as I've got some experience using Automapper in the project I mentioned a post or two ago. I found a pretty good refresher in this CodeProject post, but was a little surprised to find this observation from its author:

I ran my tests many times and one of the possible outputs could be:

AutoMapper: 2117
Manual Mapping: 293

It looks like manual mapping is 7 times faster than automatic. But hey, it took 2 secs to map hundred thousands of customers.

Wow, HEH-lo. Not a big deal for simple pages, but probably not something you want underpinning the architecture of an app that could grow (which is to say "any app that calls for MVC").

A little googling showed atrocious Automapper performance isn't uncommon, not at all. Interesting quote from the accepted answer to the first question:

Also you have mentioned NHibernate in your question. Make sure that your source object along with its collections is eagerly loaded from the database before passing it to the mapping layer or you cannot blame AutoMapper for being slow because when it tries to map one of the collections of your source object it hits the database because NHibernate didn't fetch this collection.

In other words, as this article entitled "Stop using AutoMapper in your Data Access Code" explains...

Whilst I am a big fan of AutoMapper and use it in most projects I work on, especially for Domain to ViewModel mapping, when in comes to data access code, AutoMapper is not so useful. To put it simply, AutoMapper only works with in memory data, not the IQueryable interface which is more typically used in DAL scenarios. In the data access layer, whether we are using Entity Framework, LINQ to SQL or NHibernate, we often use the IQueryable interface, specifiying [sic] what we want to query before the OR/M engine translates this into SQL and returns our data. If you use AutoMapper in your DAL however, you are almost certainly returning more data than you need from the database, as the mapping will not occur until AFTER the original query has executed and the data has been returned. [emphasis mine -mfn]

Obviously there are ways around this, namely making sure that the query that pulls your data only returns what you want for that specific data load, but then you're right back to my complaints about using a repository in the first place. Once you're hand-rolling optimizations, you've left the realm of reusable generic code. Stop trying to backport a square peg into a round hole.

DevTrends links to a post by Bogard that says:

For a lot of read-only scenarios, loading up a tracked, persistent entity is a bit of a waste. And unless you're doing CQRS with read-specific tables [tables meaning "Queries, views, tables, all SQL-specific" - Bogart in comments], you're doing projection somehow from the write tables.

But many LINQ query providers help with this by parsing expression trees to craft specific SQL queries projecting straight down at the SQL layer. Additionally, projecting in to these DTOs skips loading persistent, tracked entities into memory. Unfortunately, we're then forced to write our boring LHS-RHS code when we drop to this layer...

Exactly. Though I'm not absolutely sure CQRS requires different sets of tables to gain the initial important architectural improvements I'm arguing for here.

No, I didn't know what CQRS was off-hand myself. It's apparently Command Query Responsibility Segregation. It's nice to see Martin Fowler essentially arguing the point from my previous post on ending conventional repository use for reads:

The rationale is that for many problems, particularly in more complicated domains, having the same conceptual [CRUD] model for commands and queries leads to a more complex model that does neither well.
The other main benefit is in handling high performance applications. CQRS allows you to separate the load from reads and writes allowing you to scale each independently. If your application sees a big disparity between reads and writes this is very handy. Even without that, you can apply different optimization strategies to the two sides. An example of this is using different database access techniques for read and update. [emphasis added -mfn]

That said, Fowler's not quite so contra-CRUD as I am, and seems to believe there are many real-world use cases for CRUD. "So while CQRS is a pattern I'd certainly want in my toolbox, I wouldn't keep it at the top." Really? Writing a lot of APIs maybe?

I just don't see using CRUD as the best, scalable route to build even a typical MVC app.

Though Fowler seems less CRUD-y than I am too in that he quickly jumps to divorce reads from your database of record by putting them into reporting databases instead, which seems like overkill if you're doing that from the start. That is, I think Fowler sees CQRS as a second step you take if CRUD lets you down. I think you should use CRUD from the start.

Just to be clear, I'm using CRUD with a bold "R" to indicate a conventional CRUD system, and CRUD (with a struck-through "R") for what I'm proposing everyone do from the start when making an MVC app where reads are all-but-always done with custom SQL and, in the context of Automapper inefficiencies, hand-rolled mappings to DTOs.

There's also an implicit argument in Fowler that the write database would have a different model than the reporting dbms. I don't know that the extra overhead of two domains, one for write and one for read, is going to be worthwhile. I can understand the reporting server being a sort of "permanent temp table with periodic (and ad hoc) updates" setup, but you've still got to base it on the data that's on your write side.

That is, I don't see how you break out of CRUD and entities, though, again, I want that entity business logic on the database first. If you optimize reads -- through, I propose, views and sprocs and maybe/probably temp tables, or, as Fowler seems to assume, some export process to a reporting database; it doesn't matter -- fine, but you're still basing that information on the content of your "CRUD" database setup.

Fowler's "hyper CQRS" with a reporting database is interesting, but, to me, moving to one or more reporting databases is a DevOps issue that's possible to insert well down the line, once you know reads are so out of proportion to writes that you need the support of another/distributed database servers -- a much easier move to accomplish in the future than ripping out an architecture based on Repository and automapping models. That is, you don't have to decide to use a reporting server when you hit File >>> New Project. You do need to decide not to get wrapped up with repositories and automapping.

Maybe we're similar things, just with Fowler giving more emphasis for the work as a unit rather than as many entities being affected as once. Just that, in my limited experience, optimizing writes (outside of batch-like use cases, but users seem conditioned to accept that they kick off batches and get notified when they're done) is rarely your primary performance bottleneck. Reads? That's what hobbles your systems seemingly anytime you're big enough that you're making money.

Getting back to automapping... The sum from the DevTrends post, above, is pretty good.

When hitting a database, as a developer, it is important to only return the data that you need and no more. When using modern ORM's, this is typically achieved by using projections when writing IQueryable queries. Typing this type of projection code can be tiresome and some people use AutoMapper instead, believing that it achieves the same thing. Unfortunately, AutoMapper knows nothing about IQueryable and only works with in-memory data, making it less than ideal in DAL scenarios. In order for AutoMapper to do its mapping, it needs to retreive all source data from the database, resulting in much more data being returned than is necessary, reducing performance and increasing database load and network traffic.

If your app has the potential to grow -- and let's just stipulate that any MVC app does -- you want to keep an eye on performance. And the more of this overhead you integrate into your architecture -- repositories, automapping in your data access layer -- the more tech debt you're going to have once that growth happens.

Anyhow, the tasks I was waiting on are done, so enough architecture discussion. Code time.

Bottom line: KISS now or you'll have debt to pay later.

Labels: , , ,

posted by ruffin at 3/30/2015 11:12:00 AM

Apparently [a quote by Tony Hoare](, the inventor of quicksort, speaking at the "1980 Turing Award Lecture":

"There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult."

Caught that on the website. Can't say I'm a huge fan of Kompozer, and I'm hoping NVu eats less memory than having a full Seamonkey open just for the editor. The quote's a nice find.

(In other news, I can't believe NVu 1.0 is nearly ten years old and we still don't have a great replacement that tends to follow Hoare's "first method". A good Markdown editor might be as close as we've gotten.)

Labels: ,

posted by ruffin at 3/30/2015 08:58:00 AM

Support freedom
All posts can be accessed here:

Just the last year o' posts:

Powered by Blogger Curmudgeon Gamer badge