Very good, though oversimplified and not exactly chock full of programming examples, introduction to Big O notation here:

I said that the idea behind Big O notation is simple and useful. It's simple because it lets us ignore lots of details in favor of descriptions like "each conference attendee adds more work than the last" or "each one adds the same amount of work". And it's useful because it lets us answer questions like "can we actually solve this problem in a reasonable amount of time?"

Speaking of which, if you're a programmer, you probably use hashes (also known as dictionaries) all the time. Did you know that no matter how many items are in a hash, their lookup and insertion time never bogs down? It's O(1). How do they do that?

I finally learned what hashing really was last year (just out of curiosity) -- and duh, and brilliant, and duh. Hashtables take about 30 seconds to explain, if you don't read through the snippet from or, The Whale, as we like to call it, below.

Create a hashing function that produces a hash that's significantly smaller than the average length of the value being stored in your hashtable (md5 is fine, for example). Now use those as keys for your values on insertion.

So this line of text...

Whenever I find myself growing grim about the mouth; whenever it is a damp, drizzly November in my soul; whenever I find myself involuntarily pausing before coffin warehouses, and bringing up the rear of every funeral I meet; and especially whenever my hypos get such an upper hand of me, that it requires a strong moral principle to prevent me from deliberately stepping into the street, and methodically knocking people's hats off--then, I account it high time to get to sea as soon as I can. This is my substitute for pistol and ball.

... will always be found at location 0e92fa0b51b9f8879eae58ad30bc943c. ALWAYS. There's no look-up at all. Wow. Neat. And, in retrospect, duh. (Brilliance to me is always something that's so painfully obvious after you hear it that you can never again forget it -- literally life-[view-]changing on some level.)

Now by virtue of having fewer bytes in the hash than the value (ignoring that we could zip that text and have less already, etc etc), we will have collisions, where more than one value will have the same hash as another. Oh noes! How can we store the value if something's already at its index? Push it from the U[nsolved] drawer to the underused X files?

No no. If that happens in your hashtable, you simply string each content piece along in a list that you store at that key. Now your lookup time for collisions is slightly longer than straight O(1), but it's still so good it's barely worth worrying about. Read the above to know why that means it's still essentially O(1). Also now obvious: Why Hashtables don't guarantee order.

Again, hashing is brilliant. And, well, after you hear it, duh.

Labels: