Guidelines and rules for GetHashCode - Fabulous Adventures In Coding - Site Home - MSDN Blogs:

How do hash tables and similar data structures use GetHashCode?

Consider a "set" abstract data type. Though there are many operations you might want to perform on a set, the two basic ones are insert a new item into the set, and check to see whether a given item is in the set. We would like these operations to be fast even if the set is large.

I'm using SqlDbSharp to store lots of records and want a quicker way to determine if an entry is already in a table than pulling out and comparing strings.  I think storing hash codes in the table for the strings I'm worried about is smarter, especially when things get large, though I'm not absolutely sold on the idea.

Regardless, interesting reading, as pretty much always, from Lippert.

EDIT: My idea is bad.

Guideline: the integer returned by GetHashCode should never change

Ideally, the hash code of a mutable object should be computed from only fields which cannot mutate, and therefore the hash value of an object is the same for its entire lifetime.

However, this is only an ideal-situation guideline; the actual rule is:

Rule: the integer returned by GetHashCode must never change while the object is contained in a data structure that depends on the hash code remaining stable

So there's no guarantee the serialized values will be useful going forward.  Likely, but not guaranteed.  I can, of course, make my own implementation, which I might.  It's a good idea in general -- two equal values mean possible equality and different values mean "not same".  Saves a lot of time if we really push this into an index.

Or, as Lippert says about putting things into hash dependent data structures...

Now if we have ten thousand items in the set then we are looking through one of a hundred buckets, each with on average a hundred items; the Contains operation just got a hundred times cheaper.

On average.

We hope.

Labels: , ,