Unit hash

Classes

THash -

Functions

Types

THashFunc
THashItem

Constants

NumOfPrimes
Primes

Variables


Functions


Types


THashFunc  = function( Sender: TObject; const Key: string): longint

THashItem = record
Key : string;
Value : Pointer;
end;

Constants

NumOfPrimes = 288

class THash 1.0 Luu Tran Last revised: 8-Jul-97 This is a free component. Use it as you like, at your own risk. Please send me any bug fix and improvement you make. ------------------------ THash: Implement open hashing with separate chaining. THash works like Perl's associative array, i.e., an array indexed by a Key of type string with an associated Value of type Pointer, which means Value is anything that will fit into a Pointer type -- you just need to do the appropriate typecasting. There is a default hash size (number of buckets) of 17, so you can start adding items right away if you wish. By default, I double the hash size and rebuild it whenever number of items > 2 x number of buckets. The rebuild is expensive and best avoided by setting an appropriate hash size at the outset. As usual, there's a speed vs. space tradeoff. Note: you can turn off the automatic resize feature by setting AutoGrow to false. You can add an item or change its value by simply assigning to the hash. (Just like an array) You can delete an item by assigning NIL as its value or use the explicit .Delete method. You can also explicitly add with the Add method. The Add method does not check whether the item already exists so it is slightly faster; it's useful when you are populating the hash at the start. There is also an explicit Delete method. Examples: zip:=THash.Create; zip.Size:=100; zip[ 'Pasadena']:=Pointer( 91107); City:='Boston'; zip[ City]:=Pointer( 02935); TStudent = class Name: string; Age: integer; Major: string; end; students:=THash.Create; students.size:=967; std:=TStudent.Create; with std do begin Name:='jack'; Age:=30; Major:='cs'; end; students[std.Name]:=std; Freeing items: As with TList, you are responsible for freeing the value objects in the hash. Hence, you need to free std in the example above. Walking through the hash: You can walk through the hash with the .First and .Next method. Adding or removing items while walking through the hash yields unpredictable results. Example: var HI:THashItem; HI:=students.First; while HI.Value<>nil do begin with TStudent( HI.Value) do writeln( 'Name = '+Name+'; Age = '+IntToStr( Age)); HI:=students.Next; end; Custom hash function: THash has a simple hash function which may or may not suit you. In the latter case, you can specify your own hash function (e.g., MyHash.HashFunc:=MyHashFunc). The hash function must a) take a string Key and return a longint hash value; and b) return the same hash value for a given Key (of course). (I always take the absolute value so it doesn't matter if the hash function returns a negative number.) Needless to say, the efficiency of the hash hinges on selecting a mapping function such that the input set is distributed evenly among the range 0..MAXINT-1. Note: assigning a hash function after you have added some items to the hash will force a rebuild. Statistic properties: There are several statistic properties to help you evaluate your hash function and choice of hash sizes: Count: number of hash items. Used: number of used buckets. You want Count/Used to be as close to 1 as possible. What's more, you want this ratio to be more or less constant regardless of number of hash items. Chains: number of chains, i.e., buckets with more than 1 item. LongestChain: size of the longest chain. AvgChainSize: number of items in chains / number of chains. iCount[ B]: number of items in bucket B. Implementation notes: The hash is a TStringList (FHash). When the size is specified, I add FSize blank items ('', nil) to FHash. When there is only one item in a bucket, then no chaining is necessary and the item is simply assigned to the bucket. When a collision occurs, then I create a chain, which is another TStringList and add to the chain the item currently in the bucket plus the new item. I can tell that something in a bucket is a hash item versus a chain if its value is not nil but its key is ''. (That's why I don't allow blank keys.) This scheme saves having to create lots of chains with just 1 item in them (which should be the case if the hash is efficient). When the last item is deleted from a chain, the chain is freed and the bucket becomes available. Caveat: one side effect of this scheme is that you cannot hash a value that is indistinguishable from a nil pointer, i.e., you canNOT do something like this: value:=0; myHash['foo']:=Pointer( value); There are at least two ways to handle the above: 1) if all your values are positive, simply add 1 to value before assigning it to the hash and subtract 1 when retrieving it from the hash; or 2) put a record or class wrapper around the value. I use a const array of 288 prime numbers to determine the size of the hash table. So if you request a size of say, 600, then I use the next greatest prime number in the array as the actual size, namely, 607. If the requested size N is larger than the greatest number in the array, then N itself is used as the size. This is rather awkward, I know. If someone has a better idea, please let me know.

Primes = (17 ,31 ,127 ,233 ,353 ,467, 607 ,739 ,877 ,1019 ,1153 ,1297 , 1381 ,1523 ,1663 ,1823 ,1993 ,2131 , 2221 ,2371 ,2539 ,2689 ,2833 ,3001 , 3083 ,3259 ,3433 ,3581 ,3733 ,3911 , 4001 ,4153 ,4327 ,4507 ,4663 ,4861 , 4943 ,5099 ,5281 ,5449 ,5641 ,5801 , 5861 ,6067 ,6229 ,6373 ,6577 ,6763 , 6841 ,7001 ,7211 ,7417 ,7573 ,7727 , 7841 ,8039 ,8221 ,8389 ,8599 ,8747 , 8837 ,9013 ,9203 ,9391 ,9539 ,9739 , 9817 ,10009 ,10181 ,10357 ,10589 ,10753 , 10861 ,11069 ,11257 ,11447 ,11677 ,11839 , 11939 ,12113 ,12301 ,12491 ,12647 ,12841 , 12941 ,13121 ,13313 ,13513 ,13709 ,13883 , 13997 ,14207 ,14423 ,14621 ,14771 ,14951 , 15077 ,15263 ,15413 ,15619 ,15787 ,15973 , 16073 ,16273 ,16487 ,16693 ,16921 ,17099 , 17203 ,17393 ,17579 ,17789 ,17977 ,18149 , 18251 ,18433 ,18661 ,18911 ,19139 ,19333 , 19427 ,19577 ,19801 ,19993 ,20161 ,20359 , 20477 ,20707 ,20899 ,21089 ,21283 ,21493 , 21569 ,21757 ,21961 ,22129 ,22343 ,22549 , 22651 ,22853 ,23039 ,23209 ,23459 ,23633 , 23747 ,23911 ,24097 ,24317 ,24527 ,24781 , 24889 ,25111 ,25307 ,25523 ,25703 ,25919 , 26003 ,26209 ,26407 ,26641 ,26813 ,26993 , 27091 ,27337 ,27581 ,27773 ,27953 ,28163 , 28289 ,28513 ,28657 ,28843 ,29063 ,29269 , 29383 ,29581 ,29833 ,30071 ,30253 ,30491 , 30577 ,30809 ,31013 ,31189 ,31379 ,31607 , 31723 ,31981 ,32173 ,32359 ,32533 ,32719 , 32833 ,33029 ,33223 ,33457 ,33617 ,33809 , 33911 ,34159 ,34351 ,34543 ,34747 ,34963 , 35089 ,35311 ,35521 ,35771 ,35977 ,36187 , 36293 ,36523 ,36697 ,36887 ,37061 ,37313 , 37409 ,37579 ,37831 ,38053 ,38287 ,38557 , 38651 ,38833 ,39043 ,39229 ,39439 ,39671 , 39779 ,39979 ,40177 ,40459 ,40693 ,40879 , 40993 ,41189 ,41389 ,41609 ,41809 ,41983 , 42083 ,42293 ,42461 ,42683 ,42853 ,43063 , 43201 ,43457 ,43669 ,43943 ,44111 ,44281 , 44453 ,44647 ,44851 ,45083 ,45317 ,45541 , 45659 ,45863 ,46103 ,46327 ,46559 ,46751 , 46853 ,47123 ,47339 ,47533 ,47737 ,47947 , 48073 ,48311 ,48523 ,48751 ,48947 ,49139 , 49223 ,49451 ,49667 ,49871 ,50077 ,50287 , 50383 ,50599 ,50873 ,51131 ,51341 ,51503 )


Variables