Indexing and Hashing
Indexing
Allows for the efficient location of records. An index consists of two fields:
- data value of a column in a table
- RID for a storage record (or first record of a pointer chain for the same data value)
Any column can be indexed.
Combinations of columns can be indexed.
Indexes can be heavy on processing and maintenance.
Indexes can be:
- direct
- sequential
Indexes can be indexed aka multi-level indexing.
Inserting rows into multi-levelled indexes can require complex changes.
Hashing
Computes location of a desired row from the value of some column and the size of the block.
Given a value from a column as 38 (entry number 38 in a sequential list of entries i.e. 1 to 39) and given a size of block as 13 (rows of data in the block) we can work out block number and the offset. This holds where we
column value/block size = (block number + 1) remainder offset 38/13 = (2 + 1) remainder 12 therefore block number is 3 - offset is 12
One is added to the result to avoid block 0 which is used to hold DBMS information for accessing the file.
Remember the offset is from the end of the block and is used to access the field in the footer to locate the offset from the top of the block.
In a situation where the entries may not run sequentially, this method would reserve empty rows. An alternative is to allow the DBMS to specify its own offset for the record. A disadvantage here is that a subsequent attempt to store a record may result in the targeted space being filled (a collision). The file manager will require an algorithm to place records where a collision has occurred.
To Index or to Hash
Hashing is superior to indexing as there are no overheads in traversing an index structure. However indexing is used in preference due to the difficulties in designing good hashing schemes making the best use of space.
Comments, suggestions, ideas to
Stuart Banner
