Wednesday, August 17, 2022

what is Hashing and Types of hashing?

 Hashing:   

Hashing is yet another method used for making retrieval faster. It provides direct access to record on the basis of the value of a specific field called the hash_field. Here, when a new record is inserted, it is physically stored at an address which is computed by applying a mathematical function (hash function) to the value of the hash field. Thus, for every new record, hash address=f (hash_field), where f is the hash function.

 Later, when a record is to be retrieved, the same hash function is used to compute the address where the record is stored. Retrievals are faster since a direct access is provided and there is no search involved in the process.

 As example of the typical hash function is given by a numeric hash field, say an id, modulus a very large prime number. As hashing relates a field value to the address of the record, multiple hash fields will map a record to multiple addresses at the same time. Hence, there can be only one hash field per file.

 Example: Consider the example of the Student table. Let Stuid be the hash field and the hash function be defined as ((Stuid mod 10000)*64+1025). The records with Stuid 10001, 10002, 10003 etc. will be stored at addresses 1089, 1153, 1217 etc. respectively. 

Types of hashing: 

There are two types of hashing: 

 i) Static hashing 

 ii) Dynamic hashing

i) Static Hashing has the number of fixed primary pages in the directory. Thus, when a bucket is full, we need an overflow bucket to store any additional records that hash to the full bucket. This can be done with a link to an overflow page, or a linked list of overflow pages. The linked list can be separate for each bucket, or the same for all buckets that overflow. When searching for a record, the original bucket is accessed first, then the overflow buckets. Provided there are many keys that hash to the same bucket, locating a record may require accessing multiple pages on disk, which greatly degrades performance.

ii) Dynamic hashing: The problem of lengthy searching of overflow buckets is solved by Dynamic Hashing. In Dynamic Hashing the size of the directory grows with the number of collisions to accommodate new records and avoid long overflow page chains. Extendible and Linear Hashing are two dynamic hashing techniques.


Indexing: Indexing is another common method for making retrievals faster. Consider the example of CUSTOMER table. The following query is based on Customer's city. “Retrieve the records of all customers who reside in Delhi" Here a sequential search on the CUSTOMER table has to be carried out and all records with the value 'Delhi' in the Cust_City field have to be retrieved. 

The time taken for this operation depends on the number of pages to be accessed. If the records are randomly stored, the page accesses depends on the volume of data. If the records are stored physically together, the number of pages depends on the size of each record alsoIf such queries based on Cust_City field are very frequent in the application, steps can be taken to improve the performance of these queries. Creating an Index on Cust_City is one such method. This results in the scenario as shown below.





A new index file is created. The number of in file is same as that of the data file. The index file has two fields in each record. One field contains the value of the Cust_City field and the second contains a pointer to the actual data record in the CUSTOMER table.


Whenever a query based on Cust_City field occurs, a search is carried out on the Index file. Here, it is to be noted that this search will be much faster than a sequential search in the CUSTOMER table, if the records are stored physically together. This is because of the much smaller size of the index record due to which each page will be able to contain number of records.


When the records with value 'Delhi' in the Cust_City field in the index file are located, the pointer in the second field of the records can be followed to directly retrieve the corresponding CUSTOMER records.


Thus the access involves a Sequential access on the index ne and a Direct access on the actual data file.

Retrieval Speed v/s Update Speed : Though indexes help making retrievals faster, they slow down updates on the table since updates on the base table demand update on the index field as well.


It is possible to create an index with multiple fields i.e., index on field combinations. Multiple indexes can also be created on the same table simultaneously though there may be a limit on the maximum number of indexes that can be created on a table.

 

No comments:

Post a Comment