CSCI 331
Database Assignment
Disk Access

 

  1. Consider a disk with the following properties: Block size, B = 512 Bytes, number of blocks per track = 20, number of tracks per surface = 400, disks consists of 15 double sided platters. The seek time is 12 msec and the rotational speed is 5400 RPMs.

 

  1. What is the total capacity of the disk?
  2. 512B/block*20 blocks/track*400 tracks/surface*30 surfaces = 122,880,000 Bytes = 117 MB disk

    (1024*1024 = 1,048,576Bytes/MByte)

     

  3. How many cylinders are there?
  4. 400

     

  5. What is the total capacity of a cylinder?
  6. 122,880,000/400 = 307,200 Bytes

     

  7. What is the total capacity of a track?
  8.  

    307,200/30= 10240Bytes

     

  9. What is the transfer rate?
  10.  

    5400 RPM => 11.11msec/rev (60 *1000)/5400

    512*20/11.11 = 921.7B/msec

     

  11. How much time does it take to locate and transfer a single block on average?
  12.  

    btt = s+rd+bt = 12 + 11.11/2 + 11.11/20 = 18.12msec

  13. How long would it take to transfer:
  1. 20 random blocks?

20 * 18.12 = 362.4

 

  1. ii. 20 blocks on the same cylinder?

12 + (20* 11.11/2 +11.11/20) = 134.4

 

iii. 20 consecutive blocks?

12 + 5.56 + 20 * .56 = 28.76

 

  1. A file has 20,000 records of fixed length with the following fields: Name (30 bytes), SSN (9 bytes) – key field, Address (40 bytes). Phone (9 bytes), birthday (8 bytes), gender (1 byte), majordeptcode (4 bytes), classcode (4 bytes). Assume the records are stored on the disk described in problem 1 with random blocks.

 

  1. What is the record size?
  2. 105 Bytes

     

  3. What is the blocking factor?
  4. floor(512/105) = 4

     

  5. What is the number of blocks assuming unspanned organization?
  6. 20000/4 = 5000

     

  7. What is the average time to find a record using SSN if the records are stored unordered?
  8. linear search => must retrieve 5000/2 blocks => 2500 * 18.12ms = 45,300 msec = 45.3 sec (3/4 of a minute!)

     

  9. What is the average time to find a record using SSN if the records are stored ordered by SSN?
  10. Binary search => log2 of 5000 = 13 blocks => 13 * 18.12 = 235.56 msec (1/4 of a second!)

     

  11. What is the average time to find a record using SSN if the records are stored ordered by Name?

linear search => same answer as d => 45.3 sec

 

 

  1. A Parts file with Part# as the hash key includes records with the following parts values: 2369, 3760, 4692, 4871, 5659, 1821, 1074, 7115, 1620, 2428, 3943, 4750, 6975, 4981, 9208. The file uses eight buckets numbered 0 to 7. Each bucket uses 1 disk block and contains a maximum of two records. Using the hash function h (K) = K mod 8, load the records into the file in the given order. Use open addressing to handle overflow. Show how the records would be stored.

 

Bucket Records on disk block

  1. 3760 6975
  2. 2369 9208
  3. 1074
  4. 5659 7115
  5. 4692 1620
  6. 1821 2428
  7. 4750 4981
  8. 4871 3943

 

 

 

  1. Suppose we have and ordered file of fixed-length records and an unordered overflow file to handle insertion. Both files use unspanned records. Write algorithms in pseudo code for order insertion and order deletion. Note any assumptions that you make.
  2. Assumptions:

    we have a table of n pointers to the n blocks used for the ordered file and m pointers for the overflow file.

    the number of records per block is small (e.g. 5)

     

    Binary_Search( Record R)

    low = 0

    high = n

    while (low < high)

    mid = (low + high)/2

    if ( last_record_in_block(Ordered_file(mid)) < R)

    low = mid + 1

    else high = mid

    return low

    int Deletion(Record R))

    Found = false

    Block_Number = binary_Search (R)

    if (in_block(R, Ordered(Block))) //If record is found in the ordred block

    remove(R) //Remove the record

    Found = true // indicate found

    else

    i = 0

    while(!Found && i < m) //While not found and more blocks available

    if (in_block(R, Unordered(i)))

    remove(R)

    Found = true

    i++

    return found

     

     

    int Insertion(Record R)

    Block_Number = binary_Search (R) //This part is not required

    if(!Full(Ordered(Block_Number ))) // if you assume that all

    Block_insert (R, Ordered(Block_Numbered)) //insertions are done into

    else //the overflow area

    i = 0

    while(Full(Unordered(i))

    i ++

    Block_insert(R, Unordered(i))

     

     

  3. Suppose that the file in problem 2 was ordered by the key field, SSN and we want to construct a primary index on SSN. Calculate the number of blocks required to hold the index. Calculate the number of block accesses required on average to retrieve a record by SSN.
  4.  

    Each index record would need to hold a SSN (9 Bytes) and a pointer to the physical address. A physical address must identify the surface, track, and block which would take a minimum of 19 bits. Assume a 3 Byte address.

    We need 1 index record for each block (the first SSN on the block) and since we have 5000 blocks (from problem 2) we would have 5000 records. Since each block holds 512 Bytes and we use 12 Bytes per record, we can store 42 records per block and require 120 blocks to hold the index.

     

    A binary search can be done on the index to determine the block . This would take 7 accesses on average + 1 for the data access, 8 accesses in all.

     

    If instead, you assumed a table of physical address exists in memory, the index could simply indicate a block number which could be done handled with a 2 byte integer value, resulting in 11 Bytes for each record and 46 records on each block, 109 blocks for the index, which still requires 7 accesses to find the block that holds the data and 1 to retrieve the data = 8 accesses in all.

     

    Since a binary search of the data itself would take 13 accesses, the time savings would be approximately 46% while increasing our space needs by 2.4%

     

     

  5. Suppose the file in problem 2 is ordered by majordeptcode and a secondary index is constructed on SSN. Calculate the number of blocks required to hold the index. Calculate the number of block accesses required on average to retrieve a record by SSN.

 

In this case, we would need an index record for each SSN instead of each block resulting in 20,000 records in the index file. This would require 477 blocks 20,000/42)-. A binary search of the index would take 9 accesses plus 1 additional access to retrieve the data, 10 accesses in all. Without the secondary index, a linear search would need to be done on the data requiring on average 2,500 accesses.

 

In this case, we would have a 99.6% reduction in the amount of time at a cost of 10% in space!!!!