CSCI 331
Database Assignment
Disk Access
512B/block*20 blocks/track*400 tracks/surface*30 surfaces = 122,880,000 Bytes = 117 MB disk
(1024*1024 = 1,048,576Bytes/MByte)
400
122,880,000/400 = 307,200 Bytes
307,200/30= 10240Bytes
5400 RPM => 11.11msec/rev (60 *1000)/5400
512*20/11.11 = 921.7B/msec
btt = s+rd+bt = 12 + 11.11/2 + 11.11/20 = 18.12msec
20 * 18.12 = 362.4
12 + (20* 11.11/2 +11.11/20) = 134.4
iii. 20 consecutive blocks?
12 + 5.56 + 20 * .56 = 28.76
105 Bytes
floor(512/105) = 4
20000/4 = 5000
linear search => must retrieve 5000/2 blocks => 2500 * 18.12ms = 45,300 msec = 45.3 sec (3/4 of a minute!)
Binary search => log2 of 5000 = 13 blocks => 13 * 18.12 = 235.56 msec (1/4 of a second!)
linear search => same answer as d => 45.3 sec
Bucket Records on disk block
Assumptions:
we have a table of n pointers to the n blocks used for the ordered file and m pointers for the overflow file.
the number of records per block is small (e.g. 5)
Binary_Search( Record R)
low = 0
high = n
while (low < high)
mid = (low + high)/2
if ( last_record_in_block(Ordered_file(mid)) < R)
low = mid + 1
else high = mid
return low
int Deletion(Record R))
Found = false
Block_Number = binary_Search (R)
if (in_block(R, Ordered(Block))) //If record is found in the ordred block
remove(R) //Remove the record
Found = true // indicate found
else
i = 0
while(!Found && i < m) //While not found and more blocks available
if (in_block(R, Unordered(i)))
remove(R)
Found = true
i++
return found
int Insertion(Record R)
Block_Number = binary_Search (R) //This part is not required
if(!Full(Ordered(Block_Number ))) // if you assume that all
Block_insert (R, Ordered(Block_Numbered)) //insertions are done into
else //the overflow area
i = 0
while(Full(Unordered(i))
i ++
Block_insert(R, Unordered(i))
Each index record would need to hold a SSN (9 Bytes) and a pointer to the physical address. A physical address must identify the surface, track, and block which would take a minimum of 19 bits. Assume a 3 Byte address.
We need 1 index record for each block (the first SSN on the block) and since we have 5000 blocks (from problem 2) we would have 5000 records. Since each block holds 512 Bytes and we use 12 Bytes per record, we can store 42 records per block and require 120 blocks to hold the index.
A binary search can be done on the index to determine the block . This would take 7 accesses on average + 1 for the data access, 8 accesses in all.
If instead, you assumed a table of physical address exists in memory, the index could simply indicate a block number which could be done handled with a 2 byte integer value, resulting in 11 Bytes for each record and 46 records on each block, 109 blocks for the index, which still requires 7 accesses to find the block that holds the data and 1 to retrieve the data = 8 accesses in all.
Since a binary search of the data itself would take 13 accesses, the time savings would be approximately 46% while increasing our space needs by 2.4%
In this case, we would need an index record for each SSN instead of each block resulting in 20,000 records in the index file. This would require 477 blocks 20,000/42)-. A binary search of the index would take 9 accesses plus 1 additional access to retrieve the data, 10 accesses in all. Without the secondary index, a linear search would need to be done on the data requiring on average 2,500 accesses.
In this case, we would have a 99.6% reduction in the amount of time at a cost of 10% in space!!!!