Database Management Systems - Prof. Holowczak
Zicklin School of Business - Baruch College
City University of New York
Database Management Systems
File Organization and Storage
What You'll Learn
Data Structures
| Elmasri/Navathe (3rd) ed. | Kroenke (7th ed.) | Hoffer, Prescott & McFadden (6th ed.)
|
|---|
| Chapter 5 and 6
| Appendix A
| Chapter 6 and Appx C
|
Data Storage Characteristics
- For a significant amount of data, we
require persistent, inexpensive,
reliable and sharable
storage methods with relatively
rapid access time.
- Persistent - Data persists (lives on)
after power is removed.
- Inexpensive - typically measured on
a $ per Gigabyte basis.
- Reliable - Should not have to be replaced
due to excessive errors.
- Sharable - Should facilitate sharing
of data among many users.
- Access time - Data should be
accessible in a relatively short period
of time.
Data Storage Hierarchy
| Processor Registers | 1 - 5 ns | $1000's / GB
|
| Cache memory | 15 - 30 ns | $100's / GB
|
| Main Memory (Core) | 30 - 100 ns | $10's / GB
|
| Magnetic Disk (hard disk) | 5 - 30 ms | $1's / GB
|
| Optical Disk (CD-ROM) | 50 - 100 ms | $0.10's / GB
|
| Magnetic Tape | 100's ms to seconds | less than $0.10 / GB
|
Magnetic Disk Characteristics
- We focus on magnetic disk
- Access time is the dominant cost to consider
- Access time consists of:
- Seek time - moving the disk
read/write head to the right track
- Disk Rotation time - waiting for
the disk to rotate the track under the head
- Transfer time - time to actually read
the data (blocks) from the disk and place it on the
bus for main memory.
- The goal is to minimize seek and disk rotation
delay by orienting related data on the same or
adjacent tracks.
- Block - The smallest unit of memory a disk
can read or write.
- Block Size - the size of the block.
Typically 512 bytes, 1024, 2048, ... 32 Kilobytes.
Record Storage on Disk
- Relations (records) are stored on
disk with each tuple written one
after the other (end to end).
- Blocking Factor - the number of
tuples (records) that can fit into a single
block.
- Example: EMPLOYEE takes 100 bytes to store one
tuple (record).
If the Block Size is 2,000 bytes, then we can
store 20 EMPLOYEE tuples (records) in one block.
Thus the Blocking factor is 2000/100 = 20
- f = B/R
- Fixed length records: Each record is of fixed length.
Pad with spaces, etc.
- Variable Length records: Each record is only as long
as the data it contains.
- Unspanned Records: A record is found in one and
only one block. i.e., records do not span
across block boundaries.
- Spanned Records: Records are allowed to
span across block boundaries.
File Operations
- Consider four basic File Operations:
| Operation | Similar SQL Statement
|
|---|
| Find | Select
|
| Insert | Insert
|
| Modify | Update
|
| Delete | Delete
|
- Unordered file - New record is inserted
at the end of the file.
- Insert takes constant time.
- Select, Update and Delete take n/2 time.
(n is the number of records)
- Ordered file - New record is inserted
in order, in the file.
- Insert takes log2n plus this time to
re-organize records.
- Select, Update, Delete take at least log2n
- Indexed file - New record is inserted at
the end of the file.
- An index is
maintained that points to the location on
disk where the record is found.
- Insert takes constant time for the data itself
plus log2n for the index
- Select, Update, Delete take log2n
lookup on the index followed by constant time
to access data record.
Types of Indexing
- An index is made up of two components:
A key and a pointer
- The key is typically the key value for
the relation and is mainly used to identify and
look up records.
- The pointer is an address on disk where the
rest of the data in the record can be found.
- Two types of indexes discussed here: Ordered index and
Hashing.
Ordered Index
- Records are stored as they are inserted.
- Key attribute is stored in order in the index.
Hashing
- Identify a function f that takes as input,
the key for a relation and returns, as output,
the physical disk address for the rest of the
data in the record.
- Example: Assume employee records.
Function f takes the ascii values of the
first and last name and adds them.
The numeric result is the physical address for the
record.
- Selection time is constant.
- It is possible function f can map two
different keys to the same address. In this case,
we use a series of hash buckets.
File: index.htm Date: 10:50 AM 10/25/2011
All materials Copyright, 1997-2011 Richard Holowczak