COMP9315 19T2 Week 2 Lecture 1_哔哩哔哩_bilibili
C:\python\COMP9315-master\19T2\Lectures\weel02.pdf
COMP9315 24T1 - Course Notes (unsw.edu.au)
前面三分之一时间讲week1的练习题,是一个存储过程
COMP9315-master\19T2\Lecture Exercises\week01\ex05\schema.sql 这是一个存储过程
Week 02 Lectures
Storage Manager
Storage Management 2/68
Levels of DBMS related to storage management
... Storage Management 3/68
Aims of storage management in DBMS:
- provide view of data as collection of pages/tuples
- map from database objects (e.g. tables) to disk files
- manage transfer of data to/from disk storage
- use buffers to minimise disk/memory transfers
- interpret loaded data as tuples/records
- basis for file structures used by access methods
Views of Data in Query Evaluation
... Views of Data in Query Evaluation 5/68
Representing database objects during query execution:
- DB (handle on an authorised/opened database)
- Rel (handle on an opened relation)
- Page (memory buffer to hold contents of disk block)
- Tuple (memory holding data values from one tuple)
Addressing in DBMSs:
- PageID = FileID+Offset ... identifies a block of data
- where Offset gives location of block within file
- TupleID = PageID+Index ... identifies a single tuple
- where Index gives location of tuple within page
Storage Management 6/68
Topics in storage management ...
- Disks and Files
- performance issues and organisation of disk files
- Buffer Management
- using caching to improve DBMS system throughput
- Tuple/Page Management
- how tuples are represented within disk pages
- DB Object Management (Catalog)
- how tables/views/functions/types, etc. are represented
Storage Technology
Storage Technology 8/68
Persistent storage is
- large, cheap, relatively slow, accessed in blocks
- used for long-term storage of data
Computational storage is
- small, expensive, fast, accessed by byte/word
- used for all analysis of data
Access cost HDD:RAM ≅ 100000:1, e.g.
- 10ms to read block containing two tuples
- 1μs to compare fields in two tuples
... Storage Technology 9/68
Hard disks are well-established, cheap, high-volume, ...
Alternative bulk storage: SSD
- faster than HDDs, no latency
- can read single items
- update requires block erase then write
- over time, writes "wear out" blocks
- require controllers that spread write load
Feasible for long-term, high-update environments?
... Storage Technology 10/68
Comparison of HDD and SSD properties:
HDD SDD
Cost/byte ~ 4c / GB ~ 13c / GB
Read latency ~ 10ms ~ 50μs
Write latency ~ 10ms ~ 900μs
Read unit block (e.g. 1KB) byte
Writing write a block write on empty block
Will SSDs ever replace HDDs?
Cost Models 11/68
Throughout this course, we compare costs of DB operations
Important aspects in determining cost:
- data is always transferred to/from disk as whole blocks (pages)
- cost of manipulating tuples in memory is negligible
- overall cost determined primarily by #data-blocks read/written
Complicating factors in determining costs:
- not all page accesses require disk access (buffer pool)
- tuples typically have variable size (tuples/page ?)
More details later ...
File Management 12/68
Aims of file management subsystem:
- organise layout of data within the filesystem
- handle mapping from database ID to file address
- transfer blocks of data between buffer pool and filesystem
- also attempts to handle file access error problems (retry)
Builds higher-level operations on top of OS file operations
... File Management 13/68
Typical file operations provided by the operating system:
fd = open(fileName,mode)
// open a named file for reading/writing/appending
close(fd)
// close an open file, via its descriptor
nread = read(fd, buf, nbytes)
// attempt to read data from file into buffer
nwritten = write(fd, buf, nbytes)
// attempt to write data from buffer to file
lseek(fd, offset, seek_type)
// move file pointer to relative/absolute file offset
fsync(fd)
// flush contents of file buffers to disk
DBMS File Organisation 14/68
How is data for DB objects arranged in the file system?
Different DBMSs make different choices, e.g.
- by-pass the file system and use a raw disk partition
- have a single very large file containing all DB data
- have several large files, with tables spread across them
- have multiple data files, one for each table
- have multiple files for each table
etc.
Single-file DBMS 15/68
Consider a single file for the entire database (e.g. SQLite)
Objects are allocated to regions (segments) of the file.
If an object grows too large for allocated segment, allocate an extension.
What happens to allocated space when objects are removed?
... Single-file DBMS 16/68
Allocating space in Unix files is easy:
- simply seek to the place you want and write the data
- if nothing there already, data is appended to the file
- if something there already, it gets overwritten
If the seek goes way beyond the end of the file:
- Unix does not (yet) allocate disk space for the "hole"
- allocates disk storage only when data is written there
With the above, a disk/file manager is easy to implement
Single-file Storage Manager 17/68
Consider the following simple single-file DBMS layout:
E.g.
SpaceMap = [ (0,10,U), (10,10,U), (20,600,U), (620,100,U), (720,20,F) ]
TableMap = [ ("employee",20,500), ("project",620,40) ]
... Single-file Storage Manager 18/68
Each file segment consists of a number fixed-size blocks
The following data/constant definitions are useful
#define PAGESIZE 2048 // bytes per page
typedef long PageId; // PageId is block index
// pageOffset=PageId*PAGESIZE
typedef char *Page; // pointer to page/block buffer
Typical PAGESIZE values: 1024, 2048, 4096, 8192
... Single-file Storage Manager 19/68
Storage Manager data structures for opened DBs & Tables
typedef struct DBrec {
char *dbname; // copy of database name
int fd; // the database file
SpaceMap map; // map of free/used areas
NameTable names; // map names to areas + sizes
} *DB;
typedef struct Relrec {
char *relname; // copy of table name
int start; // page index of start of table data
int npages; // number of pages of table data
...
} *Rel;
Example: Scanning a Relation
With the above disk manager, our example:
select name from Employee
might be implemented as something like
DB db = openDatabase("myDB");
Rel r = openRelation(db,"Employee");
Page buffer = malloc(PAGESIZE*sizeof(char));
for (int i = 0; i < r->npages; i++) {
PageId pid = r->start+i;
get_page(db, pid, buffer);
for each tuple in buffer {
get tuple data and extract name
add (name) to result tuples
}
}
Single-File Storage Manager 21/68
// start using DB, buffer meta-data
DB openDatabase(char *name) {
DB db = new(struct DBrec);
db->dbname = strdup(name);
db->fd = open(name,O_RDWR);
db->map = readSpaceTable(db->fd);
db->names = readNameTable(db->fd);
return db;
}
// stop using DB and update all meta-data
void closeDatabase(DB db) {
writeSpaceTable(db->fd,db->map);
writeNameTable(db->fd,db->map);
fsync(db->fd);
close(db->fd);
free(db->dbname);
free(db);
}
... Single-File Storage Manager 22/68
// set up struct describing relation
Rel openRelation(DB db, char *rname) {
Rel r = new(struct Relrec);
r->relname = strdup(rname);
// get relation data from map tables
r->start = ...;
r->npages = ...;
return r;
}
// stop using a relation
void closeRelation(Rel r) {
free(r->relname);
free(r);
}
... Single-File Storage Manager 23/68
// assume that Page = byte[PageSize]
// assume that PageId = block number in file
// read page from file into memory buffer
void get_page(DB db, PageId p, Page buf) {
lseek(db->fd, p*PAGESIZE, SEEK_SET);
read(db->fd, buf, PAGESIZE);
}
// write page from memory buffer to file
void put_page(Db db, PageId p, Page buf) {
lseek(db->fd, p*PAGESIZE, SEEK_SET);
write(db->fd, buf, PAGESIZE);
}
... Single-File Storage Manager 24/68
Managing contents of space mapping table can be complex:
// assume an array of (offset,length,status) records
// allocate n new pages
PageId allocate_pages(int n) {
if (no existing free chunks are large enough) {
int endfile = lseek(db->fd, 0, SEEK_END);
addNewEntry(db->map, endfile, n);
} else {
grab "worst fit" chunk
split off unused section as new chunk
}
// note that file itself is not changed
}
... Single-File Storage Manager 25/68
Similar complexity for freeing chunks
// drop n pages starting from p
void deallocate_pages(PageId p, int n) {
if (no adjacent free chunks) {
markUnused(db->map, p, n);
} else {
merge adjacent free chunks
compress mapping table
}
// note that file itself is not changed
}
Changes take effect when closeDatabase() executed.
Exercise 1: Relation Scan Cost 26/68
Consider a table R(x,y,z) with 105 tuples, implemented as
number of tuples r = 10,000
average size of tuples R = 200 bytes
size of data pages B = 4096 bytes
time to read one data page Tr = 10msec
time to check one tuple 1 usec
time to form one result tuple 1 usec
time to write one result page Tr = 10msec
Calculate the total time-cost for answering the query:
insert into S select * from R where x > 10;
if 50% of the tuples satisfy the condition.