MIT6.830-2022-lab1实验思路详细讲解

news2025/1/16 18:46:29

文章目录

  • 前言
  • 一、6.830/Lab1 Start
  • 二、Exercise
    • 2.1、Exercise1:Fields and Tuples
    • 2.2、Exercise2:Catalog
    • 2.3、Exercise3:BufferPool
    • 2.4、Exercise4:HeapFile access method
    • 2.5、Exercise5:HeapFile
    • 2.6、Exercise6:Operators
    • 2.7、A simple query
  • 三、总结


前言

最近在从公司回家后一直在看《MYSQL时怎样运行的》这本书,但是纸上得来终觉浅..为此在某个周末提前开始了,之前一直想写的lab,也就是6.830这一门关于database的这门课。


一、6.830/Lab1 Start

课程地址
lab1地址

课程语言使用的是java,因此笔者直接使用了idea作为lab的开发ide,debug断点调试啥的都比较方便。

对于lab1其实是简单实现的是一个简易数据库中关于访问磁盘数据的一些核心模块。

In the lab assignments in 6.5830/6.5831 you will write a basic database management system called SimpleDB. For this lab, you will focus on implementing the core modules required to access stored data on disk;

而对于具体Exercise的要实现的主要是:Fields 、Tuple、TupleDesc、Catalog、BufferPool、HeapFile等主要概念,这些会在后面进行进一步说明。

二、Exercise

2.1、Exercise1:Fields and Tuples

Tuples in SimpleDB are quite basic. They consist of a collection of Field objects, one per field in the Tuple. Field is an interface that different data types (e.g., integer, string) implement. Tuple objects are created by the underlying access methods (e.g., heap files, or B-trees), as described in the next section. Tuples also have a type (or schema), called a tuple descriptor, represented by a TupleDesc object. This object consists of a collection of Type objects, one per field in the tuple, each of which describes the type of the corresponding field.

简单来说:对于一个Tuples可以理解为就是表中的一条记录,而fields则是Tuples中这条记录的具体各个字段内容。而TupleDesc则是各个字段对应的schema描述信息

  • Tuple Class:

/**
 * Tuple maintains information about the contents of a tuple. Tuples have a
 * specified schema specified by a TupleDesc object and contain Field objects
 * with the data for each field.
 */
public class Tuple implements Serializable {

    private static final long serialVersionUID = 1L;

    /**
     *  类似于元组的schema信息
     */
    TupleDesc td;

    /**
     * 代表元组在disk的位置
     */
    RecordId rid;

    /**
     * 标示这一条记录的所有字段
     */
    CopyOnWriteArrayList<Field> fields;


    /**
     * Create a new tuple with the specified schema (type).
     *
     * @param td the schema of this tuple. It must be a valid TupleDesc
     *           instance with at least one field.
     */
    public Tuple(TupleDesc td) {
        // some code goes here
        this.td = td;
        this.fields = new CopyOnWriteArrayList<>();
    }

    /**
     * @return The TupleDesc representing the schema of this tuple.
     */
    public TupleDesc getTupleDesc() {
        // some code goes here
        return td;
    }

    /**
     * @return The RecordId representing the location of this tuple on disk. May
     *         be null.
     */
    public RecordId getRecordId() {
        // some code goes here
        return rid;
    }

    /**
     * Set the RecordId information for this tuple.
     *
     * @param rid the new RecordId for this tuple.
     */
    public void setRecordId(RecordId rid) {
        // some code goes here
        this.rid = rid;
    }

    /**
     * Change the value of the ith field of this tuple.
     *
     * @param i index of the field to change. It must be a valid index.
     * @param f new value for the field.
     */
    public void setField(int i, Field f) {
        // some code goes here
        if(i >= 0 && i < fields.size()){
            fields.set(i,f);
        } else if (i == fields.size()) {
            fields.add(f);
        }


    }

    /**
     * @param i field index to return. Must be a valid index.
     * @return the value of the ith field, or null if it has not been set.
     */
    public Field getField(int i) {
        // some code goes here
        if(fields == null || i >= fields.size()){
            return null;
        }
        return fields.get(i);
    }

    /**
     * Returns the contents of this Tuple as a string. Note that to pass the
     * system tests, the format needs to be as follows:
     * <p>
     * column1\tcolumn2\tcolumn3\t...\tcolumnN
     * <p>
     * where \t is any whitespace (except a newline)
     */
    public String toString() {
        // some code goes here
        StringBuilder stringBuilder = new StringBuilder();
        Iterator<TupleDesc.TDItem> tdItems = this.td.iterator();
        int i = 0;
        while (tdItems.hasNext()) {
            TupleDesc.TDItem item = tdItems.next();
            stringBuilder.append("FiledName: ").append(item.fieldName);
            stringBuilder.append("==> Value: ").append(fields.get(i).toString());
            stringBuilder.append("\n");
            i++;
        }
        return stringBuilder.toString();
    }

    /**
     * @return An iterator which iterates over all the fields of this tuple
     */
    public Iterator<Field> fields() {
        // some code goes here
        return fields.iterator();
    }

    /**
     * reset the TupleDesc of this tuple (only affecting the TupleDesc)
     */
    public void resetTupleDesc(TupleDesc td) {
        // some code goes here
        this.td = td;
    }

    @Override
    public boolean equals(Object obj) {
        if (!(obj instanceof  Tuple))
        {
            return false;
        }

        Tuple other = (Tuple) obj;
        if (this.rid.equals(other.getRecordId()) &&
                this.td.equals(other.getTupleDesc())) {
            for (int i = 0; i < this.fields.size(); i++) {
                if (!this.fields.get(i).equals(other.getField(i))) {
                    return false;
                }
            }
            return true;
        }

        return false;
    }
}

  • TupleDesc Class:
/**
 * TupleDesc describes the schema of a tuple.
 */
public class TupleDesc implements Serializable ,Cloneable{

    /**
     * 表的别名
     */
    private String tableAlias;

    CopyOnWriteArrayList<TDItem> tdItems;

    public CopyOnWriteArrayList<TDItem> getTdItems() {
        return tdItems;
    }

    @Override
    public TupleDesc clone() {
        try {
            return (TupleDesc) super.clone();
        } catch (CloneNotSupportedException e) {
            throw new AssertionError();
        }
    }

    /**
     * A help class to facilitate organizing the information of each field
     */

    public static class TDItem implements Serializable {

        private static final long serialVersionUID = 1L;


        /**
         * The type of the field
         */
        public final Type fieldType;

        /**
         * The name of the field
         */
        public final String fieldName;

        public TDItem(Type t, String n) {
            this.fieldName = n;
            this.fieldType = t;
        }



        public String toString() {
            return fieldName + "(" + fieldType + ")";
        }
    }

    /**
     * @return An iterator which iterates over all the field TDItems
     *         that are included in this TupleDesc
     */
    public Iterator<TDItem> iterator() {
        // some code goes here
        if(tdItems == null){
            return null;
        }
        return tdItems.iterator();
    }

    private static final long serialVersionUID = 1L;

    /**
     * Create a new TupleDesc with typeAr.length fields with fields of the
     * specified types, with associated named fields.
     *
     * @param typeAr  array specifying the number of and types of fields in this
     *                TupleDesc. It must contain at least one entry.
     * @param fieldAr array specifying the names of the fields. Note that names may
     *                be null.
     */
    public TupleDesc(Type[] typeAr, String[] fieldAr) {
        // some code goes here
        tdItems = new CopyOnWriteArrayList<>();
        for (int i = 0; i < typeAr.length; i++) {
            tdItems.add(new TDItem(typeAr[i],fieldAr[i]));
        }
    }

    /**
     * Constructor. Create a new tuple desc with typeAr.length fields with
     * fields of the specified types, with anonymous (unnamed) fields.
     *
     * @param typeAr array specifying the number of and types of fields in this
     *               TupleDesc. It must contain at least one entry.
     */
    public TupleDesc(Type[] typeAr) {
        // some code goes here
        tdItems = new CopyOnWriteArrayList<>();
        for (int i = 0; i < typeAr.length; i++) {
            tdItems.add(new TDItem(typeAr[i],null));
        }

    }
    public TupleDesc(){
        tdItems = new CopyOnWriteArrayList<>();
    }

    /**
     * @return the number of fields in this TupleDesc
     */
    public int numFields() {
        // some code goes here
        return tdItems.size();
    }

    /**
     * Gets the (possibly null) field name of the ith field of this TupleDesc.
     *
     * @param i index of the field name to return. It must be a valid index.
     * @return the name of the ith field
     * @throws NoSuchElementException if i is not a valid field reference.
     */
    public String getFieldName(int i) throws NoSuchElementException {
        if(tableAlias != null){
            return tableAlias+"."+tdItems.get(i).fieldName;
        }
        // some code goes here
        return tdItems.get(i).fieldName;
    }

    /**
     * Gets the type of the ith field of this TupleDesc.
     *
     * @param i The index of the field to get the type of. It must be a valid
     *          index.
     * @return the type of the ith field
     * @throws NoSuchElementException if i is not a valid field reference.
     */
    public Type getFieldType(int i) throws NoSuchElementException {
        // some code goes here
        if(i < 0 || i >= tdItems.size()){
            throw new NoSuchElementException("i is not a valid field reference.");
        }
        return tdItems.get(i).fieldType;

    }

    /**
     * Find the index of the field with a given name.
     *
     * @param name name of the field.
     * @return the index of the field that is first to have the given name.
     * @throws NoSuchElementException if no field with a matching name is found.
     */
    public int indexForFieldName(String name) throws NoSuchElementException {
        // some code goes here
        if(name == null){
            throw new NoSuchElementException("no field with a matching name is found.");
        }
        for (int i = 0; i < tdItems.size(); i++) {
            if(name.equals(tdItems.get(i).fieldName)){
                return i;
            }
        }
        throw new NoSuchElementException("no field with a matching name is found.");
    }

    /**
     * @return The size (in bytes) of tuples corresponding to this TupleDesc.
     *         Note that tuples from a given TupleDesc are of a fixed size.
     */
    public int getSize() {
        // TODO: some code goes here
        int size = 0;
        for (TDItem item : tdItems) {
            size += item.fieldType.getLen();
        }
        return size;
    }

    /**
     * Merge two TupleDescs into one, with td1.numFields + td2.numFields fields,
     * with the first td1.numFields coming from td1 and the remaining from td2.
     *
     * @param td1 The TupleDesc with the first fields of the new TupleDesc
     * @param td2 The TupleDesc with the last fields of the TupleDesc
     * @return the new TupleDesc
     */
    public static TupleDesc merge(TupleDesc td1, TupleDesc td2) {
        // some code goes here
        if (td1 == null) {
            return td2;
        }

        if (td2 == null) {
            return td1;
        }
        TupleDesc tupleDesc = new TupleDesc();
        for(int i = 0 ; i < td1.numFields() ; i++){
            tupleDesc.tdItems.add(td1.tdItems.get(i));
        }
        for (int i = 0; i < td2.numFields(); i++) {
            tupleDesc.tdItems.add(td2.tdItems.get(i));
        }
        return tupleDesc;
    }

    /**
     * Compares the specified object with this TupleDesc for equality. Two
     * TupleDescs are considered equal if they have the same number of items
     * and if the i-th type in this TupleDesc is equal to the i-th type in o
     * for every i.
     *
     * @param o the Object to be compared for equality with this TupleDesc.
     * @return true if the object is equal to this TupleDesc.
     */

    public boolean equals(Object o) {
        if (!(o instanceof TupleDesc)) {
            return false;
        }

        TupleDesc other = (TupleDesc) o;
        if (other.getSize() != this.getSize() || other.numFields() != this.numFields()) {
            return false;
        }

        for (int i = 0; i < this.numFields(); i++) {
            if (!this.getFieldType(i).equals(other.getFieldType(i))) {
                return false;
            }
        }

        return true;
    }

    public int hashCode() {
        // If you want to use TupleDesc as keys for HashMap, implement this so
        // that equal objects have equals hashCode() results
        int result = 0;
        for (TDItem item : tdItems) {
            result += item.toString().hashCode() * 41 ;
        }
        return result;
    }

    /**
     * Returns a String describing this descriptor. It should be of the form
     * "fieldType[0](fieldName[0]), ..., fieldType[M](fieldName[M])", although
     * the exact format does not matter.
     *
     * @return String describing this descriptor.
     */
    public String toString() {
        // some code goes here
        StringBuilder stringBuilder = new StringBuilder();
        for (int i = 0; i < this.numFields(); i++) {
            TDItem tdItem =  tdItems.get(i);
            stringBuilder.append("(")
                    .append(tdItem.fieldType.toString())
                    .append("[").append(i).append("]")
                    .append("(").append(tdItem.fieldName)
                    .append("[").append(i).append("])");
        }
        return stringBuilder.toString();
    }

    public void setTableAlias(String tableAlias){
        this.tableAlias = tableAlias;
    }
}
  • 测试结果:
    在这里插入图片描述
    在这里插入图片描述

2.2、Exercise2:Catalog

The catalog (class Catalog in SimpleDB) consists of a list of the tables and schemas of the tables that are currently in the database. You will need to support the ability to add a new table, as well as getting information about a particular table. Associated with each table is a TupleDesc object that allows operators to determine the types and number of fields in a table.
The global catalog is a single instance of Catalog that is allocated for the entire SimpleDB process. The global catalog can be retrieved via the method Database.getCatalog(), and the same goes for the global buffer pool ( using Database.getBufferPool()).

数据库中管理表的入口,类似于“数据库”的概念,但更多的是存储各个表的与其id或者name的映射,以便更好的去访问这些文件(表),也因此叫做目录。

  • Catalog Class:
/**
 * The Catalog keeps track of all available tables in the database and their
 * associated schemas.
 * For now, this is a stub catalog that must be populated with tables by a
 * user program before it can be used -- eventually, this should be converted
 * to a catalog that reads a catalog table from disk.
 *
 * @Threadsafe
 */
public class Catalog {

    ConcurrentHashMap<Integer,Table> tableIdMap;

    ConcurrentHashMap<String,Integer> tableNameMap;


    /**
     * Constructor.
     * Creates a new, empty catalog.
     */
    public Catalog() {
        // some code goes here
        tableIdMap = new ConcurrentHashMap<>();
        tableNameMap = new ConcurrentHashMap<>();
    }

    /**
     * Add a new table to the catalog.
     * This table's contents are stored in the specified DbFile.
     *
     * @param file      the contents of the table to add;  file.getId() is the identfier of
     *                  this file/tupledesc param for the calls getTupleDesc and getFile
     * @param name      the name of the table -- may be an empty string.  May not be null.  If a name
     *                  conflict exists, use the last table to be added as the table for a given name.
     * @param pkeyField the name of the primary key field
     */
    public void addTable(DbFile file, String name, String pkeyField) {
        // some code goes here
        tableIdMap.put(file.getId(),new Table(file,name,pkeyField));
        tableNameMap.put(name,file.getId());
    }

    public void addTable(DbFile file, String name) {
        addTable(file, name, "");
    }

    /**
     * Add a new table to the catalog.
     * This table has tuples formatted using the specified TupleDesc and its
     * contents are stored in the specified DbFile.
     *
     * @param file the contents of the table to add;  file.getId() is the identfier of
     *             this file/tupledesc param for the calls getTupleDesc and getFile
     */
    public void addTable(DbFile file) {
        addTable(file, (UUID.randomUUID()).toString());
    }

    /**
     * Return the id of the table with a specified name,
     *
     * @throws NoSuchElementException if the table doesn't exist
     */
    public int getTableId(String name) throws NoSuchElementException {
        // some code goes here

        if(name != null && tableNameMap.containsKey(name)){
            return tableNameMap.get(name);
        }
        throw new NoSuchElementException("table doesn't exist!");

    }

    /**
     * Returns the tuple descriptor (schema) of the specified table
     *
     * @param tableid The id of the table, as specified by the DbFile.getId()
     *                function passed to addTable
     * @throws NoSuchElementException if the table doesn't exist
     */
    public TupleDesc getTupleDesc(int tableid) throws NoSuchElementException {
        // some code goes here
        if(tableIdMap.containsKey(tableid)){
            return tableIdMap.get(tableid).getFile().getTupleDesc();
        }
        throw new NoSuchElementException("table doesn't exist!");

    }

    /**
     * Returns the DbFile that can be used to read the contents of the
     * specified table.
     *
     * @param tableid The id of the table, as specified by the DbFile.getId()
     *                function passed to addTable
     */
    public DbFile getDatabaseFile(int tableid) throws NoSuchElementException {
        // some code goes here
        if(tableIdMap.containsKey(tableid)){
            return tableIdMap.get(tableid).getFile();
        }
        throw new NoSuchElementException("table doesn't exist!");

    }

    public String getPrimaryKey(int tableid) {
        // some code goes here
        if(tableIdMap.containsKey(tableid)){
            return tableIdMap.get(tableid).getPkeyField();
        }
        throw new NoSuchElementException("table doesn't exist!");
    }

    public Iterator<Integer> tableIdIterator() {
        // some code goes here
        return tableIdMap.keySet().iterator();
    }

    public String getTableName(int id) {
        // some code goes here
        if(tableIdMap.containsKey(id)){
            return tableIdMap.get(id).getName();
        }
        throw new NoSuchElementException("table doesn't exist!");
    }

    /**
     * Delete all tables from the catalog
     */
    public void clear() {
        // some code goes here
        tableIdMap.clear();
    }

    /**
     * Reads the schema from a file and creates the appropriate tables in the database.
     *
     * @param catalogFile
     */
    public void loadSchema(String catalogFile) {
        String line = "";
        String baseFolder = new File(new File(catalogFile).getAbsolutePath()).getParent();
        try {
            BufferedReader br = new BufferedReader(new FileReader(catalogFile));

            while ((line = br.readLine()) != null) {
                //assume line is of the format name (field type, field type, ...)
                String name = line.substring(0, line.indexOf("(")).trim();
                //System.out.println("TABLE NAME: " + name);
                String fields = line.substring(line.indexOf("(") + 1, line.indexOf(")")).trim();
                String[] els = fields.split(",");
                List<String> names = new ArrayList<>();
                List<Type> types = new ArrayList<>();
                String primaryKey = "";
                for (String e : els) {
                    String[] els2 = e.trim().split(" ");
                    names.add(els2[0].trim());
                    if (els2[1].trim().equalsIgnoreCase("int"))
                        types.add(Type.INT_TYPE);
                    else if (els2[1].trim().equalsIgnoreCase("string"))
                        types.add(Type.STRING_TYPE);
                    else {
                        System.out.println("Unknown type " + els2[1]);
                        System.exit(0);
                    }
                    if (els2.length == 3) {
                        if (els2[2].trim().equals("pk"))
                            primaryKey = els2[0].trim();
                        else {
                            System.out.println("Unknown annotation " + els2[2]);
                            System.exit(0);
                        }
                    }
                }
                Type[] typeAr = types.toArray(new Type[0]);
                String[] namesAr = names.toArray(new String[0]);
                TupleDesc t = new TupleDesc(typeAr, namesAr);
                HeapFile tabHf = new HeapFile(new File(baseFolder + "/" + name + ".dat"), t);
                addTable(tabHf, name, primaryKey);
                System.out.println("Added table : " + name + " with schema " + t);
            }
        } catch (IOException e) {
            e.printStackTrace();
            System.exit(0);
        } catch (IndexOutOfBoundsException e) {
            System.out.println("Invalid catalog entry : " + line);
            System.exit(0);
        }
    }
}

这里自己定义了一个Table Class:

@Data
public class Table {

    /**
     * the contents of the table to add;  file.getId() is the identfier of
     * this file/tupledesc param for the calls getTupleDesc and getFile
     */
    private DbFile file;

    /**
     * the name of the table -- may be an empty string.  May not be null.  If a name
     * conflict exists, use the last table to be added as the table for a given name.
     */
    private String name;

    /**
     * the name of the primary key field
     */
    private String pkeyField;

    public Table(DbFile file,String name,String pkeyField){
        this.file = file;
        this.name = name;
        this.pkeyField = pkeyField;
    }

    public Table(DbFile file,String name){
        new Table(file,name,"");
    }


}
  • 测试结果:
    在这里插入图片描述

2.3、Exercise3:BufferPool

The buffer pool (class BufferPool in SimpleDB) is responsible for caching pages in memory that have been recently read from disk. All operators read and write pages from various files on disk through the buffer pool. It consists of a fixed number of pages, defined by the numPages parameter to the BufferPool constructor. In later labs, you will implement an eviction policy. For this lab, you only need to implement the constructor and the BufferPool.getPage() method used by the SeqScan operator. The BufferPool should store up to numPages pages. For this lab, if more than numPages requests are made for different pages, then instead of implementing an eviction policy, you may throw a DbException. In future labs you will be required to implement an eviction policy.
The Database class provides a static method, Database.getBufferPool(), that returns a reference to the single BufferPool instance for the entire SimpleDB process.

虽然数据存储在磁盘中,但是也不能每次都从磁盘中读数据,这样性能是极差的。也因此为了提升查询性能,后续数据从磁盘中取出后,缓存在内存中,下次查询同样的数据的时候,直接从内存中读取、处理。而这读取的数据,则被分为若干的页,以页作为磁盘和内存之间的交互的基本单位。(对于InnoDB也是如此,并且页的基本单位为16KB)。也就是下一个Exercise要实现的内容:堆页

  • BufferPool Class
/**
 * BufferPool manages the reading and writing of pages into memory from
 * disk. Access methods call into it to retrieve pages, and it fetches
 * pages from the appropriate location.
 * <p>
 * The BufferPool is also responsible for locking;  when a transaction fetches
 * a page, BufferPool checks that the transaction has the appropriate
 * locks to read/write the page.
 *
 * @Threadsafe, all fields are final
 */
public class BufferPool {
    /**
     * Bytes per page, including header.
     */
    private static final int DEFAULT_PAGE_SIZE = 4096;

    private static int pageSize = DEFAULT_PAGE_SIZE;

    /**
     * Default number of pages passed to the constructor. This is used by
     * other classes. BufferPool should use the numPages argument to the
     * constructor instead.
     */
    public static final int DEFAULT_PAGES = 50;

    private final int numPages;
    private final Map<PageId, Page> bufferPool = new ConcurrentHashMap<>();
    /**
     * Creates a BufferPool that caches up to numPages pages.
     *
     * @param numPages maximum number of pages in this buffer pool.
     */
    public BufferPool(int numPages) {
        // some code goes here
        this.numPages = numPages;
    }

    public static int getPageSize() {
        return pageSize;
    }

    // THIS FUNCTION SHOULD ONLY BE USED FOR TESTING!!
    public static void setPageSize(int pageSize) {
        BufferPool.pageSize = pageSize;
    }

    // THIS FUNCTION SHOULD ONLY BE USED FOR TESTING!!
    public static void resetPageSize() {
        BufferPool.pageSize = DEFAULT_PAGE_SIZE;
    }

    /**
     * Retrieve the specified page with the associated permissions.
     * Will acquire a lock and may block if that lock is held by another
     * transaction.
     * <p>
     * The retrieved page should be looked up in the buffer pool.  If it
     * is present, it should be returned.  If it is not present, it should
     * be added to the buffer pool and returned.  If there is insufficient
     * space in the buffer pool, a page should be evicted and the new page
     * should be added in its place.
     *
     * @param tid  the ID of the transaction requesting the page
     * @param pid  the ID of the requested page
     * @param perm the requested permissions on the page
     */
    public Page getPage(TransactionId tid, PageId pid, Permissions perm)
            throws TransactionAbortedException, DbException {
        // some code goes here
        // TODO 事务..
        // bufferPool应直接放在直接内存
       if(!bufferPool.containsKey(pid)){
           DbFile file = Database.getCatalog().getDatabaseFile(pid.getTableId());
           Page page = file.readPage(pid);
           bufferPool.put(pid,page);
        }
       return bufferPool.get(pid);

    }

    /**
     * Releases the lock on a page.
     * Calling this is very risky, and may result in wrong behavior. Think hard
     * about who needs to call this and why, and why they can run the risk of
     * calling it.
     *
     * @param tid the ID of the transaction requesting the unlock
     * @param pid the ID of the page to unlock
     */
    public void unsafeReleasePage(TransactionId tid, PageId pid) {
        // TODO: some code goes here
        // not necessary for lab1|lab2
    }

    /**
     * Release all locks associated with a given transaction.
     *
     * @param tid the ID of the transaction requesting the unlock
     */
    public void transactionComplete(TransactionId tid) {
        // TODO: some code goes here
        // not necessary for lab1|lab2
    }

    /**
     * Return true if the specified transaction has a lock on the specified page
     */
    public boolean holdsLock(TransactionId tid, PageId p) {
        // TODO: some code goes here
        // not necessary for lab1|lab2
        return false;
    }

    /**
     * Commit or abort a given transaction; release all locks associated to
     * the transaction.
     *
     * @param tid    the ID of the transaction requesting the unlock
     * @param commit a flag indicating whether we should commit or abort
     */
    public void transactionComplete(TransactionId tid, boolean commit) {
        // TODO: some code goes here
        // not necessary for lab1|lab2
    }

    /**
     * Add a tuple to the specified table on behalf of transaction tid.  Will
     * acquire a write lock on the page the tuple is added to and any other
     * pages that are updated (Lock acquisition is not needed for lab2).
     * May block if the lock(s) cannot be acquired.
     * <p>
     * Marks any pages that were dirtied by the operation as dirty by calling
     * their markDirty bit, and adds versions of any pages that have
     * been dirtied to the cache (replacing any existing versions of those pages) so
     * that future requests see up-to-date pages.
     *
     * @param tid     the transaction adding the tuple
     * @param tableId the table to add the tuple to
     * @param t       the tuple to add
     */
    public void insertTuple(TransactionId tid, int tableId, Tuple t)
            throws DbException, IOException, TransactionAbortedException {
        // TODO: some code goes here
        // not necessary for lab1
    }

    /**
     * Remove the specified tuple from the buffer pool.
     * Will acquire a write lock on the page the tuple is removed from and any
     * other pages that are updated. May block if the lock(s) cannot be acquired.
     * <p>
     * Marks any pages that were dirtied by the operation as dirty by calling
     * their markDirty bit, and adds versions of any pages that have
     * been dirtied to the cache (replacing any existing versions of those pages) so
     * that future requests see up-to-date pages.
     *
     * @param tid the transaction deleting the tuple.
     * @param t   the tuple to delete
     */
    public void deleteTuple(TransactionId tid, Tuple t)
            throws DbException, IOException, TransactionAbortedException {
        // TODO: some code goes here
        // not necessary for lab1
    }

    /**
     * Flush all dirty pages to disk.
     * NB: Be careful using this routine -- it writes dirty data to disk so will
     * break simpledb if running in NO STEAL mode.
     */
    public synchronized void flushAllPages() throws IOException {
        // TODO: some code goes here
        // not necessary for lab1

    }

    /**
     * Remove the specific page id from the buffer pool.
     * Needed by the recovery manager to ensure that the
     * buffer pool doesn't keep a rolled back page in its
     * cache.
     * <p>
     * Also used by B+ tree files to ensure that deleted pages
     * are removed from the cache so they can be reused safely
     */
    public synchronized void removePage(PageId pid) {
        // TODO: some code goes here
        // not necessary for lab1
    }

    /**
     * Flushes a certain page to disk
     *
     * @param pid an ID indicating the page to flush
     */
    private synchronized void flushPage(PageId pid) throws IOException {
        // TODO: some code goes here
        // not necessary for lab1
    }

    /**
     * Write all pages of the specified transaction to disk.
     */
    public synchronized void flushPages(TransactionId tid) throws IOException {
        // TODO: some code goes here
        // not necessary for lab1|lab2
    }

    /**
     * Discards a page from the buffer pool.
     * Flushes the page to disk to ensure dirty pages are updated on disk.
     */
    private synchronized void evictPage() throws DbException {
        // TODO: some code goes here
        // not necessary for lab1
    }

}

2.4、Exercise4:HeapFile access method

Access methods provide a way to read or write data from disk that is arranged in a specific way. Common access methods include heap files (unsorted files of tuples) and B-trees; for this assignment, you will only implement a heap file access method, and we have written some of the code for you.
A HeapFile object is arranged into a set of pages, each of which consists of a fixed number of bytes for storing tuples, (defined by the constant BufferPool.DEFAULT_PAGE_SIZE), including a header. In SimpleDB, there is one HeapFile object for each table in the database. Each page in a HeapFile is arranged as a set of slots, each of which can hold one tuple (tuples for a given table in SimpleDB are all of the same size). In addition to these slots, each page has a header that consists of a bitmap with one bit per tuple slot. If the bit corresponding to a particular tuple is 1, it indicates that the tuple is valid; if it is 0, the tuple is invalid (e.g., has been deleted or was never initialized.) Pages of HeapFile objects are of type HeapPage which implements the Page interface. Pages are stored in the buffer pool but are read and written by the HeapFile class.

  • 简单来说,对于堆页:内部有多个槽的存储空间,每个槽中可以储存一个元组。同时还有一个head的bitmap位图,每个位上用来判断每个槽是否已经被存储了一个元组。标题为Heapile access method,其实可以看出来堆页储存的就是下一个Exercise堆文件要实现的具体数据。

回到lab中,HeapPage这里还有要注意的可能就是TuplesNum、Header的计算。introduce中也给出了具体的计算公式:

  • tuples per page = floor((page size * 8) / (tuple size * 8 + 1))
  • header bytes = ceiling(tuples per page / 8)

只要记住这里的计算都要考虑到bit、与byte的换算就没太大问题。

  • HeapPageId Class:
    这个类主要也是储存HeapFile与HeaPage的映射关系。每个tableId,都应该有多个pgNo也就是多个页,也就是一对多的关系。
/**
 * Unique identifier for HeapPage objects.
 */
public class HeapPageId implements PageId {

    /**
     * The table that is being referenced
     */
    private int tableId;

    /**
     * The page number in that table.
     */
    private int pgNo;
    /**
     * Constructor. Create a page id structure for a specific page of a
     * specific table.
     * 理解为是表id(HeapPageId)与其子页的关系映射的数据结构
     * 如表1-1 表1-2...
     * @param tableId The table that is being referenced
     * @param pgNo    The page number in that table.
     */
    public HeapPageId(int tableId, int pgNo) {
        //  some code goes here
        this.tableId = tableId;
        this.pgNo = pgNo;
    }

    /**
     * @return the table associated with this PageId
     */
    public int getTableId() {
        // TODO: some code goes here
        return tableId;
    }

    /**
     * @return the page number in the table getTableId() associated with
     *         this PageId
     */
    public int getPageNumber() {
        // TODO: some code goes here
        return pgNo;
    }

    public void setTableId(int tableId){
        this.tableId = tableId;
    }

    public void setPgNo(int pgNo){
        this.pgNo = pgNo;
    }

    /**
     * @return a hash code for this page, represented by a combination of
     *         the table number and the page number (needed if a PageId is used as a
     *         key in a hash table in the BufferPool, for example.)
     * @see BufferPool
     */
    public int hashCode() {
        // TODO: some code goes here
        // throw new UnsupportedOperationException("implement this");
        return getPageNumber()*1000 + getTableId();
    }

    /**
     * Compares one PageId to another.
     *
     * @param o The object to compare against (must be a PageId)
     * @return true if the objects are equal (e.g., page numbers and table
     *         ids are the same)
     */
    public boolean equals(Object o) {
        // TODO: some code goes here
        if (!(o instanceof  HeapPageId)) { return false; }

        HeapPageId other = (HeapPageId) o;
        if (this.getPageNumber() == other.getPageNumber() && this.getTableId() == other.getTableId()) {
            return true;
        }
        return false;
    }

    /**
     * Return a representation of this object as an array of
     * integers, for writing to disk.  Size of returned array must contain
     * number of integers that corresponds to number of args to one of the
     * constructors.
     */
    public int[] serialize() {
        int[] data = new int[2];

        data[0] = getTableId();
        data[1] = getPageNumber();

        return data;
    }

}
  • HeapPage Class:
/**
 * Each instance of HeapPage stores data for one page of HeapFiles and
 * implements the Page interface that is used by BufferPool.
 *
 * @see HeapFile
 * @see BufferPool
 */
public class HeapPage implements Page {

    final HeapPageId pid;

    final TupleDesc td;

    /**
             * the lowest bit of the first byte represents whether or not the first slot in the page is in use.
     * The second lowest bit of the first byte represents whether or not the second slot in the page is in use
     */
    final byte[] header;
    final Tuple[] tuples;

    /**
     * Each page in a HeapFile is arranged as a set of slots, each of which can hold one tuple (tuples for a given table
     * in SimpleDB are all of the same size)
     */
    final int numSlots;

    byte[] oldData;
    private final Byte oldDataLock = (byte) 0;

    /**
     * Create a HeapPage from a set of bytes of data read from disk.
     * The format of a HeapPage is a set of header bytes indicating
     * the slots of the page that are in use, some number of tuple slots.
     * Specifically, the number of tuples is equal to: <p>
     * floor((BufferPool.getPageSize()*8) / (tuple size * 8 + 1))
     * <p> where tuple size is the size of tuples in this
     * database table, which can be determined via {@link Catalog#getTupleDesc}.
     * The number of 8-bit header words is equal to:
     * <p>
     * ceiling(no. tuple slots / 8)
     * <p>
     *
     * @see Database#getCatalog
     * @see Catalog#getTupleDesc
     * @see BufferPool#getPageSize()
     */
    public HeapPage(HeapPageId id, byte[] data) throws IOException {
        this.pid = id;
        this.td = Database.getCatalog().getTupleDesc(id.getTableId());
        this.numSlots = getNumTuples();
        DataInputStream dis = new DataInputStream(new ByteArrayInputStream(data));

        // allocate and read the header slots of this page
        header = new byte[getHeaderSize()];
        for (int i = 0; i < header.length; i++)
            header[i] = dis.readByte();

        tuples = new Tuple[numSlots];
        try {
            // allocate and read the actual records of this page
            for (int i = 0; i < tuples.length; i++)
                tuples[i] = readNextTuple(dis, i);
        } catch (NoSuchElementException e) {
            e.printStackTrace();
        }
        dis.close();

        setBeforeImage();
    }

    /**
     * Retrieve the number of tuples on this page.
     *
     * @return the number of tuples on this page
     */
    private int getNumTuples() {
        // some code goes here
        return (BufferPool.getPageSize() * 8) / (td.getSize() * 8 +1);

    }

    /**
     * Computes the number of bytes in the header of a page in a HeapFile with each tuple occupying tupleSize bytes
     *
     * @return the number of bytes in the header of a page in a HeapFile with each tuple occupying tupleSize bytes
     */
    private int getHeaderSize() {
        // some code goes here
        return (int) Math.ceil((double) getNumTuples() / 8);

    }

    /**
     * Return a view of this page before it was modified
     * -- used by recovery
     */
    public HeapPage getBeforeImage() {
        try {
            byte[] oldDataRef = null;
            synchronized (oldDataLock) {
                oldDataRef = oldData;
            }
            return new HeapPage(pid, oldDataRef);
        } catch (IOException e) {
            e.printStackTrace();
            //should never happen -- we parsed it OK before!
            System.exit(1);
        }
        return null;
    }

    public void setBeforeImage() {
        synchronized (oldDataLock) {
            oldData = getPageData().clone();
        }
    }

    /**
     * @return the PageId associated with this page.
     */
    public HeapPageId getId() {
        // some code goes here
        return this.pid;
    }

    /**
     * Suck up tuples from the source file.
     */
    private Tuple readNextTuple(DataInputStream dis, int slotId) throws NoSuchElementException {
        // if associated bit is not set, read forward to the next tuple, and
        // return null.
        if (!isSlotUsed(slotId)) {
            for (int i = 0; i < td.getSize(); i++) {
                try {
                    dis.readByte();
                } catch (IOException e) {
                    System.out.println("slotId:"+slotId+" is empty;");
                    throw new NoSuchElementException("error reading empty tuple");
                }
            }
            return null;
        }

        // read fields in the tuple
        Tuple t = new Tuple(td);
        RecordId rid = new RecordId(pid, slotId);
        t.setRecordId(rid);
        try {
            for (int j = 0; j < td.numFields(); j++) {
                Field f = td.getFieldType(j).parse(dis);
                t.setField(j, f);
            }
        } catch (java.text.ParseException e) {
            e.printStackTrace();
            throw new NoSuchElementException("parsing error!");
        }

        return t;
    }

    /**
     * Generates a byte array representing the contents of this page.
     * Used to serialize this page to disk.
     * <p>
     * The invariant here is that it should be possible to pass the byte
     * array generated by getPageData to the HeapPage constructor and
     * have it produce an identical HeapPage object.
     *
     * @return A byte array correspond to the bytes of this page.
     * @see #HeapPage
     */
    public byte[] getPageData() {
        int len = BufferPool.getPageSize();
        ByteArrayOutputStream baos = new ByteArrayOutputStream(len);
        DataOutputStream dos = new DataOutputStream(baos);

        // create the header of the page
        for (byte b : header) {
            try {
                dos.writeByte(b);
            } catch (IOException e) {
                // this really shouldn't happen
                e.printStackTrace();
            }
        }

        // create the tuples
        for (int i = 0; i < tuples.length; i++) {

            // empty slot
            if (!isSlotUsed(i)) {
                for (int j = 0; j < td.getSize(); j++) {
                    try {
                        dos.writeByte(0);
                    } catch (IOException e) {
                        e.printStackTrace();
                    }

                }
                continue;
            }

            // non-empty slot
            for (int j = 0; j < td.numFields(); j++) {
                Field f = tuples[i].getField(j);
                try {
                    f.serialize(dos);

                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }

        // padding
        int zerolen = BufferPool.getPageSize() - (header.length + td.getSize() * tuples.length); //- numSlots * td.getSize();
        byte[] zeroes = new byte[zerolen];
        try {
            dos.write(zeroes, 0, zerolen);
        } catch (IOException e) {
            e.printStackTrace();
        }

        try {
            dos.flush();
        } catch (IOException e) {
            e.printStackTrace();
        }

        return baos.toByteArray();
    }

    /**
     * Static method to generate a byte array corresponding to an empty
     * HeapPage.
     * Used to add new, empty pages to the file. Passing the results of
     * this method to the HeapPage constructor will create a HeapPage with
     * no valid tuples in it.
     *
     * @return The returned ByteArray.
     */
    public static byte[] createEmptyPageData() {
        int len = BufferPool.getPageSize();
        return new byte[len]; //all 0
    }

    /**
     * Delete the specified tuple from the page; the corresponding header bit should be updated to reflect
     * that it is no longer stored on any page.
     *
     * @param t The tuple to delete
     * @throws DbException if this tuple is not on this page, or tuple slot is
     *                     already empty.
     */
    public void deleteTuple(Tuple t) throws DbException {
        // TODO: some code goes here
        // not necessary for lab1
    }

    /**
     * Adds the specified tuple to the page;  the tuple should be updated to reflect
     * that it is now stored on this page.
     *
     * @param t The tuple to add.
     * @throws DbException if the page is full (no empty slots) or tupledesc
     *                     is mismatch.
     */
    public void insertTuple(Tuple t) throws DbException {
        // TODO: some code goes here
        // not necessary for lab1
    }

    /**
     * Marks this page as dirty/not dirty and record that transaction
     * that did the dirtying
     */
    public void markDirty(boolean dirty, TransactionId tid) {
        // TODO: some code goes here
        // not necessary for lab1
    }

    /**
     * Returns the tid of the transaction that last dirtied this page, or null if the page is not dirty
     */
    public TransactionId isDirty() {
        // TODO: some code goes here
        // Not necessary for lab1
        return null;      
    }

    /**
     * Returns the number of unused (i.e., empty) slots on this page.
     * 计算未使用的slot:计算header数组中bit为0
     */
    public int getNumUnusedSlots() {
        // some code goes here
        int cnt = 0;
        for(int i=0;i<numSlots;++i){
            if(!isSlotUsed(i)){
                ++cnt;
            }
        }
        return cnt;

    }

    /**
     * Returns true if associated slot on this page is filled.
     */
    public boolean isSlotUsed(int i) {
        // some code goes here
        // 计算在header中的位置
        int iTh = i / 8;
        // 计算具体在bitmap中的位置
        int bitTh = i % 8;
        int onBit = (header[iTh] >> bitTh) & 1;
        return onBit == 1;
    }

    /**
     * Abstraction to fill or clear a slot on this page.
     */
    private void markSlotUsed(int i, boolean value) {
        // TODO: some code goes here
        // not necessary for lab1
    }

    /**
     * @return an iterator over all tuples on this page (calling remove on this iterator throws an UnsupportedOperationException)
     *         (note that this iterator shouldn't return tuples in empty slots!)
     */
    public Iterator<Tuple> iterator() {
        // some code goes here
        List<Tuple> tupleList = new ArrayList<>();
        // 判断是否在empty slots
        for(int i = 0; i < numSlots ;i++){
            if(isSlotUsed(i)) tupleList.add(tuples[i]);
        }
        return tupleList.iterator();
    }

}

  • RecordId Class:
/**
 * A RecordId is a reference to a specific tuple on a specific page of a
 * specific table.
 */
public class RecordId implements Serializable {

    private static final long serialVersionUID = 1L;

    /**
     * the pageid of the page on which the tuple resides
     */
    private PageId pid;

    /**
     * the tuple number within the page.
     */
    private int tupleNo ;

    /**
     * Creates a new RecordId referring to the specified PageId and tuple
     * number.
     *
     * @param pid     the pageid of the page on which the tuple resides
     * @param tupleno the tuple number within the page.
     */
    public RecordId(PageId pid, int tupleno) {
        // TODO: some code goes here
        this.pid = pid;
        this.tupleNo = tupleno;
    }

    /**
     * @return the tuple number this RecordId references.
     */
    public int getTupleNumber() {
        // TODO: some code goes here
        return tupleNo;
    }

    /**
     * @return the page id this RecordId references.
     */
    public PageId getPageId() {
        // TODO: some code goes here
        return pid;
    }

    /**
     * Two RecordId objects are considered equal if they represent the same
     * tuple.
     *
     * @return True if this and o represent the same tuple
     */
    @Override
    public boolean equals(Object o) {
        // TODO: some code goes here
        //throw new UnsupportedOperationException("implement this");
        if (!(o instanceof  RecordId))
        {
            return false;
        }

        RecordId other = (RecordId) o;
        if (this.pid.equals(other.getPageId()) && this.getTupleNumber() == other.getTupleNumber()) {
            return true;
        } else {
            return false;
        }
    }

    /**
     * You should implement the hashCode() so that two equal RecordId instances
     * (with respect to equals()) have the same hashCode().
     *
     * @return An int that is the same for equal RecordId objects.
     */
    @Override
    public int hashCode() {
        // TODO: some code goes here
        return this.pid.getTableId()*100+this.pid.getPageNumber()*10+this.tupleNo;

    }

}

  • 测试结果:
    在这里插入图片描述
    在这里插入图片描述
    在这里插入图片描述

2.5、Exercise5:HeapFile

To read a page from disk, you will first need to calculate the correct offset in the file. Hint: you will need random access to the file in order to read and write pages at arbitrary offsets. You should not call BufferPool instance methods when reading a page from disk.

  • 简单来说,堆文件是DBFile的一种实现,储存在磁盘上属于随机存储的,也因此如果被读取到BufferPool上,这个文件则被存储BufferPool的多个页上
  • 这里还有一个点就是,对于一个DBFile就是就是一个table,而上文的catalog的table其实就是对应的这里的文件id。(具体见DBFile的doc),也因此每个文件都有一个TupleDesc对应着表的schema。
    这里笔者先画一个图总结一下大致关系:
    在这里插入图片描述
    回到lab中,因为是随机存储的,所以就要去计算文件的offset(偏移量),也就是数据位于具体的哪一页中。
  • 而这里要注意的一个细节则是之前BufferPool缓存的应该是完整的页数据。(就如同innoDB中小于16KB的数据同样不会被加载到内存中耗费资源,每次从内存中刷进磁盘的数据基本是大于16KB)。
/**
 * HeapFile is an implementation of a DbFile that stores a collection of tuples
 * in no particular order. Tuples are stored on pages, each of which is a fixed
 * size, and the file is simply a collection of those pages. HeapFile works
 * closely with HeapPage. The format of HeapPages is described in the HeapPage
 * constructor.
 *
 * @author Sam Madden
 * @see HeapPage#HeapPage
 */
public class HeapFile implements DbFile {

    /**
     * f the file that stores the on-disk backing store for this heap file.
     */
    private final File f;

    /**
     * 文件描述(consist of records)
     */
    private final TupleDesc td;


    /**
     * 写在内部类的原因是:DbFileIterator is the iterator interface that all SimpleDB Dbfile should
     */
    private static final class HeapFileIterator implements DbFileIterator {
        private final HeapFile heapFile;
        private final TransactionId tid;

        /**
         * 存储了堆文件迭代器
         */
        private Iterator<Tuple> tupleIterator;
        private int index;

        public HeapFileIterator(HeapFile file,TransactionId tid){
            this.heapFile = file;
            this.tid = tid;
        }
        @Override
        public void open() throws DbException, TransactionAbortedException {
            index = 0;
            tupleIterator = getTupleIterator(index);
        }

        private Iterator<Tuple> getTupleIterator(int pageNumber) throws TransactionAbortedException, DbException{
            if(pageNumber >= 0 && pageNumber < heapFile.numPages()){
                HeapPageId pid = new HeapPageId(heapFile.getId(),pageNumber);
                HeapPage page = (HeapPage)Database.getBufferPool().getPage(tid, pid, Permissions.READ_ONLY);
                return page.iterator();
            }else{
                throw new DbException(String.format("heapFile %d  does not exist in page[%d]!", pageNumber,heapFile.getId()));
            }
        }

        @Override
        public boolean hasNext() throws DbException, TransactionAbortedException {
            // TODO Auto-generated method stub
            if(tupleIterator == null){
                return false;
            }

            if(tupleIterator.hasNext()){
                return true;
            }else{
                if(index < (heapFile.numPages()-1)){
                    index++;
                    tupleIterator = getTupleIterator(index);
                    return tupleIterator.hasNext();
                }else{
                    return false;
                }
            }
        }

        @Override
        public Tuple next() throws DbException, TransactionAbortedException, NoSuchElementException {
            if(tupleIterator == null || !tupleIterator.hasNext()){
                throw new NoSuchElementException();
            }
            return tupleIterator.next();
        }

        @Override
        public void rewind() throws DbException, TransactionAbortedException {
            close();
            open();
        }

        @Override
        public void close() {
            tupleIterator = null;
        }

    }

    /**
     * Constructs a heap file backed by the specified file.
     *
     * @param f the file that stores the on-disk backing store for this heap
     *          file.
     */
    public HeapFile(File f, TupleDesc td) {
        // some code goes here
        this.f = f;
        this.td = td;
    }

    /**
     * Returns the File backing this HeapFile on disk.
     *
     * @return the File backing this HeapFile on disk.
     */
    public File getFile() {
        // some code goes here
        return f;
    }

    /**
     * Returns an ID uniquely identifying this HeapFile. Implementation note:
     * you will need to generate this tableid somewhere to ensure that each
     * HeapFile has a "unique id," and that you always return the same value for
     * a particular HeapFile. We suggest hashing the absolute file name of the
     * file underlying the heapfile, i.e. f.getAbsoluteFile().hashCode().
     *
     * @return an ID uniquely identifying this HeapFile.
     */
    public int getId() {
        // some code goes here
        return f.getAbsoluteFile().hashCode();
    }

    /**
     * Returns the TupleDesc of the table stored in this DbFile.
     *
     * @return TupleDesc of this DbFile.
     */
    public TupleDesc getTupleDesc() {
        // some code goes here
        return this.td;
    }

    // see DbFile.java for javadocs
    public Page readPage(PageId pid) {
        // some code goes here
        int tableId = pid.getTableId();
        int pgNo = pid.getPageNumber();
        int offset = pgNo * BufferPool.getPageSize();
        RandomAccessFile randomAccessFile = null;

        try{
            randomAccessFile = new RandomAccessFile(f,"r");
            // 起码有pgNo页那么大小就应该大于pgNo
            if((long) (pgNo + 1) *BufferPool.getPageSize() > randomAccessFile.length()){
                randomAccessFile.close();
                throw new IllegalArgumentException(String.format("table %d page %d is invalid", tableId, pgNo));
            }
            byte[] bytes = new byte[BufferPool.getPageSize()];
            // 移动偏移量到文件开头,并计算是否change
            randomAccessFile.seek(offset);
            int read = randomAccessFile.read(bytes,0,BufferPool.getPageSize());
            // Do not load the entire table into memory on the open() call
            // -- this will cause an out of memory error for very large tables.
            if(read != BufferPool.getPageSize()){
                throw new IllegalArgumentException(String.format("table %d page %d read %d bytes not equal to BufferPool.getPageSize() ", tableId, pgNo, read));
            }
            HeapPageId id = new HeapPageId(pid.getTableId(),pid.getPageNumber());
            return new HeapPage(id,bytes);
        }catch (IOException e){
            e.printStackTrace();
        }finally {
            try{
                if(randomAccessFile != null){
                    randomAccessFile.close();
                }
            }catch (Exception e){
                e.printStackTrace();
            }
        }
        throw new IllegalArgumentException(String.format("table %d page %d is invalid", tableId, pgNo));
    }

    // see DbFile.java for javadocs
    public void writePage(Page page) throws IOException {
        // TODO: some code goes here
        // not necessary for lab1
    }

    /**
     * Returns the number of pages in this HeapFile.
     */
    public int numPages() {
        // some code goes here
        // 通过文件长度算出所在bufferPool所需的页数
        System.out.println(String.format("file length :「 %d 」,BufferPool PageSize :「 %d 」 ",getFile().length(),BufferPool.getPageSize()));
        return (int) Math.floor(getFile().length() * 1.0 / BufferPool.getPageSize());
    }

    // see DbFile.java for javadocs
    public List<Page> insertTuple(TransactionId tid, Tuple t)
            throws DbException, IOException, TransactionAbortedException {
        // TODO: some code goes here
        return null;
        // not necessary for lab1
    }

    // see DbFile.java for javadocs
    public List<Page> deleteTuple(TransactionId tid, Tuple t) throws DbException,
            TransactionAbortedException {
        // TODO: some code goes here
        return null;
        // not necessary for lab1
    }

    // see DbFile.java for javadocs

    /**
     * HeapFile与Heap为一一对应的关系,所以其实是获取BufferPool中对应页的元组迭代器
     */
    public DbFileIterator iterator(TransactionId tid) {
        //some code goes here
        return new HeapFileIterator(this,tid);

    }

}

  • 测试结果:
    在这里插入图片描述

2.6、Exercise6:Operators

Operators are responsible for the actual execution of the query plan. They implement the operations of the relational algebra. In SimpleDB, operators are iterator based; each operator implements the DbIterator interface.
Operators are connected together into a plan by passing lower-level operators into the constructors of higher-level operators, i.e., by “chaining them together.” Special access method operators at the leaves of the plan are responsible for reading data from the disk (and hence do not have any operators below them).

这里的Operators其实更像于后端是怎么进行具体的查询,也就是query plan。在本次Exercise实现的更像是包装了一个扫描数据的入口,也就是又一层对外开放的迭代器。

  • SeqScan Class:
/**
 * SeqScan is an implementation of a sequential scan access method that reads
 * each tuple of a table in no particular order (e.g., as they are laid out on
 * disk).
 */
public class SeqScan implements OpIterator {

    private TransactionId tid;
    private int tableId;
    private String tableAlias;
    private DbFile dbFile;
    private DbFileIterator dbFileIterator;

    /**
     * Creates a sequential scan over the specified table as a part of the
     * specified transaction.
     *
     * @param tid        The transaction this scan is running as a part of.
     * @param tableid    the table to scan.
     * @param tableAlias the alias of this table (needed by the parser); the returned
     *                   tupleDesc should have fields with name tableAlias.fieldName
     *                   (note: this class is not responsible for handling a case where
     *                   tableAlias or fieldName are null. It shouldn't crash if they
     *                   are, but the resulting name can be null.fieldName,
     *                   tableAlias.null, or null.null).
     */
    public SeqScan(TransactionId tid, int tableid, String tableAlias) {
        this.tid = tid;
        this.tableId = tableid;
        this.tableAlias = tableAlias;
        this.dbFile = Database.getCatalog().getDatabaseFile(tableid);
        this.dbFileIterator = dbFile.iterator(tid);
    }

    public SeqScan(TransactionId tid, int tableId) {
        this(tid, tableId, Database.getCatalog().getTableName(tableId));
    }

    /**
     * @return return the table name of the table the operator scans. This should
     * be the actual name of the table in the catalog of the database
     */
    public String getTableName() {
        return Database.getCatalog().getTableName(this.tableId);
    }

    /**
     * @return Return the alias of the table this operator scans.
     */
    public String getAlias() {
        return this.tableAlias;
    }

    /**
     * Reset the tableid, and tableAlias of this operator.
     *
     * @param tableid    the table to scan.
     * @param tableAlias the alias of this table (needed by the parser); the returned
     *                   tupleDesc should have fields with name tableAlias.fieldName
     *                   (note: this class is not responsible for handling a case where
     *                   tableAlias or fieldName are null. It shouldn't crash if they
     *                   are, but the resulting name can be null.fieldName,
     *                   tableAlias.null, or null.null).
     */
    public void reset(int tableid, String tableAlias) {
        this.tableId = tableid;
        this.tableAlias = tableAlias;
    }


    @Override
    public void open() throws DbException, TransactionAbortedException {
        dbFileIterator.open();
    }

    /**
     * Returns the TupleDesc with field names from the underlying HeapFile,
     * prefixed with the tableAlias string from the constructor. This prefix
     * becomes useful when joining tables containing a field(s) with the same
     * name.  The alias and name should be separated with a "." character
     * (e.g., "alias.fieldName").
     *
     * @return the TupleDesc with field names from the underlying HeapFile,
     * prefixed with the tableAlias string from the constructor.
     */
    @Override
    public TupleDesc getTupleDesc() {
        return Database.getCatalog().getTupleDesc(this.tableId);
    }

    @Override
    public boolean hasNext() throws TransactionAbortedException, DbException {
        return dbFileIterator.hasNext();
    }

    @Override
    public Tuple next() throws NoSuchElementException,
            TransactionAbortedException, DbException {
        return dbFileIterator.next();
    }

    @Override
    public void close() {
        dbFileIterator.close();
    }

    @Override
    public void rewind() throws DbException, NoSuchElementException,
            TransactionAbortedException {
        dbFileIterator.rewind();
    }
}
  • Test结果:
    在这里插入图片描述

2.7、A simple query

这里的A simple query就是为了将前面的各个组件进行组合起来。先创建一个some_data_file.txt
的文件,然后利用SimpleDb.class将其编译为二进制的dat文件。最后再利用lab中提供的代码熟悉整个流程并测试dat文件。

package simpledb;
import java.io.*;

public class test {

    public static void main(String[] argv) {

        // construct a 3-column table schema
        Type types[] = new Type[]{ Type.INT_TYPE, Type.INT_TYPE, Type.INT_TYPE };
        String names[] = new String[]{ "field0", "field1", "field2" };
        TupleDesc descriptor = new TupleDesc(types, names);

        // create the table, associate it with some_data_file.dat
        // and tell the catalog about the schema of this table.
        HeapFile table1 = new HeapFile(new File("some_data_file.dat"), descriptor);
        Database.getCatalog().addTable(table1, "test");

        // construct the query: we use a simple SeqScan, which spoonfeeds
        // tuples via its iterator.
        TransactionId tid = new TransactionId();
        SeqScan f = new SeqScan(tid, table1.getId());

        try {
            // and run it
            f.open();
            while (f.hasNext()) {
                Tuple tup = f.next();
                System.out.println(tup);
            }
            f.close();
            Database.getBufferPool().transactionComplete(tid);
        } catch (Exception e) {
            System.out.println ("Exception : " + e);
        }
    }

}
  • test结果:

这边再扩展一下正常sqlite的流程:
在这里插入图片描述

  • 如图所示,其中Tokenizer,Parser,Code Generator都属于front-end(前端部分)。输入一个sql query能被解析成为一个能被VM解析的byte文件。(本质上是可以在数据库上运行的编译文件)。对于本次lab1其实也是直接尽可能略过了front-end部分,专注于后端存储部分。
    而对于 back-end 后面几个部分简单介绍下(引用:SQLite Database System: Design and Implementation.)
  • The virtual machine takes bytecode generated by the front-end as instructions. It can then perform operations on one or more tables or indexes, each of which is stored in a data structure called a B-tree. The VM is essentially a big switch statement on the type of bytecode instruction.
  • Each B-tree consists of many nodes. Each node is one page in length. The B-tree can retrieve a page from disk or save it back to disk by issuing commands to the pager.
  • The pager receives commands to read or write pages of data. It is responsible for reading/writing at appropriate offsets in the database file. It also keeps a cache of recently-accessed pages in memory, and determines when those pages need to be written back to disk.
  • The os interface is the layer that differs depending on which operating system sqlite was compiled for.

三、总结

写这次的Lab1感觉就稍微轻松一些了,可能也是因为笔者还是对java更熟悉些,而对于lab中的一些点,其实反复看看doc的注释,和introduction或者直接深入看test源码,通过测试还是比较容易。具体的bug点就不提了(其实是笔者断断续续写了一周,总结梳理的时候忘了个大概orz…)后面的Lab也会在实习空闲时继续完善。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/23991.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

DJ12-2-2 算术运算指令

目录 1. 加法指令 &#xff08;1&#xff09;ADD 指令 &#xff08;2&#xff09;ADC 指令 &#xff08;3&#xff09;INC 指令 2. 减法指令 &#xff08;1&#xff09;SUB 指令 &#xff08;2&#xff09;SBB 指令 &#xff08;3&#xff09;DEC 指令 &#xff08;4&…

Alluxio 2.9新版发布 | 重塑架构,支持大规模多租户环境

/ Alluxio宣布正式发布数据编排平台2.9版本 / Alluxio 2.9 版本的主要新增功能包括&#xff1a; 新增跨环境集群同步功能、增强Alluxio在Kubernetes上的可管理性、提高S3 API 安全性和用户体验 2022年11月17日&#xff0c;全球首创的开源数据编排软件开发商Alluxio宣布正式发…

1.2 分布

测度理论 (Durrett) 第五版 个人笔记 答案 Durrett高等概率论教材 (Probability) 攻读概率及统计/机器学习应用方向博士学位. Measure TheoryProbability SpacesDistributionsRandom VariablesIntegrationProperties of the IntegralExpected ValueProduct Measures, Fubini’…

[计算机毕业设计]机器视觉指纹识别图像识别

前言 &#x1f4c5;大四是整个大学期间最忙碌的时光,一边要忙着准备考研,考公,考教资或者实习为毕业后面临的就业升学做准备,一边要为毕业设计耗费大量精力。近几年各个学校要求的毕设项目越来越难,有不少课题是研究生级别难度的,对本科同学来说是充满挑战。为帮助大家顺利通过…

2022Q3运动户外行业数据分析(高增长概念解读)

本篇我们将继续来分析22年Q3季度中&#xff0c;运动户外行业的高增长概念。 在运动户外行业中&#xff0c;我们发现了3个高增长品类&#xff0c;分别是&#xff1a;泳镜、运动休闲鞋、运动包。 一、游泳用品高增长概念——泳镜 在功能上区分&#xff0c;泳镜可以分为竞速泳镜、…

十秒钟搞懂linux的软硬链接细节图解和目录结构文件的基本命令

秒懂linux链接图解一&#xff0c;软硬链接的分析1&#xff0c;软链接的图解&#xff1a;2&#xff0c;硬链接的图解3&#xff0c;软硬链接的区别4,目录创建软链接的语法格式二&#xff0c;linxu的根目录结构示意图和各部分的功能&#xff0c;可以根据单词部分记忆&#xff08;部…

SDN环境搭建(超详细)

文章目录前言一. 安装VMware workstation二、Ubuntn安装三、Mininet安装四、RYU安装五、WireShark安装六、在Mininet中启动可视化界面常见问题总结写在后面前言 最近在做SDN这方面的实验&#xff0c;在这里记录一下自己的学习过程和踩过的坑。 具体环境&#xff1a; VMware-wo…

RepGhost

轻量级的CNN模块重参数化技术构建硬件高效的 Ghost 模块,通过结构重新参数化技术开发一种硬件高效的 RepGhost 模块&#xff0c;以实现特征的隐式重用。RepGhostNet 把 Concat 操作去掉&#xff0c;同时修改现有结构以满足重参数化的规则。最终得到的 RepGhostNet 是一个高效的…

swift-类属性-MachO读取

上一篇 swift-类属性 为源码层面类属性结构剖析&#xff0c;接下来从MachO层面验证读取类属性内容 极简类结构 class IFLPerson2 {var age: Int 20var heigh: Double 180}MachO-__swift5_types读取 var size: UInt 0//__swift5_types section 的pFilevar ptr getsectdata(…

Alibaba内部传出的面试秘技,秋招offer尽收囊中

又逢“金九银十”&#xff0c;年轻的毕业生们满怀希望与忐忑&#xff0c;去寻找、竞争一个工作机会。已经在职的开发同学&#xff0c;也想通过社会招聘或者内推的时机争取到更好的待遇、更大的平台。 然而&#xff0c;面试人群众多&#xff0c;技术市场却相对冷淡&#xff0c;…

修改Cmder默认命令提示符

修改Cmder默认命令提示符常规操作第二种方法我的方法参考常规操作 打开在Cmder目录下的vendor文件夹&#xff0c;编辑里面的clink.lua文件找到local lambda “λ”&#xff0c;将“λ”修改为“$”&#xff0c;如图&#xff08;图片来源&#xff1a;https://www.jianshu.com/…

11月“图无处不在”线上直播 - Neo4j宣布下一代图数据平台Neo4j 5上线

中国北京&#xff0c;2022年11月10日—— 图技术的领导者Neo4j 宣布下一代可用于云端的图数据平台Neo4j 5上线。在传统数据库的基础上&#xff0c;扩大了原生图的性能优势&#xff0c;同时在本地、云、混合云或多云部署中实现更高可扩展性&#xff0c;从而使企业能够更快地创建…

Java 集合---尚硅谷Java入门视频学习

问题&#xff1a;什么时候需要一个容纳数据的容器&#xff0c;也就是集合对象&#xff1f; Java集合框架中就包含了对不确定个数的数据处理的集合类问题&#xff1a;如果只是为了容纳数据&#xff0c;可以是直接使用数组&#xff0c;为什么要学习集合&#xff1f; 数组使用起来…

【论文阅读】多模态模型CoCa

Introduction 在这项工作中&#xff0c;我们统一了单编码器、双编码器和编码器-解码器范式&#xff0c;并训练了一个包含三种方法优点的图像-文本基础模型。我们提出了对比Captioner模型(CoCa)&#xff0c;该模型采用经过对比损失和captioning损失训练的编码器-解码器架构。如图…

C语言源代码系列-管理系统之职工工资管理系统

往期文章分享点击跳转>《导航贴》- Unity手册&#xff0c;系统实战学习点击跳转>《导航贴》- Android手册&#xff0c;重温移动开发 &#x1f449;关于作者 众所周知&#xff0c;人生是一个漫长的流程&#xff0c;不断克服困难&#xff0c;不断反思前进的过程。在这个过…

数据存储策略——lsm-tree

文章目录一、背景二、lsm-tree简介三、lsm-tree设计思想四、lsm-tree原理1.写操作2.读操作3.有序表持久化4.后台压缩五、lsm-tree的应用六、lsm-tree优缺点分析总结一、背景 由于传统机械磁盘的原理&#xff0c;它在读写时有个寻道的操作&#xff0c;在读写时都需要消耗一个寻…

基于PHP+MySQL网上报名系统的设计与实现

一直以来如何更好的实现校园现代化和信息化是当前很多高校一直探索的问题&#xff0c;随着时代的发展&#xff0c;高校内各类考试和报名也越来越多&#xff0c;如何通过互联网直接进行在线报名是本系统研究的一个重点内容。 本系统是一个网上报名系统&#xff0c;为了能够更加灵…

[计算机毕业设计]机器学习的数据驱动股票价格预测

前言 &#x1f4c5;大四是整个大学期间最忙碌的时光,一边要忙着准备考研,考公,考教资或者实习为毕业后面临的就业升学做准备,一边要为毕业设计耗费大量精力。近几年各个学校要求的毕设项目越来越难,有不少课题是研究生级别难度的,对本科同学来说是充满挑战。为帮助大家顺利通过…

财政政策与货币政策(下)

财政政策与货币政策(下) – 潘登同学的宏观经济学笔记 文章目录财政政策与货币政策(下) -- 潘登同学的宏观经济学笔记粘性价格下的货币经济总供给曲线总供给曲线斜率与价格粘性菲利普斯曲线的消失货币政策的“动态不一致”财政政策与货币政策的配合财政主导 vs. 货币主导恶性通…

【学习记录】实例分割的发展与区别

【学习记录】实例分割的发展与区别 参考于《The Evolution Of Instantce Segmentation》 文章目录【学习记录】实例分割的发展与区别发展历程RCNNFast RCNNMultipath NetworkFaster RCNNMask RCNN发展历程 RCNN 开发集成了RCNN技术产生了AlexNet&#xff0c;以及使用选择性搜索…