CMU-15-445

2022-04-14 约 5242 字预计阅读 11 分钟

CMU-15-445

Project 1

LRUReplacer

理解Pin与Unpin: 参考《数据库系统概念》，当有线程在读取一个 Page 时，这个 Page 是不能被淘汰的，因此需要Pin操作将它移出 LRUReplacer 。同时 Page 类中会有计数器记录有几个线程正在读取它的内容，当计数器降为 0 时，用 Unpin 操作将其添加进 LRUReplacer 。
实现方式是采用双端队列与哈希表，思路与分布式缓存中的相同。

BufferPoolManager

我的刷盘策略: 懒刷盘，只有当一个页面被驱逐或者被上层显式调用刷盘时才进行刷盘。
frame_id和获得页框的方法: frame_id 就是pages_数组的下标，代表一个内存“页框”，用于存放物理页。一开始所有frame_id均在free_list_中，当占用一个 frame_id 时，将它从free_list_取出，并在page_table_建立映射，注意此时并不将其放入replacer_，因为此时上层调用者会使用返回的Page，pin_count_不为零，只有可以进行驱逐的frame_id才会放入replacer_。当后期free_list_为空时，获得一个新页框的方法是从replacer_中驱逐一个页面，得到它对应的页框，此时应注意对驱逐的页面刷盘。
易错点:
1.UnpinPgImp()操作中，page_table_找不到要Unpin的页面时，应当返回 true 。
2.DeletePgImp()中如果成功删除掉一个页，应该 Pin 一下对应的 frame_id 。保证一个 frame_id 只能出现在free_list_或replacer_中的一个地方，当然也有可能均不出现(此时这个 frame_id 是正在被使用的，pin_count_大于零，无法被驱逐)。

ParallelBufferPoolManager

用std::vector<BufferPoolManagerInstance*> bpms_记录管理的 BufferPoolManagerInstance ，用size_t next_instance_记录下一个页面应当分配在哪一个 BufferPoolManagerInstance 。注意只有NewPgImp()操作需要加锁，这个 ParallelBufferPoolManager 就是为了提高并发度的。

得分情况与优化

因 BufferPoolManager 中的易错点一开始只有 42 分，修改完成后达到满分，Time 13.01 ，排名 212 。后进行优化，将双向队列中的智能指针改为原始指针，并在FlushPgImp()和FlushAllPgsImp()中将 Page 的is_dirty_设为 false ，避免重复刷盘。优化后 Time 9.63 ，排名 84 。

Project 2

参考资料

Extendible Hash Table
Extendible Hashing (Dynamic approach to DBMS)
一定要仔细看Project提供的已有代码！

组织结构说明

一个Page的大小在config.h中给出，为 4096 byte，那么如何确定BUCKET_ARRAY_SIZE呢？BUCKET_ARRAY_SIZE是一页中可以存放的键值对数，每一个键值对还需要两 bit 的 readable 和 occupied 标志，所以可以通过算式PAGE_SIZE/(sizeof (MappingType) + 0.25)分子分母同乘4得到hash_table_page_defs.h中的#define BUCKET_ARRAY_SIZE (4 * PAGE_SIZE / (4 * sizeof(MappingType) + 1))。
可以注意到Page类中data_的大小就是一个 PAGE_SIZE ，同时也注意到HashTableBucketPage没有构造函数和析构函数并且存放数据的array_长度为零，那如何产生一个HashTableBucketPage对象？在课程官网有提到reinterpret_cast，分配一个HashTableBucketPage对象时调用auto bucket_page = reinterpret_cast<HashTableBucketPage<int, int, IntComparator> *>(bpm->NewPage(&bucket_page_id, nullptr)->GetData());，这时会按照Page类中GetData()返回的大小为 PAGE_SIZE 的数据分配内存，然后被HashTableBucketPage指针指向，HashTableBucketPage指针指向的内存大小就是一个 PAGE_SIZE ，因为array_变量在类的尾部声明，就可以使用array_来访问页中数据，即使array_的长度被声明为 0 。HashTableDirectoryPage的内存分配同理。
页中的键值对是以std::pair<KeyType, ValueType>的形式存放的，KeyType是一个模板类，内部是一个长度为KeySize的定长 char 数组。ValueType是RID(Record ID)类，存放长度为 32 bit 的slot_num_与 32 bit 的page_id_，其实就是对应数据页的 id 和该 Record 在数据页内的偏移，和 Project 3 有关，现在可以不管。
当一个键值对加入 bucket 后，如果要删除它，是将它对应的 readable 标志置为 0 ，意味着此时这个位置是一个 tombstone 。
DIRECTORY_ARRAY_SIZE为 512 ，所以GlobalDepth最大为 9 。
HashTableDirectoryPage与ExtendibleHashTable共用一把table_latch_。
截止当前 Project ，可见的项目结构如图：

易错点

1.千万记住 Unpin 不再使用的 Page ，包括HashTableDirectoryPage。
2.课程要求使用 least-significant bits 作为 index ，这和很多参考资料上画的是相反的，要注意。
3.我给Insert()加的是读锁，多个线程同时执行Insert()，全部 insert 到一个满的桶上都会到SplitInsert()扩容，为了避免重复扩容，SplitInsert()先检查是否可以插入成功并且不扩容。至于为什么Insert()加读锁，是因为大部分insert操作不改变 HashTable 的全局状态，加读锁性能明显更好。Merge()同理。
4.桶分裂时当前桶可能存在于bucket_page_ids_中的多个位置，注意全部设成正确的新值。
5.对目录进行 Shrink 之后应该再对所有空桶进行检查是否能合并，因为 Shrink 之后刚刚合并的桶的 SplitBucket 发生了改变，可能继续与空桶合并。

神奇的位运算

HashTableDirectoryPage和HashTableBucketPage中有很多函数用到了几个比较妙的位运算，并且这几个函数对于ExtendibleHashTable中的几个骚操作很有帮助。根据参考材料和课程提供的注释使用这些函数即可。下面是其中几个比较有意思的实现。

1
2
3


uint32_t HashTableDirectoryPage::GetSplitImageIndex(uint32_t bucket_idx) {
  return bucket_idx ^ (1 << (local_depths_[bucket_idx] - 1));
}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24


template <typename KeyType, typename ValueType, typename KeyComparator>
void HASH_TABLE_BUCKET_TYPE::RemoveAt(uint32_t bucket_idx) {
  readable_[bucket_idx >> 3] &= ~(1 << (bucket_idx & 7));
}

template <typename KeyType, typename ValueType, typename KeyComparator>
bool HASH_TABLE_BUCKET_TYPE::IsOccupied(uint32_t bucket_idx) const {
  return occupied_[bucket_idx >> 3] & (1 << (bucket_idx & 7));
}

template <typename KeyType, typename ValueType, typename KeyComparator>
void HASH_TABLE_BUCKET_TYPE::SetOccupied(uint32_t bucket_idx) {
  occupied_[bucket_idx >> 3] |= 1 << (bucket_idx & 7);
}

template <typename KeyType, typename ValueType, typename KeyComparator>
bool HASH_TABLE_BUCKET_TYPE::IsReadable(uint32_t bucket_idx) const {
  return readable_[bucket_idx >> 3] & (1 << (bucket_idx & 7));
}

template <typename KeyType, typename ValueType, typename KeyComparator>
void HASH_TABLE_BUCKET_TYPE::SetReadable(uint32_t bucket_idx) {
  readable_[bucket_idx >> 3] |= 1 << (bucket_idx & 7);
}

得分情况

一共两次有效提交，第一次因为没有注意到易错点 5 只有 75 分，改正后达到满分，Time 32.96 ，排名 38 ，还算是比较靠前的，有时间再优化一下，很多逻辑都还可以简化，但是重构比较麻烦。

Project 3

参考资料

课程PPT重点看 Processing Models 和 Expression Evaluation 。
CMU15-445 数据库实验全满分通过笔记
 Hash Join

组织结构说明

执行相关:

ExecutionEngine是一个执行计划被执行的地方，它接收上层的执行计划并执行后返回结果集，执行时会调用ExecuteFactory的静态方法CreateExecutor(ExecutorContext *, const AbstractPlanNode *)新建一个std::unique_ptr<AbstractExecutor>用以进行执行操作。它的成员变量如下，其中的Catalog *catalog_是用于进行 table creation, table lookup, index creation, and index lookup 的。([[maybe_unused]]是一个 C++17 新特性，详见Attributes)

1
2
3
4
5
6
7


private:
  /** The buffer pool manager used during query execution */
  [[maybe_unused]] BufferPoolManager *bpm_;
  /** The transaction manager used during query execution */
  [[maybe_unused]] TransactionManager *txn_mgr_;
  /** The catalog used during query execution */
  [[maybe_unused]] Catalog *catalog_;

ExecutorContext保存了一个语句的执行过程中可能用到的上下文信息,ExecuteFactory创建 Executor 的时候会将ExecutorContext传递给Executor。成员变量如下。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


private:
  /** The transaction context associated with this executor context */
  Transaction *transaction_;
  /** The datbase catalog associated with this executor context */
  Catalog *catalog_;
  /** The buffer pool manager associated with this executor context */
  BufferPoolManager *bpm_;
  /** The transaction manager associated with this executor context */
  TransactionManager *txn_mgr_;
  /** The lock manager associated with this executor context */
  LockManager *lock_mgr_;

每个执行都会有一个执行计划类，ExecuteFactory创建 Executor 的时候也会把对应的执行计划类传递给Executor。

索引和表相关:

Tuple可以看作是一个数据行，有着唯一的 RID(Record ID) ，即 Project 2 中哈希表的ValueType，RID类，存放长度为 32 bit 的slot_num_与 32 bit 的page_id_，其实就是对应TablePage的 id 和该 Record 在数据页内的偏移。哈希表中查到 RID 之后就可以根据 RID 获取对应的数据行。
Schema schema_保存了一张表中的列信息，主要包含std::vector<Column> cols，Column即记录一个列的信息，包括这一列的名字、数据类型和在Tuple中的偏移量，并且保存了创建一个列的表达式const AbstractExpression *expr_。
TableInfo保存一张表的信息，其中主要包含Schema schema_、std::string name、std::unique_ptr<TableHeap> table_。TableHeap代表了一个存在于磁盘的表，用于对表进行真正的修改操作。TablePage是表的存储单元，保存了Tuple，TableHeap中有TablePage的双向链表，表内容修改操作发生于TablePage。
TablePage中的注释很好地说明了数据是如何保存在表中的。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


/**
 * Slotted page format:
 *  ---------------------------------------------------------
 *  | HEADER | ... FREE SPACE ... | ... INSERTED TUPLES ... |
 *  ---------------------------------------------------------
 *                                ^
 *                                free space pointer
 *
 *  Header format (size in bytes):
 *  ----------------------------------------------------------------------------
 *  | PageId (4)| LSN (4)| PrevPageId (4)| NextPageId (4)| FreeSpacePointer(4) |
 *  ----------------------------------------------------------------------------
 *  ----------------------------------------------------------------
 *  | TupleCount (4) | Tuple_1 offset (4) | Tuple_1 size (4) | ... |
 *  ----------------------------------------------------------------
 *
 */

IndexInfo记录了一个索引的信息，成员如下，其中std::unique_ptr<Index> index_使用的就是 Project 2 实现的 ExtendibleHashTable 。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


  /** The schema for the index key */
  Schema key_schema_;
  /** The name of the index */
  std::string name_;
  /** An owning pointer to the index */
  std::unique_ptr<Index> index_;
  /** The unique OID for the index */
  index_oid_t index_oid_;
  /** The name of the table on which the index is created */
  std::string table_name_;
  /** The size of the index key, in bytes */
  const size_t key_size_;

Catalog使用TableInfo与IndexInfo两个结构来对表和索引进行操作。

Nested Loop Join

Nested Loop Join 写起来感觉怪怪的，贴一下我的代码吧，不知道逻辑对不对，测试是过了。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38


void NestedLoopJoinExecutor::Init() {
    left_executor_->Init();
    right_executor_->Init();
    first_ = true;
}

bool NestedLoopJoinExecutor::Next(Tuple *tuple, RID *rid) {
    if (first_ && !left_executor_->Next(&left_tuple_, &left_rid_)) {
        return false;
    }
    first_ = false;

    Tuple right_tuple;
    RID right_rid;

    while (true) {
        while (right_executor_->Next(&right_tuple, &right_rid)) {
            if (plan_->Predicate() == nullptr || plan_->Predicate()
                    ->EvaluateJoin(&left_tuple_, left_executor_->GetOutputSchema(),
                                   &right_tuple, right_executor_->GetOutputSchema())
                    .GetAs<bool>()) {
                std::vector<Value> output;
                for (const auto &col : GetOutputSchema()->GetColumns()) {
                    output.push_back(col.GetExpr()->EvaluateJoin(&left_tuple_, left_executor_->GetOutputSchema(), &right_tuple,
                                                                 right_executor_->GetOutputSchema()));
                }
                *tuple = Tuple(output, GetOutputSchema());
                return true;
            }
        }
        if (!left_executor_->Next(&left_tuple_, &left_rid_)) {
            break;
        }
        right_executor_->Init();
    }

    return false;
}

Hash Join

这个自己按照我的 Nested Loop Join 的方法写出来线上测试过不了，后来按照CMU15-445 数据库实验全满分通过笔记的修改，代码如下。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53


void HashJoinExecutor::Init() {
    left_child_->Init();
    right_child_->Init();

    // 初始化哈希表阶段
    Tuple left_tuple;
    RID left_rid;
    while (left_child_->Next(&left_tuple, &left_rid)) {
        HashJoinKey hash_key{plan_->LeftJoinKeyExpression()->Evaluate(&left_tuple, left_child_->GetOutputSchema())};
        if (map_.count(hash_key) != 0) {
            map_[hash_key].emplace_back(left_tuple);
        } else {
            map_[hash_key] = std::vector{left_tuple};
        }
    }

    left_index_ = -1;
}

bool HashJoinExecutor::Next(Tuple *tuple, RID *rid) {
    // 探测阶段
    HashJoinKey cur_key;
    if (left_index_ != -1) {
        cur_key.key_ = plan_->RightJoinKeyExpression()->Evaluate(&right_tuple_, right_child_->GetOutputSchema());
    }
    if (left_index_ == -1  // 判断是否第一次执行Next()
        || map_.find(cur_key) == map_.end()  // 判断当前是否已经指向一个能join的右表tuple
        || left_index_ == static_cast<int>(map_[cur_key].size()) // 判断这个能join的右表tuple是否还能跟左表tuple相结合
            ) {
        while (true) {
            if (right_child_->Next(&right_tuple_, &right_rid_)) {
                cur_key.key_ = plan_->RightJoinKeyExpression()->Evaluate(&right_tuple_, right_child_->GetOutputSchema());
                if (map_.find(cur_key) != map_.end()) {
                    left_index_ = 0;
                    break;
                }
            } else {
                return false;
            }
        }
    }

    auto cur_vector = map_.find(cur_key)->second;
    auto cur_left_tuple = cur_vector[left_index_];
    std::vector<Value> output;
    for (const auto &col : GetOutputSchema()->GetColumns()) {
        output.push_back(col.GetExpr()->EvaluateJoin(&cur_left_tuple, left_child_->GetOutputSchema(),
                                                     &right_tuple_, right_child_->GetOutputSchema()));
    }
    *tuple = Tuple(output, GetOutputSchema());
    ++left_index_;
    return true;
}

得分情况

这个 Project 的线上测试因为测试框架的更新导致编译失败，详见Issue #227。向提交的压缩包中加入 /src/include/storage/page/tmp_tuple_page.h 后即可编译成功。第一次提交因为 Hash Join 超时得到 330 分，修改后达到满分，Time 3.87 ，排名 30 。

Project 4

参考资料

《数据库系统概念》
CMU15-445 数据库实验全满分通过笔记
 2021 CMU 15-445 实验笔记

三种隔离级别

READ_UNCOMMITED: 读取不需要获得共享锁，写入需要获得排他锁，用完直接放锁。
READ_COMMITTED: 要解决脏读的问题，解决方案就是读时上读锁，读完解读锁；写时上写锁，但等到commit时才解写锁；读时上读锁，读完解读锁。这样，永远不会读到未commit的数据，因为上面有写锁。
REPEATABLE_READ: 读取和写入均需要锁，需要遵守强两阶段锁规则。

事务状态

注意: Any failed lock operation should lead to an ABORTED transaction state (implicit abort) and throw an exception. The transaction manager would further catch this exception and rollback write operations executed by the transaction.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


 * Transaction states for 2PL:
 *     _________________________
 *    |                         v
 * GROWING -> SHRINKING -> COMMITTED   ABORTED
 *    |__________|________________________^
 *
 * Transaction states for Non-2PL:
 *     __________
 *    |          v
 * GROWING  -> COMMITTED     ABORTED
 *    |_________________________^
 *

加锁解锁逻辑

按照《数据库系统概念》18.1.4 的描述和2021 CMU 15-445 实验笔记的思路实现即可。下面是课程官网的提示。

LockShared(Transaction, RID): Transaction txn tries to take a shared lock on record id rid. This should be blocked on waiting and should return true when granted. Return false if transaction is rolled back (aborts).
LockExclusive(Transaction, RID): Transaction txn tries to take an exclusive lock on record id rid. This should be blocked on waiting and should return true when granted. Return false if transaction is rolled back (aborts).
LockUpgrade(Transaction, RID): Transaction txn tries to upgrade a shared to exclusive lock on record id rid. This should be blocked on waiting and should return true when granted. Return false if transaction is rolled back (aborts). This should also abort the transaction and return false if another transaction is already waiting to upgrade their lock.
Unlock(Transaction, RID): Unlock the record identified by the given record id that is held by the transaction.

得分情况和疑惑

这个 Project 真写麻了，是四个 Project 里最痛苦的，一开始只有 55 分，和死锁预防的有关的测试全部超时，后来在CMU15-445 数据库实验全满分通过笔记看到“如果老事务想上锁，队列中如果有只是在等待并没有获得锁的新事务请求，也要abort它。如果老事务请求继续等待，测试就会超时”，但是这个和《数据库系统概念》讲的又不一样了。虽然最后达到了满分，但是很多实现和教材的有冲突并且我的很多代码逻辑并不完善。Project 4 无排名。

总结

简单了解C++加上做完 CMU 15-445 共计 13 天，对数据库底层原理加深了理解，也知道了《数据库系统概念》这一本很好的教材，还需要仔细看看。共计 44 次提交，感谢连续十多天熬夜 debug 的自己。