<style>
details {
margin-top: 0.5em;
margin-bottom: 0.5em;
}
summary {
//font-weight: bolder;
}
summary:hover {
text-decoration: underline;
}
blockquote {
font-size: 16px;
}
.todo {
color: #ff00ff;
border: 2px dashed #ff00ff;
padding: 0em 1em;
border-radius: 5px;
margin-top: 1em;
margin-bottom: 1em;
//display: none; // UNCOMMENT TO HIDE TODOs
}
</style>
**[« Back to the main CS 0300 website](https://csci0300.github.io/)**
# Project 5A: Concurrent Store
**Due: Friday, April 25 at 8:00 PM EDT**
# Introduction
A key-value store is a storage system in which data (a *value*) is addressed by a *key*; common examples are Python's dictionaries or Java/C++ hash maps. Key-value stores are popular for large-scale web applications because they are fast and scalable.
Clients interface with the key-value store through a server. Once client(s) connect to the server, they can issue requests to get, put, append, and delete key-value pairs from the store.
In this project, you'll implement two kinds of key-value stores. You'll then extend your implementation to support concurrent client requests.
Production applications often handle so much data (and so many client connections) that one key-value store is not enough -- they need to divide (i.e., *shard*) their data between multiple stores. In Project 5B, you'll extend your implementation to support such a distributed system.
## Learning Objectives
This assignment will help you:
* **Familiarize yourself with synchronization primitives**, namely mutexes.
* **Think about problems introduced by concurrency**, like deadlocks and race conditions on shared data structures.
* **Learn about key networking abstractions**, the "client" and "server," and the roles of each.
* **Understand the implications of privacy laws for storage systems**, and how developers need to implement technical functionality to give users rights over their data.
* **Reason about trade-offs between stakeholder interests and public goods** relating to privacy and data protection.
## Assignment Installation
To get started on the project, you will need to pull some additional stencil code into your project repo. You can do this as follows:
1. Open a terminal in your container environment and `cd` into the directory you set up as your project repo from Lab 0 (e.g., `cd projects`)
2. Run the following command to pull the latest code from the `handout` remote (see below if you get errors):
```
cs300-user@9899143429a2:~/projects$ git pull handout main
```
If this reports an error, run:
```bash
git remote add handout git@github.com:csci0300/cs300-s25-projects.git
```
followed by:
```bash
git pull
git pull handout main
```
This will merge our Project 5 (`kvstore`) stencil code with your repository.
:::warning
**Note: If you see a warning about "divergent branches"...**
<details> <summary><b>Expand for instructions</b> </summary>

This means you need to set your Git configuration to merge our code by default. To do so, run (within the same directory in your labs repository workspace):
```console
$ git config pull.rebase false
$ git pull handout main
```
</details>
:::
:::danger
**If your terminal window changes to something like this**:

<details> <summary><b>Expand for instructions</b> </summary>
Your terminal has asked you to add a commit message to explain what code was merged in, and opened a text editor so you can write the message. The top line is a default message, which you can use as-is. To use the message and finish the commit, do the following:
- **If your terminal looks like the image above**, press `Ctrl+s` to save, then `Ctrl+q` to quit. (These are shortcuts for the text editor `micro`, which is the program that is running now)
- **If your terminal instead has a bar at the top that says `GNU nano`**, you are using a text editor called `nano`. To save your commit and exit, press `Ctrl+x`, then press `y`, and then press `Enter`.
</details>
:::
Once you have a local working copy of the repository that is up to date with our stencils, you are good to proceed! You should now see a new directory called `kvstore` inside your projects repo: you should do all your work for this project in this directory.
## :eyes: NEW: Extra initial setup for this project
:::danger
:warning::warning::warning: **Warning: Don't skip this step!** This project requires you to install some libraries before you can compile the code or navigate it properly in VSCode. **Be sure to follow the instructions here before continuing!**
:::
This project has a rather large stencil that requires some extra libraries not present in our normal container, so we need to install them in order to build the project. To do this:
1. If you have not done so already, open a terminal in your container environment and `cd` into the `kvstore` directory for this project
2. Run the following command to install the packages we need:
```
$ sudo apt-get update && sudo apt-get -y install libstdc++-14-pic libstdc++-14-dev cmake
```
This command will run for a minute or so as it downloads and installs a few libraries. The last line should say something like `Setting up ....` or `Processing triggers...`. If you're not sure if the command worked, try to compile your project in the next step and see how it goes.
To make sure the installation worked, and to show you how to run the tests, we can try to compile the stencil now. To do this:
1. `cd` into the `build` directory in your repository's `kvstore` directory
```shell
$ cd build
build$
```
:::success
**What to expect**: For the rest of this assignment, remember that you should generally run all commands (`make`, `make check`, all other testing commands) from the `build` directory. In the handout, we'll use the following syntax to note that your should be in the build directory before running the command:
```shell
build$ make -j # Compile (from the build directory)
```
... so be sure to check for this!
:::
2. Next, compile the code!
```
build$ make -j
```
:::info
**What's the `-j`?** The `-j` option turns on **parallel builds**: this will cause make to run multiple processes to speed up the compilation (yay, parallellism!). Throughout this project, you can add the `-j` option to any make command to speed things up!
- **If you use parallel builds, you need to remember one thing**: *if you get a compilation error that doesn't make sense, run `make` one more time without `-j`*. When parallel builds are turned on, sometimes too many processes are printing to the screen to see what went wrong--running again with only one process will help you see the error.
:::
If you encounter encounter a compilation error, make sure you ran the `apt-get` command in the previous step. If you continue encountering issues, please let us know in office hours or on Edstem.
3. Finally, to make sure everything works, we can run the tests (here, for part A1):
```shell
build$ make check A1
```
at this point, the implementation is missing, so you should see the tests fail (which is okay!)--specifically, your output look like this:

However, running the tests proves your stencil is set up properly, yay! Your repository should now be set up to work on both projects 5A and 5B! :tada:
:::warning
**Important note (remember for later)**: if you ever need to delete your container, switch systems, or need to refresh your container by running:
```
./cs300-run-docker --clean
```
... you will need to do this setup process again in order to reinstall the packages.
If you have needed to run `./cs300-run-docker --clean` often in order to work on your projects, there may be an issue with your container setup: please make a post on EdStem and we will work with you to ensure you don't need to keep doing this.
:::
# Stencil organization
Project 5B also uses this stencil, so there is more code in the repository than you will need for this assignment. Here are the components you'll need for Project 5A:
| Directories | Contents |
| :--------------------- | :----------------- |
| `build/` | See [Compile and Run](#Compile-and-Run). |
| `tests/` and `test_utils/` | See [Testing](#Testing). |
| `client/` | The `simple_client` implementation you'll run in this assignment, along with the commands it supports. Also has the stencil for `shardkv_client`, which you'll implement in Project 5B. |
|`gdpr/` | The `database.txt` file you'll work with in the Tweeter portion of the assignment
|`kvstore/` | The definition of the abstract `KvStore` interface, as well as the `SimpleKvStore` and `ConcurrentKvStore` classes. |
| `server/` | The definition of the `KvServer` class and support code. |
If you're curious, descriptions of the rest of the directories are below:
| Directories | Contents |
| :--------------------- | :----------------- |
| `cmd/` | The runner code that starts the client, server, and shardcontroller. You should not modify these. |
| `net/` | The networking support code used by the clients and servers. You should not modify these. |
| `repl/` | Utilities for running the client, server, and shardcontroller on the command line. You should not modify these.
| `common/` | The `Shard` class and some helpers for managing or operating on them. Also contains miscellaneous utility functions, such as color printing. These will both be useful in Project 5B.
| `shardcontroller/` | The definition of the `shardcontroller` class, and support code to process client requests. You'll implement the shardcontroller methods in Project 5B. |
# Testing
We strongly recommend you test your code often while implementing, by compiling and running a server with a few clients (as shown above).
To run tests, **you must be in the `build/` directory**. Then, run `make check concurrent_store`.
To run the tests for an *individual step* `#` , you can run `make check A#`. For example, to run the tests for step 1, you can run `make check A1`.
The tests are in the `tests/` directory, with each subdirectory corresponding to a part of the assignment. After implementing each step, the testing harness will run all tests related to that step. Specifically, the steps will run all tests in the following directories:
* Step 1: `kvstore_sequential_tests` (A1)
* Step 2: `kvstore_parallel_tests` (A2)
* Step 3: `kvstore_sequential_tests` (A3)
* Step 4: `kvstore_parallel_tests`(A4) and `kvstore_performance_tests` (A5)
:::danger
If you're failing tests, your first step to debug should be to run them individually. Each test prints why it failed, but `make check concurrent_store` suppresses that output.
:::
To run an individual test, run that specific test's executable. For instance, for `test_put_get_delete`, run in the `build/` directory:
```
make test_put_get_delete
./test_put_get_delete
```
If nothing prints, the test passed. Otherwise, you'll see an error message.
:::warning
<details><summary>Example Error Message</summary>
```
$ ./test_put_get_delete
Assert failed in file test_put_get_delete.cpp on line 26:
Expected first argument to equal second argument x1Ha4M0dLCkK
```
The error message directs us to this assertion:
```cpp=26
ASSERT_EQ(get_res.value, vals[i]);
```
so the error message is telling us that `get_res.value` (the first argument) is empty, but the test expected it to be equal to the second argument, `vals[i]`, or x1Ha4M0dLCkK.
</details>
:::
The correctness tests can also optionally specify which *type* of KVStore to run (`SimpleKvStore` or `ConcurrentKvStore`) by passing in either `simple` or `concurrent` as an argument:
```
./test_put_get_delete simple
```
The tests default to the `ConcurrentKvStore`.
If you want to write your own tests, add new files to the subdirectories in `tests/`. Be sure to take a look at the functions in `test_utils/test_utils.hpp` --- they will make writing your own tests much easier.
## Running the client manually
To compile, `cd` into the `build/` directory and run `make`. This will create two executables: `simple_client` and`server`.
:::info
**To run the server**, specify the port, and optionally the number of workers:
```
$ ./server <port> [n_workers]
```
**To run the simple client**, specify the server address:
```
$ ./simple_client <server hostname:port>
```
<details>
<summary><b><span>Example Usage</span></b></summary>
<br />
To connect multiple clients to a server, start by opening multiple terminals (or use [***`tmux`***](https://www.hamvocke.com/blog/a-quick-and-easy-guide-to-tmux/)).
**Terminal 1**: runs `server` on port 5000 with the default number of workers:
```
$ ./server 5000
```
Your server will then be listening on `<hostname>:5000`; the server's full address will be displayed in the terminal.
**Terminal 2**: runs `simple_client` to connect to the server, using the address displayed above (in this example, the hostname is `65a4af2c28ae`):
```
$ ./simple_client 65a4af2c28ae:5000
```
</details>
:::
The server defaults to using your SimpleKvStore. To change that, edit the line `this->store = std::make_unique<SimpleKvStore>();` in `server.cpp` to use a `ConcurrentKvStore`.
### Using the Client
Once connected to the server, you can use one or more clients to run standard key-value store operations, like `get`, `put`, `append`, and `delete`. You can also use two additional requests, `multiget` and `multiput` (for multiple key-value pairs). For specific usage instructions, type `help` in the command prompt.
To exit the client (and any other executable), press `Ctrl-D` (or `Cmd-D`); this sends `End-Of-File` (`EOF`) to the program, causing it to exit cleanly. If your program hangs, use `Ctrl-C` (or `Cmd-C`) to force-quit the program.
# Part 1: Implementing a Key-Value store
## Step 1: Single-Threaded Simple KV Store
In this section, you'll implement a single-threaded simple key-value store. By "single-threaded," we mean that only one thread can safely operate on the KVStore at a time. **This means that for Step 1, you should not worry about concurrency. You may safely assume that only one thread will ever be operating on your KVStore at a time.**
Take a look at how we define a `SimpleKvStore` (in `kvstore/simple_kvstore.hpp`):
:::info
```cpp
class SimpleKvStore : public KvStore {
public:
SimpleKvStore() = default;
~SimpleKvStore() = default;
bool Get(const GetRequest *req, GetResponse *res) override;
bool Put(const PutRequest *req, PutResponse *) override;
bool Append(const AppendRequest *req, AppendResponse *) override;
bool Delete(const DeleteRequest *req, DeleteResponse *res) override;
bool MultiGet(const MultiGetRequest *req, MultiGetResponse *res) override;
bool MultiPut(const MultiPutRequest *req, MultiPutResponse *) override;
std::vector<std::string> AllKeys() override;
private:
// Your internal key-value store implementation!
};
```
:::
:::success
**Task**: Define your internal key-value store. What data structure might be useful here? You may find [this reference page](https://en.cppreference.com/w/cpp/container) useful.
:::
Now it's time to implement each of these methods. First, look at how we define the `Request` and `Response` structs in `net/server_commands.hpp`. The `Request` object contains the information being requested. The `Response` object is where your methods should store the response to the request (if a response is needed). For the `AllKeys` operation, return a vector of the keys. For the other operations, return `true` if the operation is successful; otherwise, return `false`.
<details>
<summary><b><span>Give me an example!</span></b></summary>
<br/>
A call to `Get(const GetRequest *req, GetResponse *res)` takes in a pointer to a `GetRequest`, which has a `key` field. Your method should get the value corresponding to `key` from your internal key-value store and store it in the `value` field of the `GetResponse` that `res` points to.
If a `Response` struct is empty, that means you don't need to provide an explicit confirmation that you fulfilled the request. For example, for a call to `Put(const PutRequest *req, PutResponse *res)`, you should insert the `key` and `value` from the PutRequest that `req` points to into your internal key-value store. Since `PutResponse` is an empty struct, you don't need to do anything with it.
</details>
<br />
Your key-value store will support the following operations:
| Function | Arguments | Summary |
| :---------------------: | :-----------------: | :-----------------: |
| `Get` | `<key>` | Gets the value associated with `<key>` from the key-value store. Fails if it does not exist. |
| `Put` | `<key>`, `<value>`| Sets `<key>` to `<value>`; replaces existing values. |
| `Append` | `<key>`, `<value>`| Appends `<value>` to the current value associated with `<key>`. Creates a new key-value pair if it does not already exist. |
| `Delete` | `<key>` | Deletes `<key>` from the key-value store, returning the last saved value. Fails if it does not exist. |
| `MultiGet` | `<key_1> ... <key_n>` | Gets the values associated with keys `<key_1> ... <key_n>` from the key-value store. Fails if any key does not exist.|
| `MultiPut` | `<key_1> ... <key_n>, <value_1> ... <value_n>` | Sets `<key_1> ... <key_n>` to `<value_1> ... <value_n>`, respectively.|
| `AllKeys` | None | Returns all of the keys from the database.|
:::success
**Task**: Implement the methods in `kvstore/simple_kvstore.cpp`.
When you are finished, you should pass all `kvstore_sequential_tests`. You can check that your implementation is correct by running **in the `build/` directory**:
```shell
build$ make check A1 # Remember to run from `build` directory!
```
**If you're failing a test**, you should run it individually to see debug output (replace test_name with the name of the test):
```shell
build$ make test_name # Replace test_name with your test
build$ ./test_name
```
The tests default to using the `ConcurrentKvStore` (which you'll implement in Steps 3 and 4), so you'll need to explicitly tell each test to use the `SimpleKvStore` by passing in `simple` as an argument.
:::
## Step 2: Concurrent Simple KV Store
We would like our server to be able to handle multiple requests in parallel. However, our simple key-value store can't safely handle concurrent requests yet! In this step, you'll make your simple key-value store thread-safe.
#### Implementation
:::success
**Task**: Add a synchronization primitive to `SimpleKvStore` in `kvstore/simple_kvstore.hpp` to make it thread-safe. You should use at most **one** mutex in this part. (We will explore more efficient locking schemes in later steps.)
:::
:::success
**Task**: Modify the methods in `kvstore/simple_kvstore.cpp` to enforce synchronized access, using the primitive you defined in the previous task.
When you are finished, you should pass all `kvstore_parallel_tests`. You can check that your implementation is correct by running **in the `build/` directory**:
```shell
build$ make check A2
```
**If you're failing a test**, you should run it individually to see more debugging output (replace `test_name` with the name of the test):
```shell
build$ make test_name
build$ ./test_name
```
:::
## Step 3: Single-Threaded Bucket-Based KV Store
#### Overview
Operations on your simple key-value store are *thread-safe* and *concurrent*, but there is no actual *parallelism*. Since two threads cannot modify the store at the same time, it is thread-safe. However, the entire map gets locked every time a thread wants to perform an operation on the store, which means that there are no parallel (i.e. simultaneous) operations.
This single mutex protection strategy is an example of **coarse-grained locking**: one lock protects access to a large section of code or an entire data structure. **Fine-grained locking** attempts to reduce lock contention by splitting the data structure or code into smaller, potentially disjoint pieces that each have their own lock. Multiple threads can then operate on separate pieces in parallel.
However, we have to be careful that a fine-grained locking scheme doesn't introduce subtle concurrency bugs. Also, having too many mutexes can introduce overhead that offsets the benefits of the fine-grained locking strategy.
:::info
<details><summary><span><b>Example of Coarse vs. Fine-Grained locking</b></span></summary>
<br/>
A data structure you could implement coarse or fine-grained locking would be an array.
Coarse-grained locking would imply locking the entire array when any of it needs to be modified. This means that even if only a small part of the array will be modified, no other threads can access any part of the array.
This would look something like having an array-specific mutex.
```cpp=
// This may appear somewhat tedious
my_array_mtx.lock();
my_array[0] = 123;
my_array_mtx.unlock();
// But this locking is a lot easier on the computer
my_array_mtx.lock();
for(int i = 0; i < sizeof(my_array) / sizeof(int); i++){
my_array[i] = i + 3;
}
my_array_mtx.unlock();
```
Alternatively, you could have fine-grained locking. This would involve ensuring that, instead of the entire array being unavailable, only the specific components changed will be unavailable to other threads. Thus, every "slot" in the array would need a corresponding mutex. While this allows for greater flexibility for threads, this structure may be a lot heavier on the system (because one would need to create more mutexes).
This would look something like:
```cpp=
// This is a lot simpler than before
my_array_mtx_array[0].lock();
my_array[0] = 123;
my_array_mtx_array[0].unlock();
// But this is a lot more tedious
for(int i = 0; i < sizeof(my_array) / sizeof(int); i++){
my_array_mtx_array[i].lock();
my_array[i] = i + 3;
my_array_mtx_array[i].unlock();
}
```
Notice in the above example, the scalability isn't too great either -- this is because we chose a too-fine grain 🌾 Thus, it's always important to find a balance between approaches.
Another very cool example of fine and coarse-grained locking is locking a linked list! Feel free to examine this [link](https://www.usenix.org/system/files/login/articles/login_fall20_14_kelly.pdf) for more information.
:::
#### Implementation
If the single mutex locking strategy from Step 2 is too coarse-grained, and one lock for each key-value pair is too fine-grained, how can we design an efficient and scalable KVStore? We need to revisit our internal store design to support a new locking scheme--one where there's more than one lock, but fewer locks than the number of keys.
For this part, you will implement the key-value store API as a bucket-based hash table.
Take a look at how we define a `ConcurrentKvStore` (in `kvstore/concurrent_kvstore.hpp`):
```cpp=90
class ConcurrentKvStore : public KvStore {
public:
ConcurrentKvStore() = default;
~ConcurrentKvStore() = default;
bool Get(const GetRequest *req, GetResponse *res) override;
bool Put(const PutRequest *req, PutResponse *res) override;
bool Append(const AppendRequest *req, AppendResponse *res) override;
bool Delete(const DeleteRequest *req, DeleteResponse *res) override;
bool MultiGet(const MultiGetRequest *req, MultiGetResponse *res) override;
bool MultiPut(const MultiPutRequest *req, MultiPutResponse *res) override;
std::vector<std::string> AllKeys() override;
private:
// Your internal key-value store implementation!
DbMap store;
};
```
Notice that your concurrent key-value store will support the same operations as the `SimpleKvStore`, so refer to Step 1 for descriptions of each method. Your internal store is defined as type `DbMap`. We have already implemented the `DbMap` struct for you. Your responsibility is to write the methods in `kvstore/concurrent_kvstore.cpp`.
**For this part, you may assume that only one thread will operate on your bucket-based KVStore at a time**. We will add support for synchronized access in Step 4.
:::warning
You are required to keep your internal store as type `DbMap`, and to maintain the internal structure of the `DbMap` (i.e., no changing this line):
```cpp
std::array<std::list<DbItem>, BUCKET_COUNT> buckets;
```
However, you are *not required* to use `getIfExists`, `insertItem`, or `removeItem` if you don't want to. We provide them because we think they will be helpful helper methods for you to work with your internal store, but if you'd rather write different ones (or do all of your work in `kvstore/concurrent_kvstore.cpp`), feel free.
:::
:::success
**Task**: Use the given `DbMap` implementation to fill in the function bodies for `Get`, `Put`, `Append`, `Delete`, `MultiGet`, `MultiPut`, and `AllKeys` in `kvstore/concurrent_kvstore.cpp`.
When you are finished, you should pass all `kvstore_sequential_tests`. You can check that your implementation is correct by running **in the `build/` directory**:
```shell
build$ make check A3
```
If you're failing a test, you should run it individually to see debug output (replace test_name with the name of the test):
```shell
build$ make test_name
build$ ./test_name
```
For Steps 3 and 4, you can either specify `concurrent` as an argument or just leave it blank (as in the example above).
:::
## Step 4: Concurrent Bucket-Based KV Store
#### Overview
In this part, you'll modify your bucket-based key-value store implementation to support concurrent *and* parallel access using readers-writer locks.
#### Readers-writer locks
Recall from class the basic rule of synchronization:
> If two or more threads can concurrently access an object, and at least one of the accesses is a write, a race condition can occur and synchronization is required.
Observe that synchronization is only required if at least one thread is *writing* to shared data. If both threads are reading, then they can safely read concurrently. Regular mutexes, however, do not address this distinction--only one thread can access a critical section at a time, so such concurrent reads are impossible.
One potential optimization is a **readers-writer lock**, or **RWLock** ([`std::shared_mutex`](https://en.cppreference.com/w/cpp/thread/shared_mutex) in C++). RWLocks allow for **concurrent** access for read-only operations, but **exclusive** access for write operations. Before entering a critical section, threads may attempt to acquire either a *shared* (or *read*) lock or an *exclusive* (or *write*) lock. Multiple threads may hold a shared lock at once, but like regular mutexes, only one thread may hold an exclusive lock. To ensure safe concurrent or exclusive access, reader threads cannot acquire a shared lock until there is no writer thread holding an exclusive lock.
As with mutexes, it is up to the programmer to properly protect shared data and lock critical sections within the program. RWLocks present an additional challenge over mutexes: programmers must also ensure that threads acquire the correct *type* of lock before entering a critical section or potentially modifying data. For example, acquiring a shared lock and then modifying shared data is undefined behavior. As you implement key-value store functions, keep in mind which functions are amenable to concurrent access, and which require exclusive access.
:::info
<details><summary><span><b>Example of RWLocks</b></span></summary>
<br/>
To acquire exclusive access to a critical section, use the lock and unlock functions. To acquire shared access, use lock_shared and unlock_shared:
```cpp
shared_mtx.lock_shared(); // acquire/release a shared lock
/*
* Do work here. Do not modify any shared data in this critical section!
*/
shared_mtx.unlock_shared();
shared_mtx.lock(); // acquire/release an exclusive lock
/*
* Do work here. You can access and modify shared data here.
*/
shared_mtx.unlock();
```
See the [`std::shared_mutex`](https://en.cppreference.com/w/cpp/thread/shared_mutex) class for complete documentation.
In addition, like regular mutexes, C++ provides a wrapper class, [`std::shared_lock`](https://en.cppreference.com/w/cpp/thread/shared_lock), that automatically unlocks the shared mutex once it goes out of scope. This has the same API as `std::unique_lock`:
```cpp
void foo() {
// On initialization, a shared lock is acquired on `shared_mtx`.
std::shared_lock guard(shared_mtx);
/*
* Do work here. Do not modify any shared data here!
*
* On function exit (where `guard` goes out of scope), the shared lock is automatically released.
*/
}
```
</details>
:::
<details><summary><i><span style="color:#0000EE">Optional: Why not just use RWLocks all the time?</span></i></summary>
<br/>
Especially in applications that experience read-heavy workloads, RWLocks permit more concurrency over regular mutexes. However, **RWLocks should not be used as a drop-in replacement for mutexes**: RWLocks come with additional overhead over simple mutexes, as they must contain additional state and logic to handle readers and writers. RWLocks also must avoid *reader* or *writer* starvation, where one operation is disproportionately favored. [**Wikipedia's article**](https://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock) provides an introduction to potential RWLock implementations and problems.
</details>
<br/>
:::success
**Task**: Add RWLocks to `DbMap` to synchronize access to your buckets.
:::danger
You may not use a single, global mutex. If you find yourself writing code like this:
```cpp
global_mtx.lock();
// < your code here >
global_mtx.unlock();
```
you are doing something wrong. You will not pass the performance tests, and *may face further deductions during grading*. The purpose of this step is to support thread-safe, *parallel* access to the store--a single lock prevents parallelism!
<details><summary><b><span>Hints</span></b></summary>
<br/>
Think about a more fine-grained approach that makes concurrent access to your store more efficient! See Part 3 for an explanation of coarse-grained vs. fine-grained locking. How can you use multiple RW locks such that multiple threads can operate on your store in parallel?
</details>
:::
#### Synchronization Considerations
Now, it's time to use the primitives you defined to synchronize access to your key-value store. Here are two hints before you get started:
1. **Enforce a consistent locking order.** For example, if a given operation requires `lock1` and `lock2`, you should enforce that every operation always attempts to lock `lock1` before `lock2` (or vice versa). Think about why this strategy is necessary to prevent deadlock!
2. **Acquire all relevant locks before accessing or modifying the store.** We want each operation in our key-value store to be *atomic*, meaning that they are each executed as a single, indivisible unit. In this context of a key-value store, other operations should not modify relevant buckets while the `MultiGet`, `MultiPut`, and `AllKeys` operations are running.
Consider two concurrent operations: one MultiGet that requires `lock1` and `lock2`, and one Delete that requires `lock1`. Imagine the following execution order:
* The MultiGet acquires `lock1`, gets `value1` associated with `key1`, then releases `lock1`.
* The Delete acquires `lock1`, deletes the `key1: value1` entry, then releases `lock1`.
* The MultiGet acquires `lock2`, gets some values, than releases `lock2`.
When the MultiGet returns, it will returns a value (`value1`) that is no longer in the store. We can prevent this bug by ensuring that the MultiGet owns both `lock1` and `lock2` before proceeding, and does not release either until it has finished entirely.
::::success
**Task**: Modify the methods in `kvstore/concurrent_kvstore.cpp` to enforce synchronized access, using the primitives you defined in the previous task.
When you are finished, you should pass all `kvstore_parallel_tests`. You can check that your implementation is correct by running **in the `build/` directory**:
```shell
build$ make check A4
```
**Performance requirement**: If you implemented fine-grained locking correctly, you should also pass the performance tests, which must show at least 2x speedup relative to the single-threaded implementation. You can run the performance tests by running **in the `build/` directory**:
```shell
build$ make perf
```
For full-credit on the performance tests, your multi-threaded solution with fine-grained locking needs to be at least **2x faster** than (i.e., achieve twice the throughput of) the single-threaded, sequential implementation.
:::danger
:warning: **Warning**: the performance test scripts will report results in two ways:
- The `PASSED`/`FAILED` status indicates if your status at the top reports on *correctness*--i.e., did all of the put/get/multiput/multiget operations return produce the expected results
- The red/green bars measure *performance*. **If the bar is red, it means you are not yet meeting the performance requirement, even if the test shows as `PASSED`!**
:::
::::
At this point, your implementation should pass all of the Part A tests! You can test this by running **in the `build/` directory**:
```shell
build$ ./run_tests concurrent_store
```
or
```bash
build$ make check concurrent_store
```
# Part 2: Tweeter
### Context
Now that you've implemented a simple key-value store, let's build upon it with a use case!
In this part of the assignment, you’ll be working with a specialized and more complicated key-value store for a new social media platform, Tweeter. On Tweeter, users can choose their usernames, write posts that appear on their profiles, and respond to other users’ posts (which appear on both users’ profiles).
In Tweeter’s database, there are five kinds of key-value pairs.
| Key Value Pair Structure | Example Return Value |
| -------- | -------- |
| `user_id` → username | “user_14” → “malte” |
| `post_id` → post content | “post_59” → “Hello, Tweeter!” |
| `all_users` → comma-separated list of `user_ids` | “all_users” → “user_13,user_14,user_160” |
| `user_id_posts` → comma-separated list of `post_ids` that user has posted | “user_14_posts” → “post_59,post_1”
| `post_id_replies` → comma-separated list of `post_ids` that respond to post | “post_59_replies” → “post_60, post_61” |
Even though there can be multiple users with the same usernames, every `user_id` is unique. The same goes for posts: even though there can be multiple posts with the same text, every post has a unique `post_id`.
### Loading your KVStore with data
We provide you with the data stored on Tweeter's instance of KVStore **[here](https://docs.google.com/spreadsheets/d/1F0v4PTkbPRzdsqgWmFxmBSZywUV_06T88pIYZM44a6k/edit?usp=sharing)**. The same data is also in the `gdpr/database.txt`, and you can load it into your KVStore as follows:
1. In one terminal, `cd` to the `build` directory and build your KVStore with `make`
2. Then, from the `build` directory, run the following command to start a KVStore server listening on port 1234 (you can pick any number in above 1024 for the port, it just sets up a rendevous point for your client to connect with the server):
```shell
build$ make # Compile the server and client
build$ ./server 1234
```
3. **In another terminal**, run the client and feed in the data:
```shell
build$ ./simple_client 127.0.0.1:1234 < ../gdpr/database.txt
```
After running this command, typing `print store` into the first terminal (which runs the server) should show that your KVStore contains our dataset. Note that since the KVStore is an in-memory store, you will need to re-load the dataset every time you restart the server, as it loses its contents on shutdown.
Now, a client can connect to the server and make API requests; for instance, to fetch `user_01`'s data from the server, start up a new client and make a `Get` request:
```shell
build$ ./simple_client 127.0.0.1:1234
get user_01
```
### Regulatory Background
When implementing a real-world storage system handling personal data like Tweeter, it is essential to understand the legal frameworks that govern data privacy. In this section of the assignment, you will implement features pertaining to privacy regulations, with a particular focus on the European Union's General Data Protection Regulation (GDPR). This regulation is designed to ensure that users have agency over their data, including the right to both access and delete their information online.
:::success
**Task:** Please read the specific requirements of the GDPR at the following links.
* [Art. 15 GDPR - Right of access by the data subject](https://gdpr.eu/article-15-right-of-access/)
* [Art. 17 GDPR - Right to erasure ('Right to be forgotten')](https://gdpr.eu/article-17-right-to-be-forgotten/)
:::warning
**What about the United States?**
The US is moving slower in data protection laws than the EU, but some states have created GDPR-like regulations such as the California Consumer Privacy Act(CCPA).
You can read about its equivalent policies here:
* [CCPA Right to Delete](https://www.oag.ca.gov/privacy/ccpa#sectiond)
* [CCPA Right to Know](https://www.oag.ca.gov/privacy/ccpa#sectionc)
:::
::::info
<details><summary><strong>In a nutshell… 🥜</strong></summary>
### Right to Access
_GDPR Right of Access (Article 15)_ – Users can receive a copy of their own data upon request, as well as the source of any data that was not collected from the subject. When relevant, they can also request: the reason their data has been processed, the categories of data being processed, who can access this data, and the duration for which their data will be stored. They must also be informed when their data is transferred to a third party.
:::warning
<details><summary> <strong>And the United States?</strong></summary>
_CCPA Right to Know_ – Users can request access to personal data from companies, as well as: the categories of their data, the categories of sources, the purpose of collecting that data, and information about what categories of data are being transferred to what types of third parties.
#### What’s the difference?
Not much – for our application they share broadly similar data access rights!
</details>
### Right to Delete:
_GDPR Right to be Forgotten (Article 17)_ – Users can request the deletion of the personal data under six specific circumstances, including when it is no longer needed for the purpose it was collected, or the user has revoked consent. Companies are not required to comply if the data is essential for: avoiding infringing upon freedom of expression, complying with legal requirements, engaging in scientific or historical research, or reasons in the interest of public health.
:::warning
<details><summary> <strong>And the United States?</strong></summary>
_CCPA Right to Delete_ – Users can request the deletion of all personal data at any time. Companies are not required to comply if the data is essential for: fulfilling the purpose for which it was collected, ensuring security and integrity, identifying and repairing bugs in functionality, avoiding infringing upon the rights of another consumer (e.g. to free speech), engaging in scientific, historical, or statistical research, “enabling solely internal uses that are reasonably aligned with the expectations of the consumer,” or complying with legal requirements.
#### What’s the difference?
The primary difference is that “the GDPR right only applies if the request meets one of six specific conditions while the CCPA right is broad. However, the CCPA also allows business to refuse the request on much broader grounds than the GDPR” ([Practical Law](https://www.bakerlaw.com/webfiles/Privacy/2018/Articles/CCPA-GDPR-Chart.pdf)).
</details>
::::
### Stakeholders
Tweeter has users who are in the EU, so Tweeter must comply with the GDPR’s right to access and the right to be forgotten. You are in charge of making a decision on how to handle a particular user's request to exercise their right to access and their right to be forgotten.
::::success
**Task:** Check EdStem! You will find a pinned post containing information about which stakeholder pair (or pairs) you have been assigned to complete this portion of the assignment.
If you have been assigned two stakeholder pairs, **please use the first one for all parts of the assignment.** You will use the second pair to answer question 3B later!
::::
You will write a function `GDPRDelete()` that performs a delete request for your assigned stakeholder. There are several ways one might implement privacy-conscious deletion (think about what we discussed in section for ideas!). While many different kinds of delete can be acceptable for your stakeholder pair, **you cannot opt out of implementing some form of delete and reject their request, although your implementation may not meet all of their hopes and expectations**. (In the real world, there are cases where deletion requests have been rejected completely, but for the scope of this assignment, we’ve selected the dataset and the stakeholders such that you should be able to implement some kind of delete, even if it is an unsatisfactory one for your stakeholder.)
The caveat with this particular assignment is that you are *also* accountable to an ethical auditor who is charged with scrutinizing your company’s handling of GDPR compliance. Therefore, when satisfying your stakeholder’s request for deletion in this assignment, you must also consider other stakeholders that could be affected by this deletion and whose legitimate claims and interests might be infringed upon.
The following introduces each stakeholder pair and explains the context in which the deletion request occurs:
**PAIR #1:**
* **Data Subject: Congressperson Kirby** — Congressperson Kirby is currently the elected congressperson for Rhode Island’s first district and is running for reelection in the current election cycle. Recently, some of their tweets from when they were in college have resurfaced. These tweets were written 10 years ago concerning a pandemic that occurred at the time. While the congressperson has issued apologies, this has not been enough to stop the ongoing discourse from users all across the political spectrum. Because of this, the congressperson has made a request to exercise their right to be forgotten. They wish to delete their account, which they hope will delete their original tweets and related public discourse (other users quoting or rewording the original posts) about the controversy.
* **Opposing stakeholder:** [Freedom House Advocacy Group](https://freedomhouse.org/issues/media-freedom) — This advocacy group is concerned with government accountability & transparency. They oppose Kirby’s attempt to remove evidence of their controversy from the past.
**PAIR #2:**
* **Data Subject: Sarsra Breisand** — Sarsra Breisand is an American singer, actress and director. With a career spanning over six decades, she has achieved success in multiple fields of entertainment, and is among the few performers awarded an Emmy, Grammy, Oscar, and Tony (EGOT). Recently, a paparazzi reporter leaked information about where Sarsra Breisand lives. Despite her attempts to hide this information, she draws more attention to this leaked information. As a desperate final attempt to protect her privacy, Sarsra has made a request to exercise her right to be forgotten and delete her account and all posts related to her. Sarsra understands that this will erase her social media profile, but hopes that this will put an end to the interest in her whereabouts.
* **Opposing stakeholder:** Beth Abraham — Beth Abraham is a historian that has begun studying what she has named the Breisand Effect. She has released several research papers on this psychological and sociological phenomenon and is in the process of writing another.
**PAIR #3:**
* **Data Subject: Angel Yoline** — Angel Yoline is an up-and-coming actress who is excitedly promoting her new film on Tweeter. However, a while ago she posted views on Tweeter that characterize working class people in a negative light. Her former partner, Brad Schmidt, has recently highlighted these problematic views as a means of getting revenge on Angel. She now wishes for her own post, Brad’s post, and the public discourse about the controversy to be deleted.
* **Opposing stakeholder:** Film Enthusiasts — Film enthusiasts have been excited about the new film that Angel Yoline is starring in. When an account exposes something that Angel Yoline said about working class people, there is a surge of new posts discussing her controversial opinions.
**PAIR #4:**
* **Data Subject: Frank Blimp** — Frank Blimp is a recent divorcee who doesn’t have custody of his child from the marriage. Falling into hard times, Frank missed a couple of child support payments, which his ex-wife, Marge, has called him out on Tweeter. Even though both Frank and Marge have private accounts, these posts are visible to Frank’s friends and family. Since this tweet, Frank has been able to repay the missed payments and is financially stable enough to continue making future payments. He’s asked Marge to take down her tweets shaming him, but she has refused. Frank now requests his data to be erased in the hope that this will also make posts mentioning him disappear.
* **Opposing stakeholder:** Marge Blimp — Marge is Frank’s ex-wife who has main custody of their child. After he had missed multiple child support payments, she resorted to using Tweeter to call him out. She wants others to be aware of Frank's past behavior.
**PAIR #5:**
* **Data Subject: Matt Bleat** — Matt Bleat is a senior in high school who has just committed to Brown University after being recruited to their track and field team. Recently, there have been posts from Matt’s high school classmates calling attention to former allegations against Matt for sexual harassment. The high school determined that there was insufficient evidence to support those allegations and declined to take further action. Matt invokes the right to erasure, expecting that it will remove his posts, but also remove posts that mention hashtags related to the controversy.
* **Opposing stakeholder:** The Center for Changing Our Campus Culture — They believe that posts highlighting the allegations against Matt should remain on Tweeter, and Matt should be prepared to deal with the possibility that Brown could discover these tweets and rescind their offer of admission.
### The `GDPRDelete()` Function
The signature of `GDPRDelete` is as follows:
```c++
bool GDPRDelete(std::string& user_id);
```
In other words, the function receives a single argument, which is the user ID of the data subject who is invoking the right to erasure. The function can use this argument to look up data related to the data subject in the KVStore, or to find the user's identifier (e.g., "user_01") in other data.
:::danger
**Note:** Although the `GDPRDelete()` function should be *specific to your stakeholder pair* (i.e. delete a user's data in the way that you determine makes the most sense for that situation), **it should be able to apply that deletion scheme to any `user_id` passed to it**. You should not be deleting based on hard-coded post numbers or user ids, but instead come up with a general implementation with that stakeholder pair in mind.
:::
In implementing `GDPRDelete()` you can consider a number of broad options for data store modification including, but not limited to:
* Complete removal of data.
* Modification of data (e.g. anonymization, modification, removal of parts of the content).
* Making data harder to find, i.e. delisting or downranking (what might this mean in the context of this KVStore?).
* Adding additional keys to the store.
Your `GDPRDelete()` is allowed to leave the KVStore in an inconsistent state (e.g., the key "user_01" is deleted from the store, but remains in the "all_users" list), but if this is the case, this should be clearly noted and justified.
Note that the function does not receive information about the context in which the deletion happens (e.g., what other users' data the data subject might want to delete, or what the other users' views on this are). Your design could extend the KVStore with auxiliary metadata that captures relevant context (e.g., special key-value pairs that indicate users who are of special interest, such as public figures), and `GDPRDelete()` may draw on this data. Using such metadata is not a requirement, however.
:::info
We will not grade you on the level of sophistication your `GDPRDelete()` function achieves, but rather on whether it works. As long as your written answers **justify your choices**, you will get full credit, even if the function itself is simple.
:::
### Before you write any code…
You might feel overwhelmed with the situation you’ve been presented with — we’ve engineered each situation such that the conflict is _intentionally difficult_ to deal with. Because of this, we want you to consider the following questions before you touch any of the code. You don’t have to submit your answers for this, but we highly recommend writing down some notes for yourself.
1. Consider why the data subject would want their data (and potentially the others’ data) to be deleted. What kind of ethically significant claims might lead to the acceptance or rejection of their request?
2. Consider why the opposing stakeholder wouldn’t want data to be deleted. What kind of ethically significant claims would lead to the acceptance or rejection of their request?
3. What strategies or mechanisms could you propose to mitigate potential harm to the opposing stakeholder while still respecting the data subject's right to be forgotten?
:::warning
_**Important note:** we don’t expect you to find a solution that satisfies everyone — rather, the point is to make the most reasonable tradeoffs between the opposing parties’ claims._
:::
:::success
**Implementation Tasks**
**Task 1:** Implement `GDPRDelete()` in `client/simple_client.cpp`.
**Note**: `GDPRDelete()` must be thread safe. But recall, `GDPRDelete()` runs on the client side, thus the server-side requests that are triggered must be thread-safe, but you do not need to use mutexes in your `GDPRDelete()` function.
<details>
<summary>
<strong>
Hint
</strong>
</summary>
Look at the <code>Split()</code> function in <code>common/utils.cpp</code>.
</details>
**Task 2:** Leave detailed (header and/or inline) comments in your code that explain what kind of delete you are implementing.
`GDPRDelete()` should operate on any keys (such as `user_ID`, `post_ID`, etc) you deem necessary to implement your chosen strategy of deletion.
To test your `GDPRDelete()` function, you can use the `gdprdelete <user>` command in the client as follows:
```
$ ./simple_client 127.0.0.1:1234
gdprdelete user_10
```
You can then use `print store` in the server to see how your store contents have changed.
(Go back to [Loading your KVStore with data](#Loading-your-KVStore-with-data) if you need a reminder on how to start up your KVStore.)
:::
### After you’ve finished implementing delete…
The right to know and the right to delete affect all stakeholders of a database: its users, its operators, and those who use the data for technology, studies, and historical records. Now that you have implemented your version of privacy-conscious access and deletion, you should consider the potential pitfalls of your design choices in practice.
::::success
**Task:** Answer the following questions in your README file:
:::danger
You may want to read the ["How will you be graded"](#How-will-you-be-graded) section below before beginning this task to understand what we are looking for!
:::
**Q1.** Which stakeholder pair(s) were you assigned? (1 sentence)
**Q2.** What kind of delete did you implement and why? Explain your decisions as if you were reporting to an ethical auditor who will determine whether your design is justified. Highlight how your decisions address stakeholder concerns, and explain why you addressed their concerns in this way. If you purposely ignored any stakeholder concerns in your implementation, explain why they were not addressed (1-2 short paragraphs)
**Q3.** In the real world, Twitter, Google, Airbnb, and other companies write one implementation of GDPR-compliant account removal to realize the right to erasure. They do not write per-user functions! In the next part of the assignment, you'll think about how to generalize your implementation.
Answer the applicable question based on whether we assigned you one or two stakeholder pairs. Please mark which version of the question you are answering in your README. (1-3 short paragraphs)
* **Q3A.** ***If we assigned you only one stakeholder pair:*** What are the shortcomings of your implementation? How might your technical implementation change if you were asked to create a general `GDPRDelete()` suitable for _Right to be Forgotten_ requests from all of Tweeter’s users? Consider specifically some other short and/or long term stakeholders within your scenario for whom your implementation might not be suitable.
* **Q3B.** ***If you were assigned two stakeholder pairs:*** What are the shortcomings of your implementation? How might your technical implementation change if you were asked to create a general `GDPRDelete()` suitable for _Right to be Forgotten_ requests from all of Tweeter’s users? Consider specifically the concerns of your second pair of stakeholders, and whether your implementation of `GDPRDelete()` would be suitable for these stakeholders as well.
Along with considering additional stakeholders, you can also consider other outside entities who might be affected by your implementation of `GDPRDelete()`.
::::
### How will you be graded?
1. Your implementation needs to work and do what you indicate it is doing in your comments.
2. You’re not being graded on whether you picked the “right” answer — there is no one correct version of delete for each stakeholder pair. Rather, the bulk of your grade is determined by how well you explain why you chose your specific implementation: the justifications you offer in support of your implementation should reflect a **comprehensive assessment of the context and competing claims** and weigh them against each other in a **nuanced way**.
A good response identifies the legitimate claims that each stakeholder may have, explains why those claims are important, and compares the importance of both claims. It provides concrete reasons for (fully or partially) prioritizing or rejecting individual stakeholders’ claims and how those trade-offs are reflected in the chosen implementation of the right to be forgotten. A good response, importantly, also touches upon the limitations of those choices.
Your response should show that you have thought about the interests of your stakeholders beyond the directly legal implications. That is, ***you must use some justification outside of the GDPR itself for the design of your implementation***. Because GDPR requirements are defined rather loosely, you will have naturally made decisions beyond those required legally; a good response will highlight some of these decisions. You may also think about some of the values that motivate the GDPR requirements that you have drawn upon.
# Debugging with Sanitizers and GDB
Debugging concurrent code can be trickier than debugging single-threaded code. As threads interact with the shared data structures, it can be difficult to track down data races and deadlocks. Luckily, GDB and thread sanitizers can be really helpful.
### Thread Sanitizers
Google’s [ThreadSanitizer](https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual) for C and C++ can help you detect data races, deadlocks, and thread leaks. To run the code with thread sanitizers, compile it with:
```
make TSAN=1
```
When compiling tests, you will need to specify `TSAN=1` too:
```
make check TSAN=1
```
or
```
./test.sh TSAN=1 <A1|A2|A3|A4|A5>
```
<details><summary><span>Example</span></summary>
Consider the following example:
```cpp!
int count = 0;
int main() {
// Start child thread, which calls child()
std::thread p1(child);
std::thread p2(child);
// Only parent executes here
p1.join();
p2.join();
printf("count %d\n", count);
}
int child() {
// Child executes here
while(count < 100) {
std::this_thread::sleep_for(std::chrono::milliseconds(1));
count += 1;
}
return 0;
}
```
When you run this code using a thread sanitizer, it will produce the following output:

Note: Open the image in a new tab for a larger image.
* The red message indicates what error is occurring (in this case, a data race)
* The blue messages indicate conflicting memory accesses with a stack trace. In our case, the conflict occurs on line 25 and line 27. It includes which thread is performing each access, and if the thread holds a mutex it will also indicate it.
* Below these messages, the sanitizer might also indicate:
* * specific mutexes, when they were aquired, which threads acquired them, and sometimes when they were created
* * threads, and when they were created
* Refer to this [ThreadSanitizerReportFormat](https://github.com/google/sanitizers/wiki/ThreadSanitizerReportFormat) for more information on the report format.
</details>
### GDB
Sometimes if your program hangs due to a deadlock, the thread sanitizer may not be able to return any reports. In these cases, gdb has some more helpful tools. Here are some useful gdb commands:
* **`info threads`**: prints out existing threads, the Light Weight Processor (LWP) they’re running on, and the line of code they’re executing. The current thread will have an asterisk next to it.
```
3 Thread 0x7ffff3bff700 (LWP 9167) "example" __lll_lock_wait ()
at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
4 Thread 0x7ffff33fe700 (LWP 9003) "example" __lll_lock_wait ()
at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
```
* **`thread <thread-id>`**: switches to thread <thread-id>
```
(gdb) thread 4
[Switching to thread 4 (Thread 0x7ffff33fe700 (LWP 9003))]
#0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
135 ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S
```
* **`p <mutex>`**: Prints the fields of a mutex. Helpful for determining the owner of the lock identified by its Light Weight Processor.
* **`CTRL+Z`**: Stops the execution of the program. If your program is hanging likely due to a deadlock, this allows you to examine the current state of the program. For example, if one of your threads is blocked on a mutex, you can switch to that thread, see which thread owns the mutex, and then switch to that thread.
* **`bt`**: Prints out a stacktrace. Often times, threads are blocked on a system call (locking a mutex, waiting on a condition variable, or reading/writing to a file descriptor). Backtrace is useful for discovering where in the codebase the blocking call was called.
* **`f <frame-number>`**: Switches to the provided frame number. Allows you to switch to frames within files you’re familiar with and determine which function called blocking system calls.
# Handing In and Grading
## Handin Instructions
You will hand in your code using Git. In the `kvstore/` subdirectory of your projects repository, you MUST fill in the text file called `README-5A.md`.
<details><summary><span><b>Remind me again what the README should contain?</b></span></summary>
The README.md file will include the following:
* Any design decisions you made and comments for graders, under “Design Overview”. If there’s nothing interesting to say, just list “None”.
* Any collaborators and citations for help that you received, under "Collaborators". CS 300 encourages collaboration, but we expect you to acknowledge help you received, as is standard academic practice.
* Your answers to the questions regarding your `GDPRDelete()` implementation.
* Ranking the difficulty and time consumption of the project, compared to other projects so far. We won't use this for grading, but we collect the data to calibrate for future years.
</details>
:::danger
:warning: **Important extra step**: Because your you'll continue working on Project 5B from the same directory of your repository, you'll need to one extra step when submitting your work to let us know which commit you want graded for this project. To do this (example in the figure below):
1. Select your preferred commit in the dropdown on the grading server
2. Click the "**Grade this commit**" button

<br />
The "Grade this commit" button will record your latest commit for Project 5A. This way, as you work on project 5B, we'll know to return to this commit to grade Project 5A.
:::
**By 8:00pm on Friday, April 25th**, you must have a commit with all steps completed.
## Grading Breakdown
This project is graded out of 140 points, but will be weighted equally with the other projects in the course.
For the **KVStore** implementation parts:
* **18% (18 points)** for passing the A1 tests (2 pts per test).
* **7% (7 points)** for passing the A2 tests (1 pt per test).
* **40% (40 points)** for passing the A3 tests (4 pts per test, and 4 pts for passing them all).
* **21% (21 points)** for passing the A4 tests (3 pts per test).
* **14% (14 points)** for passing the A5 (performance) tests (7 pts per test).
For the **Tweeter** part of the project:
* **25% (10 points)** for your implementation of `GDPRDelete`.
* **25% (10 points)** for comments and explanation of how your deletion function works.
* **50% (20 points)** for answers to the three README questions.
Now head to the grading server, make sure that you have the “KVStore” page configured correctly with your project repository, and check that your KVStore runs on the grading server as expected.
Congratulations, you’ve completed the first part of the fifth CS 300 project! :+1:
---
<small><i>Acknowledgements:</i> This project was developed for CS 300 by Richard Tang, Carolyn Zech, Cedric Sirianni, and Malte Schwarzkopf. The Tweeter portion of this project was developed for CS 300 by Eva Schiller, Eva Lau, Colton Rusch, and Malte Schwarzkopf.</small>