Final Quiz (Spring 2025) #

Show all solutions Hide all solutions

Rules #

Read these rules carefully! In this quiz, you CANNOT interact with others, use AI tools, or access shared documents or editors. If you violate these rules, we will treat this as a matter of academic integrity and follow the process for academic code conduct violations.

The quiz is open book, open note, open computer. You may access the book and your own notes. You may also use computers or electronic devices to access your own class materials, public class materials, and online resources. Specifically:

You may access a browser and a PDF reader.
You may access your own notes and assignment code electronically.
You may access the course site.
You may access our lecture notes, exercises, and section notes.
You may access our lecture, assignment, and section code.
You may access online resources, including search engines, Wikipedia, and other websites.
You may run a C or C++ compiler, including an assembler and linker.
You may access manual pages and common utilities, including calculators and a Python interpreter.

However:

You cannot use AI tools (like ChatGPT, Github Copilot, Google AI, and others) during the quiz, with the exception of "AI overview" text from a normal search engine query.
You cannot contact other humans with questions about the quiz — whether in person, via IM, or by posting questions to forums — with the exception of course staff.
You cannot interact with other humans via chat, text, phone calls, shared Google Docs, or anything like it.
You cannot ask the quiz questions on class discussion forums or other question forums such as Stack Overflow.
You cannot access solutions from any previous quiz, from this course or others, by paper or computer, except for those on the CS 300 course site.

Any violations of this policy are breaches of academic honesty and will be treated accordingly. Please appreciate our flexibility and behave honestly and honorably.

Additionally, students are taking the quiz at different times. Do not post publicly about the quiz until given permission to do so by course staff. This includes general comments about quiz difficulty.

Completing your quiz #

You have 180 minutes to complete the quiz starting from the time you press "Start".

Different students may have different quizzes. Enter your answers directly in this form; a green checkmark appears when an answer is saved.

Students with extra time accommodations: If your extra time is not reflected next to the start button, contact course staff and wait for confirmation before pressing it.

You should not need to attach a drawing or other longer notes, but you may do so by adding them to a final subdirectory in your cs300-s25-projects repository. Make sure you push your changes to GitHub before time is up, and explain in this form where we should look at your notes.

Notes #

Assume a Linux operating system running on the x86-64 architecture unless otherwise stated. If you get stuck, move on to the next question. If you’re confused, explain your thinking briefly for potential partial credit.

1. Virtual Memory and Page Tables (12 points) #

UTAs Ross, Anish, and Julia hopped in a Time Machine (😉) and traveled to the year 2125, where computers now use a ARM128, a 128-bit processor architecture. In ARM128, addresses are 128 bits (16 bytes) and memory pages are 65536 bytes (i.e., 2^16 bytes, or 64 KiB).

QUESTION 1A. How many 16-byte addresses can fit into one page in ARM128? Explain your reasoning.

QUESTION 1B. In ARM128, how many bits are required for the offset at the end of the address? Explain your reasoning.

QUESTION 1C. Julia discovers that ARM128 only targets systems with up to 2^112 bytes of memory, making the effective address size only 112 bits. With an effective address size of 112 bits, how many levels of page tables are required? Explain your reasoning.

2. Race Conditions and Synchronization (15 points) #

Alex and Chloe are writing a program that calculates the overall performance of submissions sent to CS 300's as-yet-unreleased super-secret high-performance grading server. Their program processes a log file from the grading server, which contains a series of records about recent submissions sorted by submission time. The log file is 65536 bytes total, and each record is exactly 16 bytes.

Alex and Chloe have already written a function to load the data into a buffer, and now want to use two threads to process the data, starting with the earliest records. They start with the following program:

#define BUF_SIZE 65536
#define RECORD_SIZE 16

char buf[BUF_SIZE];
size_t head = 0; // Next index to start reading in the buffer

void load_data(char* file, char* buffer, size_t sz) { /* . . . */ }
void process_record(char* record) { /* . . . */ }

void get_next_record(char* dest) {
    memcpy(dest, &buf[head], RECORD_SIZE);
    head += RECORD_SIZE;
}

void worker_thread() {

    while(head < BUF_SIZE) {
        // Read a record from the buffer
        char this_record[RECORD_SIZE];
        get_next_record(this_record);

        // Do some super-expensive computation on this data
        process_record(this_record);
    }

}

int main() {
    // Load the log data into the buffer
    load_data("data.txt", buf, BUF_SIZE);

    // Create two threads to process the data
    std::thread threads[2];
    threads[0] = std::thread(worker_thread);
    threads[1] = std::thread(worker_thread);

    // Wait for the threads to finish
    threads[0].join();
    threads[1].join();

    return 0;
}

QUESTION 2A. Alex argues that since the threads only read from the buffer, the basic rule of synchronization is not violated. Why is Alex's reasoning incorrect? Explain your reasoning, and include an example of a problematic interaction between the two threads.

QUESTION 2B. Chloe proposes using a mutex to solve the problem. Where should Chloe lock and unlock the mutex in order to prevent the issue? The code below contains a copy of the worker thread function with several possible locations marked (Point A, Point B, Point C, ...). Indicate the locations where you would lock and unlock the mutex to solve the issue you identified in part A, and explain you reasoning. Your answer should have the form "lock at X, unlock at Y". followed by an explanation (a few words, at most one sentence) of how your design solves the problem.

Assume that all other code remains unchanged, and that no other changes can be made to this function besides locking/unlocking the mutex.

std::mutex mtx; // Mutex declared as global (all other variables unchanged)

void worker_thread() {

    // **** POINT A ****
    while(head < BUF_SIZE) {
        // **** POINT B ****

        // Read a record from the buffer
        char this_record[RECORD_SIZE];
        get_next_record(this_record);

        // **** POINT C ****

        // Do some super-expensive math on this_record
        process_record(this_record);

        // **** POINT D ****
    }
    // **** POINT E ****
}

QUESTION 2C. Chloe is concerned that this implementation will not be able to take advantage of parallelism, and thus will not have a significant speedup compared to not using threads. Is Chloe correct? Write your answer as "yes" or "no" and briefly explain your reasoning (around one sentence).

3. Fork & Pipes (12 points) #

Below is a sample program written by TA Alaina as she’s exploring the power of fork() and pipes! Unfortunately, she doesn’t really understand what the expected behavior is . . . Answer the following questions about the program to help her out!

#include <stdio.h>
#include <sys/types.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>

int pipe1[2];
int pipe2[2];

int main() {
    pipe(pipe1); 
    pid_t pid_first = fork(); 

    if (pid_first == 0) { // child process
        pipe(pipe2); 
        pid_t pid_second = fork(); // child calls fork, creating grandchild
        
        // **** [PART A] ****
    }
    return 0;
}

QUESTION 3A. After creating three processes (parent, child, or grandchild) and two pipes (pipe1, pipe2), Alaina is curious about how the processes could communicate with each other.

Consider the point in the code labeled \[PART A\]. For each of the following options, write "yes" or "no" if the two processes mentioned could communicate (via read or write calls) using the specified pipe, and briefly explain your reasoning (a few words, at most one sentence).

For each option, assume that no data has been sent over any pipe yet, and that all options are independent (meaning that no two options would be used in the same program). Also, assume that all calls to fork and pipe succeed, and that Alaina would eventually add code to close the appropriate ends of the pipe that would not be used (depending on which option she picked).

Parent could communicate with child using pipe2
Child could communicate with parent using pipe1
Grandchild could communicate with child using pipe2
Grandchild could communicate with parent using pipe2

QUESTION 3B. Having learned about pipes, Alaina now wants to learn more about fork() and execv(). She keeps the same parent-child-grandchild process structure from the previous program and removes all the code with pipes, and adds the following code:

    int main() {
        pid_t pid_first = fork(); 

        if (pid_first == 0) { // child process
            pid_t pid_second = fork(); // child calls fork, creating grandchild

            if (pid_second == 0) { // grandchild process   // <-- *** NEW ***
                char* args[] = {"./other_program", NULL};  // <-- *** NEW ***
                execv("./other_program", args);            // <-- *** NEW ***
            }                                              // <-- *** NEW ***
        }

        printf("Hello from %d\n", getpid()); // <-- *** NEW ***

        return 0;

    }

Which of the following options would be printed as a result of running Alaina's new program? Pick one option and explain your reasoning (a few words, at most one sentence).

When the program runs, assume that the program other_program exists, is being called with valid arguments, and makes no additional print statements. Also, you may assume that all calls to fork and execv succeed.

Option 1:

Hello from <parent PID>

Option 2:

Hello from <parent PID>
Hello from <child PID>

Option 3:

Hello from <parent PID>
Hello from <child PID>
Hello from <child PID>

Option 4:

Hello from <parent PID>
Hello from <child PID>
Hello from <grandchild PID>

4. Condition Variables (8 points) #

Bokai and Liz are exploring ways to send messages between threads and are building a program where 4 sender threads all want to send the message "hi bokai!" to a receiver thread.

Instead of using a bounded buffer, they devise a system where the sender and receiver threads share a single pointer (data). When a thread wants to send data, it allocates the actual message on the heap using malloc, and assigns data to point to this newly-allocated memory. Then, the sender sets a flag (ready) to indicate there is data waiting to be read, and signals a condition variable to wake up the receiver thread. After printing out the message, the receiver thread frees the data, since it's no longer needed, and sets the ready flag to false to wait for the next message.

Here's the code for their current version:

#define N_THREADS 4
bool ready = false;
std::mutex mtx;
std::condition_variable cv_ready;
char* data = NULL; // NULL when no data available

void send(int idx, char* message) {
    std::unique_lock<std::mutex> guard(mtx);
        
    assert(data == NULL);  // <------------------ Assertion that fails (PART A)
        
    // Create message, store it at shared pointer
    char* buf = (char*)malloc(100);
    memset(buf, 0, 100);
    memcpy(buf, message, strlen(message));
    data = buf;

    ready = true;
    cv_ready.notify_all(); // Wake up receiver thread
}

void receive_all() {
    for (int i = 0; i < N_THREADS; i++) {
        std::unique_lock<std::mutex> guard(mtx);

        while(!ready) {
            cv_ready.wait(guard);
        }

        printf("Received: %s\n", data);

        // Done with this message, free it and reset for next sender
        free(data);
        data = NULL;
        ready = false;
    }
}

int main(int argc, char** argv) {
    std::thread senders[N_THREADS];
    for(int i = 0; i < N_THREADS; i++) {
        senders[i] = std::thread(send, i, (char*)"hi bokai!");
    }
    std::thread receiver = std::thread(receive_all);
    for(int i = 0; i < N_THREADS; i++) {
        senders[i].join();
    }
    receiver.join();
    return 0;
}

However, when running the program, Bokai and Liz discover that it has a bug! They start debugging and find that, after sending at least once, the program fails the assert statement marked above, which checks that there is no data "waiting" in the pointer before the sender starts sending.

QUESTION 4A. Assuming no spurious wakeups, give an example of the execution order of the threads (and any wait/notify_all calls involved) that will cause the program to crash with an assertion failure. Your answer does not need to include all threads, just enough to describe the problem.

Example answer format:

receiver: calls wait()
sender 1: checks assertion, sets data to point to message, calls notify_all
receiver: prints and frees message
sender 2: ...

QUESTION 4B. Bokai and Liz have a hunch that they can solve the problem by adding another condition variable, so they add a new one called cv_empty to wait for the data pointer to be empty. They know they need to add the following code segments, but they aren't sure where to add them:

Segment 1 (wait):
while(data != NULL) {
    cv_empty.wait(guard);
}
Segment 2 (notify_all):
cv_empty.notify_all();

The code below contains a copy of the sender and receiver functions, with several possible lines marked (A, B, C, D, ...). Indicate the locations where Bokai and Liz would add wait and notify calls to fix the issue. You should select one location for the call to wait (where segment 1 would be added) and one location for the call to notify (where segment 2 would be added). (Example answer: "wait on X, notify on Y"):

void send(int idx, char* message) {
    std::unique_lock<std::mutex> guard(mtx);
    
    // ****POINT A****
    assert(data == NULL);  // <------------------ Assertion that fails (PART A)
    // ****POINT B****

    // Create message, store it at shared pointer
    char* buf = (char*)malloc(100);
    memset(buf, 0, 100);
    memcpy(buf, message, strlen(message));
    data = buf;

    // ****POINT C****
    ready = true;
    cv_ready.notify_all(); // Wake up receiver thread
}

void receive_all() {
    for (int i = 0; i < N_THREADS; i++) {
        std::unique_lock<std::mutex> guard(mtx);

        // ****POINT D****
        while(!ready) {
            cv_ready.wait(guard);
        }

        printf("Received: %s\n", data);
        // ****POINT E****
        
        free(data);
        data = NULL;
        ready = false;

        // ****POINT F****
    }


}

5. Undefined Behavior (10 points) #

Alex K and Allison found the following code, and they think it's sus.

QUESTION 5A. Find 3 instances of undefined behavior (out of 4 total) in the code below. To give your answers, write the line number where you noticed the problem, and a brief description (e.g., "line 3, freeing pointer that was already freed").

 1 #include <thread>
 2 #include <vector>
 3 #include <cstring>
 4
 5 // Increments a value passed by reference
 6 void plus20(int* x) {
 7   *x = *x + 20;
 8 }
 9
10
11 int* do_the_thing() {
12   int magic_number = 2;
13   return &magic_number;
14 }
15
16 int main() {
17   std::thread threads[3];
18   int counter = 10;
19
20   // Make some threads
21   for (int i = 0; i < 3; i++) {
22     threads[i] = std::thread(plus20, &counter);
23   }
24
25   int* a = do_the_thing();
26   long new_number = 0;
27   memcpy(&new_number, a, 8);
28   counter = new_number;
29
30   // Join threads
31   for (int i = 0; i < 3; i++) {
32     threads[i].join();
33   }
34
35   return 0;
36 }

6. Sharding and Scalability (10 points) #

Andrew and Cedric are building a distributed system to serve pictures of cats. They have accumulated a lot of cat pictures and (naturally) a lot of users, so they are considering some different sharding strategies in order to improve scalability.

QUESTION 6A. Cedric first proposes a design where a single, centralized shardcontroller keeps track of all the shards. When a client makes a request, it always contacts the shardcontroller, which then makes a request to the appropriate shard(s), as shown in the figure below. In a few words (max 1 sentence), why is this approach not scalable?

QUESTION 6B. Andrew next proposes an alternative where the client does not communicate with the shard controller on every request. Instead, the client periodically fetches a shard configuration from the shard controller and then consults the configuration when a request is made, as shown in the figure below. Compared to the strategy from part A, what is one advantage and one disadvantage of this new approach? Your explanation for each advantage/disadvantage should be a few words, at most one sentence.

QUESTION 6C. Cedric has a new idea to eliminate the shardcontroller entirely. Instead of storing the mapping of shards to servers on the client, he proposes using a fixed hash function to compute a hash value for each key, and then use this to select a server. For example, given N shards, the client would know which shard to contact by computing the hash value modulo N.

Compared to the strategy from part B, what is one advantage and one disadvantage of this new approach? You may assume that the hash function is deterministic and is designed to uniformly distribute keys across shards. Your explanation for each advantage/disadvantage should be a few words, at most one sentence. (Hint: what if Taylor Swift posts a picture of one of her cats?)

7. Signed Numbers (8 points) #

Nick writes some code to add some numbers, but is concerned that some of the add operations will invoke undefined behavior and cause his computer to explode!

QUESTION 7A. For each of the 4 examples below, determine if the add operation invokes undefined behavior. If so, provide a brief explanation why; if not, provide the result of the expression (in decimal or hex, you decide).

#include <stdlib.h>
    
int main(int argc, char** argv) {
    int int1 = 15;
    int int2 = 0x7FFFFFF5;
    int int3 = 0xFFFFFFF2;
    unsigned int uint4 = int3;

    // Examples (ex1 to ex4)
    int ex1 = int1 + int3;
    int ex2 = int2 + int1;
    unsigned int ex3 = uint4 + (unsigned int)int1;
    unsigned long ex4 = (unsigned long)uint4 + (unsigned long)int1;

    return 0;
}

8pts
Answers:
ex1: 15 + (-14) = 1; No, this is fine
ex2: 0x7ffffff5 + 15 => UB, signed integer overflow
ex3: 0xfffffff2 + 0xf = 0x1; No, this is unsigned overflow (wraparound), which is not UB
ex4: 0xfffffff2 + 0xf = 0x100000001 or 4294967297; Not UB, result fits within unsigned long

8. KVStore (10 points) #

After office hours one night, Akshay and Allen found a crumpled-up piece of paper with some code for an implementation of MultiGet for the concurrent key-value store in Project 5A. Akshay and Allen notice that the code uses bucket-based locking, which was intended to improve performance by allowing multiple threads to access the key-value store in parallel.

However, scrawled all over the code are the words, "Too slow!!! Not running in parallel at all??" Intrigued, Akshay and Allen type in the code and--sure enough--it passes all of the correctness tests, but does not show a significant performance improvement compared to a single-threaded implementation. The code is shown below.

(Note: Some parts of the code have been omitted for clarity; you can assume any removed code--and all other functions not shown--work properly and are not related to the issue.)

bool ConcurrentKVStore::MultiGet(MultiGetRequest* req, MultiGetResponse* res) {

    std::vector<std::string> keys_to_get; // Sorted list of keys to look up

    // [... populate keys_to_get with sorted list of keys we need ...]

    all_keys_mutex.lock();

    for (auto key : keys_to_get) {
        size_t b = store.bucket(key);   // Get the bucket for this key
        store.mutexes[b].lock_shared();

        auto kvpair = store.getIfExists(b, key);
        if (!kvpair) { // Key not found: unlock all mutexes and return
            store.mutexes[b].unlock_shared();
            all_keys_mutex.unlock();
            return false;
        }

        res->values.push_back(kvpair->value); // Add value to results
        store.mutexes[b].unlock_shared();
    }

    all_keys_mutex.unlock();

    return true;
}

QUESTION 8A. Akshay and Allen suspect that the performance issue is caused by the use of the global all_keys_mutex, which is locked before and after the for loop that looks up each key. Why does locking all_keys_mutex cause this version of MultiGet to have poor performance? Specifically, how does using the global mutex affect the implementation for bucket-based locking, and cause it to not show significant performance over the single-threaded version? Your answer should be at most 1-2 sentences.

QUESTION 8B. Akshay and Allen comment out the lines involving all_keys_mutex and run the tests again. They find that performance is much improved, but the correctness tests for MultiGet are now failing. Since the code worked before the global mutex was removed, they suspect that there is now an issue with synchronization that causes the code to return incorrect results when there are many simultaneous requests. However, the thread sanitizer does not report any issues, so they're not sure what to do.

Why does this implementation for MultiGet perform incorrectly when all_keys_mutex is removed? Your answer should be at most 1-2 sentences.

9. WeensyOS (8 points) #

Kazuya just finished implementing WeensyOS steps 5 and 6, where he implemented syscall_fork with sharing of read-only pages. Kazuya's current implementation follows our recommended specification for syscall_fork: it iterates over all the pages in the parent process; if a parent process' page is read-only, the page is shared with the child; if a page is writable, fork creates a new physical page, copies the contents of the parent's page into it, and maps the new page into the child process' new pagetable.

Now, Kazuya wants to try out a new strategy to make his WeensyOS more efficient: since allocating new pages and copying data is expensive, Kazuya believes that fork should just skip this step and share all (i.e., writeable and read-only) pages between the parent and child. Thus, after fork, the child process would have a new page table, but it would contain the same mappings to physical pages as the parent.

QUESTION 9A. In a sentence or two, explain why Kazuya's strategy strategy will not work. Specifically, your answer should explain why sharing pages in this manner is problematic, e.g., it is not sufficient to say, "because fork is supposed to copy pages").

QUESTION 9B. Kazuya has an idea for fixing his strategy from part A by leveraging the page fault handler. Currently, whenever a process tries to read or write to a page it does not have permission to access, the hardware will run the kernel's exception handler (the exception function) so the kernel can decide what to do (e.g., print an error like "write protection problem: cannot write to address 0x1200a4", kill the process, ignore it, etc.).

Kazuya wants to change the exception handler to help with copying pages: he wants to set things up so that the first time a process modifies a shared writable page, it runs the exception handler. Then, the exception handler could make a new copy of the page for the process that is writing to it, effectively delaying the copy until the page is actually modified. After the exception handler returns, the process can continue running as expected.

In this setup, which two of the following operations would be needed to correctly realize Kazuya's implementation? For each option, write "correct" or "incorrect" and briefly explain your reasoning (a few words, at most one sentence).

When creating the child process, syscall_fork should set all of the child process' pages to read-only
When creating the child process, syscall_fork should leave parent process' page permissions unmodified
After the first write occurs, the new copy of the page should be mapped into the child's page table as writeable
After the first write occurs, the new copy of the page should be mapped into the child's page table at a new virtual address that is different from the virtual address used in the parent

QUESTION 9C. Kazuya and Daniel are debating if Kazuya's revised solution (i.e., part B, which copies pages as they are modified, from the exception handler) is actually more efficient than the original version of syscall_fork, since it still copies pages.

In what circumstances is Kazuya's implementation more efficient than the original syscall_fork? Explain your reasoning in roughly one sentence. In your answer, we're looking for you to talk about high-level tradeoffs or examples about how/when pages are copied, rather than a detailed runtime analysis of the code that would need to be written.

10. CS 300/1310 Feedback (6 points) #

QUESTION 10A. If you could make one change to CS 300/1310 to improve it (other than firing the instructor and dropping the quizzes), what would you change? Any answer but no answer will receive full credit.

QUESTION 10B. What topics do you wish we had spent more time on? Any answer but no answer will receive full credit.

11. Submit! (1 points) #

QUESTION 11A. Write "I AM DONE" in the box. (You may still continue editing other questions until the time limit expires.)