labRust.md - HedgeDoc

<style> details { margin-top: 0.5em; margin-bottom: 0.5em; } summary { //font-weight: bolder; } summary:hover { text-decoration: underline; } blockquote { font-size: 16px; } .todo { color: #ff00ff; border: 2px dashed #ff00ff; padding: 0em 1em; border-radius: 5px; margin-top: 1em; margin-bottom: 1em; //display: none; // UNCOMMENT TO HIDE TODOs } </style> **[« Back to the main CS 300 website](https://csci0300.github.io)** # Section 9: Safer Systems Programming (featuring Rust) --- :::danger **⚠️ This section is meant to be done in-person with other students. ⚠️** ::: # Introduction By this point in the semester, you've done a significant amount of programming in C/C++ and are likely very familiar with our good friend, GDB. We've debugged issues like unknown segfaults and buffer overwrites and will soon tackle race conditions and deadlocks! A common link between all these errors is that they are a consequence of giving the programmer unfettered access to the underlying memory representation of data, and putting the responsibility on the programmer to manage it properly. This can lead to serious vulnerabilities in widely used software. In this lab, you will explore some new ways in which programming language designers have tried to make low-level systems programming easier and less error-prone, while maintaining its main advantages: a high degree of control over the memory representation of data and the opportunity for low-level performance optimization. In particular, we will explore concepts in **[Rust](https://www.rust-lang.org/)**, a relatively new systems programming language. Rust 1.0 released in 2015, and its language designers had the benefit of 40 years of hindsight compared to C, no need to maintain backwards compatibility, and the opportunity to enshrine in the language some of the ideas that systems programmers had discovered to be useful in the past few decades. :::info **An important disclaimer**: We will use Rust to exemplify key concepts in enforcing memory-safety in a systems language, but this lab is not meant to be a Rust tutorial. Some of the ideas you'll encounter in this lab also exist in modern C++ and in other systems languages, such as Go. If you would like to learn more about Rust, we have linked some resources in the Appendix section at the bottom of this handout! ::: # Assignment ## Assignment Installation First, ensure that your repository has a `handout` remote. Type: ```bash $ git remote show handout ``` If this reports an error, run: ```bash $ git remote add handout https://github.com/csci0300/cs300-s25-section.git ``` Then run: ```bash $ git pull $ git pull handout main ``` :::warning **Note: If you see a warning about "divergent branches"...** <details> <summary><b>Expand for instructions</b> </summary> ![](https://i.imgur.com/BeYsMKC.png) This means you need to set your Git configuration to merge our code by default. To do so, run (within the same directory in your labs repository workspace): ```console $ git config pull.rebase false $ git pull handout main ``` </details> ::: :::danger **If your terminal window changes to something like this**: ![](https://cs300-beta.cs.brown.edu/uploads/5618b1a0-dcbc-4abd-8e17-1b9a48c9b951.png) <details> <summary><b>Expand for instructions</b> </summary> Your terminal has asked you to add a commit message to explain what code was merged in, and opened a text editor so you can write the message. The top line is a default message, which you can use as-is. To use the message and finish the commit, do the following: - **If your terminal looks like the image above**, press `Ctrl+s` to save, then `Ctrl+q` to quit. (These are shortcuts for the text editor `micro`, which is the program that is running now) - **If your terminal instead has a bar at the top that says `GNU nano`**, you are using a text editor called `nano`. To save your commit and exit, press `Ctrl+x`, then press `y`, and then press `Enter`. </details> ::: This will merge our stencil code with your previous work. If you have any conflicts from previous labs, resolve them before continuing further. To save your work back to your personal repository, commit your changes and run `git push origin main`. ## :eyes: Extra initial setup In case you decided to attend this optional section, we pre-installed Rust in all of your course containers--it's been there the whole time! However, before starting, you should make sure your container has Rust properly set up. To do this, run the following command in your container terminal: ```shell $ rustc --version # Run the rust compiler rustc 1.83.0 (90b35a623 2024-11-26) ``` After running `rustc --version`, you should see output like the example. Your rust should be properly set up, yay! Skip onto the next section to start working! **If you get `command not found` instead**, run the following commands (which tells your shell where to find rust): ```shell $ echo "source ~/.bashrc" >> ~/.bash_profile $ source ~/.bash_profile ``` ... and then run `rustc --version` again. You should now see some version info from the rust compiler, indicating that your rust is now set up properly, yay! :tada: <details><summary><b>If you still get <code>command not found...</code></b></summary> If you're still getting an error, run the following commands in your terminal: ```shell $ export CARGO_HOME=/opt/rust $ export RUSTUP_HOME=/opt/rust $ export PATH=$PATH:/opt/rust/bin ``` ... and then try running `rustc --version` again. These commands tell your shell where to find rust. If you open a new terminal, or close and reopen your terminal, you will need to run these commands again to reconfigure your shell. </details> :::info **Heads-up**: If you have any other container terminal tabs running, you will need to close and re-open them for them to load the changes you just made. (You shouldn't need more than one terminal for this section, but just something to keep in mind.) ::: ## Part 1: Basic Memory Safety In these exercises, you will walk through two programs that address classic errors in memory management, for which we have provided the code in C as well as the analogous code in Rust. You might find some of the Rust syntax new, so it will be helpful to compare the two programs side-by-side so that your knowledge of C code can help you make sense of the Rust code! **The Rust and C programs do the exact same thing.** ### Overview of Lifetimes As you know well by now, any memory that is allocated during a program must at some point be freed. Think back all the way to our discussions about the different segments of memory (data, stack, heap). One of the distinguishing factors between these segments is how long the memory allocated to those regions persists. | Segment | Duration | Method of Freeing | | ------- | -------- | ----------------- | | Data | Duration of program | Automatic when program ends | | Stack | $\leq$ Duration of function call | Automatic when data goes out of scope | | Heap | Until freed | Manual call to `free()` | Looking at the second column of that table lets us introduce the concept of **lifetimes**. The lifetime of a piece of memory is the span of time after it is initialized but before it is freed. For instance: * Many local variables have a lifetime that's equivalent to the length of a function execution. * A loop index variable has a lifetime equivalent to the duration of the loop. * Global data has the same lifetime as the entire program (this is sometimes called the "static" lifetime). :::success **Task 1.1:** Among these three types of memory, it is most difficult to determine the lifetime of heap data. Discuss with your group why this is. ::: Why do we care about the lifetimes of data? As it turns out, most of the memory bugs we worry about in C and C++ can be framed in terms of lifetime violations! You'll explore a couple of such examples in Part 1 of this section. Before doing so, however, we will introduce one more foundational concept that systems programmers use to reason about memory safety: Ownership. ### Ownership One of the foundational concepts of memory safety in Rust is the notion of **ownership** of memory. The owner of a piece of memory is responsible for making sure that it gets freed when the memory is no longer needed. Each value is owned by a variable, and can only have one such owner at a time. The variable that owns a value determines its lifetime. Typical memory errors in low-level languages like C/C++ are often caused by mistaken notions of ownership. Consider the situation where there is no owner, or multiple pieces of code that believe they are the owners: | Situation | Consequence | | -------- | -------- | | Two or more owners | Use-after free, double free | | No owners | Memory leak (missing free) | Over time, systems programmers realized that careful reasoning about ownership is a good way to avoid memory safety issues like the above. Indeed, Google's most recent C++ style guide [says](https://google.github.io/styleguide/cppguide.html#Ownership_and_Smart_Pointers): > It's virtually impossible to manage dynamically allocated memory without some sort of ownership logic. The C++ language consequently acquired *optional* notions of ownership, the most common of which are either careful documentation of ownership via comments on APIs (e.g., "the `consume` function takes ownership of the memory at the `data` pointer") or the use of [smart pointers](https://en.cppreference.com/w/cpp/memory), which are special classes (e.g., `unique_ptr<int>` and `shared_ptr<int>`) that replace "raw" pointers (such as `int*`) and encode notions of ownership. In the following exercise, you will explore Rust's notion of ownership, and specifically, how changes of ownership ("moving") work. ### Exercise: What is "Moving"? For the first set of tasks, you will look at `move.c`. Make sure to read the comments in the code so that you are familiar with what the program is supposed to do. :::success **Task 1.2:** Before running any of the code, answer the following questions and record your answers: 1. What is the memory management error in the C program? Why is it a problem? 2. What could be a possible way to prevent this type of error? (There's no incorrect answer here, so feel free to use your imagination.) Now, from within the `1-move` directory, compile and run the C program using the commands shown below. The first command will compile the executable `move_c`, and the second command will run it. ```shell $ make c $ ./move_c ``` Now, answer the following question: 3. What does the program output? Was this expected, or could you imagine another output? ::: Memory management errors that compile and run without visible errors lead to undefined behavior being introduced into programs. Our example here was small, but if this code was part of a much bigger program where some other functionality depended on a correct output, then much more significant problems could arise. Rust developers had the advantage of being able to learn from the mistakes made with C, and thus built the Rust compiler to prevent memory-management errors *at compile-time*. Now, pull up `move.rs` and compile the program using the command **`make rust`**. :::success **Task 1.3:** What happens when you compile the Rust program? Look at the error message and discussion with your group how you think Rust can detect this! We recommend looking at the "Helpful Background" dropdown for some additional context on this type of memory error. <details><summary><b>Helpful Background</b></summary> <br /> This is an example of <i>aliasing</i>. In the C program, we had a pointer to some allocated memory, and then manually set another pointer to point to the <i>same</i> memory. Having multiple references to the same data is dangerous precisely for the reasons you've just seen—if we free some memory while there's still another reference pointing to it, then we now have a seemingly valid reference that's actually pointing to invalid memory. In C, these types of errors can silently compile and run! <br /> </details> <br /> ::: :::warning **⚠️ Stop here until your section TAs have summarized the ownership part! ⚠️** ::: ### Exercise: Enforcing Lifetimes You have just explored an example of a use-after-free lifetime violation, where we tried to read from heap-allocated memory whose lifetime had ended upon a call to `free()`. Looking at the above table, we can spot another situation where lifetime violations can cause memory safety issues: returning references to values on the stack. Since values on the stack live for *at most* the duration of a function call, returning pointers to stack values (like local variables) can cause segfaults when trying to access that memory elsewhere in the program (it's possible you may have experienced this type of problem this semester!). In this exercise, you'll explore how Rust's ownership guarantees can make reasoning about lifetimes at compile time (i.e., "statically") easier and even allow the compiler to warn you about lifetime errors at compile time! Now, read through `lifetimes.c` until you feel like you have a good sense of what the program is doing. Then, read through `lifetimes.rs`. **Don't be scared by the syntax, `lifetimes.rs` implements the same functionality as `lifetimes.c`**. If you are struggling to understand what's happening in the Rust program, use the C program as reference. Below, we also explain some of the unfamiliar syntax that you might find in the Rust program. :::info There are two constructs in the Rust code that may be unfamiliar. They are both in the `List` struct definition. * `Option` is a type that specifies that a value could be null. Since references in Rust can't be `NULL` like they can in C, we use `Option` instead. The Rust `NULL` equivalent is `None`, and its counterpart is `Some(...)`. Therefore, if we have something of type `Option<List>`, it can either be `None` or `Some(List {...})`. * Syntax that looks like `'a` is Rust's way of manually annotating lifetimes. You don't need to worry about it for this example, since we'll focus on a case where the Rust compiler can give you lifetime warnings without any help from the developer. Anytime you see `'a`, feel free to ignore it. <details><summary><b>I want to know more about Rust's lifetime annotation syntax</b></summary> <br /> As we discussed before, sometimes reasoning about lifetimes can be complex, and Rust provides a method for the developer to manually specify lifetimes when they are too complicated for the compiler to infer. The lifetime annotations on the struct are quite simple: first we introduce a lifetime parameter called `'a` ("alpha"). This is what `struct List<'a>` does. Then, we specify that the reference inside the option needs to have a lifetime of at least `'a` (this is what the syntax `&'a List` means). Finally, we declare that the next element in the list is parameterized by the same lifetime `'a` (this is what `List<'a>` means). Now, the Rust compiler can make sure we don't have any lifetime violations with the `List` struct. This happens at *compile-time*, without running the program. What kind of lifetime violations are we worried about? Any time a struct contains a reference, we have to make sure that the struct doesn't outlive the reference, or else we'll have a dangling reference. To help us, when we create a new `List` node, the Rust compiler will examine the lifetime of the reference we use inside and ensure that the `List` doesn't outlive that reference! </details> ::: :::success **Task 2.1:** Before running any of the code, discuss the following questions and record you answer: 1. What is wrong with `lifetimes.c`? What do you expect the output to be? ::: Now, compile and run`lifetimes.c` and `lifetimes.rs`. ```bash cd 2-lifetimes # compile and run the c program make c ./lifetimes_c # compile the Rust program make rust ``` :::success **Task 2.2:** Discuss the following questions and record your answer: 1. Does the output of the C program match what you expect? If not, how can you explain the difference? 2. What is the output of the Rust compiler? How is the Rust compiler able to give the warning it does? ::: :::warning **⚠️ Stop here until your section TAs have summarized the lifetimes part! ⚠️** ::: ## Part 2: Safer Concurrency Having seen how systems languages can address basic memory bugs by enforcing ownership and lifetimes, we are now ready to look at how language developers try to make concurrency safer! ### Exercise: Locking Discipline In this exercise, you will look at `concurrency.cpp` and `concurrency.rs`. Similar to before, these examples do the exact same thing, but you might find that the Rust code has some unfamiliar terms. *We highly recommend you use your knowledge about C++ to make sense of the Rust code, as well as read all the comments that we've annotated the code with!* First, look at `concurrency.cpp`. This program is longer than the programs you've seen before, but it is not complex: it is a money transfer application! Here is a quick overview of the functions and structs you are given: * `account_t`: This struct contains two fields, a `balance` (meant to represent how much money the account contains), and a `balance_mtx` which is the mutex that protects the balance. * `main()`: Initializes a single account with a starting balance and a vector of integers that represent deposits/withdrawals that we would like to make to the balance. It initializes a pool of worker threads to handle these requests and then prints out the final balance. * `update_account()`: Function passed to threads; keeps updating the balance with the next request from the shared vector until there are no requests left. * `make_large_vector()`: Returns a long vector of requests. :::info **Tip:** We *highly* recommend also reading over the comments in the code to make sure you follow the structure of the program and the purpose of all the variables/functions. This will make the following tasks much easier. ::: :::success **Task 3.1:** Before running the program, identify what the error is in the code. Where does it originate, and what do you predict the output to be? Discuss these questions and record your answers. ::: Compile and run `concurrency.cpp` as many times as you need to confirm your hypothesis. ```bash cd 3-concurrency # compile make cpp # run ./concurrency_cpp ``` You might find the following command useful if you are having a hard time finding an example of the error: ``` while true; do ./concurrency_cpp; done | uniq ``` Now, take a look at the Rust program (we recommend having it up alongside the C++ version). The overall structure of the code is the same, however there are a couple of notable types and functions which are worth mentioning in order to make more sense of this program. :::info <details><summary><b>The <code>Arc</code> type</b></summary> <br /> Think back to the first program you analyzed (`move.c/move.rs`), where a seemingly-valid reference was left pointing to a block of memory that had already been freed. You discovered that Rust prevents this by only allowing one "owner" for a given block of memory. When the "owner" goes out of scope or is dropped, the compiler knows that it can free that memory. This gets complicated when we have multiple threads trying to access shared data. In these cases, it is unclear at compile-time when the data can be deallocated and who is responsible for doing so, because the data's lifetime must persist until every shared owner is done with it. In these cases, we wrap the data with an `Arc`, which stands for **A**tomic **R**eference **C**ounter. References to `Arc` types are given out using a special method, `Arc::clone()` and are dynamically counted at *runtime*. (Counting is done atomically to ensure thread-safety.) This guarantees that the memory is freed only when there are zero references—and thus no owner—left in scope. `Arc<Mutex<T>>` (where `T` is some arbitrary type) is a common pattern in Rust code, because most mutex-protected data needs to be shared across threads. </details> <br /> <details><summary><b><code>unwrap()</code></b></summary> <br /> You may have noticed that a method called `unwrap()` appears in multiple—seemingly unrelated—places in the program. Certain functions which have the possibility of failing will return *either* the computed value upon success, *or* a special error type upon failure. This has the advantage of letting the caller decide how to handle the return value in each case. Calling `.unwrap()` on the output of one of these functions handles success and failure in the following manner: 1. If the function ran successfully and returned the computed value, `.unwrap()` will just return that value. 2. If the function errored and returned an error type, `.unwrap()` will panic and halt the program. </details> <br /> <details><summary><b>Atomics and <code>Ordering</code></b></summary> <br/> At a high level, the <code>Ordering</code> parameter tells the compiler what order the CPU is allowed perform atomic operations with respect to each other. <b>For this example, the ordering parameter does not matter and you may ignore it. If you'd like to learn more, however, keep reading!</b> <br/> <br/> You may notice that when invoking the methods `load` and `store` on an `AtomicBool`, we also pass a parameter called `Ordering::Relaxed` into the method. Normally, a compiler can optimize away certain operations or execute certain instructions in different sequences from the code that the programmer wrote in order to improve performance. While atomic types ensure that any operation *on them* happens as an unbroken sequence, they do not offer any special guarantees on the order in which they are executed in relation to *other* atomic instructions. Sometimes however, we need guarantees on when threads can read and write atomic values. For example, we might want to make sure that a thread does not read an atomic value before another thread has written to it. While we are guaranteed that the individual read and write instructions happen atomically, we may not be guaranteed that one happens before another. The `Ordering` enum tells the compiler how strict it should be with following the sequence of instructions that are specified in the original source code. This practice was actually inherited from C, which established the various different kinds of ordering that are currently used in Rust's memory model. </details> ::: :::success **Task 3.2:** Compile the Rust program (`make rust`) and observe the compilation error. What is the error you get? How do mutexes differ between Rust and C++, and why is Rust's model safer? Discuss these questions and record your answers. ::: Now, you will fix both programs, starting with the C++ program. If you find yourself stuck on fixing the Rust example, we recommend looking at the code for hints ;). We have also included a description of some relevant methods and types. :::success **Task 3.3:** Fix the error in both programs! (Start with C++.) Once you have done so, check your program's correctness with: ``` make check ``` <details><summary><b>Hints for Rust</b></summary> <br /> [`Mutex<T>`](https://doc.rust-lang.org/std/sync/struct.Mutex.html) has methods `lock()` and `unlock()`, both of which can return either a `MutexGuard` on success or an error on failure. This means that you'll want to use `unwrap()` to access the mutex guard. Dereferencing a `MutexGuard` will give access to the inner data. </details> ::: :::warning **⚠️ Stop here until your section TAs have summarized the concurrency part! ⚠️** This is where the section materials end. You might discuss with your TA, e.g., whether you think Rust helps programmers avoid common errors, and what the trade-offs involved might be (i.e., why might some people still want to program in C/C++ instead?). ::: **Congratulations**, you have completed the Rust section! :tada: There is obviously much more to learn about Rust and how it changes systems programming. If you have questions, talk to the course staff and take upper-level courses that let you use Rust for projects: in our experience, nothing makes you learn Rust better than having to write a substantial program in it on your own. # Additional Resources on Rust If you want to learn more about safer systems programming and Rust, check out some of these resources! * [Rust by Example Book, Brown Edition](https://rust-book.cs.brown.edu/) * [Rust Textbook](https://doc.rust-lang.org/book/) --- <small>_Acknowledgements:_ This section (originally a lab) was developed for CS 300 by Sreshtaa Rajesh, Paul Biberstein, Livia Zhu, Shihang (Vic) Li, Casey Nelson, and Malte Schwarzkopf.</small>