Project 2: DMalloc - HedgeDoc

<style> details { margin-top: 0.5em; margin-bottom: 0.5em; } summary { //font-weight: bolder; } summary:hover { text-decoration: underline; } blockquote { font-size: 16px; } .todo { color: #ff00ff; border: 2px dashed #ff00ff; padding: 0em 1em; border-radius: 5px; margin-top: 1em; margin-bottom: 1em; //display: none; // UNCOMMENT TO HIDE TODOs } </style> **[« Back to the main CS 300 website](https://csci0300.github.io/)** # Project 2: DMalloc **Due: Friday, February 28 at 8:00 PM EST** # Introduction One of the unique challenges systems developers face is the explicit management of computer memory allocation. In other, higher-level programming languages, you work with variables/objects without really worrying about where they are in memory, how much space they are taking up, or when to give their memory back to the operating system. In systems, we take a different perspective: **dynamic memory** is allocated and deallocated manually by the programmer. This gives the programmer the ability to write faster and more efficient code, or low level programs that might not have super fancy languages or runtime environments to implicitly allocate (embedded systems, etc.). But, like most powerful systems assets, dynamic memory can lead to some tricky bugs! In this project, **you will create a *debugging memory allocator*** to help catch some of these bugs. To do this, you'll write some code that acts as a wrapper around `malloc` and `free` called `dmalloc` and `dfree`: your versions will store and check some extra *metadata* about how memory is used in order to detect problems like: - Common memory programming errors: e.g., freeing the same block twice (called a double-free), trying to allocate too much memory etc). - *Boundary write errors*, which are writes outside a dynamically allocated block (e.g., writing 65 bytes into a 64-byte piece of memory) - Other more general programming errors You can think of this as writing your own version of the AddressSanitizer that you saw in the first project! While implementing it, you'll learn more about how the sanitizer works, and what its limitations are. Specifically, this project will allow you to: - Learn about memory representations of information - Be aware of common memory allocation errors - Really understand how `malloc` and the heap work! - Start getting used to C++ programming ::::info <details> <summary> <span > More Context: how memory Allocation works</span> </summary> <br> Recall the memory layout a program sees in x86-64 <br> ![](https://i.imgur.com/YAhDw3C.png) <br> Remember from lab 1 that there are three places in memory where we can store program data and accordingly there are three types of memory allocation. 1. **Static** memory allocation stores data in the *data* segement. This is used for global variables and variables declared with the `static` keyword. The memory location of these variables does not change throughout the program. In other words, these allocations exist in the same "static" location determined by the compiler before runtime and are always available (e.g., recall how `static` variables work in functions like `strtok`). If static variables are initialized to a value, the compiler puts them into the initialized *data* segment; otherwise it puts them into a segment (confusingly) called *bss*. **Remember:** "static" refers to the *location* of a variable, not whether the program can modify it (that's what the `const` keyword is for). 2. **Automatic** memory allocation stores data on the Stack. This is the memory location for local variables and arguments within functions. The allocation size for the stack is also determined at compile time, and the compiler decides how much the stack grows on a given function call. It is not "static" because the allocations come and go as we move through stack frames (function calls), and it is "automatic" because the lifetime of these allocations is determined by the compiler. 3. **Dynamic** memory allocation stores data on the heap. The lifetime of this type of memory allocation cannot be determined at compile time and is managed explicitly by the programmer. Dynamic memory allocation in C uses functions like `malloc` and `free`; C++ uses the `new` and `delete` keywords. The lifetime and size of dynamic memory allocations is entirely up to you! This project focuses on dynamic memory allocation. We rarely have memory allocation problems when the compiler is doing deterministic work as with **static** and **automatic** allocation; however with **dynamic** allocation we can run into some issues. Because memory is the foundation on which a program runs, memory errors and misuse are a particularly nasty set of bugs that can be really hard to deal with. Quite often memory errors will result in [Undefined Behavior](https://en.wikipedia.org/wiki/Undefined_behavior) or completely fatal errors :mask:. The purpose of this project is to get familiar with dynamic memory allocation functions as well as understand how the heap is used and some of the dangers associated with it. </details> :::: <br/> --- ## Questions to think about :::info Note: You are not required to answer the following questions to complete DMalloc, and they will not affect your grade for the project. They exist to help you think about how you might want to design your implementation for DMalloc. If you're failing tests or stuck debugging, these questions can help you think about edge cases you might not be accounting for! ::: 1. What are some situations where it is much better to use dynamic memory allocation over static or automatic memory allocation? 2. How does the heap grow in memory? How is this different from how the stack grows? Why is this the case? 3. Consider some pointer, `ptr`, that points to memory address `0x12f99a64`. What address does `(char *) ptr + 1` point to? What address does `(int *) ptr + 1` point to? 4. Name and describe at least two potential memory errors that can arise from using `malloc` and `free` incorrectly. 5. How is checking for integer overflow in **addition** different than checking for integer overflow in **multiplication**? Think about both signed and unsigned integers. --- # Debugging Resources/GDB A common way students debug code is by adding print statements everywhere and hoping for the best. While this may have worked in the past, and may continue to work, it is the debugging equivalent of kicking down a revolving door. GDB can, and should, become your best friend. Like you may have seen on Snake or Lab 2, GDB might take a little practice to get used to, but we guarantee that it will make debugging faster and easier for you! ### GDB: How To **Run GDB** on the executable using the `gdb` command. In this project, each test has its own executable: to run gdb on a specific test, you can run it on the program for that test. For example: ```shell # Run gdb on test 34 $ gdb test034 ``` <details><summary>More</summary> > **2. Set a breakpoint** on the function of your choice using `b [ function | file:line ]` (often this will be `main`). ``` (gdb) b main (gdb) b my_file.c:23 ``` **3. Run the program** using `r`, along with any arguments you want. ``` (gdb) r [arg1] [arg2] ... ``` **4. Display the source code** so you can follow along the execution. ``` (gdb) layout src ``` **5. Step through your code** using `n` to execute one line and `s` to step inside a function. (If you accidentally step into a function you didn't intend to, use `finish`). **6. Print variables** and pointers. The most useful commands are: - `p [variable]` to print values - `x/[type] [variable]` to interpret the variable as a pointer, and print the value at that location as a specific type. For examples, `x/s` for strings, `x/d` for decimal, `x/x` for hexadecimal, etc. [more](http://visualgdb.com/gdbreference/commands/x) </details> ### Debugging a Segfault Run GDB on your executable (don't set any breakpoints) and then run your program with `r`. Once it segfaults, run `bt` to print a stack trace (the list of subsequent function calls that got you where you are now). You can now use `layout src` to view the code, then `up` and `down` to view different stack frames. **[Look here for more info on gdb commands](https://darkdust.net/files/GDB%20Cheat%20Sheet.pdf)** :::info Note: While debugging tests, you may find that some of your variables are `<optimized out>` when you try to print their contents. To make using GDB easier, go to your Makefile and turn off compiler optimizations (line 2). The default level we set for you is `OPT ?= -O2`. You can turn this off by changing it to `OPT ?= -O0`. Make sure to revert back to the previous value when you're done! ::: --- # Getting started ## Updating your stencil code To get started on the project, you will need to pull some additional stencil code into your project repo. You can do this as follows: 1. Open a terminal in your container environment and `cd` into the directory you set up as your project repo from Lab 0 (e.g., `cd projects`) 2. Run the following command to pull the latest code from the `handout` remote (see below if you get errors): ``` cs300-user@9899143429a2:~/projects$ git pull handout main ``` If this reports an error, run: ```bash git remote add handout git@github.com:csci0300/cs300-s25-projects.git ``` followed by: ```bash git pull git pull handout main ``` This will merge our Project 2 (DMalloc) stencil code with your repository. :::warning **Note: If you see a warning about "divergent branches"** like this: ![](https://i.imgur.com/BeYsMKC.png) This means you need to set your Git configuration to merge our code by default. To do so, run (within the same directory in your labs repository workspace): ```console $ git config pull.rebase false $ git pull handout main ``` ::: :::danger **If your terminal window changes to something like this**: ![](https://cs300-beta.cs.brown.edu/uploads/5618b1a0-dcbc-4abd-8e17-1b9a48c9b951.png) <details> <summary><b>Expand for instructions</b> </summary> Your terminal has asked you to add a commit message to explain what code was merged in, and opened a text editor so you can write the message. The top line is a default message, which you can use as-is. To use the message and finish the commit, do the following: - **If your terminal looks like the image above**, press `Ctrl+s` to save, then `Ctrl+q` to quit. (These are shortcuts for the text editor `micro`, which is the program that is running now) - **If your terminal instead has a bar at the top that says `GNU nano`**, you are using a text editor called `nano`. To save your commit and exit, press `Ctrl+x`, then press `y`, and then press `Enter`. </details> ::: Once you have a local working copy of the repository that is up to date with our stencils, you are good to proceed! You should now see a new directory called `dmalloc` inside your projects repo: you should do all your work for this project in this directory. ## Stencil Code ### Standard Memory allocation in C To do this project, it is important to understand the functions that are called when dynamically allocating memory in C. These are the functions you see in regular C programs: ::::info <details> <summary><b>malloc</b> </summary> > ```c void* malloc(size_t sz) ``` Allocates **`sz`** bytes of memory and returns a pointer **`ptr`** to the "new" memory block. This memory is not initialized (it can contain anything). Upon success, the **`sz`** bytes of storage starting at address **`ptr`** are guaranteed not to overlap with any other active objects, including the program’s code and global variables, its stack, or with any other active dynamically allocated memory. Upon failure, `malloc` returns a null pointer (`NULL` in C-speak, `nullptr` in C++ speak). Failure can occur if **`sz`** is too big, memory is exhausted, or for other reasons. </details> :::: ::::info <details> <summary><b>free</b> </summary> > ```c void free(void* ptr) ``` Free a single block of memory previously allocated by `malloc`. The pointer argument, **`ptr`**, must either be a null pointer or a pointer to active dynamically-allocated memory. More explicitly, it is **NOT OK** to call free if... - **`ptr`** points to the program’s code, or into its global variables, or into the stack. - **`ptr`** was not returned by a previous call to `malloc`. - **`ptr`** points to heap memory whose lifetime has ended (i.e., the memory block pointed to by **`ptr`** was already freed and has not been reallocated). </details> :::: ::::info <details> <summary><b>calloc</b> </summary> > ```c void* calloc(size_t nmemb, size_t sz) ``` Allocates memory for an **array** of objects and clears the allocated block to zero. This is a "secondary" memory allocation function that itself makes a call to `malloc`. Our stencil code includes an implementation of this method with a bug that you will fix when testing! </details> :::: #### Using these functions Dynamically-allocated memory remains active until it is explicitly freed with a call to free. The rules of malloc and free are simple: allocate once, then free exactly once. Some quick notes on boundary cases: * In C++, `malloc(0)` should return a non-null pointer. If `ptr == malloc(0)`, then `ptr` does not overlap with any other allocation and can be passed to `free`. * `free(nullptr)` is allowed. It does nothing. (`nullptr` is C++'s syntax for `NULL`.) * `malloc(sz)` returns memory whose alignment works for any object. On **x86-64** computers, this means that the address value returned by `malloc` must be a **multiple of 16** (so its last hexadecimal digit is `0`). ### Stencil versions: base_malloc/base_free In the stencil code, **`dmalloc`** and **`dfree`** don't use `malloc` and `free` directly. Instead, they use helper functions called **`base_malloc`** and **`base_free`**, which we have written for you. Why do we avoid directly calling the C functions **`malloc`** and **`free`**? Because debugging allocators interact with undefined behavior! :flushed: Undefined behavior is something to be avoided because any program that invokes undefined behavior has no meaning. As far as the C++ language standard is concerned, once undefined behavior occurs, a program may do absolutely anything, such as setting your computer on fire or (our favorite example), force demons to fly out of your nose. Undefined-behavior bugs with `malloc` and `free` are common, and debugging allocators must make some undefined behaviors well-defined. For example, when a debugging allocator is in force, a program with a simple double free bug will reliably print a specific error message and exit before any undefined behavior occurs. This brings us back to **`base_malloc`** and **`base_free`**, which are defined in `basealloc.cc`. This allocator behaves like `malloc` and `free`, but has the following additional properties: * **`base_free`** does not modify freed memory (the contents of freed storage remain unchanged until the storage is reused by a future **`base_malloc`**). * **`base_free`** never returns freed memory to the operating system. You don't really have to understand how the functions in `basealloc.cc` work -- just treat them like **`malloc`** and **`free`**. Yet, **`base_malloc`** and **`base_free`** are not standard library functions (like **`malloc`** and **`free`**), so we are able to change what is considered undefined behavior. However, we don't define everything! You will still need to take care of some undefined behavior (like double frees, invalid frees, and wild writes) in your debugger. # Testing We have included a comprehensive set of tests for you to help you write your debugging memory allocator in the files `testXXX.cc`. Each test is a small program which dynamically allocates some memory. Later tests intentionally contain memory errors, which your debugging allocator should detect. When working on the project, **you should not modify the test files**: our autograder will replace your testing files with our own when grading, so altering them will not help you. Running `make check` will run your code against all the tests in order. It’s a good idea to work on the tests in order: get the simple stuff working, then gradually add more complex functionality. In other words, you will have a better time with this assignment if you incrementally tackle tests one after another, rather than writing the full allocator in one go and then running the tests. **To see a test, open `test001.cc` and look at its code.** You will notice these comments at the end of the file: ``` //! alloc count: active 0 total 0 fail 0 //! alloc size: active 0 total 0 fail 0 ``` These lines define the expected output for the test. The `make check` command checks your actual output against the expected output and reports any discrepancies. Don't change these lines -- doing so may make tests pass for you locally, but the grading server will replace your changes with the original lines before it runs your code. ::::info <details> <summary> <b>Super Helpful Testing Tips! </b></summary> <ul> > <li>To print differences for a <b>single</b> test, try <code>make check-N</code>, as in <code>make check-1</code>. If you want to see the full output of a test, without worrying about comparing to expected output, run that test on the command line directly. For example: > ```shell $ make test001 $ ./test001 ``` runs test001 and prints the full output. This can be <b>very useful</b> if you're getting errors from the sanitizers. </li> <li> The tests are written such that your output is compared to an expected output. This means you should pay super close attention to make sure your output is <b>exactly</b> right. (One extra space will fail a test!) At the bottom of each test file, there are comments with the expected output. </li> <li> If you see "???" in the expected output comment, that means the test will accept any symbols. In the example below, the test will pass regardless of the number your program prints after fail > ```shell //! alloc count: active 5 total 10 fail ??? ``` </li> <li> We also use "??" in some tests to assign a "variable" to match in the output. Check out test20.cc for an example, you will need to print out the correct pointer. </li> </ul> </details> :::: ==**Note:**== One **really important** thing to do in this assignment is to **run your code with sanitizers enabled** for tests 1 to 26. To do this, add `SAN=1` to your `make` invocation: for example, you might run `make check SAN=1` to run all tests with the appropriate sanitizer flags set, or `make check-1-26 SAN=1` to run tests 1-26 with sanitizers enabled. The grading server will run your code for tests 1-26 with the sanitizers enabled, so you **must** pass them (later tests use undefined behavior to test your allocator; they won't pass with sanitizers enabled). They'll also help you find many bugs, guaranteed! :::danger **Make sure your tests pass on the grading server** as well as locally. If your solution has undefined behavior, the different architecture locally can sometimes make sanitizer tests pass when they won't on the grading server. ::: `make check SAN=1` takes care of running with the right sanitizers for each test, but here are the details: * Tests 1-26 must pass the AddressSanitizer, but tests 2-7, 9, 12-13, 16-23 can have memory leaks (i.e., LeakSanitizer errors are okay). * Tests 1, 8, 10, 11, 14, 15, 24, 29, 35, 36, 37, and 39 need to pass without memory leaks (i.e., no LeakSanitizer errors). # Assignment Roadmap This section describes the main functionality required for the debugging allocator, ordered from simpler to more complex and from low-numbered to high-numbered tests. ### Step 1: Statistics In the stencil code there is a function **`print_statistics()`** that uses the information stored in a statistics struct to print out statistics about all the memory allocations so far. The statistics struct is defined as follows (see `dmalloc.hh`): ```cpp= struct dmalloc_stats { unsigned long long nactive; // number of active allocations [#malloc - #free] unsigned long long active_size; // number of bytes in active allocations unsigned long long ntotal; // number of allocations, total unsigned long long total_size; // number of bytes in allocations, total unsigned long long nfail; // number of failed allocation attempts unsigned long long fail_size; // number of bytes in failed allocation attempts uintptr_t heap_min; // smallest address in any region ever allocated uintptr_t heap_max; // smallest address beyond any allocated region (exclusive upper bound) }; ``` It is up to you to fill in the function **`get_statistics(statistics *stats)`** so that accurate information is stored in an instance of **`dmalloc_stats`**. To do this, think about using a *global variable* that holds a struct and which any calls to **`dmalloc()`** and **`dfree(ptr)`** update. :::success **Task:** Implement the `get_statistics`, `dmalloc`, and `dfree` functions in `dmalloc.cc`. ::: ::::info <details> <summary><b>Hints!</b></summary> > Most of these statistics are easy to track, and you should tackle them first (start by passing tests 1-5 and 7-10). The trickier test is `active_size`. To track this information, your **`dfree(ptr)`** implementation must find the number of bytes allocated for the block pointed to by **`ptr`**. This is because after a block is freed, `active_size` should decrease by the size of that block. The easiest, and probably best, way to do this is for your **`dmalloc`** code to request more space than the user required when it calls **`base_malloc`**. This extra space will allow you to store metadata about the allocated block alongside it. This metadata can store a structure you define yourself, and it should keep track of the allocated block size. As you write your debugging allocator, you may realize you want to store other information as well. The metadata will be stored at the beginning of an allocated block, but **`dmalloc`** should still return a pointer to the "payload" of the block, i.e., to the space after the metadata. You will need to use pointer arithmetic to return a pointer that points somewhere just beyond the metadata. In **`dfree(ptr)`**, the **`ptr`** argument then must be a pointer to the payload of a block (because that's what **`dmalloc`** returned before), but you can use pointer arithmetic again to access the metadata stored just before **`ptr`** to access the block's size. Consider what bugs might emerge when adding metadata near your allocated block and how you might resolve them! ![](https://i.imgur.com/mNepOPb.png) Quick reminders on pointer arithmetic: * You can use the function **`sizeof()`** to get the size of a datatype. * **`base_malloc`** returns a `void*`, which is a pointer without a type. Before you can do pointer arithmetic, you will need to cast the `void*` pointer to have a type. However, **`dmalloc`** should still return a `void*`. * When you incremement a pointer, the pointer's address will jump ahead by the size of the pointer's datatype. </details> :::: Run `make check` to run the test programs. Specifically, <span style="color:orange"> test001 </span> through <span style="color:orange"> test011 </span> should pass when you are done with this section. ### Step 2: Integer overflow protection Your debugging malloc library also needs to be robust against integer overflow attacks. Integer overflow occurs when you try to allocate a number of bytes that is too large. For example, the `size_t` type of the first argument to **`dmalloc`** is a type alias for an unsigned long data type, which is comprised of 8 bytes. When you compute the number of bytes to request and it exceeds the maximum possible integer, the value of the `size_t` wraps around to 0. Consider, for example, the **`calloc`** function in the stencil code, think about how overflow could occur, and then fix it. :::success **Task:** Implement the overflow protection in `dmalloc()` and `dcalloc()`. ::: <span style="color:orange"> test012 </span> through <span style="color:orange"> test015 </span> should pass after this is done correctly. Make sure that <span style="color:orange"> test015 </span> succeeds on the grading server too; if you get a test timeout on the grading server, your overflow checks are not quite right. ### Step 3: Invalid free and double-free detection **`dfree(ptr)`** should print an error message and then call C’s **`abort()`** function when **`ptr`** does not point to active dynamically-allocated memory. There are several cases when this might occur. The test programs define the desired error message format. Here’s our error message for <span style="color:orange">test016</span>: ``` MEMORY BUG: test016.cc:8: invalid free of pointer 0xffffffffffffffe0, not in heap ``` Error messages should be printed to standard error (using C’s `fprintf(stderr, ...)`). Different error situations require different error messages; the other test programs define the required messages. :::success **Task:** Check for invalid and double frees in `dfree()` and print an appropriate error message. ::: ::::info <details> <summary> <b>Hints! </b></summary> - **`ptr`** needs to point somewhere on the heap. How are you tracking the allocated space on the heap? - You need to catch double frees (when free is called on the same pointer twice)! How can you determine whether or not a block is freed? - **`ptr`** must be a pointer to the **beginning** of an allocated memory block, not just any address on the heap. Knowing your metadata header should be stored in memory right before **`ptr`**, how can you guarantee you are actually accessing a header? - Don't be afraid to store more information in your metadata! </details> :::: <span style="color:orange"> test016 </span> through <span style="color:orange"> test024 </span> should pass after this is done correctly. ### Step 4: Boundary write error detection A *boundary error* is when a program reads or writes memory beyond the actual dimensions of an allocated memory block. A debugging memory allocator can’t detect boundary **read** errors, but it can detect boundary **write** errors within some constraints. For example, a common strategy used by sanitizers is to add data with a known "secret" value around the boundaries of your allocation--it's generally not feasible to detect all possible write errors, but this strategy is good enough to detect most common cases. To detect boundary write errors, your **`dfree(ptr)`** should print an error message and call **`abort()`** if it detects that the memory block associated with **`ptr`** suffered a boundary write error. :::success **Task:** Check for boundary write errors in `dfree()` and print an appropriate error message. Your `dfree` should be able to detect write errors that occur **within 8 bytes** of the allocation. ::: ::::info <details> <summary><b>Expand for details</b></summary> As a strategy for detecting boundary write errors, consider adding data with a known ("secret") value around the boundaries of your allocation, and then check that value in appropriate places. No debugging allocator can reliably detect all boundary write errors. For example, consider this: ```c int* array = (int*) malloc(10 * sizeof(int)); int secret = array[10]; // save boundary value array[10] = 1384139431; // boundary write error array[10] = secret; // restore old value! dmalloc can’t tell there was an error! ``` Or this: ```c int* array = (int*) malloc(10 * sizeof(int)); array[200000] = 0; // a boundary write error, but very far from the boundary! ``` For this project, we are only expecting your code to catch common simple cases where the user writes one or more bytes **within 8 bytes of the allocated block**. Take a look at tests 25--28 for examples--we expect you to catch errors like this. </details> :::: <span style="color:orange"> test025 </span> through <span style="color:orange"> test028 </span> should pass after this is done correctly. Note that test 27 and following invoke undefined behavior (and make assumptions about the compiler's choices for it) in order to test your implementation. **You don't need to pass these and the higher-numbered tests (27-40) with sanitizers enabled.** ### Step 5: Memory leak reporting A memory leak happens when code allocates a block of memory, but then forgets to free it. Memory leaks are not as serious as other memory errors, particularly in short-running programs. They don't cause a crash, and the operating system always reclaims all of a program's memory when the program exits. But in long-running programs, such as your browser or the operating system itself, memory leaks have serious effect and are important to avoid. Fill in the **`print_leak_report()`** function such that, when called, it prints a report about every allocated object in the system. This report should list every object that has been allocated via **`dmalloc`** but not freed using **`dfree`**. Print the report to standard output (**not** standard error). A report should look like this: ``` LEAK CHECK: test034.cc:23: allocated object 0x9b811e0 with size 19 LEAK CHECK: test034.cc:21: allocated object 0x9b81170 with size 17 LEAK CHECK: test034.cc:20: allocated object 0x9b81140 with size 16 LEAK CHECK: test034.cc:19: allocated object 0x9b81110 with size 15 LEAK CHECK: test034.cc:18: allocated object 0x9b810e0 with size 14 LEAK CHECK: test034.cc:16: allocated object 0x9b81080 with size 12 LEAK CHECK: test034.cc:15: allocated object 0x9b81050 with size 11 ``` A programmer will use this leak checker by calling **`print_leak_report()`** before exiting the program, after freeing all allocated dynamic memory they recall. Any missing frees will show up in the leak report! :::success **Task:** Implement `print_leak_report()`. ::: ::::info <details> <summary> <b>Hints!</b> </summary> > There are multiple ways to do this. Here are some things that **might** be helpful: - [**`std::pair`**](https://en.cppreference.com/w/cpp/utility/pair) -- a C++ object that contains two data elements (similar to a tuple with two members). - [**`std::map`**](https://en.cppreference.com/w/cpp/container/map) -- C++ key-value map. You will need to add an <code>#include <map></code> at the top of dmalloc.cc to use this. - [**`map::begin`**](http://www.cplusplus.com/reference/map/map/begin/) -- returns an iterator referring to the first element in the map container. - [**`std::unordered_map`**](https://en.cppreference.com/w/cpp/container/unordered_map) -- C++ key-value hash map. You will need to add <code>#include <unordered_map></code> at the top of `dmalloc.cc` to use it. If you get strange and scary compiler errors when using this data structure, it's likely because the compiler doesn't know how to hash your key type. Check out [**this guide**](https://cs61.seas.harvard.edu/site/2018/Section1/#unordered-map-hash-table) for more information. </details> :::: <span style="color:orange"> test029 </span> through <span style="color:orange"> test031 </span> should pass after this is done correctly. ### Step 6: Advanced reports and checking This is the home stretch! To make your debugger even more robust, you should print better information and defend against more complex invalid frees. You will need to read the test code and understand what is being tested to defend against it. **Update your invalid free message**. After determining that a pointer is invalid, your code should check whether the pointer is inside a different allocated block. This will use the same structures you created for the leak checker. If the invalid pointer is inside another block, print out that block, like so: ``` MEMORY BUG: test032.cc:10: invalid free of pointer 0x833306c, not allocated test032.cc:9: 0x833306c is 100 bytes inside a 2001 byte region allocated here ```` Also make sure your invalid free detector can **handle more diabolical situations**. What situations? Check the test code to find out! :::success **Task:** If an invalid free occurs within an already allocated block print an appropriate error message. ::: <span style="color:orange"> test032 </span> through <span style="color:orange"> test034 </span> should pass after this is done correctly. ### Step 7: Performance and C++ integration Finally, test programs 35 to 40 test other situations that allow you to check that your debugging allocator is efficient and compatible with C++ style memory alocation (not just C style memory allocation). > **What's different about C++ memory allocation?** C++ introduces a new paradigm for dynamic memory allocation with the **`new`** and **`delete`** keywords. These keywords are designed to work with C++ classes, but under the hood they work simply by using **`malloc`** and **`free`** respectively. <span style="color:orange">test035</span> calls **`dmalloc`** 500,000 times, supplying tens of thousands of different filename/line-number pairs. Ideally, your solution should run <span style="color:orange">test035</span> in **one second or less**. :::success **Task:** Meet the performance requirement! ::: **After you implement this part, all tests should pass!!** :tada: # Extra Credit We encourage you to learn more and flex your coding muscles by doing some extra credit work! Extra credit is in no sense required to get an A, except if you're a graduate student taking CSCI 1310, in which case we require you to solve **two** of the below ideas. ## Reallocation **Easy.** Implement `drealloc`. This function is based on the C standard library's `realloc`, and should behave like this: ```c++ /// drealloc(ptr, sz, file, line) /// Reallocate the dynamic memory pointed to by `ptr` to hold at least /// `sz` bytes, returning a pointer to the new block (and growing the /// existing allocation if possible). If `ptr` is `nullptr`, behaves /// like `dmalloc(sz, file, line)`. If `sz` is 0, behaves like /// `dfree(ptr, file, line)`. The allocation request was at location /// `file`:`line`. void* drealloc(void* ptr, size_t sz, const char* file, long line); ``` ## Sanitizer **Moderately difficult.** Modern debugging memory allocators are tightly integrated with C and C++ compilers: the compilers generate data structures that the debugging allocators use at runtime to detect mistakes. This has many advantages, including efficiency and precision. Research a debugging memory allocator integrated with a compiler and runtime of your choice, such as GCC/Clang's AddressSanitizer, and write a brief section in your README (under "Extra Credit"), describing how it works. You can do this by reading information online (with citation) and/or by running your own experiments. ## Line Number Lookup **Difficult.** Most of our test programs use macros that redefine `malloc` and `free` to supply filename and line number information. But C++ discourages the use of fancy macros, and C++ style allocation, such as the `dbg_allocator` class used in `test035` and up, has no place for macros to go. A C++ debugging memory allocator will use _return address information_ to identify call sites. Change your debugging allocator so that `dbg_allocator`'s allocate and deallocate functions pass information based on [`__builtin_return_address`](https://gcc.gnu.org/onlinedocs/gcc/Return-Address.html) to your allocator's functions. Then, make your allocator transform this information numbers into real filename–line number pairs just before printing. For instance, you can call out to an external program, such as Linux's `addr2line`. But do not perform that transformation until printing, since the transformation is expensive. Use `./test037` to check your work. Before your fixes, the `LEAK CHECK` lines will start with `?:1;` afterwards, they should start with something like `.../test037.cc:NN`. <details> <summary> <span>A note on inlining</span></summary> A compiler optimization called <a href="https://compileroptimizations.com/category/function_inlining.htm">"inlining"</a> may complicate your work, since inlining can cause <code>__builtin_return_address</code> to return an unexpected address. Put the magic attribute <code>[[gnu::noinline]]</code> before a function to prevent inlining for that function. For example: ```c++ int f() { // f might be inlined into its caller } [[gnu::noinline]] int f2() { // f2 will not be inlined into its caller } ``` </details> # Handing In Your Code As before, you will hand in your code using Git. **In the `dmalloc/` subdirectory of your project repository, you MUST fill in the text file called `README.md`**. <details> <summary> <span> Remind me again what the <code>README.md</code> should contain?</span> </summary> <br> The <code>README.md</code> file will include the following:<br /> <ol> <li>Any design decisions you made and comments for graders, under <i>"Design Overview"</i>. If there's nothing interesting to say, just list "None".</li> <li>Any collaborators and citations for help that you received, under <i>"Collaborators"</i>. CS 300 encourages collaboration, but we expect you to acknowledge help you received, as is standard academic practice.</li> <li>Ranking the difficulty and time consumption of the project, compared to other projects so far. We won't use this for grading, but we collect the data to calibrate for future years.</li> </ol> </details> <br /> ### Grading breakdown: * **100% (80 points)** tested functionality (2 points per test). If running `make check` reports that all tests pass, you've probably got all these points. * **Up to 10 points** for extra credit. Now head to the grading server, make sure that you have the "DMalloc" page configured correctly with your project repository, and check that all tests pass on the grading server. :::info **Note:** If the commit you want graded is not your most recent one, you should flag the one you want graded and the grading commit will be updated after the autograder finishes. ::: **Congratulations, you've completed the second CS 300 project!** :tada: --- <small>_Acknowledgements:_ DMalloc was originally developed for Harvard's CS 61 course. We are grateful to Eddie Kohler for allowing us to use the assignment for CS 300.</small>