« Back to the main CS 300 website
Project 3: Caching I/O
Design Discussion due by Friday, March 8th (sign up here)
Project due Friday, March 15th at 6:00 PM EST
Introduction
Caching is the act of putting a small/fast/expensive data store (often called a cache) in front of a large/slow/cheap one. The primary purpose of caching is to increase the performance of data retrieval by storing frequently or recently accessed data within the cache. When data is requested, the system first checks the cache; if the data is found there, it can be retrieved much more quickly than if it had to be fetched from the slower data store.
Indeed, caching is at the heart of many optimizations in computing. Examples include:
- The storage hierarchy (hard drive → SSD → DRAM → L3-L1 cache → registers), as seen in lectures.
- Memoization (i.e., saving results of expensive computations for future reuse).
- Your web browser's cache, which prevents redundant network requests when you access pages you've recently visited.
- Content Delivery Networks, a technology that prevents long-distance Internet traversals by instead keeping copies of web pages close to users (this is the idea behind Akamai and Cloudflare, two billion-dollar companies!).
In this assignment, you will be speeding up a performance-critical aspect of systems programming: the reading and writing of data to and from a filesystem. To do this, you will use ✨ caching ✨. The file I/O cache you will implement in this project is similar to the CPU cache we already touched upon when we discussed alignment, and which we will discuss again in lecture, with some differences. Notably, the file I/O cache is defined in software, rather than provided by hardware (as CPU caches are). This allows the I/O cache to have variable size. Additionally, it your file I/O cache will have a single fixed-size slot (storing one contiguous segment of the file) rather than multiple slots.
Motivation
Due to the design of Hard Disk Drives (HDD, "disks"), interacting with data on disk involves waiting on several slow physical mechanisms. For example, magnetic hard disks require requests to wait for platters to spin and the metal arms to move to the right place where it can read the data off the disk. Even with modern SSDs, I/O operations on them take much longer than operations on memory. Moreover, I/O requires making system calls to the operating system, which are much slower than normal function calls (for reasons we will discuss soon).
If our programs were required to read and write their data solely from disk, they would be unacceptably slow. We saw an example of this in the form of the diskio-slow
program in lecture.
Fortunately, we can do better using caching! If we are willing to sacrifice a small amount of data integrity (i.e., in the event that your computer suddenly loses power, some data is lost), we can gain 100–1,000x in performance. To do this, the computer temporarily holds the data you are using inside of its main memory (DRAM) instead of working directly with the disk. In other words, the main memory (DRAM) acts as a cache for the disk.
Project Description
You will implement an Input/Output (I/O) library that supports operations such as reading data from files (io300_readc
and io300_read
), writing data to files (io300_writec
, io300_write
), or moving to a different offset within the file (io300_seek
). Your I/O library uses a cache to speed up access to data and reduce the number of disk operations required.
For example, when an application wants to read from a file, it can call your io300_read
instead of the system call read
. The arguments to io300_read
are:
io300_read(struct io300_file* const f, char* const buff, size_t const sz)
f
is the file to read from and sz
is the number of bytes to read. buff
is the application buffer – a region of memory (provided by the caller) where the read bytes should go. In other words, io300_read
should write bytes from the data stored in a file (or the cache) into buff
. That way, when the io300_read
call finishes, the caller can read the bytes returned from buff
.
The io300_write
function has the same structure:
io300_write(struct io300_file* const f, const char* buff, size_t const sz)
In this case, the caller puts the bytes they want to write into the application buffer buff
. They then call io300_write
on buff
, which writes the first sz
bytes of buff
to the file f
.
Your goal when writing io300_read
and io300_write
is to introduce an intermediate caching layer to minimize expensive trips to the file on disk.
Here is a visualization of how the application buffer, the cache, and the file interact:
Your approach to implementing this project should be:
- Read the entire handout as some critical information is located in later sections.
- Attend a design discussion, where you will discuss a design for your cache data structure with a TA and with your peers. This includes the mechanics of how your cache works and all metadata required to maintain it.
- Fill out the functions in
impl/student.c
to implement caching file I/O that conforms to the API specified in io300.h
. In addition, you must ensure that make check
reports that your implementation is correct, and make perf
reports that your implementation is at most 5–10x slower than stdio
(the standard library's caching I/O functions) in all cases (see grading rubric).
What’s an API?
- API (Application Programming Interface) – This is a formal contract that a piece of software uses to declare to the outside world how it can be used. An example of this would be the
List<E>
interface in Java which states that anything claiming to be a List
can be asked to add()
, remove()
, insert()
, …, and other "list-like" things. In this project, we have defined an API for an io300_file
struct
representing a file, providing methods for reading, writing, and seeking on the file/struct
that you will implement.
Project Details
Installation
Ensure that your project repository has a handout
remote. Type:
$ git remote show handout
If this reports an error, run:
$ git remote add handout https://github.com/csci0300/cs300-s24-projects.git
Then run:
$ git pull
$ git pull handout main
This will merge our Project 3 (fileio
) stencil code with your repository.
Once you have a local working copy of the repository that is up to date with our stencils, you are good to proceed. You'll be doing all your work for this project inside the fileio
directory in the working copy of your projects repository.
Stencil/Layout
We provide you with the following files:
File |
Purpose |
io300.h |
A list of all the methods your implementation must supply to be used with the test programs. |
impl/ |
Contains the C source for this project. impl/student.c is the only file you have to edit. |
test_scripts/ |
Test scripts provided by us that working implementations will pass. Run make check and make perf to use them. |
test_programs/ |
Contains the test programs that use your implementation to do I/O. |
test_files/ |
Some files to run your implementation on while you are developing. |
In this project, we provide you with a number of existing test programs that do basic file manipulation (see test_programs/*.c
). These programs are written in such a way that all interaction with the filesystem is done through the io300_file
interface (a series of functions whose signatures are declared in io300.h
). This means that if two libraries separately implement the functions in io300.h
(potentially quite differently), the test programs will work with either implementation.
We provided you with two implementations, and you will develop a third.
- The naive implementation (
impl/naive.c
) reads from and writes directly to the disk without any caching. The initial project stencil is identical to this implementation.
- The standard I/O implementation (
impl/stdio.c
) leverages the existing C Standard Library's "buffered" I/O, which does some clever caching.
The speed difference between these two solutions, as measured by running the test programs with each implementation, is astounding! Try it out for yourself:
$ make perf_testdata
$ make -B IMPL=naive
$ time ./byte_cat /tmp/perf_testdata /tmp/testout
real 0m18.515s
user 0m6.819s
sys 0m11.694s
$ make -B IMPL=stdio
$ time ./byte_cat /tmp/perf_testdata /tmp/testout
real 0m0.140s
user 0m0.112s
sys 0m0.028s
The numbers will differ on your computer, but the general relationship (a ~100x performance difference) will hold. (Note that real
is the actual time taken (this is called "wall clock time"), while user
refers to the part of that time spent in userspace, while sys
is the time spent in the kernel.)
In the project, your task is to fill in the third implementation (impl/student.c
) and make it perform as close to the stdio
implementation as possible.
Note: This project deliberately leaves you a lot of design freedom. You should feel free to come up with clever caching designs and benchmark them.
Both simple and involved schemes will work, but cleverer schemes will get closed to the performance of stdio
, and may sometimes even beat it! We will award extra credit for impressive performance (see the grading rubric).
You're welcome (and encouraged) to work with other students in groups to come up with your design. As usual, the code you hand in must be your own.
Getting Started
This project, unlike DMalloc, is in C. This means you cannot use C++ language features and data structures in your solution.
The core idea behind this project is the following:
When a program asks the I/O library to write a byte to a file, instead of immediately going to the disk, put that byte in a cache, then write the entire cache to disk at a later point.
The first thing you should do is look at io300.h
to understand the public API that each implementation offers. Next, you should run make -B IMPL=stdio check
to run the tests on the implementation we provide (you can also run make -B IMPL=naive check
to see how ridiculously slow it is).
The first thing you'll want to do is to work out why the stencil code (and the naive
implementation) is so slow. To do so, use a handy tool called strace
(for syscall trace), which prints the system calls (calls from your program into the OS) that happen when your program runs.
System calls are expensive: they require changing the processor's privilege mode, lots of safety checks, and they invoke OS code. Moreover, the read
or write
syscalls often go directly to disk.
Start out by investigating how the naive implementation performs on a 755 byte test file:
$ make -B IMPL=naive
$ strace -e trace=read ./byte_cat test_files/words.rot13.txt /tmp/out
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\v\2\0\0\0\0\0"..., 832) = 832
[...]
read(3, "1", 1) = 1
read(3, ".", 1) = 1
read(3, "1", 1) = 1
read(3, " ", 1) = 1
read(3, "V", 1) = 1
read(3, "a", 1) = 1
read(3, " ", 1) = 1
read(3, "g", 1) = 1
[...]
The initial lines relate to read()
system calls that happen as the OS starts up your executable (which is encoded in a format called "ELF"). You can ignore those lines. At some point, you'll see a lot of lines like read(3, "V", 1) = 1
. A line like this means that the program (or, in this case, the I/O library used) invoked the system call read()
with arguments 3
(a number that refers to file descriptor), "V"
, and 1
, and that the return value was 1
(as indicated by = 1
).
Think about what this output means. Can you see a reason why this is inefficient?
Give me more hints!
Let's consider a thought experiment, this time with the write()
system call.
Imagine that I am writing a program that will sequentially write 1000 bytes (all of value 'c'
) to a file, one byte at a time.
One way to do it would be to call write()
1000 times, once per byte.
for (int i = 0; i < 1000; i++) {
write(fd, 'c', 1);
}
Another option would be to create a local variable char buff[40]
and put bytes into that. Once we have written 40 bytes to buff
, we issue one single call to write(fd, buff, 40)
.
char buff[40];
for (int i = 0; i < 1000; i++) {
buff[i % 40] = 'c';
if (i % 40 == 39 || i == 999)
write(fd, buff, 40);
}
The critical thing to remember is that a call to write(fd, 'c', 1)
and write(fd, buff, 40)
will take nearly the exact same amount of time because the cost of executing a system call is much larger than the difference between moving 1 byte and moving 40 bytes.
So with the second solution, we can reduce the number of syscalls by 40x.
In this project, you will be generalizing the second solution to deal with things like writing bytes backwards (decreasing indices) and writing chunks of bytes all at once.
Design Discussion
To succeed at this project, it is crucial that you think conceptually about your design before you write substantial amounts of code. One good way to check a design is to discuss it with someone else. Hence, we ask you to sign up for a design discussion with the course staff. During the design discussion, you will come up with a design together, and will receive feedback on your ideas.
Sign up for a design discussion slot (if you haven't yet signed up, do so here). Read through the entire handout before going to your design discussion, and think about some of the considerations you will discuss.
Consider the following:
- What metadata is needed to keep track of where to next read/write data (1) in the file and (2) in the cache?
- What metadata is needed to determine the parts of the file that are currently in the cache?
- When do you access the file versus the cache? Accessing the cache is preferable for speed, but not always possible.
- When do you transfer data in the cache to the file and vice versa? What data is then in the cache and in the file?
- How is your cache laid out in memory?
If your design discussion is in second half of the first week of the project, we recommend that you work through the following material, so that you can get started on the project and check your design during the design discussion.
To help think through the design for your cache, consider the following example code. The file we are caching is testfiles/tiny.txt
and the cache size is just 8 bytes in the example.
char buffer[5];
io300_file* testFile = io300_open("testfiles/tiny.txt", "tiny!");
ssize_t r = io300_read(testFile, buffer, 5);
ssize_t w = io300_write(testFile, "aaa", 3);
r = io300_read(testFile, buffer, 2);
ssize_t s = io300_seek(testFile, 12);
w = io300_write(testFile, "aaa", 3);
r = io300_readc(testFile);
io300_close(testFile);
Think about how you would fill out following template for every line in the sequence of operations:
You will end up with eight such pictures with notes on the state of the cache and the metadata. You can write these digitally (e.g., in a Google Doc) or on a piece of paper. Drawing pictures on paper is an excellent way to work on this part of the assignment!
Remember to think about how the OS will track the current read/write position in the file. We explain this in the handout below, we'll talk about it in lecture, and the design discussion will also cover this. Then also think about how you would keep track of the current read/write position in your cache, and how that position might differ from the file position managed by the OS.
Things to Think About
Here are some questions to ask yourself before/while implementing:
- When should I call
read()
and fill my ?
- What happens to the data in the cache when a program reads or write the last byte in the cache? What about one byte past this?
- How do I know or keep track of the fact that the cache has been modified?
- What happens when
flush
is called, but nothing in the cache has been changed?
- What happens if a syscall fails?
- What happens when I read a byte from a file that is all
1
s (0xFF
, 0b11111111
)? How is this different than the integer -1
? How does this mess with return values from things like read()
and write()
? What is the deal with unsigned
vs signed char
? (some answers to these can be found in example/io_return_example.c
).
- What happens if we seek to a location that is within the cache? How should this differ from seeking to a location outside of the cache?
To help you out as you're designing your cache, we'll share what your contents of your file, cache, and return value are after two select lines of code.
After Line 2
- File Contents: "this is a test\n"
- Cache Contents: "this is "
- Return: 5
After Line 6
- File Contents: "this aaaa test\n"
- Cache Contents: "a teaaa"
- Return: 3
Implementing Caching I/O
Now it's time to get coding! Be sure to read the rest of the handout first, however.
Task:
Complete the functions in impl/student.c
marked with TODO
comments such that you implement cached file IO.
It will make sense to proceed step by step, and we provide some guidance on getting started below.
Your final implementation should:
- Conform to the API specified in
io300.h
.
- Pass all tests run in
make check
reports that your implementation is correct
- Perform no more than 5–10x slower than stdio in all cases tested in
make perf
.
We highly recommend that you thoroughly read and engage with the following sections when working on developing your caching I/O strategy.
Note: you cannot use any C Standard Library I/O functions while implementing your library (this would defeat the purpose of the assignment!). These include functions like fread
, fwrite
, fgetc
, fputc
, fseek
. Remember to only create one cache. You will not need to call malloc
anywhere, except in the place that we already provide for you.
Information about files (very helpful!)
Here are some statements about files that are true for most Unices (UNIX-derived operating systems, such as Linux or macOS), and in particular your course developement environment.
-
A file is an ordered sequence of bytes - not text, not ASCII characters, not UTF-8 characters, but just bytes. The way we interpret the bytes present in a file leads to a file's colloquial "type" (like a text file, or a CSV, or a compiled binary). To this end, file extensions (.txt
, .csv
, .html
, .c
, .exe
) are largely irrelevant. They merely serve as hints as to how their bytes should be interpreted. In fact, there is nothing special about the .
character in a filename. So rather than thinking about a file called malte.jpeg
as an object called malte
of type .jpeg
, it should be considered to be a sequence of bytes whose name is malte.jpeg
(additionally, you may take the .jpeg
suffix to informally mean "this probably contains data in the JPEG format and is suitable to be opened by photo viewers").
$ gcc helloworld.c -o helloworld
$ ./helloworld
hello world
$ gcc helloworld.c -o helloworld.cool.program.300
$ ./helloworld.cool.program.300
hello world
$ head -c 10 helloworld.cool.program.300 | xxd
00000000: cffa edfe 0700 0001 0300 ..........
It's just bytes!
-
The way we interact with files is through file descriptors. A file descriptor is an integer that the operating system uses to identify a file. Any time you want to do something to that file, you pass the operating system the file descriptor as a reference. The way we get file descriptors is with the open
syscall.
int fd = open("myfile.txt", O_RDONLY);
From now on, if we want to do something with myfile.txt
, we will pass the operating system fd
so that it knows which file to manipulate on our behalf.
-
Once a file has been open
ed, we can start manipulating it. When a file is open, the operating system keeps track of an index into that file that represents the "read/write head" of that file. This is like using your finger to scan across the lines while you read. Wherever the "head" points is the next byte to be read, or if we want to write a byte, it is the location where the next byte will be written.
-
Once we have a file descriptor (which identifies a file), there are three basic operations we can do to modify the bytes within that file.
read(fd, buffer, size)
will read size
bytes from the file identified by fd
into the location in memory pointed to by buffer
. read
returns the number of bytes that were successfully read from the file, and then increments the file's read/write head by that number of bytes.
write(fd, buffer, size)
will write size
bytes starting at memory location buffer
into the file identified by fd
. write
returns the number of bytes that were successfully written to the file, and then increments the file's read/write head by that number of bytes.
lseek(fd, index, SEEK_SET)
will change the value of the read/write head of a file identified by fd
to index
. This allows us to "skip" around in a file and read/write from arbitrary locations (we don't always want to read/write sequentially from the start of a file).
Note: lseek
can be called on a position beyond the end of the file (EOF), though this will not change the file's size. If data is written at this position, then the file's size grows to this point and the space between between the old EOF and the written bytes should be filled with null bytes. You'll need to consider this edge case in your caching design!
Making Progress
One of your early goals should be to get the byte_cat
program working for a small text file.
$ touch /tmp/out.txt
$ make clean all
$ ./byte_cat test_files/tiny.txt /tmp/out.txt
$ cat /tmp/out.txt
this is a test
The byte_cat
program performs no seeks and is the most simple test case you can create. Here is a list of the functions that byte_cat
uses: open
, close
, filesize
, readc
, writec
. We provide all of filesize
, and open
/close
are simple, so you just have to get your reading and writing logic down!
Debugging Tips
The easiest way to spend a lot of time on this project is to just write code, or to randomly edit your code, without thinking and debugging carefully. Make sure you use the below techniques to make your debugging productive!
Note that the stencil implementation passes all correctness tests, but it is very slow! Additionally, you might find that the performance tests pass locally but not on the grading server. Tests must pass on the grading server to receive credit.
As you work on the project, you will likely break the correctness tests in your quest to improve performance.
If you fail a correctness test, the first thing you will usually want to look into is whether it failed because of a crash in your code (which you can develop using familiar tools such as GDB), or whether it ran, but produced incorrect output. When you see incorrect output file contents, it makes sense to work out how they're incorrect, by looking at the actual bytes in the input and output files! Check out the Debugging section below for some helpful tools, such as xxd
and diff
.
Make sure your program can handle non-ASCII characters! Not all files consist of text; for example, images or videos can contain arbitrary bytes, including the NUL
byte that has special meaning for text, but not in other data.
Additionally, it can also help to look at the output from strace
, which lists the system calls that your caching I/O library makes, to see if anything unexpected happens during the test.
If your performance is poor, strace
should be your first point of call. Poor performance in this project is almost always explained by your implementation making too many system calls. Running tests under strace
will help you understand exactly what sequence of system calls happen, and how many bytes they move into and out of the cache.
Finally, you will likely want to have debug output in your library that you can disable easily when running performance tests, as printf
-type functions slow down your program a lot. We provide the helper function dbg()
in the stencil for this purpose: you can change the value of the DEBUG_PRINT
constant to turn the debug printing code on or off in a single keystroke. dbg()
works just like printf()
, but if DEBUG_PRINT
is set to 0, the compiler will throw away the code in the function and turn it into a no-op.
Consider also writing your own tests if you want to test specific behavior. For more information, look at Testing and Appendix II.
Finishing Up
Once you have a working implementation, test it with make check
(correctness) and make perf
(performance). You probably won't match stdio
on all performance benchmarks yet, but you will meet the performance bar for this project if you achieve the following:
- Your implementation is within 10x of
stdio
on the byte_cat
benchmarks (byte_cat
and reverse_byte_cat
).
- Your implementation is within 5x of
stdio
on the block_cat
tests (block_cat
, reverse_block_cat
, and random_block_cat
).
If you get closer to stdio
(or even beat it on some tests), we will award extra credit.
Testing
Overview
make check
tests your implementation for correctness. It does this by running all of the test programs with a variety of inputs on random files, and is split into two categories: (1) basic functionality tests, which test the core functionality of your implemented functions and their interactions with one another, and (2) end-to-end tests, which test your implementation with standard file operations (such as copying between files).
Basic functionality tests
The basic functionality tests use the io300_test
program. The program randomly generates a single file, which it then opens and performs/tests the specified functions on. The io300_test
program verifies that each io300_readc
or io300_read
call reads the correct bytes from the file, and verifies at the end of the test that all writes are reflected in the final file.
The name of each test specifies the functions tested – for example, readc/writec
will perform a combination of io300_readc
and io300_writec
calls on the same generated file. You can also parse this information from the command that runs the test (io300_test
with some arguments), which specifies the test's configuration. (make check
prints this command with each test when running.) For instance, for seek_beyond_eof
, the testing script runs:
./io300_test readc writec seek -s 0 -m 4096 -n 1000
which specifies that the program will randomly generate a file 0 bytes long (-s 0
), which is allowed to grow to a size of 4096 bytes (-m 4096
), on which it will then perform 1000 (-n 1000
) calls to randomly either io300_readc
or io300_writec
, each preceded by a call to io300_seek
to some location between 0 and the maximum file size. Running just ./io300_test
prints a description of the values of these arguments as well.
Note: If you fail basic tests (e.g., readc/writec
) while passing end-to-end tests (e.g., byte_cat
), the reason is almost certainly that you are not correctly handling operations when both reads and writes happen on the same file.
Make sure that both read and write functions leave the cache metadata in a state that the other function can work with! (In other words, invariants need to be preserved, and – if appropriate – restored, by these functions.)
End-to-end tests
The end-to-end tests use a combination of your io300_*
functions within larger programs. Most of these programs operate on separate input and output files, so there will be two caches (one for the input file, one for the output file).
make perf
tests your implementation for speed and compares it to stdio
on the end-to-end tests.
You will find that the correctness tests already pass on the stencil code! This is because the stencil code contains a copy of a naive implementation that always makes system calls for every operation (i.e., there is no caching).
The stencil code will not pass the performance tests. You also cannot simply hand in the stencil code and receive the 60% credit for correctness ;-)
The performance tests will run very slowly in the unmodified stencil code (e.g., on an M2 Apple MacBook Air, many of the tests take over a minute to run). You will need to make them run faster by implementing caching!
⚠️ Important note on performance ⚠️
Performance can vary across computers. The Docker container adds additional overhead when you run locally, and there are differences in architecture (Apple Silicon vs. Intel Machines) and software (Windows vs. macOS vs. Linux) that affect the speed at which the performance tests run.
We have made efforts to make the tests as consistent as possible, but please always check your performance on the grading server once the performance tests pass locally. We will ultimately grade you based on the performance on the grading server!
Working with tests
The end-to-end correctness and performance tests will run your implementation on each specified test program using a randomly-generated file of a certain size. For your own personal debugging use, we've provided you with the tools to easily generate these testing files.
Generating test data
If you wish to generate random testing data, use make check_testdata
, which generates 80 KiB of data to test for correctness, and make perf_testdata
, which generates 1 MiB of data to test for speed. These files will be located at /tmp/check_testdata
and /tmp/perf_testdata
, respectively.
Running a single test
You can run a single test for performance as follows:
$ make -B IMPL=student
$ make perf_testdata
$ time ./byte_cat /tmp/perf_testdata /tmp/testout
...
The following is an example of running a simple test for correctness:
$ make check_testdata
$ ./byte_cat /tmp/check_testdata /tmp/testout
$ diff /tmp/check_testdata /tmp/testout
Advanced testing
For a more complex test, you may need to generate intermediate output files. See correctness_tests.py
for examples!
$ ./reverse_byte_cat /tmp/check_testdata /tmp/testout
$ ./reverse_byte_cat /tmp/testout /tmp/testout2
$ diff /tmp/check_testdata /tmp/testout2
You may also want to write additional tests to test specific features individually of your code, particularly if you plan to implement extra credit features. For more information, see Appendix II.
Debugging
You may find the following tools (in addition to GDB and sanitizers) helpful in debugging this project.
Helpful Commands
man
displays the authoritative documentation for all functions.
man 2 open
man 2 close
man 2 read
man 2 write
man 2 lseek
dd if=/dev/urandom ...
is used in our test scripts to transfer random binary data into a file.
wc -c <file>
displays the number of bytes in a file.
du -h <file>
also displays the size of a file and is better for large files (it gives human readable byte numbers).
head -c 40 | xxd
or tail -c 40 | xxd
will give you a peek at the beginning or end of a file. This is useful for inspecting large files where xxd
would normally flood your screen.
Hexdump
xxd <file>
allows you to view the contents of binary files. This will be helpful when you have a buggy implementation and you want to see what is going wrong, by printing out what's in your result file. The code below demonstrates an example.
$ echo 'this is ascii' > out.bytes
$ xxd out.bytes
00000000: 7468 6973 2069 7320 6173 6369 690a this is ascii.
$ dd if=/dev/urandom of=out.bytes bs=32 count=1
1+0 records in
1+0 records out
32 bytes copied, 0.000336055 s, 95.2 kB/s
$ xxd out.bytes
00000000: 65b7 6c53 69f3 f1ed e6d2 09eb ec66 9403 e.lSi........f..
00000010: f33c e929 d703 314f e7dd 5e6b 56a0 2d28 .<.)..1O..^kV.-(
Diff
diff <file1> <file2>
will tell you if the input files differ. This may again be helpful when you have a buggy implementation and you want to figure out where in the output file you're differing from the expected content.
strace
strace
, as mentioned above, is a tool that provides diagnostic information about the system calls a program is making. This may be especially helpful when you are trying to improve the performance of your implementation.
Expand for a more complete description of strace!
strace
(on Linux) allows you to view the system calls that a program makes. Here is an example of running strace on a plain hello world C program (fed in through stdin). Pay attention to the final two lines of the output, namely the call to write()
and exit()
. Everything else you see is just the shell starting the program and can be ignored.
$ printf 'int main(){printf("hello world\\n");}' \
> | gcc -w -x c - && strace ./a.out
execve("./a.out", ["./a.out"], 0x7ffe85175530 /* 26 vars */) = 0
brk(NULL) = 0x563c71960000
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=26696, ...}) = 0
mmap(NULL, 26696, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f8ee6db4000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20\35\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=2030928, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8ee6db2000
mmap(NULL, 4131552, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f8ee67a1000
mprotect(0x7f8ee6988000, 2097152, PROT_NONE) = 0
mmap(0x7f8ee6b88000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1e7000) = 0x7f8ee6b88000
mmap(0x7f8ee6b8e000, 15072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f8ee6b8e000
close(3) = 0
arch_prctl(ARCH_SET_FS, 0x7f8ee6db34c0) = 0
mprotect(0x7f8ee6b88000, 16384, PROT_READ) = 0
mprotect(0x563c6fd55000, 4096, PROT_READ) = 0
mprotect(0x7f8ee6dbb000, 4096, PROT_READ) = 0
munmap(0x7f8ee6db4000, 26696) = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
brk(NULL) = 0x563c71960000
brk(0x563c71981000) = 0x563c71981000
write(1, "hello world\n", 12hello world
) = 12
exit_group(0) = ?
+++ exited with 0 +++
We can use strace
to see how frequently our programs are making calls to read()
and write()
as an indirect way to measure performance.
The following is a demonstration of the number of these syscalls that take place when running byte_cat
on a 755 byte long file.
Notice how the naive implementation makes around 750 reads and writes (this makes sense because the naive implementation calls read()
and write()
once per character) while the stdio implementation makes <10% fewer read and write calls.
$ wc -c test_files/words.rot13.txt
755 test_files/words.rot13.txt
$ make -B IMPL=naive byte_cat
gcc -ggdb3 -Wall -Wextra -Wshadow -std=gnu11 -fsanitize=address -fsanitize=undefined -fsanitize=leak test_programs/byte_cat.c impl/naive.c -o byte_cat
$ strace ./byte_cat test_files/words.rot13.txt out.txt 2>&1 | grep read | wc -l
804
$ strace ./byte_cat test_files/words.rot13.txt out.txt 2>&1 | grep write | wc -l
758
$ make -B IMPL=stdio byte_cat
gcc -ggdb3 -Wall -Wextra -Wshadow -std=gnu11 -fsanitize=address -fsanitize=undefined -fsanitize=leak test_programs/byte_cat.c impl/stdio.c -o byte_cat
$ strace ./byte_cat test_files/words.rot13.txt out.txt 2>&1 | grep read | wc -l
50
$ strace ./byte_cat test_files/words.rot13.txt out.txt 2>&1 | grep write | wc -l
4
Note from Your TAs: Although usingstrace
may seem intimidating, it is incredibly useful in improving performance! In particular, it can help you visualize the sequence of system calls.
dbg()
Check out the function:
static void dbg(struct io300_file *f, char *fmt, ...)
in impl/student.c
. Use it to debug while you are working, and then you can silence its output with one keystroke when you hand in. It acts just like printf()
, but it also logs your file's metadata so you can see what is happening as your program executes. When you add fields to your file structure, be sure to include them in the format string after the TODO
in dbg()
.
Here is an example of using it.
int io300_writec(struct io300_file *f, int ch) {
dbg(f, "writing char: %c\n", ch);
...
GDB
As mentioned in previous assignments and in the debugging tips section, GDB is a very useful tool for stepping through your code. To set up gdb for a specific test, you can do the following:
- Run the command
gdb [test executable]
(e.g., gdb stride_cat
)
- Set breakpoints using
b [location]
(e.g., b io300_read
)
- Run
r [arguments]
(e.g., r test_files/tiny.txt /tmp/out.txt
)
- Note that some tests take in more parameters (for example,
block_cat
takes in an extra parameter of block size). You can see what parameters each test takes in by running the test executable with no parameters (e.g., ./block_cat
).
You may find it helpful to print out the contents of the cache and verify that it matches the steps in your design. See the GDB guide for more useful commands.
To optimize your implementation and achieve more impressive performance, you can consider some of the following extensions! If you are a graduate student taking CSCI 1310, please implement and document at least one performance optimization.
Adaptive Caching
Easy. Detect the direction of access within a file (are the reads occurring in forward or reverse) and adapting your caching strategy to pre-fetch that memory. Measure how much of an impact adaptive caching has on performance over your basic implementation by creating additional test programs in test_scripts/correctness_tests.py
and test_scripts/performance_tests.py
. Document how you implemented adaptive caching and its performance impact in your README file.
Diabolical byte_cat
Moderately easy. Check out the diabolical_byte_cat
test program. This test program works similarly to byte_cat
, but tries to throw off your cache by doing some gratuitous seeks!
Edit test_scripts/correctness_test.py
and uncomment diabolical_byte_cat
as part of the end-to-end tests (at the bottom of the file), and do the same in test_scripts/performance_test.py
. Now optimize your code to handle the situations that this test program creates. Make sure that you still pass the correctness tests, and meet the performance bar for the diabolical_byte_cat
performance test.
Detecting Strides
Moderately difficult. Have your cache detect "strides", or access patterns that jump by fixed amounts within a file. For example, reading byte 1, then byte 1001, then byte 2001, and pre-fetching appropriately. Measure how much of an impact detecting strides has on performance over your basic implementation by creating additional test programs in test_scripts/correctness_tests.py
and test_scripts/performance_tests.py
. Document how you implemented stride detection and its performance impact in your README file.
Memory-Mapped I/O
Difficult. Research "memory-mapped I/O" and how it works, then use it in your implementation when possible. Measure how much of an impact memory-mapped I/O has on performance over your basic implementation by creating additional test programs in test_scripts/correctness_tests.py
and test_scripts/performance_tests.py
. Document how you implemented memory-mapped I/O and its performance impact in your README file.
Handing In & Grading
Handin instructions
As before, you will hand in your code using Git. In the fileio/
subdirectory of your project repository, you MUST fill in the text file called README.md
.
Remind me again what the README.md
should contain?
The README.md
file will include the following:
- Any design decisions you made and comments for graders, under "Design Overview". If there's nothing interesting to say, just list "None".
- Any collaborators and citations for help that you received, under "Collaborators". CS 300 encourages collaboration, but we expect you to acknowledge help you received, as is standard academic practice.
- Ranking the difficulty and time consumption of the project, compared to other projects so far. We won't use this for grading, but we collect the data to calibrate for future years.
Grading breakdown
While the project is about speed, correctness is more important than performance!
We will not award credit for performance tests that pass if you fail the related correctness tests, but we will be generous with partial credit if you pass correctness tests and fail to meet performance goals.
To understand why, consider an implementation that always throws all data away: this would be infinitely fast and meet all the performance goals, but of course it isn't a useful caching I/O library!
- 60% (60 points) for passing the correctness tests. If your tests pass on the grading server with sanitizers, you've probably got all these points. Each basic functionality test is worth 1.5 points, each end-to-end test is worth 3 points, and you get 1.5 points for passing all tests.
- 40% (40 points) for meeting the performance goals: your
byte_cat
runs must be within 10x of the stdio
runtime, and your other runs (block_cat
, etc.) must be within 5x of the stdio
runtime. Each benchmark is worth 6.5 points, and you get 1 point for passing them all. We will award partial credit if you miss the target but improve over the naive implementation. You will not receive credit for good performance on benchmarks that fail the correctness tests; it is more important to be correct than to be fast.
- Up to 10 points of extra credit for impressive performance optimizations (e.g., matching or beating
stdio
).
Graduate students taking CSCI 1310 should implement and measure one performance optimization, and describe it in their README. Make sure to cover, in a few sentences, what the optimization is, why you expected it to improve performance, and whether it did or did not improve performance in your experiments.
Now head to the grading server, make sure that you have the "Caching I/O" page configured correctly with your project repository, and check that your tests pass on the grading server as expected.
Note: If the commit you want graded is not your most recent one, you should flag the one you want graded and the grading commit will be updated after the autograder finishes.
Congratulations, you've completed the third CS 300 project!
Appendix I: Implementation Swapping
This section explains how our test programs can use different implementations that match the io300.h
API.
Consider the following to be our test program:
#include <stdio.h>
extern int getRandomNumber();
int main() {
printf("%d\n", getRandomNumber());
}
Now we want an implementation of the getRandomNumber
function that we can link with the test program to produce a fully functional number printer.
#include <stdlib.h>
int getRandomNumber() { return rand(); }
int getRandomNumber() { return 4; }
Now when we compile the main test program, we have a choice. Since both implementations conform to the same public API (namely, they provide one function getRandomNumber :: void -> int
), either can be swapped in and the main program won't know the difference.
$ gcc test-prog.c real-rand.c && ./a.out
16807
$ gcc test-prog.c my-rand.c && ./a.out
4
This is the principle behind the different implementations for this project, but instead of the public API being a single function, it is a family of IO related functions defined in io300.h
At a high level, to test your implementation, we will be doing something like this:
$ gcc our-test-program.c impl/stdio.c && time ./a.out
$ gcc our-test-program.c impl/naive.c && time ./a.out
$ gcc our-test-program.c impl/student.c && time ./a.out
to compare the speed of your implementation against the provided ones.
Appendix II: Writing Custom Tests
While you are certainly not required to design your own tests—passing the autograder tests we provide is more than enough—this assignment has a lot of moving parts! This appendix will help you write custom tests, which you may want to do for testing specific features (e.g. writing) without the requirement of everything else working.
Let's create a test program, student_test.c
which opens a file, checks its filesize, and writes out an equivalent number of Zs to an output file.
To create this custom test:
- Create a new test file,
student_test.c
, in the test_programs
folder:
#include <stdio.h>
#include "../io300.h"
int main(int argc, char *argv[]) {
if(argc != 3) {
fprintf(stderr, "usage: %s <in-file> <out-file>\n", argv[0]);
return 1;
}
struct io300_file *in = io300_open(argv[1], "student test infile");
if (in == NULL) {
return 1;
}
struct io300_file *out = io300_open(argv[2], "student test z-ified outfile");
if (out == NULL) {
io300_close(in);
return 1;
}
off_t const filesize = io300_filesize(in);
if (filesize == -1) {
fprintf(stderr, "error: could not compute filesize\n");
io300_close(in);
io300_close(out);
return 1;
}
io300_close(in);
for (int i = 0; i < filesize; i++) {
if(io300_writec(out, 'Z') == -1) {
fprintf(stderr, "error: write should not fail.\n");
io300_close(out);
return 1;
};
}
io300_close(out);
return 0;
}
- Add your test name to the Makefile
In order to create the binary for this test, add its name (for this example, student_test
) to TEST_PROGRAMS
in the project Makefile. This will make sure this new test gets built along with the existing autograder tests when make
is run
- Test\(^2\) – Test your test!
A bad test can be very misleading, leading you to think that your code is failing when in actuality it may be completely correct. Try running your test manually on sample files from test_files
, and triple check that everything seems to work as intended.
- Add your test to the autograder
This project has autograders for both correctness (test_scripts/correctness_test.py
) and performance (test_scripts/performance_test.py
). Let's add our test to the correctness autograder, as we're testing the results of our write function
Our autograder runs each provided test, and asks two questions: did it pass (i.e. returned 0), and is the output actually correct?
Let's add a new Python function for our test:
def student_test(infile, outfile, outfile2):
assert shell_return(f"printf '%*s' $(cat {infile} | wc -c) | tr ' ' 'Z' > {outfile2}") == 0
return shell_return(f'./student_test {infile} {outfile}') == 0 \
and files_same(outfile, outfile2)
You can see the two parts of our test above. Did it pass? Well, shell_return(f'./student_test {infile} {outfile}') == 0
means an error wasn't raised. Is our output correct? If our manual and generated "Z-files" are the same, we can trust that it is: files_same(outfile, outfile2)
.
Finally, we can add our new test to our runtests
block at the bottom of our autograder file, which handles the execution of each test:
runtests({
'rot13': rot13,
'student_test': student_test
})
- Run the autograder
Huzzah: your test is complete! Running make check
should provide the results of your custom test, which hopefully passes :)
======= CORRECTNESS RESULTS =======
byte_cat: PASSED
[…]
rot13: PASSED
student_test: PASSED
This test was just one sample test you could write. All of our autograder tests are available in test_scripts/
, so feel free to reference them in creating your own. Further, the process for adding performance tests is very similar; reference test_scripts/performance_test.py
to see the specific syntax used for its tests!
Be careful when modifying tests or the autograder, as changing existing tests could lead to misleading autograder results.
Acknowledgements: The IO project was developed for CS 300.
« Back to the main CS 300 website
Project 3: Caching I/O
Design Discussion due by Friday, March 8th (sign up here)
Project due Friday, March 15th at 6:00 PM EST
Introduction
Caching is the act of putting a small/fast/expensive data store (often called a cache) in front of a large/slow/cheap one. The primary purpose of caching is to increase the performance of data retrieval by storing frequently or recently accessed data within the cache. When data is requested, the system first checks the cache; if the data is found there, it can be retrieved much more quickly than if it had to be fetched from the slower data store.
Indeed, caching is at the heart of many optimizations in computing. Examples include:
In this assignment, you will be speeding up a performance-critical aspect of systems programming: the reading and writing of data to and from a filesystem. To do this, you will use ✨ caching ✨. The file I/O cache you will implement in this project is similar to the CPU cache we already touched upon when we discussed alignment, and which we will discuss again in lecture, with some differences. Notably, the file I/O cache is defined in software, rather than provided by hardware (as CPU caches are). This allows the I/O cache to have variable size. Additionally, it your file I/O cache will have a single fixed-size slot (storing one contiguous segment of the file) rather than multiple slots.
Motivation
Due to the design of Hard Disk Drives (HDD, "disks"), interacting with data on disk involves waiting on several slow physical mechanisms. For example, magnetic hard disks require requests to wait for platters to spin and the metal arms to move to the right place where it can read the data off the disk. Even with modern SSDs, I/O operations on them take much longer than operations on memory. Moreover, I/O requires making system calls to the operating system, which are much slower than normal function calls (for reasons we will discuss soon).
If our programs were required to read and write their data solely from disk, they would be unacceptably slow. We saw an example of this in the form of the
diskio-slow
program in lecture.Fortunately, we can do better using caching! If we are willing to sacrifice a small amount of data integrity (i.e., in the event that your computer suddenly loses power, some data is lost), we can gain 100–1,000x in performance. To do this, the computer temporarily holds the data you are using inside of its main memory (DRAM) instead of working directly with the disk. In other words, the main memory (DRAM) acts as a cache for the disk.
Project Description
You will implement an Input/Output (I/O) library that supports operations such as reading data from files (
io300_readc
andio300_read
), writing data to files (io300_writec
,io300_write
), or moving to a different offset within the file (io300_seek
). Your I/O library uses a cache to speed up access to data and reduce the number of disk operations required.For example, when an application wants to read from a file, it can call your
io300_read
instead of the system callread
. The arguments toio300_read
are:f
is the file to read from andsz
is the number of bytes to read.buff
is the application buffer – a region of memory (provided by the caller) where the read bytes should go. In other words,io300_read
should write bytes from the data stored in a file (or the cache) intobuff
. That way, when theio300_read
call finishes, the caller can read the bytes returned frombuff
.The
io300_write
function has the same structure:In this case, the caller puts the bytes they want to write into the application buffer
buff
. They then callio300_write
onbuff
, which writes the firstsz
bytes ofbuff
to the filef
.Your goal when writing
io300_read
andio300_write
is to introduce an intermediate caching layer to minimize expensive trips to the file on disk.Here is a visualization of how the application buffer, the cache, and the file interact:
Your approach to implementing this project should be:
impl/student.c
to implement caching file I/O that conforms to the API specified inio300.h
. In addition, you must ensure thatmake check
reports that your implementation is correct, andmake perf
reports that your implementation is at most 5–10x slower thanstdio
(the standard library's caching I/O functions) in all cases (see grading rubric).What’s an API?
List<E>
interface in Java which states that anything claiming to be aList
can be asked toadd()
,remove()
,insert()
, …, and other "list-like" things. In this project, we have defined an API for anio300_file
struct
representing a file, providing methods for reading, writing, and seeking on the file/struct
that you will implement.Project Details
Installation
Ensure that your project repository has a
handout
remote. Type:If this reports an error, run:
Then run:
This will merge our Project 3 (
fileio
) stencil code with your repository.Once you have a local working copy of the repository that is up to date with our stencils, you are good to proceed. You'll be doing all your work for this project inside the
fileio
directory in the working copy of your projects repository.Stencil/Layout
We provide you with the following files:
io300.h
impl/
impl/student.c
is the only file you have to edit.test_scripts/
make check
andmake perf
to use them.test_programs/
test_files/
In this project, we provide you with a number of existing test programs that do basic file manipulation (see
test_programs/*.c
). These programs are written in such a way that all interaction with the filesystem is done through theio300_file
interface (a series of functions whose signatures are declared inio300.h
). This means that if two libraries separately implement the functions inio300.h
(potentially quite differently), the test programs will work with either implementation.We provided you with two implementations, and you will develop a third.
impl/naive.c
) reads from and writes directly to the disk without any caching. The initial project stencil is identical to this implementation.impl/stdio.c
) leverages the existing C Standard Library's "buffered" I/O, which does some clever caching.The speed difference between these two solutions, as measured by running the test programs with each implementation, is astounding! Try it out for yourself:
$ make perf_testdata # generates 1 MiB performance test file $ make -B IMPL=naive # compiles naive implementation $ time ./byte_cat /tmp/perf_testdata /tmp/testout real 0m18.515s user 0m6.819s sys 0m11.694s $ make -B IMPL=stdio $ time ./byte_cat /tmp/perf_testdata /tmp/testout real 0m0.140s user 0m0.112s sys 0m0.028s
The numbers will differ on your computer, but the general relationship (a ~100x performance difference) will hold. (Note that
real
is the actual time taken (this is called "wall clock time"), whileuser
refers to the part of that time spent in userspace, whilesys
is the time spent in the kernel.)In the project, your task is to fill in the third implementation (
impl/student.c
) and make it perform as close to thestdio
implementation as possible.Note: This project deliberately leaves you a lot of design freedom. You should feel free to come up with clever caching designs and benchmark them.
Both simple and involved schemes will work, but cleverer schemes will get closed to the performance of
stdio
, and may sometimes even beat it! We will award extra credit for impressive performance (see the grading rubric).You're welcome (and encouraged) to work with other students in groups to come up with your design. As usual, the code you hand in must be your own.
Getting Started
This project, unlike DMalloc, is in C. This means you cannot use C++ language features and data structures in your solution.
The core idea behind this project is the following: When a program asks the I/O library to write a byte to a file, instead of immediately going to the disk, put that byte in a cache, then write the entire cache to disk at a later point.
The first thing you should do is look at
io300.h
to understand the public API that each implementation offers. Next, you should runmake -B IMPL=stdio check
to run the tests on the implementation we provide (you can also runmake -B IMPL=naive check
to see how ridiculously slow it is).The first thing you'll want to do is to work out why the stencil code (and the
naive
implementation) is so slow. To do so, use a handy tool calledstrace
(for syscall trace), which prints the system calls (calls from your program into the OS) that happen when your program runs.System calls are expensive: they require changing the processor's privilege mode, lots of safety checks, and they invoke OS code. Moreover, the
read
orwrite
syscalls often go directly to disk.Start out by investigating how the naive implementation performs on a 755 byte test file:
The initial lines relate to
read()
system calls that happen as the OS starts up your executable (which is encoded in a format called "ELF"). You can ignore those lines. At some point, you'll see a lot of lines likeread(3, "V", 1) = 1
. A line like this means that the program (or, in this case, the I/O library used) invoked the system callread()
with arguments3
(a number that refers to file descriptor),"V"
, and1
, and that the return value was1
(as indicated by= 1
).Think about what this output means. Can you see a reason why this is inefficient?
Give me more hints!
Let's consider a thought experiment, this time with the
write()
system call.Imagine that I am writing a program that will sequentially write 1000 bytes (all of value
'c'
) to a file, one byte at a time.One way to do it would be to call
write()
1000 times, once per byte.Another option would be to create a local variable
char buff[40]
and put bytes into that. Once we have written 40 bytes tobuff
, we issue one single call towrite(fd, buff, 40)
.The critical thing to remember is that a call to
write(fd, 'c', 1)
andwrite(fd, buff, 40)
will take nearly the exact same amount of time because the cost of executing a system call is much larger than the difference between moving 1 byte and moving 40 bytes.So with the second solution, we can reduce the number of syscalls by 40x.
In this project, you will be generalizing the second solution to deal with things like writing bytes backwards (decreasing indices) and writing chunks of bytes all at once.
Design Discussion
To succeed at this project, it is crucial that you think conceptually about your design before you write substantial amounts of code. One good way to check a design is to discuss it with someone else. Hence, we ask you to sign up for a design discussion with the course staff. During the design discussion, you will come up with a design together, and will receive feedback on your ideas.
Sign up for a design discussion slot (if you haven't yet signed up, do so here). Read through the entire handout before going to your design discussion, and think about some of the considerations you will discuss.
Consider the following:
If your design discussion is in second half of the first week of the project, we recommend that you work through the following material, so that you can get started on the project and check your design during the design discussion.
To help think through the design for your cache, consider the following example code. The file we are caching is
testfiles/tiny.txt
and the cache size is just 8 bytes in the example.Think about how you would fill out following template for every line in the sequence of operations:
You will end up with eight such pictures with notes on the state of the cache and the metadata. You can write these digitally (e.g., in a Google Doc) or on a piece of paper. Drawing pictures on paper is an excellent way to work on this part of the assignment!
Remember to think about how the OS will track the current read/write position in the file. We explain this in the handout below, we'll talk about it in lecture, and the design discussion will also cover this. Then also think about how you would keep track of the current read/write position in your cache, and how that position might differ from the file position managed by the OS.
Things to Think About
Here are some questions to ask yourself before/while implementing:
read()
and fill my cache?flush
is called, but nothing in the cache has been changed?1
s (0xFF
,0b11111111
)? How is this different than the integer-1
? How does this mess with return values from things likeread()
andwrite()
? What is the deal withunsigned
vssigned char
? (some answers to these can be found inexample/io_return_example.c
).To help you out as you're designing your cache, we'll share what your contents of your file, cache, and return value are after two select lines of code.
After Line 2
After Line 6
Implementing Caching I/O
Now it's time to get coding! Be sure to read the rest of the handout first, however.
Task: Complete the functions in
impl/student.c
marked withTODO
comments such that you implement cached file IO.It will make sense to proceed step by step, and we provide some guidance on getting started below.
Your final implementation should:
io300.h
.make check
reports that your implementation is correctmake perf
.We highly recommend that you thoroughly read and engage with the following sections when working on developing your caching I/O strategy.
Note: you cannot use any C Standard Library I/O functions while implementing your library (this would defeat the purpose of the assignment!). These include functions like
fread
,fwrite
,fgetc
,fputc
,fseek
. Remember to only create one cache. You will not need to callmalloc
anywhere, except in the place that we already provide for you.Information about files (very helpful!)
Here are some statements about files that are true for most Unices (UNIX-derived operating systems, such as Linux or macOS), and in particular your course developement environment.
A file is an ordered sequence of bytes - not text, not ASCII characters, not UTF-8 characters, but just bytes. The way we interpret the bytes present in a file leads to a file's colloquial "type" (like a text file, or a CSV, or a compiled binary). To this end, file extensions (
.txt
,.csv
,.html
,.c
,.exe
) are largely irrelevant. They merely serve as hints as to how their bytes should be interpreted. In fact, there is nothing special about the.
character in a filename. So rather than thinking about a file calledmalte.jpeg
as an object calledmalte
of type.jpeg
, it should be considered to be a sequence of bytes whose name ismalte.jpeg
(additionally, you may take the.jpeg
suffix to informally mean "this probably contains data in the JPEG format and is suitable to be opened by photo viewers").It's just bytes!
The way we interact with files is through file descriptors. A file descriptor is an integer that the operating system uses to identify a file. Any time you want to do something to that file, you pass the operating system the file descriptor as a reference. The way we get file descriptors is with the
open
syscall.From now on, if we want to do something with
myfile.txt
, we will pass the operating systemfd
so that it knows which file to manipulate on our behalf.Once a file has been
open
ed, we can start manipulating it. When a file is open, the operating system keeps track of an index into that file that represents the "read/write head" of that file. This is like using your finger to scan across the lines while you read. Wherever the "head" points is the next byte to be read, or if we want to write a byte, it is the location where the next byte will be written.Once we have a file descriptor (which identifies a file), there are three basic operations we can do to modify the bytes within that file.
read(fd, buffer, size)
will readsize
bytes from the file identified byfd
into the location in memory pointed to bybuffer
.read
returns the number of bytes that were successfully read from the file, and then increments the file's read/write head by that number of bytes.write(fd, buffer, size)
will writesize
bytes starting at memory locationbuffer
into the file identified byfd
.write
returns the number of bytes that were successfully written to the file, and then increments the file's read/write head by that number of bytes.lseek(fd, index, SEEK_SET)
will change the value of the read/write head of a file identified byfd
toindex
. This allows us to "skip" around in a file and read/write from arbitrary locations (we don't always want to read/write sequentially from the start of a file).Note:
lseek
can be called on a position beyond the end of the file (EOF), though this will not change the file's size. If data is written at this position, then the file's size grows to this point and the space between between the old EOF and the written bytes should be filled with null bytes. You'll need to consider this edge case in your caching design!Making Progress
One of your early goals should be to get the
byte_cat
program working for a small text file.$ touch /tmp/out.txt # creates empty file $ make clean all $ ./byte_cat test_files/tiny.txt /tmp/out.txt $ cat /tmp/out.txt this is a test
The
byte_cat
program performs no seeks and is the most simple test case you can create. Here is a list of the functions thatbyte_cat
uses:open
,close
,filesize
,readc
,writec
. We provide all offilesize
, andopen
/close
are simple, so you just have to get your reading and writing logic down!Debugging Tips
The easiest way to spend a lot of time on this project is to just write code, or to randomly edit your code, without thinking and debugging carefully. Make sure you use the below techniques to make your debugging productive!
Note that the stencil implementation passes all correctness tests, but it is very slow! Additionally, you might find that the performance tests pass locally but not on the grading server. Tests must pass on the grading server to receive credit.
As you work on the project, you will likely break the correctness tests in your quest to improve performance.
If you fail a correctness test, the first thing you will usually want to look into is whether it failed because of a crash in your code (which you can develop using familiar tools such as GDB), or whether it ran, but produced incorrect output. When you see incorrect output file contents, it makes sense to work out how they're incorrect, by looking at the actual bytes in the input and output files! Check out the Debugging section below for some helpful tools, such as
xxd
anddiff
.Make sure your program can handle non-ASCII characters! Not all files consist of text; for example, images or videos can contain arbitrary bytes, including the
NUL
byte that has special meaning for text, but not in other data.Additionally, it can also help to look at the output from
strace
, which lists the system calls that your caching I/O library makes, to see if anything unexpected happens during the test.If your performance is poor,
strace
should be your first point of call. Poor performance in this project is almost always explained by your implementation making too many system calls. Running tests understrace
will help you understand exactly what sequence of system calls happen, and how many bytes they move into and out of the cache.Finally, you will likely want to have debug output in your library that you can disable easily when running performance tests, as
printf
-type functions slow down your program a lot. We provide the helper functiondbg()
in the stencil for this purpose: you can change the value of theDEBUG_PRINT
constant to turn the debug printing code on or off in a single keystroke.dbg()
works just likeprintf()
, but ifDEBUG_PRINT
is set to 0, the compiler will throw away the code in the function and turn it into a no-op.Consider also writing your own tests if you want to test specific behavior. For more information, look at Testing and Appendix II.
Finishing Up
Once you have a working implementation, test it with
make check
(correctness) andmake perf
(performance). You probably won't matchstdio
on all performance benchmarks yet, but you will meet the performance bar for this project if you achieve the following:stdio
on thebyte_cat
benchmarks (byte_cat
andreverse_byte_cat
).stdio
on theblock_cat
tests (block_cat
,reverse_block_cat
, andrandom_block_cat
).If you get closer to
stdio
(or even beat it on some tests), we will award extra credit.Testing
Overview
make check
tests your implementation for correctness. It does this by running all of the test programs with a variety of inputs on random files, and is split into two categories: (1) basic functionality tests, which test the core functionality of your implemented functions and their interactions with one another, and (2) end-to-end tests, which test your implementation with standard file operations (such as copying between files).Basic functionality tests
The basic functionality tests use the
io300_test
program. The program randomly generates a single file, which it then opens and performs/tests the specified functions on. Theio300_test
program verifies that eachio300_readc
orio300_read
call reads the correct bytes from the file, and verifies at the end of the test that all writes are reflected in the final file.The name of each test specifies the functions tested – for example,
readc/writec
will perform a combination ofio300_readc
andio300_writec
calls on the same generated file. You can also parse this information from the command that runs the test (io300_test
with some arguments), which specifies the test's configuration. (make check
prints this command with each test when running.) For instance, forseek_beyond_eof
, the testing script runs:which specifies that the program will randomly generate a file 0 bytes long (
-s 0
), which is allowed to grow to a size of 4096 bytes (-m 4096
), on which it will then perform 1000 (-n 1000
) calls to randomly eitherio300_readc
orio300_writec
, each preceded by a call toio300_seek
to some location between 0 and the maximum file size. Running just./io300_test
prints a description of the values of these arguments as well.Note: If you fail basic tests (e.g.,
readc/writec
) while passing end-to-end tests (e.g.,byte_cat
), the reason is almost certainly that you are not correctly handling operations when both reads and writes happen on the same file.Make sure that both read and write functions leave the cache metadata in a state that the other function can work with! (In other words, invariants need to be preserved, and – if appropriate – restored, by these functions.)
End-to-end tests
The end-to-end tests use a combination of your
io300_*
functions within larger programs. Most of these programs operate on separate input and output files, so there will be two caches (one for the input file, one for the output file).Performance tests
make perf
tests your implementation for speed and compares it tostdio
on the end-to-end tests.You will find that the correctness tests already pass on the stencil code! This is because the stencil code contains a copy of a naive implementation that always makes system calls for every operation (i.e., there is no caching).
The stencil code will not pass the performance tests. You also cannot simply hand in the stencil code and receive the 60% credit for correctness ;-)
The performance tests will run very slowly in the unmodified stencil code (e.g., on an M2 Apple MacBook Air, many of the tests take over a minute to run). You will need to make them run faster by implementing caching!
⚠️ Important note on performance ⚠️
Performance can vary across computers. The Docker container adds additional overhead when you run locally, and there are differences in architecture (Apple Silicon vs. Intel Machines) and software (Windows vs. macOS vs. Linux) that affect the speed at which the performance tests run.
We have made efforts to make the tests as consistent as possible, but please always check your performance on the grading server once the performance tests pass locally. We will ultimately grade you based on the performance on the grading server!
Working with tests
The end-to-end correctness and performance tests will run your implementation on each specified test program using a randomly-generated file of a certain size. For your own personal debugging use, we've provided you with the tools to easily generate these testing files.
Generating test data
If you wish to generate random testing data, use
make check_testdata
, which generates 80 KiB of data to test for correctness, andmake perf_testdata
, which generates 1 MiB of data to test for speed. These files will be located at/tmp/check_testdata
and/tmp/perf_testdata
, respectively.Running a single test
You can run a single test for performance as follows:
$ make -B IMPL=student $ make perf_testdata # generates 1 MiB performance test file $ time ./byte_cat /tmp/perf_testdata /tmp/testout ...
The following is an example of running a simple test for correctness:
$ make check_testdata # generates 80 KiB correctness test file $ ./byte_cat /tmp/check_testdata /tmp/testout # run test $ diff /tmp/check_testdata /tmp/testout # no output expected, as the files should be the same!
Advanced testing
For a more complex test, you may need to generate intermediate output files. See
correctness_tests.py
for examples!$ ./reverse_byte_cat /tmp/check_testdata /tmp/testout # reverse file $ ./reverse_byte_cat /tmp/testout /tmp/testout2 # reverse file again $ diff /tmp/check_testdata /tmp/testout2 # no output expected, as the files should be the same!
You may also want to write additional tests to test specific features individually of your code, particularly if you plan to implement extra credit features. For more information, see Appendix II.
Debugging
You may find the following tools (in addition to GDB and sanitizers) helpful in debugging this project.
Helpful Commands
man
displays the authoritative documentation for all functions.dd if=/dev/urandom ...
is used in our test scripts to transfer random binary data into a file.wc -c <file>
displays the number of bytes in a file.du -h <file>
also displays the size of a file and is better for large files (it gives human readable byte numbers).head -c 40 | xxd
ortail -c 40 | xxd
will give you a peek at the beginning or end of a file. This is useful for inspecting large files wherexxd
would normally flood your screen.Hexdump
xxd <file>
allows you to view the contents of binary files. This will be helpful when you have a buggy implementation and you want to see what is going wrong, by printing out what's in your result file. The code below demonstrates an example.Diff
diff <file1> <file2>
will tell you if the input files differ. This may again be helpful when you have a buggy implementation and you want to figure out where in the output file you're differing from the expected content.strace
strace
, as mentioned above, is a tool that provides diagnostic information about the system calls a program is making. This may be especially helpful when you are trying to improve the performance of your implementation.Expand for a more complete description of strace!
strace
(on Linux) allows you to view the system calls that a program makes. Here is an example of running strace on a plain hello world C program (fed in through stdin). Pay attention to the final two lines of the output, namely the call towrite()
andexit()
. Everything else you see is just the shell starting the program and can be ignored.We can use
strace
to see how frequently our programs are making calls toread()
andwrite()
as an indirect way to measure performance.The following is a demonstration of the number of these syscalls that take place when running
byte_cat
on a 755 byte long file.Notice how the naive implementation makes around 750 reads and writes (this makes sense because the naive implementation calls
read()
andwrite()
once per character) while the stdio implementation makes <10% fewer read and write calls.$ wc -c test_files/words.rot13.txt 755 test_files/words.rot13.txt $ make -B IMPL=naive byte_cat gcc -ggdb3 -Wall -Wextra -Wshadow -std=gnu11 -fsanitize=address -fsanitize=undefined -fsanitize=leak test_programs/byte_cat.c impl/naive.c -o byte_cat $ strace ./byte_cat test_files/words.rot13.txt out.txt 2>&1 | grep read | wc -l 804 $ strace ./byte_cat test_files/words.rot13.txt out.txt 2>&1 | grep write | wc -l 758 $ make -B IMPL=stdio byte_cat gcc -ggdb3 -Wall -Wextra -Wshadow -std=gnu11 -fsanitize=address -fsanitize=undefined -fsanitize=leak test_programs/byte_cat.c impl/stdio.c -o byte_cat $ strace ./byte_cat test_files/words.rot13.txt out.txt 2>&1 | grep read | wc -l 50 $ strace ./byte_cat test_files/words.rot13.txt out.txt 2>&1 | grep write | wc -l 4
Note from Your TAs: Although using
strace
may seem intimidating, it is incredibly useful in improving performance! In particular, it can help you visualize the sequence of system calls.dbg()
Check out the function:
in
impl/student.c
. Use it to debug while you are working, and then you can silence its output with one keystroke when you hand in. It acts just likeprintf()
, but it also logs your file's metadata so you can see what is happening as your program executes. When you add fields to your file structure, be sure to include them in the format string after theTODO
indbg()
. Here is an example of using it.GDB
As mentioned in previous assignments and in the debugging tips section, GDB is a very useful tool for stepping through your code. To set up gdb for a specific test, you can do the following:
gdb [test executable]
(e.g.,gdb stride_cat
)b [location]
(e.g.,b io300_read
)r [arguments]
(e.g.,r test_files/tiny.txt /tmp/out.txt
)block_cat
takes in an extra parameter of block size). You can see what parameters each test takes in by running the test executable with no parameters (e.g.,./block_cat
).You may find it helpful to print out the contents of the cache and verify that it matches the steps in your design. See the GDB guide for more useful commands.
Extra Credit
To optimize your implementation and achieve more impressive performance, you can consider some of the following extensions! If you are a graduate student taking CSCI 1310, please implement and document at least one performance optimization.
Adaptive Caching
Easy. Detect the direction of access within a file (are the reads occurring in forward or reverse) and adapting your caching strategy to pre-fetch that memory. Measure how much of an impact adaptive caching has on performance over your basic implementation by creating additional test programs in
test_scripts/correctness_tests.py
andtest_scripts/performance_tests.py
. Document how you implemented adaptive caching and its performance impact in your README file.Diabolical
byte_cat
Moderately easy. Check out the
diabolical_byte_cat
test program. This test program works similarly tobyte_cat
, but tries to throw off your cache by doing some gratuitous seeks!Edit
test_scripts/correctness_test.py
and uncommentdiabolical_byte_cat
as part of the end-to-end tests (at the bottom of the file), and do the same intest_scripts/performance_test.py
. Now optimize your code to handle the situations that this test program creates. Make sure that you still pass the correctness tests, and meet the performance bar for thediabolical_byte_cat
performance test.Detecting Strides
Moderately difficult. Have your cache detect "strides", or access patterns that jump by fixed amounts within a file. For example, reading byte 1, then byte 1001, then byte 2001, and pre-fetching appropriately. Measure how much of an impact detecting strides has on performance over your basic implementation by creating additional test programs in
test_scripts/correctness_tests.py
andtest_scripts/performance_tests.py
. Document how you implemented stride detection and its performance impact in your README file.Memory-Mapped I/O
Difficult. Research "memory-mapped I/O" and how it works, then use it in your implementation when possible. Measure how much of an impact memory-mapped I/O has on performance over your basic implementation by creating additional test programs in
test_scripts/correctness_tests.py
andtest_scripts/performance_tests.py
. Document how you implemented memory-mapped I/O and its performance impact in your README file.Handing In & Grading
Handin instructions
As before, you will hand in your code using Git. In the
fileio/
subdirectory of your project repository, you MUST fill in the text file calledREADME.md
.Remind me again what the
README.md
should contain?The
README.md
file will include the following:Grading breakdown
While the project is about speed, correctness is more important than performance!
We will not award credit for performance tests that pass if you fail the related correctness tests, but we will be generous with partial credit if you pass correctness tests and fail to meet performance goals.
To understand why, consider an implementation that always throws all data away: this would be infinitely fast and meet all the performance goals, but of course it isn't a useful caching I/O library!
byte_cat
runs must be within 10x of thestdio
runtime, and your other runs (block_cat
, etc.) must be within 5x of thestdio
runtime. Each benchmark is worth 6.5 points, and you get 1 point for passing them all. We will award partial credit if you miss the target but improve over the naive implementation. You will not receive credit for good performance on benchmarks that fail the correctness tests; it is more important to be correct than to be fast.stdio
).Graduate students taking CSCI 1310 should implement and measure one performance optimization, and describe it in their README. Make sure to cover, in a few sentences, what the optimization is, why you expected it to improve performance, and whether it did or did not improve performance in your experiments.
Now head to the grading server, make sure that you have the "Caching I/O" page configured correctly with your project repository, and check that your tests pass on the grading server as expected.
Note: If the commit you want graded is not your most recent one, you should flag the one you want graded and the grading commit will be updated after the autograder finishes.
Congratulations, you've completed the third CS 300 project!
Appendix I: Implementation Swapping
This section explains how our test programs can use different implementations that match the
io300.h
API.Consider the following to be our test program:
Now we want an implementation of the
getRandomNumber
function that we can link with the test program to produce a fully functional number printer.Now when we compile the main test program, we have a choice. Since both implementations conform to the same public API (namely, they provide one function
getRandomNumber :: void -> int
), either can be swapped in and the main program won't know the difference.This is the principle behind the different implementations for this project, but instead of the public API being a single function, it is a family of IO related functions defined in
io300.h
At a high level, to test your implementation, we will be doing something like this:
to compare the speed of your implementation against the provided ones.
Appendix II: Writing Custom Tests
While you are certainly not required to design your own tests—passing the autograder tests we provide is more than enough—this assignment has a lot of moving parts! This appendix will help you write custom tests, which you may want to do for testing specific features (e.g. writing) without the requirement of everything else working.
Let's create a test program,
student_test.c
which opens a file, checks its filesize, and writes out an equivalent number of Zs to an output file.To create this custom test:
student_test.c
, in thetest_programs
folder:In order to create the binary for this test, add its name (for this example,
student_test
) toTEST_PROGRAMS
in the project Makefile. This will make sure this new test gets built along with the existing autograder tests whenmake
is runA bad test can be very misleading, leading you to think that your code is failing when in actuality it may be completely correct. Try running your test manually on sample files from
test_files
, and triple check that everything seems to work as intended.This project has autograders for both correctness (
test_scripts/correctness_test.py
) and performance (test_scripts/performance_test.py
). Let's add our test to the correctness autograder, as we're testing the results of our write functionOur autograder runs each provided test, and asks two questions: did it pass (i.e. returned 0), and is the output actually correct?
Let's add a new Python function for our test:
You can see the two parts of our test above. Did it pass? Well,
shell_return(f'./student_test {infile} {outfile}') == 0
means an error wasn't raised. Is our output correct? If our manual and generated "Z-files" are the same, we can trust that it is:files_same(outfile, outfile2)
.Finally, we can add our new test to our
runtests
block at the bottom of our autograder file, which handles the execution of each test:Huzzah: your test is complete! Running
make check
should provide the results of your custom test, which hopefully passes :)This test was just one sample test you could write. All of our autograder tests are available in
test_scripts/
, so feel free to reference them in creating your own. Further, the process for adding performance tests is very similar; referencetest_scripts/performance_test.py
to see the specific syntax used for its tests!Be careful when modifying tests or the autograder, as changing existing tests could lead to misleading autograder results.
Acknowledgements: The IO project was developed for CS 300.