<style>
details {
margin-top: 0.5em;
margin-bottom: 0.5em;
}
summary {
//font-weight: bolder;
}
summary:hover {
text-decoration: underline;
}
blockquote {
font-size: 16px;
}
.todo {
color: #ff00ff;
border: 2px dashed #ff00ff;
padding: 0em 1em;
border-radius: 5px;
margin-top: 1em;
margin-bottom: 1em;
//display: none; // UNCOMMENT TO HIDE TODOs
}
</style>
**[« Back to the main CS 300 website](https://csci0300.github.io/)**
# Lab 5: Processes
**Due:** **Thursday, April 17th at 8:00 PM EDT**
---
# Assignment
:::warning
**Warning:** This lab--particularly Part 2, which involves pipes--has historically been one of the more challenging labs to debug.
We recommend starting on Part 2 shortly after we cover the relevant content on pipes in lecture so that you have time to debug issues that come up.
:::
## Assignment Installation
First, open a terminal in your container environment and `cd` into your `labs` folder, then ensure that your repository has a `handout` remote. To do this, type:
```bash
$ git remote show handout
```
If this reports an error, run:
```bash
$ git remote add handout https://github.com/csci0300/cs300-s25-labs.git
```
Then run:
```bash
$ git pull
$ git pull handout main
```
This will merge our Lab 5 stencil code with your previous work.
:::warning
**You may get a warning about "divergent branches"** like shown below:

This means you need to set your Git configuration to merge our code by default. To do so, run (within the same directory in your labs repository workspace):
```console
$ git config pull.rebase false
$ git pull handout main
```
:::
If you have any "conflicts" from Lab 4 (although this is unlikely), resolve them before continuing further. Run `git push` to save your work back to your personal repository.
## C++ Tips
In both parts of this assignment, you will be using some C++ that might differ slightly from what has been previously used in C.
**Arrays:** To allocate memory for C++ arrays on the heap, you should use the `new` keyword. To free that memory, you would use the `delete` keyword. This is shown in the next section, which describes the function of the `new` and `delete` keyword.
**New/Delete:** C++ lets you allocate heap memory via the new operator, which takes the name of a class (or struct) to instantiate an object in heap memory. For example, to make a heap-allocated array of ints, you would write:
```cpp
dataType* heapArray = new dataType[size];
delete[] heapArray;
```
The `new` keyword is essentially the C++ equivalent of mallocing memory and instantiating a object.
When you no longer need memory created with `new`, you use `delete` to destroy it. In this specific case, `delete[]` is used to delete dynamically allocated arrays.
**Vectors:** The `std::vector` data type from the standard C++ library stores elements of similar data types in an array-like format. However, unlike the array, the `vector` data type can grow dynamically, similar to an `ArrayList` in Java. You can declare an empty vector variable and add elements as shown:
```cpp
std::vector<int> heap_vec;
heap_vec.push_back(1);
heap_vec.push_back(2);
heap_vec.push_back(3);
```
## Part I: Simple Shell
In the first part of this lab, you will write the process launching code for your own shell, called `300sh`. `300sh` is similar to the shell you type your commands into (a program called `bash`) while running your Docker container, but it's a lot simpler.
For this part, you will work in the `shell` directory of the lab 5 code. When running commands for this part, be sure to `cd` into the `shell` directory first.
### Introduction: fork and execv
Your shell needs to be able to start new processes via system calls into the OS kernel. For this purpose, you will use the `fork` and `execv` system calls discussed in lectures.
**What's the deal with `execv`?** Thus far, you have primarily written programs which execute a series of instructions, often organized into functions. However, sometimes you'd like to write programs that have the ability to execute other programs. This can be done on Linux with the `exec` system call. When one program executes another using `execv`, the current process image is completely replaced with that of the new program. The process image comprises of the virtual address space of the current process, including the stack, the heap, the global variables (`.data`) and the code (`.text`) sections. All of these are replaced by the corresponding sections in the new program. If `execv` is successful, the original program will never be returned to as that process has been completely overwritten!
<details><summary>Here's an example usage of <strong><code>execv</code></strong>.</summary>
Try running this program in your Docker container with the file name `example.cpp`.
```cpp=
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main() {
char program_name[10] = "/bin/echo";
const char* arg1 = "hello";
const char* arg2 = "world";
const char* newargv[] = { program_name, arg1 , arg2, NULL };
// this will replace the process image with the image of
// executable `/bin/echo`and arguments newargv
execv("/bin/echo", (char* const*)newargv);
// should only get past here if execv() returns an error
perror("execv failed");
exit(EXIT_FAILURE);
}
```
Running this program will produce this output:
```shell
$ gcc -o example example.cpp
$ ./example
hello world
```
Try changing the first argument of `execv` on line 12 to an invalid program name (ex: `/bin/hello`) and see what happens.
</details>
<br />
**What's the deal with `fork`?** What if you want to execute a new program while still being able to return to the original, rather than overwriting it with the new program? This is where the ability to create multiple processes becomes useful. The `fork` system call will create a new process (the "child process") whose address space is a copy of its parent process’s address space. However, while they map the same virtual addresses, these two processes run in separate address spaces, so changes to one do not affect the other!
<details><summary>Here's an example usage of <strong><code>fork</code></strong>.</summary>
Try running this program in your Docker container or `repl.it` with the file name `example.cpp`.
```cpp=
#include <stdio.h>
#include <unistd.h>
int main() {
printf("I am in the parent process.\n");
int x = 5;
pid_t child;
// fork returns 0 to the child process
// and the child's process id (pid_t) to the parent process
if ((child = fork()) == 0) {
// only the child process can reach within this if statement //
x = x + 1;
printf("I am in the child process: x is %d.\n", x);
}
printf("I am outside the if statement.\n");
printf("Child is %d and x is %d.\n", child, x);
}
```
Running this program will produce an output similar to this:
```shell
$ gcc -o example example.cpp
$ ./example
I am in the parent process.
I am outside the if statement.
Child is 1753 and x is 5.
I am in the child process: x is 6.
I am outside the if statement.
Child is 0 and x is 6.
```
==Note:== Try running the program in your Docker container. (In online environments like repl.it, you may not see all the output.)
Since the child process is a copy of the parent's address space, it will perform the same instructions as its parent starting on line 10, and have a copy of its parent's stack, heap, and data (e.g., it has access to a copy of variable `x`). `fork` will return a different value depending on whether it's in the child or parent process -- allowing us to execute different instructions in the child and parent process (e.g., the parent process can't execute instructions within the `if` statement, but the child can). Currently, the child can also execute instructions after the `if` statement. Let's change that!
**Optional Task 1:** Immediately after line 13, add the line `exit(0);`, indicating to exit from the child process with return code 0. Also, include `#include <stdlib.h>` as a header.
**Optional Task 2:** Immediately after line 14, declare a local integer called `status` and then add the line `waitpid(child, &status, 0);`, telling the parent to wait on the child to terminate before it continues executing instructions. Also, include `#include <sys/wait.h>` as a header.
</details>
### Assignment Specifications
For this part of the lab, you will be completing the provided implementation of a simple shell by adding functionality for executing programs. You are provided with the file `sh.cpp` and its corresponding makefile. Currently, `sh.cpp` contains a read-evaluate-print loop (REPL) which reads user input, and parses it into individual tokens. If you run `./300sh` after `make`, you'll see that the shell currently doesn't do anything, except for the builtin commands `exit` and `cd`.
```shell
$ ./300sh
300sh> /bin/ls
300sh> yo
300sh> hello
300sh> /bin/ls
300sh> cd /bin
Current directory is now /bin
300sh> exit
```
Your task is to extend this functionality to be able to execute any arbitrary program (or fail if the program does not exist). This should all be done within the `do_fork` function in the stencil.
When a process, in this case the shell, executes another program, the current process is completely replaced by the program it executes. In order for your shell to continue execution after running another program, it needs to create a new child process in which the new program can run as to not replace the shell’s execution. As such, if you create a new process (using `fork`) in which to execute a different program (using `execv`), the original program will remain preserved in the parent process and its execution can continue rather than be replaced.
::::success
**Task:** Implement the `do_fork(char *args[])` in `sh.cpp`. This function should:
1. Fork into a new child process.
2. The child process should execute the desired command (the first element in `args`), with the desired arguments (the following elements in `args`).
3. The parent process should wait on the child process to terminate before executing the next command.
The argument of `do_fork` is `args`, is such that:
* the first argument is the full path to the command to be executed
* the following elements are any arguments to that command
* the last argument is `NULL` or `nullptr` (meaning that args is null terminated)
For example, if `args` were the array: `{"/bin/echo", "hello", NULL}`, `do_fork` would create a child process to execute the program `/bin/echo` with argument `hello`.
As you work on your implementation, here are some important things to keep in mind:
1. The way arguments work for `execv` is a bit quirky. Read the following for more info (and see the examples [here](#Introduction-fork-and-execv)):
<details><summary> <b>Background on how the arguments of <code>execv</code>work</b></summary>
The `execv `system call takes the following as arguments as arguments:
1. The full path to the desired command
2. An array of strings in which the first element is the program name (this could just be the full path) and the following elements are arguments to the command.
Where can you find these components in the arguments passed to `do_fork`?
</details>
2. Check out the example usages of `fork` and `execv` in [this section](#Introduction-fork-and-execv) for more info
3. Make sure to handle the case where execv fails!
4. When testing: you will need to enter commands
::::
When implementing this part of the lab you may find the [fork](#Fork), [execv](#Execv), and [wait or waitpid](#Wait-and-Waitpid) system calls useful. Please read the [Relevant System Calls](#Relevant-System-Calls) section for more details.
### Running and Testing
To build and run your code for part 1, do the following:
- If you have not done so already, `cd` into the `shell` directory
- Run `make`
- Start your shell with `./300sh`
:::warning
**What to expect when testing (and important limitations!)**: When your shell is working, it should have the same basic functionality as a regular shell. However, **our shell is missing an important feature, which you need to know about for testing**: normally, when you type a command like `ls`, your shell searches some directories on your system to find the program called `ls`. Our shell does not have this feature, so instead you need to type the full path to the the program instead.
**For example, instead of typing `ls`, you would type `/bin/ls`**, which is the path to the `ls` program. In most cases, you can guess the full path to a program by adding `/bin` or `/usr/bin` to the front of the path (e.g., `/bin/echo`, `/usr/bin/make`, etc.)
<br />
If you do not know the full pathname of a program want to execute, you can use the command `which` in a standard terminal. For example:
```
$ which cat
/bin/cat
$ ./300sh
300sh> /bin/ls
300sh Makefile builtin.h builtin.o sh.cpp sh.o
300sh> /bin/cat builtin.h
... prints the contents of builtin.h ...
```
:::
To get a sense of whether or not your shell is working properly, you can compare the output of your shell to a standard bash terminal for some simple commands such as the following:
* `cat <file>` (run as `/bin/cat <file>`)
* `echo <args>` (run as `/bin/echo <args>`)
* `ls` (run as `/bin/ls`)
* `/usr/bin/apt-get moo`
Note that `300sh` supports passing arguments to programs, but does not support output redirection (e.g., `/bin/ls > /tmp/outfile`).
:::success
**This is the end of part I!** Now, head to the grading server and run the checkoff for Lab 5.
You should see the "shell" test pass. Don't worry about others failing or partial credit; if you pass the "shell" test, you've done what you need to do for Part I of the lab.
:::
## Part II: Reverse Index (using multiprocessing and pipes)
:::warning
**Note**: we recommend waiting to start this part of the lab until we have covered pipes in lecture (usually Lecture 20), which will provide important conceptual background! Diving into the code before you have an understanding of the concepts is likely to cost you more time in the long run!
In lecture, we will let you know once we have covered a sufficient amount of content to begin this part.
:::
### Introduction
One common paradigm of programming, perhaps the one with which you are most familiar, is sequential programming. Sequential programs execute instructions one at a time in a specific order. With the aid of multiple processes, however, we can also have **concurrent** programs. Concurrent programs execute in parallel, either on multiple processors in your computer or in a time-sliced way on a single processor.
You have experience writing concurrent programs using `fork` already! `fork` creates a new process in which a distinct task can be executed in parallel with existing processes.
One benefit to concurrent programming is that it can improve program performance by splitting one big task into smaller subtasks which can be executed at the same time rather than sequentially (assuming you're running on a multiprocessor computer). However, since multiple processes exist in distinct virtual address spaces, the data in one process’s address space is not visible to any other process. This becomes a problem when using multiple processes to speed up a task, as you will eventually want to pool all child process results together.
In this part of the lab, you will learn how to share data between processes via pipes. A **pipe** is a one-way channel in which data can be sent and received between processes. On Linux, it is represented as an array of two *file descriptors* with the first (element 0) representing the file descriptor for the *read end* of the pipe, and the second (element 1) representing the file descriptor for the *write end* of the pipe. Data can be read from and written to pipes using the standard `read` and `write` system calls.
### Specifications
For this part, you will use multiple processes to improve the performance of a *reverse index* program. The basic idea of a reverse index is to take in a word and produces a list of the occurrences of that word from some known corpus of documents.
Reverse indexing is the essence of how a search engine like Google works! When you type a keyword into Google, the Google servers look up your keyword in a giant reverse index, finding the web pages that mention this keyword. It then ranks the results according to a highly secret algorithm.
Google's reverse index is split over many servers, and speeds up lookups using concurrency. For this assignment, you will build a more modest, single-machine reverse index, but concurrency will still help speed it up!
For this part of the lab, we provide you with an already-working version of a reverse index program that runs sequentially called `revindex_sequential` (defined in `revindex_sequential.cpp`). Your job is to implement a faster version (`revindex_parallel`, defined in `revindex_procs.cpp`) that takes advantage of concurrency by running multiple processes.
#### Example: using the reverse index (sequential version)
All of the code for the this part is located in the `multiProcs` directory. To give you a sense of what the output should look like, here's an example of using the (slow) sequential version to search for occurrences of the word "peace" in our corpus of classic novels (provided in the stencil in the `books` directory):
```console
# Start from lab directory
$ cd multiProcs
$ make
[ . . . ]
$ ./revindex_sequential books peace # Search for "peace" in all files in books directory
Found 305 instances of peace.
peace found in books/A Christmas Carol in Prose; Being a Ghost Story of Christmas by Charles Dickens.txt at locations:
Index 5608: rest, no peace. Incessant torture
peace found in books/The Adventures of Tom Sawyer by Mark Twain.txt at locations:
Index 5743: was at peace, now that
Index 7544: held his peace, for there
Index 11064: of the peace; the widow
Index 26252: soul at peace again; for
Index 29805: repose and peace in the
Index 36050: grumblings, and peace resumed her
Index 36497: first making peace, and this
Index 36509: pipe of peace. There was
Index 45431: of the peace, who was
Index 57224: of the peace that jugged
Index 62303: somewhat of peace and healing
peace found in books/Frankenstein; Or, The Modern Prometheus by Mary Wollstonecraft Shelley.txt at locations:
Index 5505: repose in peace. I understand
Index 18298: now at peace for ever.
[ . . .]
```
Your goal is to use multiple processes and pipes to implement a version of the same program (`./revindex_parallel`, defined in `revindex_multiprocs.cpp`).
#### Stencil organization
In `multiProcs` directory of the lab stencil, we give you the following files:
* `wordindex.cpp`: This file contains all shared functions and data structures. This includes the `wordindex` class for storing the results of a search, all functions involving reading files and directories, and functions to print out the results. You should not modify anything in this file, but you will definitely want to look at the functions and data structures it contains since you will have to use them.
* `revindex_sequential.cpp`: This file contains a single-threaded, sequential implementation of a reverse index search. You should use this as a baseline for the performance of your reverse index, and you may want to look at the code to get an idea of how the reverse index should work. However, you should not modify anything in this file.
* `revindex_procs.cpp`: This file contains a multiple process implementation of a reverse index search. This program should take advantage of parallel execution by creating multiple processes such that the reverse index of a term can be calculated for multiple files at once. Each file should be searched by its own process. **You only need to modify this file to complete the lab.**
In addition, we also provide some input files that are helpful for testing.
* The directories `books` and `tiny` contain input files to use for testing.
* The `expected_output` folder contains the expected output for searching for certain words on both the `books` and `tiny` corpuses, to check your output without waiting for runs of the sequential version.
See the [Running and Testing](#Running-and-Testing) section for more details on how to use the test files.
::::success
**Task:** Implement the following functions in `revindex_procs.cpp`. Below is a brief overview of what each function is responsible for. The comments in the stencil provide a more detailed explanation of each function.
<details><summary> <b>process_input</b></summary>
This function is responsible for setting up the pipes for each child process, creating the child processes, pooling the data from each child process, and waiting for them to complete.
</details>
<details>
<summary> <b>process_file</b></summary>
This function should be run by the child processes to search for occurrences of a given term in the child process’ delegated file (see `wordindex.h` for functions to use for this). This function is also responsible for writing the search results for the file a child process searched back to the parent process through a pipe.
</details>
<details>
<summary> <b>read_process_results</b></summary>
This function should be run by the parent process to read the results from a child process through a pipe.
</details>
When implementing this part of the lab, you may find details in the [pipe](#Pipe) section of the handout useful.<br />
<details>
<summary> <b>Hints</b></summary>
<ul>
<li> If you're getting a timeout error, its not because multiprocessing is slow! Think about where in your code reading from a pipe could be blocked. How might the amount of data written effect this? Consider the case of trying to write a huge amount of data.
<li> Be careful of how you manage your processes. You'll want to know that all of your processes execute successfully, but keep in mind the purpose of this assignment: concurrency!
<li>Each process you fork will be responsible for processing one file, but each iteration of the loop in `process_input` starts several (up to <code>workers</code>) processes. You will need to ensure that you're passing the right filename from the <code>filenames</code> vector to each process.
<li>If you are confused about how the pipes between the parent and child processes work, it may help to sketch a picture of which process runs what code and what data flows from one process to another.
<li> Recall that different processes live in different address spaces. When a worker process sends back results through a pipe, it cannot send the <code>wordindex</code> data structure directly (Why not? Think about where the memory backing the vectors it contains is allocated). You will need to <em>serialize</em> the data before writing it to the pipe; check out the helper functions we provide!
<li> The serialize/deserialize helpers we provide only serialize (encode) some members of the <code>wordindex</code> object. You will need to appropriately set the others after deserializing!
<li> Since the parent process does not know how many results a child worker process found, you'll face the question of how many bytes to read from the pipe using the <code>read</code> syscall. Check out the examples we provide with the explanation of <code>pipe</code> for some ideas on how to determine this quantity.
<li> If you're debugging an issue, it may help to add logging to your code that prints the process ID (available via the <code>getpid()</code> system call) and what that process is trying to read or write to/from a pipe.
</details>
::::
### Running and Testing
You can test your implementation using examples like the following. We recommend testing your work by starting small: first use one worker process using the `--workers` argument and use the "tiny" corpus, then go up to the full "books" corpus and use the full 8 worker processes:
```
$ make
# Example: search for "world" in the "tiny" corpus (1 worker process)
$ ./revindex_parallel --workers 1 tiny world
$ make
# Example: search for "world" in the "tiny" corpus (8 worker processes)
$ ./revindex_parallel tiny world
# Example: search for "peace" in the "books" corpus
$ ./revindex_parallel books peace
```
**Testing correctness**: To test your output, you can compare the output against a similar run of the sequential program, or compare against any of the output files in the `expected_output` directory. Your output should match the expected output exactly when search for the same word with the same corpus.
Here's an example of how you can check your output for differences using the `diff` command:
```shell
# Run a search and write to out.txt
$ ./revindex_parallel tiny world | tee out.txt
diff expected_output/tiny/world.txt out.txt # Compare to expected result for this search
# No output means that there were no differences!
```
If you check your results with the `diff` command, `diff` should return no output when the files are equal--this is what you want!
If you'd like to check the expected results for any other search terms not present in `expected_output`, you can run the sequential version on the same term and corpus and compare the output, like this (here, searching for the word "potato"):
```shell
$ ./revindex_sequential books potato > potato-expected.txt # Run expected version
$ ./revindex_parallel books potato > potato-actual.txt # Run your version
$ diff potato-expected.txt potato-actual.txt # Should show no differences
```
*Fun fact: The `expected_output` directory is just a cache of some test runs from the sequential version that we selected--so you don't need to run the sequential program a bunch of times and wait for the output!*
**Testing runtime**: Additionally, once you have a working reverse index using parallel processes, you should find a performance improvement over the sequential implementation. To verify this, you can run the `time` command on both implementations as follows:
```console
$ time ./revindex_parallel books world
$ time ./revindex_sequential books world
# [the `real` time should be slower for sequential than parallel]
```
## Relevant System Calls
Below you will find a list of functions that you may find useful for this lab. Feel free to also view the man pages if you would like more details:
### Fork
::::info
<details><summary>fork()</summary>
* The fork system call is used to create a new process. It takes no arguments and returns a `pid_t` (an integer value)
* From the perspective of the child process, fork() returns 0
* From the perspective of the parent process, fork() returns the process id of the newly created child
* Recall that the address space of new child process is an exact copy of the parent until `execv` is called. Among other things, this means that child process has its own copy of all variables set by the parent prior to the call to fork()!
* An example of fork’s usage is as follows:
```c
int pid;
if((pid = fork()) == 0){
/* this is the child */
}else{
/* this is the parent */
}
```
</details>
::::
---
### Execv
::::info
<details><summary>execv(const char<em> path, const char</em> args[])</summary>
* The execv system call is used to execute a program. In doing so, it completely replaces the image of the current process with that of the new program
* Note: this means that, when successful, execv does not return since the program that called it has been replaced and is no longer running
* Arguments:
* The first argument to execv should be the path to the desired program
* The second argument to execv is an array beginning with the name of the program, followed by any arguments to that program, and terminated by `NULL`. An example arguments array is as follows:
```c
char* args[] = {"cat", "file.txt", NULL};
```
* An example of execv’s usage is as follows:
```c
char* path = ""/bin/cat";
char* args[] = {"cat", "file.txt", NULL};
execv(path, args);
/* this segment will only be reached if execv
failed, so you know there must be some kind of
error to report */
perror("execv");
exit(1);
```
</details>
::::
---
### Wait and Waitpid
*It is often necessary for a process to wait for the child process to terminate before continuing execution. To accomplish this, you can use either wait or waitpid.*
::::info
<details><summary>wait(int* status)</summary>
* The wait system call will return once **any** child of the calling process terminates
* wait has one argument: a pointer to an integer, which will contain the status of the terminated child once wait returns (this argument can be NULL if you do not care about the child's status)
* wait returns the process id of the terminated child process on success and -1 on failure
</details>
::::
::::info
<details><summary>waitpid(pid_t pid, int* status, int options)</summary>
* The waitpid system call returns once the child process specified by the pid argument has changed state as dictated by the options argument.
* By default (when options=0) waitpid will return when the child process terminates, but other options can be specified.
* The status argument will hold the exit status of the terminated child, which can be `NULL` if this value is not needed
* waitpid return the process id of the terminated child process of success and -1 upon failure
</details>
::::
---
### Pipe
::::info
<details><summary>pipe(int fd[2])</summary>
* The pipe system call creates a one way channel through which processes can share data.
* Upon return, the fd argument will contain 2 file descriptors (integers).
* fd[0] refers to the side of the pipe from which data can be read.
* fd[1] refers to the side of the pipe to which data can be written.
* Data that is written to fd[1] is buffered by the kernel until a read from fd[0]
* Below are two examples of using the pipe() system call:
```c
/*** Example 1 ***/
int fd[2];
/* set up the pipe */
if(pipe(fd) == -1){
perror(“pipe”);
exit(1);
}
if(!fork()){
/*child process*/
char *message = “Hello World!”;
/* the message is written to the
* write end of the pipe */
write(fd[1], message, strlen(message));
/* close both ends of the pipe in
* the child process */
close(fd[0]);
close(fd[1]);
exit(0);
}else{
/*parent process*/
char buf[256];
/* read the message written to
* the pipe into the buffer */
int loc = 0;
while(read(fd[0], &buf[loc], 1) > 0){
loc++;
}
fprintf(stdout, “%s\n”, buf);
/* wait for the child to terminate */
wait(NULL);
/* close both ends of the pipe in
* the parent process */
close(fd[0]);
close(fd[1]);
}
/*** Example 2 ***/
int fd[2];
/* set up the pipe */
if(pipe(fd) == -1){
perror(“pipe”);
exit(1);
}
if(!fork()){
/*child process*/
char *message = “Hello World!”;
int len = strlen(message);
/* write the size of the message
* that will be written to the pipe */
write(fd[1], &len, sizeof(int));
/* the message itself is written to
* the write end of the pipe */
write(fd[1], message, len);
/* close both ends of the pipe in
* the child process */
close(fd[0]);
close(fd[1]);
exit(0);
}else{
/*parent process*/
int size;
/* read from the pipe, the first item
* read will be the size of the message
* since data is read in the same order
* as it is written */
read(fd[0], &size, sizeof(int));
char *buf = new char[size];
/* read the message written to the
* pipe into the buffer */
read(fd[0], buf, size);
fprintf(stdout, “%s\n”, buf);
/* wait for the child to terminate */
wait(NULL);
/* close both ends of the pipe in
* the parent process */
close(fd[0]);
close(fd[1]);
delete[] buf;
}
```
==**Note:**== The second example first writes the size of the data and then the data itself. Since you know the order in which data is written to the pipe, you know that if you first read sizeof(int) bytes, you will read the length of the remaining portion of the data (the message). You then know by how much to read so that you can read the entirety of the message in just one more call to read.
In the first example, data is read from the pipe one byte at a time until read returns 0 (there is no more data left to read). In this example, passing data through the pipe results in 14 system calls, 1 write to the pipe, 12 reads to get the whole message and 1 more read to hit EOF. In contrast, the second example passing data through the pipe only uses 4 system calls: 2 writes and 2 reads. System calls are costly since they involve handing over control to the OS. As such, you will generally find your programs to be more efficient if you cut down on system calls.
</details>
::::
## Debugging Resources/ GDB
### Debugging Multi-process Programs:
If you use GDB to help you debug this lab (as is always recommended!), it is important to note that the default behavior of GDB is to not follow the execution of child processes after they are started with fork. As such, if you set a breakpoint on a line that executes within a child process, GDB won’t break on that line.
However, you can change this behavior by running the GDB command `set follow-fork-mode child`. This tells GDB to follow the execution of the child process after a fork, and leave the parent process to run freely (but not followed). After setting this, GDB will hit breakpoints you set within a forked process but not any breakpoints set in the parent process beyond the call to fork. If needed, run `help set follow-fork-mode` within GDB for more details.
---
# Handin instructions
Turn in your code by pushing your git repository to `github.com/csci0300/cs300-s25-labs-YOURUSERNAME.git`.
Then, head to the [grading server](https://cs300.cs.brown.edu/grade/2025). On the "Labs" page, use the **"Lab 5 checkoff"** button to check off your lab.