5. What you should know#

From having taught operating systems for multiple years, we have learned two things: you have learned this material in other courses, and you probably got away without really mastering it. For example, many of you have never really written a proper makefile to automate compilation, and most of you have debugged programs without mastering the debugger.

Before taking this course, you should have good familiarity with:

  1. Unix shells

    • In this section we will introduce several shell commands that you will see and use very frequently. We encourage you to experiment with all of the commands and examples covered in this section! It is by far the best way to become comfortable with the shell.

  2. editors

    • In this section we will introduce two terminal text editors that are frequently used on Unix machines. Learning to write code on these editors are essential without access to modern IDEs with GUIs like Visual Studio Code.

  3. compiling

    • As the size of your projects increase, the need for automation via Makefiles also increase. You don’t want to be typing the compile commands for all of your files or repeatidly pressing the up arrow for previous compile commands after every change.

  4. git

    • Version control is important to keep records of long-standing projects and to clearly communicate changes to other developers working on the same project.

  5. GDB debugger

    • Print statements won’t show everything going on in a process. In addition to printing variable values, GDB allows more powerful debugging tools such as changing the value of variables at run time, switching between threads, examining CPU registers, etc.

  6. C programming language

    • It is the most used programming language for operating systems. By mastering the language, you will reduce the number of syntax errors, and can focus more on debugging the logical errors.

  7. unit tests

    • They are not only used to determine whether or not a program runs correctly, but also to test whether or not you understand how the system is supposed to work. They are really helpful for edge cases. Get rid of the “guess and check” mindset and write some tests.

We can guarantee that you won’t be successful if you do not master your tools because the programming assignments for operating system courses tend to me more demanding. We recommend this book by Jonathan Appavoo as a more detailed reference on most of what we cover here.

5.1. Shell#

The shell is a basic interface that allows users to communicate with the kernel and any installed programs. Whenever you open a terminal, the kernel starts an instance of a shell program, or shell “session”. When most people interact with computers, they do so through graphical user interfaces that they navigate using a mouse or their finger. The shell is completely text-based and designed for programmers. It has its own programming language of special commands to access kernel functionalities. For more information about terminals and shells, please refer to the relevant sections of the following textbook.

5.1.1. Some basic shell commands#

5.1.1.1. Help and information#

5.1.1.1.1. man <command>#

This is probably the most important command for students new to Linux and interacting with computers via the shell. The man command (short for “manual”) allows you to access documentation about all programs installed on a system. For example, to view documentation for the man command itself, you simply type man man to open its man page. You can scroll through man pages using the arrow keys or ctrl+f (move forward one page) and ctrl+b (move backward one page) and exit back to the command line by typing q. For more information about any of the other commands discussed in this chapter (or any chapter for that matter), consult these man pages!

5.1.1.1.2. apropos <keyword>#

This is probably the second most important command for students new to Linux and interacting with computers via the shell. If you type man apropos, its description is

apropos searches a set of database files containing short descriptions of system commands for keywords and displays the result on the standard output.

In other words, it searches all man-able programs for the keyword, and displays it on the screen. This can be useful if you know that a program description has a certain keyword or functionality, but not know the specific name of it.

5.1.1.3. Creating, viewing, and manipulating files and directories#

  • touch <desiredfilename>: create a new file.

  • cat <filename>: print contents of a file to the terminal.

  • mv /path/to/<filename> /desired/path/to/file/: move a file to a different directory. To move a directory instead of just a single file, use the flag -r.

  • cp <filename> <filecopyname>: make a copy of an existing file.

  • mdkir <desireddirectoryname>: create a new directory.

  • emacs <filename>: open the file “filename” in the EMACS editor. More on this in the following section.

  • wc <filename>: print newline, word, and byte counts for a file. To see just the word count, add the flag -w.

Commands as files within the path list: interesting resource to help with understanding of shell commands with an exercise you can follow!

5.1.1.4. Miscellaneous#

5.1.1.4.1. echo#

Display a line of text to standard out.

$ echo "Hello world"
Hello world
5.1.1.4.2. ctrl+c#

This will terminate whatever process/command is currently running in the shell. Great if you think your code may be stuck in an infinite loop or is just taking much longer than you thought it would to finish running.

5.1.1.4.3. ctrl+l#

This will clear your terminal and move the prompt up to the top of the screen.

5.1.1.5. Symbols#

5.1.1.5.1. |#

This is known as the “pipe” symbol and is used to redirect the output of one command into the input of a second command. For example, if I type ls .. | wc -w, the total number of files/directories in the .. directory will be printed to the command line.

5.1.1.5.2. >#

This symbol is used to assign a redirect-out to a command. For example, if I type: echo "Hello world" > hw.txt, instead of printing “Hello world” to the terminal, the right carrot symbol would direct that output into the file “hw.txt”.

5.1.1.5.3. <#

This symbol is used to assign a redirect-in to a command. For example, we can type cat < file.txt. In this situation, this is equivalent to cat file.txt.

5.1.1.5.4. &#

Typing the ampersand symbol & at the end of a command will result in that command being run in the background. This means that you will immediately see another prompt even if the command you run has not finished running.

5.2. Editors#

Terminal-based text editors allow us to write and edit files when we don’t have access to a GUI (e.g., when we are SSHing into a remote computer). VIM and EMACS are most common. We are requiring that students use EMACS for this course, and those who use VIM will be penalized.

5.2.1. EMACS Basics#

EMACS is a slightly more familiar-looking text editor that is valued for its extensibility. It is extremely configurable with a lot of tools and packages available.

To open a file in EMACS, type emacs <filename>. Unlike in VIM, you will immediately be able to write in/edit thew file. Once you are finished editing, you type ctrl+x followed by ctrl+c to save and exit. It will prompt you to confirm that you want to save and quit. Typing y and then enter will return you to the terminal.

EMACS has a built-in tutorial you can follow. Simply type emacs in the terminal and then scroll down to the tutorial using the arrow keys and press enter. This will take you through EMACS basics.

Here are some navigation commands you might find useful (where C- denotes ctrl+):

  • C-v: go forward one page

  • C-f: move forward one character

  • C-b: move backward one character

  • C-n: move to next line

  • C-p: move to previous line

  • C-a: move to the beginning of a line

  • C-e: move to the end of a line

  • C-g: stop a command that is taking too long to execute

5.2.2. Vim Basics#

Vim is a “simple to use, hard to master” text editor that is valued by vim enthusiasts for its coding effeciency.

To open a file in vim, type vim <filename>. When you enter vim, you will automatically be in normal mode where you can navigate your cursor. To insert text, type i to enter insert mode. To save a file, first press the esc key to enter normal mode, next press : to enter command mode, lastly press w and enter to save. To quit, enter command mode and enter q.

5.3. Makefiles#

Hopefully, you are familiar with the process of compiling c files into executables to be run. Usually, this looks something like:

gcc filename.c -o filename

This would compile the c file “filename.c” into an executable called “filename”, which we could then run by typing ./filename.

This seems easy enough, but having to type out commands like this can quickly become quite cumbersome when we have many different commands we want to run on different combinations of files or a lot of flags we want to use. In these more complex situations, we can use make. Make automates running commands on files when they have changed and is commonly used in “building” programs.

To use make, you must write a file called a makefile that describes the relationships between the files in your program and provides commands for updating each file. Usually for c programs, the executable file is updated from object files (.o files), which are in turn made by compiling source files (.c files). Once you have written your makefile, you can just run the shell command make and it will perform all necessary recompilations. make knows which files need to be updated based on the last-modification times of the files. You can also provide command line arguments to make to specify which files should be recompiled and how.

5.3.1. Makefile Basics#

A simple makefile consists of rules with the following syntax:

target ...: prerequisites ...
    recipe
    ...
    ...

The target is typically the name of a file that is generated by a program (e.g. an executable or object file). It can also be the name of an action to carry out, like “clean”.

The prerequisite is a file that is used as input to create the target. It will often be multiple files.

The recipe is an action that make carries out. A recipe may have more than one command. There must be a tab character at the beginning of each recipe line.

Here is a guide for writing makefile rules.

5.3.2. An Example#

Here is the incomplete makefile we provide for HW0:

override CFLAGS := -Wall -Werror -std=gnu99 -O0 -g  $(CFLAGS) -I.

# I generally make the first rule run all the tests
check: checkprogs
	/bin/sh run_tests.sh $(test_files)

# rule for making the parser.o  that is needed by all the test programs
myshell_parser.o: myshell_parser.c myshell_parser.h

# each of the test files depend on their own .c and myshell_parser.h
#  add another time for each test, e.g., test_simple_pipe.o line below
test_simple_input.o: test_simple_input.c myshell_parser.h
test_simple_pipe.o: test_simple_pipe.c myshell_parser.h

# each of the test programs executables are generated by combining the generated .o with the parser.o
test_simple_input : test_simple_input.o myshell_parser.o
test_simple_pipe : test_simple_pipe.o myshell_parser.o

# Add any additional tests here, e.g., the commented out test_simple_pipe
test_files=./test_simple_input # ./test_simple_pipe

.PHONY: clean check checkprogs all

# Build all of the test programs
checkprogs: $(test_files)

clean:
	rm -f *~ *.o $(test_files) $(test_o_files)

Let’s break down each line and rule.

In the first line, we specify the flags we want to use to compile C files by assigning a value to CFLAGS (more on implicit variables like CFLAGS here). The override directive just makes sure you use the assignments in the makefile even if the variable has previously been set with a command argument.

  • -Wall: turns on many compiler warning flags

  • -Werror: turns warnings into compilation errors

  • -std=gnu99: set c version

  • -O0: sets optimization level to 0 (faster compilation, better for debugging)

  • -g: adds debugging symbols to executable

  • $(CFLAGS): include default flags

  • -I.: specifies directory where header files can be found (in this example, the working directory .

Following the CFLAGS definition, we see a rule for building the test programs.

Next, we have a rule for building the parser’s object file, followed by rules to build two test programs.

Below that, there is an additional line that you can edit to add additional test scripts.

The rest of the rules in the makefile have phony targets. They are not names for files, rather just names for a recipe to be executed when an explicit request is made. Using a phony target helps avoid conflicts with files of the same name and improves performance.

The checkprogs rule builds all of the test programs.

The last rule is the clean rule, which will just remove all object files and test files from the directory.

For more info on make, see this chapter of Jonathan Appavoo’s book. You can also reference the GNU make manual.

5.4. Git Basics#

Version control is a good idea for any software project. For your work in this class, it will be required. The development environment will periodically wipe your directory, so if you haven’t pushed your changes to a remote repository, you will lose all progress. We recommend committing and pushing your code often to avoid losing your work.

If you are completely new to GitHub, we recommend this guide to help get you started. This documentation site will also be helpful.

Since you will not be collaborating on code, much of GitHub’s functionality will be unnecessary. The following five commands should cover everything that you need for making sure you don’t lose your work.

5.4.1. git clone <repourl>#

Clone a remote github repository to your machine.

5.4.2. git status#

See the status of the working directory and the staging area. It will show you which changes are and aren’t being tracked by git and which changes have been staged.

5.4.3. git add <files>#

Stages a given change to go into your next commit.

5.4.4. git commit -m "what you changed"#

Commit any added changes. Once a change is committed it is safely stored in your local database.

5.4.5. git push#

Push your local commits to the remote repository.

5.5. GDB#

gdb is your best friend when it comes to debugging. It allows you to step through a program line by line, examine variable values and the contents of the stack, and much more. For a quick reference, we recommend this guide.

5.5.1. Getting started#

In order to use gdb, you will have to use the -g command line flag when compiling. This flag makes sure that the necessary debugging information is produced.

gcc -g -o main main.c

This will compile the program main.c into an executable named “main” with debugging information.

To open the program in the debugger, you can type gdb main. To exit gdb, type q or quit.

Gdb provides documentation, which you can access by typing the command help. This will provide a list of topics, and you can then get information about a specific topic or command by typing help <topic> or help <command>.

5.5.2. Using gdb to debug#

One of the more useful things you can do in gdb is set breakpoints and observe the state of the process mid-execution. There are a couple of ways to set a breakpoint:

  • break <function>: sets a breakpoint at the beginning of function.

  • break <linenumber>: sets the breakpoint to the given line number in the source file. Execution will stop before that line has been executed.

If your code is in multiple files, you may have to specify a file name before the function name or line number, e.g. break <filename>:<function> or break <filename>:<linenumber>.

To delete the breakpoints you have set, you can type delete. You can also delete a specific breakpoint by typing delete <number>. To find out what number each breakpoint is, type info breakpoints.

Once you have set all your desired breakpoints, you need to run the program. To do this, simply type the command run into gdb. If the program takes command line arguments, you can provide them the same way as you would in the command line, except you say “run” instead of the program name. Your code will execute up to where you specified your first breakpoint.

Once you reach your breakpoint, you will probably want to look at the contents of the stack or some variables to make sure that your program is executing as expected. To see the current value of a variable, you can simply type print <variablename>.

To continue running the program after stopping at a break time, you have several options. Typing continue will set the program running again until you hit another breakpoint or the process finishes. Typing step (or just s) will execute the current source line and stop execution before the next source line, If the line that is about to be executed is a function call, then step will step into that function. In contrast, moving through execution using the next will not “step in” to a called function. It will continue until the next source line in the current function.

5.5.2.1. Debugging threads#

Once you learn about pthreads, a very powerful thing you can do is switch between threads and step through in the order you choose.

  • info threads: lists all existing threads.

  • thread <thread_id>: selects which thread to switch to.

After you have selected which thread you would like to step through, you can use the basic stepping commands to execute the next instructions. Note that already existing break points exist across all threads.

5.5.2.2. Debugging signals#

Once you learn about signals, you can use gdb to change its behavior on how to handle those signals.

  • info signals or info signals <signal>: lists all signals (or one signal) and shows how gdb responds to it.

  • handle <signal> <keyword>: changes how gdb responds to that signal based on the keyword.

Below is a list of keywords:

  • nostop: gdb should not stop your program when this signal happens.

  • stop: gdb should stop your program when this signal happens.

  • print: gdb should print a message when this signal happens.

  • noprint: gdb should not mention the occurrence of the signal at all.

  • ignore: gdb should not allow your program to see this signal.

  • noignore: gdb should allow your program to see this signal.

5.5.3. Review#

  • break: use to set breakpoints

  • run: run the program in gdb

  • delete: delete breakpoints

  • continue: set the program running again after being stopped at a breakpoint

  • step: execute the current source line and stop again before the next source line

  • next: continue until the next source line in the current function

  • list <linenumber>: print out some lines from the source code around linenumber.

This is just the beginning of what you can do with gdb, and we will add to this section of the book as the course continues!

5.6. The C Programming Language#

The C programming language is a high level statically typed procedural compiled programming language valued for its accessibility to low level computer memory. You have probably already learned the language in a previous course, but this course will utilize some of its advanced features that have probably not been taught in those introductory courses. To make the learning process easier, we recommend reviewing its syntax and concepts. Here are a few:

  • data types and their bits

  • operator precedence

  • control flow (if, for, while, etc)

  • arrays

  • strings

  • pointers

  • dynamic allocation

  • functions

  • preprocessing and compiling

  • glibc (GNU C Library)

  • algorithms and data structures (hash maps, linked lists, etc)

In addition to reviewing the language, we also encourage you to follow some sort of coding style. This website contains the preferred coding style for the Linux kernel. Although not required, adhering to a style will increase productivity so you and whoever else looks at your code will spend less time trying to understand what you wrote.

5.7. Unit Tests#

Unit tests are a way to check if your program runs as expected. Combine unit tests with makefiles, then you can automate both the compiling and testing phase of development. We call these tests unit tests because they are supposed to test a singular functionality of a component in a system rather than the whole system. You will spend less time debugging unit tests than testing the entire system because you can determine that the problem is with the specific functionality that it is testing.

Creating your own tests are essential to completing the programming assignments.

As an example, we have 3 functions: foo, bar, and baz. foo is a helper function to bar and baz, and bar is a helper function to baz.

int foo()
{
    ...
}

int bar()
{
    ...
    return foo();
}

int baz()
{
    ...
    return bar() * foo()
}

int main()
{
    assert(foo() > 0);  // this is a first level unit test
    assert(bar() > 0);  // this is a second level unit test
    assert(baz() > 0);  // this is a final level unit test aka system test
    return 0;
}

It would be very easy to create a test for baz and check the expected outputs, however, there is a possibility that the test passes for the wrong reasons (two wrongs might make a right). In the case that testing baz fails, it would be much harder to debug baz without first testing if both foo and bar pass their tests. Although this is a simple example, this idea of creating unit tests from the first level to the last will greatly improve development effeciency, and will train you to think about a complex system in bite sized chunks.