In [None]:
%run -i ../python/common.py
publish=False

if not publish:
    # cleanup any old state
    bashCmds('''[[ -d mydir ]] && rm -rf mydir
    [[ -a /tmp/foo ]] && rm -rf /tmp/foo
    [[ -a errors ]] && rm errors 
    [[ -a mydate ]] && rm mydate
    [[ -a mynewdir ]] && rm -rf mynewdir
    [[ -a anotherfile ]] && rm anotherfile
    [[ -a mybin ]] && rm -rf  mybin
    [[ -a myinfo ]] && rm myinfo''')
else:
    bashCmds('''rm -rf ~/*''')
    
closeAllOpenTtySessions()

generated="~/myfile ~/errors ~/mydate ~/mydir ~/mynewdir ~/out"


In [None]:
appdir=os.getenv('HOME')
appdir=appdir + "/fslec1"
TermShellCmd("ls ")
output=runTermCmd("[[ -d " + appdir + " ]] &&  rm -rf "+ appdir + 
             ";cp -r ../src/fslec1 " + appdir)
bash = BashSession(cwd=appdir)

(cont:fs:interface)= 
# File System Abstraction

From a user perspective, file systems support:

- a *name space*, the set of names that identify objects;
- *objects* such as the files themselves as well as directories and other supporting objects;
- *operations* on these objects.

We first describe how naming works in a Unix file system, then some of the core objects and how they are identified, and then the operations a process can perform on those objects. 



## Naming

[^hier]: Very early file systems sometimes had a single flat directory per user, or like MS-DOS 1.0, a single directory per floppy disk

Most file systems today support a
tree-structured namespace[^hier], as shown in {numref}`fs:tree-logical`. 
This tree is constructed via the use of
*directories*, or objects in the namespace which map strings to further
file system objects. A full filename thus specifies a *path* from the
root, through the tree, to the object (a file or directory) itself.
(Hence the use of the term "path" to mean "filename" in Unix
documentation.)

```{figure} ../images/pb-figures/fs/filesys-tree.png
---
width: 70%
name: fs:tree-logical
---
Logical view: hierarchical file system name space
```



Each process has an associated *current directory*, which may be changed via the `chdir` system call. File names beginning with '`/`' are termed *absolute* names, and are interpreted relative to the root of the naming tree, while *relative* names are
interpreted beginning at the current directory. Thus in the file system in
{numref}`fs:tree-logical`, if the current directory were `/home`,
the the paths `pjd/.profile` and `/home/pjd/.profile` refer to the same
file, and `../bin/cat` and `/bin/cat` refer to the same file.

Each directory also contains two special files ```.``` and ```..```, where `d/..`  identifies the parent directory of `d`, and `d/.` identifies `d`
itself.  

A typical system may provide access to
several file systems at once, e.g., a local disk and an external USB
drive or network volume. In order to unambiguously specify a file we
thus need to both identify the file within possibly nested directories
in a single file system, as well as identifying the file system itself.
Unix enables a file system to be *mounted* onto a directory in
another file system, giving a single uniform namespace.  For example, on the systems you are using, there is an *ext4* file system mounted in the root file system at ```/opt/app-root/src```, which you can see if you use the mount command to list all the file systems mounted on this computer.

In [None]:
bash.run("mount | grep ext4")

The actual implementation of mounting in Linux and other Unix-like
systems is implemented via a *mount table*, a small table in the kernel
mapping directories to directories on other file systems. As the
kernel translates a pathname it checks each directory in this table; if
found, it substitutes the mapped file system and directory before
searching for an entry. Thus before searching "/opt/app-root/src" on  for the entry "foo", the kernel will substitute the
top-level directory on the mounted ext4 files system then search for "foo".

For a more thorough explanation of path translation in Linux and other
Unix systems see the `path_resolution(7)` man page, i.e. type `man path_resolution`.

## Objects

```{sidebar} So, why is it called an inode? Dennis Ritchie, who was one of the authors of UNIX, gave this enlightened answer to the Linux kernel mailing list in 2002:
> In truth, I don't know either. It was just a term that we started to use. "Index" is my best guess, because of the slightly unusual file system structure that stored the access information of files as a flat array on the disk, with all the hierarchical directory information living aside from this. Thus the i-number is an index in this array, the i-node is the selected element of the array.
<p style='text-align: right;'> ---Dennis Ritchie, <a href='https://lkml.indiana.edu/hypermail/linux/kernel/0207.2/1182.html'>[LKML 2002]</a></p>
```

Once you use a pathname to find an object in the file system, you need to find out what kind of an object you have found.  Each file is identified in the file system by a unique **inode number** that references an **inode** data structure that maintains all kinds of information, or *meta-data* about the file (try ```man inode``` for more information).   While the inode itself is internal to the file system, and contains additional information, generic information can be obtained for any file as described [below](cont:fs:calls:naming).  One of the fields in an inode identifies the type of object that the inode refers to. The types of objects that can be referenced by inode are shown in {numref}`file_types`. 

<a id="file_types"></a>
<center><em>Types of objects in a file system</em></center>

| Name | Value  | Purpose |
| :--------------: | ------------- | ---------------------- |
| regular file | S_IFREG| A regular file normally used to store data |
| directory | S_IFDIR | A special file used to contain files or other directories |
| symbolic link | S_IFLNK | A kind of “file” that is essentially a pointer to another file name |
| block device | S_IFBLK | A device that like a disk that is accessed by reading and writing blocks |
| character device | S_IFCHR | A charter device like a `tty` |
| FIFO | S_IFIFO | A pipe |
| socket | S_IFSOCK | A socket used for networking |

```{list-table} Types of objects in a file system. 
:header-rows: 1
:name: file_types
:widths: 6 5 10
:width: 4in

* - Name 
  - Value 
  - Purpose
* - regular file
  - S_IFREG
  - A regular file normally used to store data
* - directory
  - S_IFDIR
  - A special file used to contain files or other directories
* - symbolic link
  - S_IFLNK
  - A kind of “file” that is essentially a pointer to another file name
* - block device
  - S_IFBLK
  - A device that like a disk that is accessed by reading and writing blocks
* - character device
  - S_IFCHR
  - A charter device like a `tty`
* - FIFO
  - S_IFIFO
  - A pipe
* - socket
  - S_IFSOCK
  - A socket used for networking
```
 
[^eff]: it is probably not a
coincidence that Unix arrived at the same time as computers which dealt
only with multiples of 8-bit bytes (e.g. 16 and 32-bit words), replacing
older systems which frequently used odd word sizes such as 36 bits.
(Note that a machine with 36-bit instructions already needs two
incompatible types of files, one for text and one for executable code.)

[^simple]: This is the case for almost all operating systems today, but... of course there are exceptions.  Apple OSX uses resource forks to store information associated with a file (HFS and HFS+ file systems only), Windows NTFS provides for multiple data streams in single file, although they were never put to use, and several file systems support file attributes, which are small tags associated with a file.

The last four are special files that you can connect into a file system.  The first three are core objects for all file systems. 

### Files

In keeping with the idea that everything is a file, Unix made all files just a sequence of 8-bit bytes[^eff][^simple]. Any
structure to the file (such as a JPEG image, an executable program, or a
database) is the responsibility of applications which read and write the
file. The file format is commonly indicated by a file extension like
.jpg or .xml, but this is just a convention followed by applications and
users. You can do things like rename file.pdf to file.jpg, which will
confuse some applications and users, but it will have no effect on the file
contents.

Data in a byte-sequence file is identified by the combination of the
file and its offset (in bytes) within the file. Unlike in-memory objects
in an application, where a reference (pointer) to a component of an
object may be passed around independently, a portion of a file cannot be
named without identifying the file it is contained in. Data in a file
can be created by a `write` which appends more data to the end of a
shorter file, and modified by over-writing in the middle of a file.
However, it can't be "moved" from one offset to another: if you use a
text editor to add or delete text in the middle of a file, the editor
must re-write the entire file (or at least from the modified part to the
end).

```{figure} ../images/pb-figures/fs/filesys-tree2.png
---
width: 70% 
name: fs:tree-imp
---
Implementation view: hierarchical file system name space. Gray blocks are directories that contain entries with strings and corresponding inode numbers that identify the files.
```

(cont:fs:interface:dir)= 
### Directory

As shown in {numref}`fs:tree-imp`, a directory contains entries with strings that identify objects contained in the directory, and for each the inode numbers that can then be used to find out more information about the corresponding object.   The same inode can be referenced by multiple directories, with potentially different names. Each directory entry that maps a name to an inode number is called a *hard link* to the inode, and another field in the inode structure records the number of hard links to that inode.  For example, the entry named ```..``` in any directory is a hard link to the parent directory. To illustrate this point, let's create a directory named `foo` in the `/tmp` directory and see how the link count for `foo` changes when we create a subdirectory `bar` inside foo.


% an attempt to put side by side; didn't work
% :::{figure-md} fig:filesys:tree
% ![alt](../images/pb-figures/fs/filesys-tree.png) ![alt](../images/pb-figures/fs/filesys-tree2.png)
% 
% Logical (left) and implementation (right) view of a hierarchical file system name space.
% :::

In [None]:
bash.run("mkdir /tmp/foo; ls -al /tmp/foo; mkdir /tmp/foo/bar ; ls -al /tmp/foo")

As we can see, after creating a directory `/tmp/foo` the reference count (second entry in the output above) for the ```.``` in `foo` is 2 before we create a subdirectory  `bar` in it, and 3 after. The initial two links come from the parent directory's entry named `foo`, and the `.` entry in `foo` itself. After the creation of `bar`, there is a third link to the inode for `foo` due to the `..` entry in `bar`, which refers to its parent directory. 

The link count in an inode ensures that the object the inode refers to (file or directory) will not be deleted as long as there is at least one hard link to the inode. When you issue a delete command for a filename (e.g., 'rm somefile'), the file system removes the entry for the filename from a directory and decrements the link count on the inode that entry referred to. Only when the last hard link is removed will the file object really be freed. 

### Symbolic links

The third file system object is a
*symbolic link*. This holds a text string which is interpreted as a
"pointer" to another location in the file system. When the kernel is
searching for a file and encounters a symbolic link, it substitutes this
text into the current portion of the path, and continues the translation
process.

This can be very useful 

<pre>
directory: /usr/program-1.0.1
  file:      /usr/program-1.0.1/file.txt
  sym link:  /usr/program-current -> "program-1.0.1"
</pre>

and if the OS is looking up the file `/usr/program-current/file.txt`, it
will:

1. look up `usr` in the root directory, finding a pointer to the `/usr`
directory
2. look up `program-current` in `/usr`, finding the link with contents
`program-1.0.1`

3. look up `program-1.0.1` and use this result instead of the result from
looking up `program-current`, getting a pointer to the
`/usr/program-1.0.1` directory.

4. look up `file.txt` in this directory, and find it.


Note that unlike hard links, a symbolic link does not increase the link count in the inode that it refers to. As a result, a symbolic link may be "broken"---i.e., if
the file it points to does not exist. This can happen if the link was
created in error, or the file or directory it points to is deleted
later. In that case path translation will fail with an error:

In [None]:
bash.run('''ln -s /bad/file/name bad-link
ls -l bad-link
cat bad-link''')

<pre>
pjd-1:tmp pjd$ ln -s /bad/file/name bad-link
pjd-1:tmp pjd$ ls -l bad-link 
lrwxr-xr-x  1 pjd  wheel  22 Aug  2 00:07 bad-link -> /bad/file/name
pjd-1:tmp pjd$ cat bad-link
cat: bad-link: No such file or directory
</pre>

Finally, to prevent loops there is a limit on how many levels of
symbolic link may be traversed in a single path translation:

<pre>
pjd@pjd-fx:/tmp$ ln -s loopy loopy
pjd@pjd-fx:/tmp$ ls -l loopy
lrwxrwxrwx 1 pjd pjd 5 Aug 24 04:25 loopy -> loopy
pjd@pjd-fx:/tmp$ cat loopy
cat: loopy: Too many levels of symbolic links
pjd@pjd-fx:/tmp$ 
</pre>

In [None]:
bash.run('''ln -s loopy loopy
ls -l loopy
cat loopy''')

## File System Operations:

There are several common types of file operations supported by Linux
(and with slight differences, Windows). They can be classified into
three main categories: open/close, read/write, and naming and
directories.

### Open/close

In order to access a file in Linux (or most operating
systems) you first need to open the file, passing the file name and
other parameters and receiving a *handle* (called a *file descriptor* in
Unix) which may be used for further operations. The corresponding system
calls are:

- `int desc = open(name, O_READ)`: Verify that file `name` exists and may
be read, and then return a *descriptor* which may be used to refer to
that file when reading it.

- `int desc = open(name, O_WRITE | flags, mode)`: Verify permissions and
open `name` for writing, creating it (or erasing existing contents) if
necessary as specified in `flags`. Returns a descriptor which may be
used for writing to that file.

- `close(desc)`: stop using this descriptor, and free any resources
allocated for it.


Note that application programs rarely use the system calls themselves to
access files, but instead use higher-level frameworks, ranging from Unix
Standard I/O to high-level application frameworks.

#### Read/Write operations

To get a file with data in it, you need to write it; to use that data you need to read it. To enable files to be accessed as a *stream* just like from a terminal or pipe, UNIX uses the concept of a *current position*
associated with a file descriptor. When you read 100 bytes (i.e. bytes 0
to 99) from a file, this pointer advances by 100 bytes, so that the next
read will start at byte 100, and similarly for write. When a file is
opened for reading the pointer starts at 0; when open for writing the
application writer can choose to start at the beginning (default) and
overwrite old data, or start at the end (`O_APPEND` flag) to append new
data to the file.

The read and write routines are the same ones we described before, but, for ease of reference, they are:

- `n = read(desc, buffer, max)`: Read `max` bytes (or fewer if the end of
the file is reached) into `buffer`, starting at the current position,
and returning the actual number of bytes `n` read; the current position
is then incremented by `n`.

- `n = write(desc, buffer, len)`: Write `len` bytes from `buffer` into
the file, starting at the current position, and incrementing the current
position by `len`.

- `lseek(desc, offset, flag)`: Set an open file's current position to that
specified by `offset` and `flag`, which specifies whether `offset` is
relative to the beginning, end, or current position in the file.

[^pread]: On Linux the `pread` and `pwrite` system calls allow specifying an
    offset for the read or write; other UNIX-derived operating systems
    have their own extensions for this purpose.


Note that in the basic Unix interface (unlike e.g. Windows) there is no
way to specify a particular location in a file to read or write
from[^pread]. Programs like databases (e.g. SQLite, MySQL) which need to
write to and read from arbitrary file locations must instead move the
current position by using `lseek` before a read or write. However, most
programs either read or write a file from the beginning to the end
(especially when written for an OS that makes it easier to do things
that way), and thus don't really need to perform seeks. Because most
Unix programs use simple "stream" input and output, these may be
re-directed so that the same program can---without any special
programming---read from or write to a terminal, a network connection, a
file, or a pipe from or to another program.

[^hardlink]: A hard link is an additional directory entry pointing to the same
    file, giving the file two (or more) names. Hard links are peculiar
    to Unix, and in modern systems have mostly been replaced with
    symbolic links (covered above); however Apple's Time Machine makes
    very good use of them: multiple backups can point to the same single
    copy of an un-modified file using hard links.

[^unlink]: Even when the reference count goes to zero, the file might not be removed yet - on Unix, if you
    delete an open file it won't actually be removed until all open file
    handles are closed.. In general, deleting open files is a problem:
    while Unix solves the problem by deferring the actual delete,
    Windows solves it by protecting open files so that they cannot be
    deleted.

(cont:fs:calls:naming)= 
### Naming and Directories

In Unix there is a difference between a name
(a directory entry) and the object (file or directory) that the name
points to. The naming and directories operations are:

- `rename(path1, path2)` - Rename an object (i.e., a file or directory) by
either changing the name in its directory entry (if the destination is
in the same directory) or creating a new entry and deleting the old one
(if moving into a new directory).

- `link(path1, path2)`: Add a *hard link* to a file[^hardlink].

- `unlink(path)`: Decrement the reference count to a file, if it goes to zero, delete the file[^unlink].

- `desc = opendir(path)`, `readdir(desc, dirent*), dirent=(name,type,length)`: This interface allows a program to enumerate names in a directory, and determine their type (i.e., file, directory, symbolic link, or special-purpose file).

- `stat(file, statbuf)`, `fstat(desc, statbuf)`:  returns information about the file such as  size, owner, permissions, modification time, etc. These are attributes of the file itself, residing in the inode and returned in the following structure.

```c
struct stat {
  dev_t     st_dev;         /* ID of device containing file */
  ino_t     st_ino;         /* Inode number */
  mode_t    st_mode;        /* File type and mode */
  nlink_t   st_nlink;       /* Number of hard links */
  uid_t     st_uid;         /* User ID of owner */
  gid_t     st_gid;         /* Group ID of owner */
  dev_t     st_rdev;        /* Device ID (if special file) */
  off_t     st_size;        /* Total size, in bytes */
  blksize_t st_blksize;     /* Block size for filesystem I/O */
  blkcnt_t  st_blocks;      /* Number of 512B blocks allocated */
  struct timespec st_atim;  /* Time of last access */
  struct timespec st_mtim;  /* Time of last modification */
  struct timespec st_ctim;  /* Time of last status change */
};
```

- `mkdir(path)`, `rmdir(path)`: directory operations: create a new, empty directory, or delete an empty directory.

## Some examples

Consider the following program in {numref}`filecopy_listing`, which copies one file to another.  After opening the input file (line 26) we stat the input file to get the permissions (i.e., the mode), create a file with that mode (line 37), and then go into a loop reading data from the input file into a buffer, and then writing the buffer to the output file. 

```{literalinclude} /src/fslec1/fcopy2.c
---
linenos:
emphasize-lines: 26, 37, 43, 44, 51
caption: fcopy2.c - An example program that copies one file to another using the file system interface.
name: filecopy_listing
---
```

In [None]:
display(Markdown('<font size="1.2rem">' + FileCodeBox(
    file=appdir + "/fcopy2.c", 
    lang="", 
    number=True,
    title="<b> An example program that copies one file to another: fcopy2.c </b>",
    h="100%", 
    w="100%"
) + '</font>'))
#TermShellCmd("[[ -a fcopy2 ]] && rm fcopy2; make fcopy2", cwd=appdir, prompt='', noposttext=True)
#TermShellCmd("./fcopy2 README.md rm2", cwd=appdir)

In [None]:
bash.runNoOutput("[[ -a fcopy2 ]] && rm fcopy2; make fcopy2")
bash.run("./fcopy2 README.md rm2")

To prove these are the same, let's first use the `diff` program to compare them: 

In [None]:
bash.run("echo \"diff 1\" ;  diff README.md rm2")

We can see that there is no difference between the files, since `diff` produces no output. But just to be sure, let's append a string to the end of the copy (i.e., echo "Hello class" >> rm2) and use the `diff` program to compare them again. 


In [None]:
bash.run("echo \"Hello class\" >> rm2 ; echo \"diff 2\"; diff README.md rm2")

And here we see that the only difference is the line we just appended to the copy of the original `README.md` file. 