5. The UNIX Filesystem

Contents:
Files
Using File Permissions
The umask
Using Directory Permissions
SUID
Device Files
chown: Changing a File's Owner
chgrp: Changing a File's Group
Oddities and Dubious Ideas
Summary

The UNIX filesystem controls the way that information in files and directories is stored on disk and other forms of secondary storage. It controls which users can access what items and how. The filesystem is therefore one of the most basic tools for enforcing UNIX security on your system.

Information stored in the UNIX filesystem is arranged as a tree structure of directories and files. The tree is constructed from directories and subdirectories within a single directory, which is called the root. [1] Each directory, in turn, can contain other directories or entries such as files, pointers (symbolic links) to other parts of the filesystem, logical names that represent devices (such as /dev/tty), and many other types.[2]

[1] This is where the "root" user (superuser) name originates: the owner of the root of the filesystem.
[2] For example, the UNIX "process" filesystem in System V contains entries that represent processes that are currently executing.

This chapter explains, from the user's point of view, how the filesystem represents and protects information.

5.1 Files

From the simplest perspective, everything visible to the user in a UNIX system can be represented as a "file" in the filesystem - including processes and network connections. Almost all of these items are represented as "files" each having at least one name, an owner, access rights, and other attributes. This information is actually stored in the filesystem in an inode(index node), the basic filesystem entry. An inode stores everything about a filesystem entry except its name; the names are stored in directories and are associated with pointers to inodes.

5.1.1 Directories

One special kind of entry in the filesystem is the directory. A directory is nothing more than a simple list of names and inode numbers. A name can consist of any string of any characters with the exception of a "/" character and the "null" character (usually a zero byte).[3] There is a limit to the length of these strings, but it is usually quite long: 1024 or longer on many modern versions of UNIX; older AT&T versions limit names to 14 characters or less.

[3] Some versions of UNIX may further restrict the characters that can be used in filenames and directory names.

These strings are the names of files, directories, and the other objects stored in the filesystem. Each name can contain control characters, line feeds, and other characters. This can have some interesting implications for security, and we'll discuss those later in this and other chapters.

Associated with each name is a numeric pointer that is actually an index on disk for an inode. An inode contains information about an individual entry in the filesystem; these contents are described in the next section.

Nothing else is contained in the directory other than names and inode numbers. No protection information is stored there, nor owner names, nor data. The directory is a very simple relational database that maps names to inode numbers. No restriction on how many names can point to the same inode exists, either. A directory may have 2, 5, or 50 names that each have the same inode number. In like manner, several directories may have names that associate to the same inode. These are known as links [4] to the file. There is no way of telling which link was the first created, nor is there any reason to know: all the names are equal in what they access. This is often a confusing idea for beginning users as they try to understand the "real name" for a file.

[4] These are hard links or direct links. Some systems support a different form of pointer, known as a symbolic link, that behaves in a different way.

This also means that you don't actually delete a file with commands such as rm. Instead, you unlink the name - you sever the connection between the filename in a directory and the inode number. If another link still exists, the file will continue to exist on disk. After the last link is removed, and the file is closed, the kernel will reclaim the storage because there is no longer a method for a user to access it.

Every normal directory has two names always present. One entry is for "." (dot), and this is associated with the inode for the directory itself; it is self-referential. The second entry is for ".." (dot-dot), which points to the "parent" of this directory - the directory next closest to the root in the tree-structured filesystem. The exception is the root directory itself, named "/". In the root directory, ".." is also a link to "/".

5.1.2 Inodes

For each object in the filesystem, UNIX stores administrative information in a structure known as an inode. Inodes reside on disk and do not have names. Instead, they have indices (numbers) indicating their positions in the array of inodes.

Each inode generally contains:

The location of the item's contents on the disk, if any
The item's type (e.g., file, directory, symbolic link)
The item's size, in bytes, if applicable
The time the file's inode was last modified (the ctime)
The time the file's contents were last modified (the mtime)
The time the file was last accessed (the atime) for read ( ), exec ( ), etc
A reference count: the number of names the file has
The file's owner (a UID)
The file's group (a GID)
The file's mode bits (also called file permissions or permission bits)

The last three pieces of information, stored for each item, and coupled with UID/GID information about executing processes, are the fundamental data that UNIX uses for practically all operating system security.

Other information can also be stored in the inode, depending on the particular version of UNIX involved. Some systems may also have other nodes such as vnodes, cnodes, and so on. These are simply extensions to the inode concept to support foreign files, RAID[5] disks, or other special kinds of filesystems. We'll confine our discussion to inodes, as that abstraction contains most of the information we need.

[5] RAID means Redundant Array of Inexpensive Disks. It is a technique for combining many low-cost hard disks into a single unit that offers improved performance and reliability.

Figure 5.1 shows how information is stored in an inode.

Figure 5.1: Files and inodes

5.1.3 Current Directory and Paths

Every item in the filesystem with a name can be specified with a pathname. The word pathname is appropriate because a pathname represents the path to the entry from the root of the filesystem. By following this path, the system can find the inode of the referenced entry.

Pathnames can be absolute or relative. Absolute pathnames always start at the root, and thus always begin with a "/", representing the root directory. Thus, a pathname such as /homes/mortimer/bin/crashmerepresents a pathname to an item starting at the root directory.

A relative pathname always starts interpretation from the current directory of the process referencing the item. This concept implies that every process has associated with it a current directory. Each process inherits its current directory from a parent process after a fork (see Appendix C, UNIX Processes). The current directory is initialized at login from the sixth field of the user record in the /etc/passwd file: the home directory. The current directory is then updated every time the process performs a change-directory operation (chdir or cd). Relative pathnames also imply that the current directory is at the front of the given pathname. Thus, after executing the command cd /usr, the relative pathname lib/makekey would actually be referencing the pathname /usr/lib/makekey. Note that any pathname that doesn't start with a "/" must be relative.

5.1.4 Using the ls Command

You can use the ls command to list all of the files in a directory. For instance, to list all the files in your current directory, type:

% ls -a   
instructions  letter       notes 
invoice       more-stuff   stats 
%

You can get a more detailed listing by using the ls -lF command:

% ls -lF
total 161 -rw-r--r-- 1 sian    user          505 Feb  9 13:19 instructions 
-rw-r--r-- 1 sian    user         3159 Feb  9 13:14 invoice 
-rw-r--r-- 1 sian    user         6318 Feb  9 13:14 letter 
-rw------- 1 sian    user        15897 Feb  9 13:20 more-stuff 
-rw-r----- 1 sian    biochem      4320 Feb  9 13:20 notes 
-rwxr-xr-x 1 sian    user       122880 Feb  9 13:26 stats* 
%

The first line of output generated by the ls command ("total 161" in the example above) indicates the number of kilobytes taken up by the files in the directory.[6] Each of the other lines of output contains the fields, from left to right, as described in Table 5.1.

[6] Some older versions of UNIX reported this in 512-byte blocks rather than in kilobytes.

Table 5.1: ls Output
Field Contents	Meaning
-	The file's type; for regular files, this field is always a dash
rw-r--r--	The file's permissions
1	The number of "hard" links to the file; the number of "names" for the file
sian	The name of the file's owner
user	The name of the file's group
505	The file's size, in bytes
Feb 9 13:19	The file's modification time
instructions	The file's name

The ls -F option makes it easier for you to understand the listing by printing a special character after the filename to indicate what it is, as shown in Table 5.2.

Table 5.2: ls -F Tag Meanings
Symbol	Meaning
(blank)	Regular file or named pipe (FIFO[7])
*	Executable program or command file
/	Directory
=	Socket
@	Symbolic link

[7] A FIFO is a First-In, First-Out buffer, which is a special kind of named pipe.

Thus, in the directory shown earlier, the file stats is an executable program file; the rest of the files are regular text files.

The -g option to the ls command alters the output, depending on the version of UNIX being used.

If you are using the Berkeley-derived version of ls,[8] you must use thels -g option to display the file's group in addition to the file's owner:

[8] On Solaris systems, this program is named /usr/ucb/ls.

% ls -lFg 
total 161 
-rw-r--r-- 1 sian     user       505 Feb  9 13:19 instructions 
-rw-r--r-- 1 sian     user      3159 Feb  9 13:14 invoice 
-rw-r--r-- 1 sian     user      6318 Feb  9 13:14 letter 
-rw------- 1 sian     user     15897 Feb  9 13:20 more-stuff 
-rw-r----- 1 sian     biochem   4320 Feb  9 13:20 notes 
-rwxr-xr-x 1 sian     user    122880 Feb  9 13:26 stats* 
%

If you are using an AT&T-derived version of ls,[9] using the -g option causes the ls command to only display the file's group:

[9] On Solaris systems, this program is named /bin/ls.

% ls -lFg
total 161 
-rw-r--r-- 1 user        505 Feb  9 13:19 instructions 
-rw-r--r-- 1 user       3159 Feb  9 13:14 invoice 
-rw-r--r-- 1 user       6318 Feb  9 13:14 letter 
-rw------- 1 user      15897 Feb  9 13:20 more-stuff 
-rw-r----- 1 biochem    4320 Feb  9 13:20 notes 
-rwxr-xr-x 1 user     122880 Feb  9 13:26 stats* 
%

5.1.5 File Times

The times shown with the ls -l command are the modification times of the files ( mtime). You can obtain the time of last access (the atime) by providing the -u option (for example, by typing ls -lu). Both of these time values can be changed with a call to a system library routine.[10] Therefore, as the system administrator, you should be in the habit of checking the inode change time ( ctime) by providing the -c option; for example, ls -lc. You can't reset the ctime of a file under normal circumstances. It is updated by the operating system whenever any change is made to the inode for the file.

[10] utimes ( )

Because the inode changes when the file is modified, ctime reflects the time of last writing, protection change, or change of owner. An attacker may change the mtime or atime of a file, but the ctime will usually be correct.

Note that we said "usually." A clever attacker who gains superuser status can change the system clock and then touch the inode to force a misleading ctime on a file. Furthermore, an attacker can change the ctime by writing to the raw disk device and bypassing the operating system checks altogether. And if you are using Linux with the ext2 filesystem, an attacker can modify the inode contents directly using the debugfs command.

For this reason, if the superuser account on your system has been compromised, you should not assume that any of the three times stored with any file or directory are correct.

NOTE: Some programs will change the ctime on a file without actually changing the file itself. This can be misleading when you are looking for suspicious activity. The file command is one such offender. The discrepancy occurs because file opens the file for reading to determine its type, thus changing the atime on the file. By default, most versions of file then reset the atime to its original value, but in so doing change the ctime. Some security scanning programs use the file program within them (or employ similar functionality), and this may result in wide-scale changes in ctime unless they are run on a read-only version of the filesystem.

5.1.6 Understanding File Permissions

The file permissions on each line of the ls listing tell you what the file is and what kind of file access (that is, ability to read, write, or execute) is granted to various users on your system.

Here are two examples of file permissions:

-rw------- 
drwxr-xr-x

The first character of the file's mode field indicates the type of file described in Table 5.3.

Table 5.3: File Types
Contents	Meaning
-	Plain file
d	Directory
c	Character device (tty or printer)
b	Block device (usually disk or CD-ROM)
l	Symbolic link (BSD or V.4)
s	Socket (BSD or V.4)
= or p	FIFO (System V, Linux)

The next nine characters taken in groups of three indicate who on your computer can do what with the file. There are three kinds of permissions:

r: Permission to read
w: Permission to write
x: Permission to execute

Similarly, there are three classes of permissions:

owner: The file's owner
group: Users who are in the file's group
other: Everybody else on the system (except the superuser)

In the ls -l command, privileges are illustrated graphically (see Figure 5.2).

Figure 5.2: Basic permissions

5.1.7 File Permissions in Detail

The terms read, write, and execute have very specific meanings for files, as shown in Table 5.4.

Table 5.4: File Permissions
Character	Permission	Meaning
r	READ	Read access means exactly that: you can open a file with the `open()` system call and you can read its contents with read.
w	WRITE	Write access means that you can overwrite the file with a new one or modify its contents. It also means that you can use `write()` to make the file longer or `truncate()` or `ftruncate()` to make the file shorter.
x	EXECUTE	Execute access makes sense only for programs. If a file has its execute bits set, you can run it by typing its pathname (or by running it with one of the family of `exec()` system calls). How the program gets executed depends on the first two bytes of the file.
		The first two bytes of an executable file are assumed to be a magic number indicating the nature of the file. Some numbers mean that the file is a certain kind of machine code file. The special two-byte sequence "#!" means that it is an executable script of some kind. Anything with an unknown value is assumed to be a shell script and is executed accordingly.

File permissions apply to devices, named sockets, and FIFOS exactly as they do for regular files. If you have write access, you can write information to the file or other object; if you have read access, you can read from it; and if you don't have either access, you're out of luck.

File permissions do not apply to symbolic links. Whether or not you can read the file pointed to by a symbolic link depends on the file's permissions, not the link's. In fact, symbolic links are almost always created with a file permission of "rwxrwxrwx" (or mode 0777, as explained later in this chapter) and are ignored by the operating system.[11]

[11] Apparently, some vendors have found a use for the mode bits inside a symbolic link's inode. HP-UX 10.0 uses the sticky bit of symbolic links to indicate "transition links" - portability links to ease the transition from previous releases to the new SVR4 filesystem layout.

Note the following facts about file permissions:

You can have execute access without having read access. In such a case, you can run a program without reading it. This ability is useful in case you wish to hide the function of a program. Another use is to allow people to execute a program without letting them make a copy of the program (see the note later in this section).
If you have read access but not execute access, you can then make a copy of the file and run it for yourself. The copy, however, will be different in two important ways: it will have a different absolute pathname; and it will be owned by you, rather than by the original program's owner.
On some versions of UNIX (including Linux), an executable command script must have both its read and execute bits set to allow people to run it.

Most people think that file permissions are pretty basic stuff. Nevertheless, many UNIX systems have had security breaches because their file permissions are not properly set.

NOTE: Sun's Network Filesystem (NFS) servers allow a client to read any file that has either the read or the execute permission set. They do so because there is no difference, from the NFS server's point of view, between a request to read the contents of a file by a user who is using the read() system call and a request to execute the file by a user who is using the exec() system call. In both cases, the contents of the file need to be transferred from the NFS server to the NFS client. (For a detailed description, see Chapter 20, NFS.)