Posted on
This post is the first in a series that will let you discover the inner workings of unix (linux, most of the time), by experimentation. You shouldn't take what a book or blog says for gospel, instead you should verify that the system behaves like you think it does.
Today, we will study a tricky part of the linux filesystems: file deletion. When you delete a file (for example with rm -f file.txt
), its contents are not necessarily made inaccessible immediately. There are two reasons for that: hard links and open files.
Linux filesystems have use the notion of hard links. When there are multiple hard links of a file, it means that there are multiple file names all pointing to the same data (an inode). This contrasts with a symbolic link, which is a file name pointing to another file name (which could be a real file, or another symlink).
All files are actually hard links, although most of the time they are the only pointer to their data. We say in that case that the hard link count is 1
.
$ echo "Hello, World!" > file.txt $ ls -li total 4 320779 -rw-r--r-- 1 user group 14 2014-05-29 20:21 file.txt
If we now create a hard link of that file, we see that the link count increments to 2
, for both files, and they both point to the same inode
$ ln file.txt hardlink.txt $ ls -li total 8 320779 -rw-r--r-- 2 user group 14 2014-05-29 20:21 file.txt 320779 -rw-r--r-- 2 user group 14 2014-05-29 20:21 hardlink.txt
You'll notice that there is no information that tells you what are the other hardlinks of a given file, appart from the fact that they have the same inode number, so if you see a file with a link count greater than 1
, you'll need to scan the whole filesystem to find its hardlinks.
Creating a symlink, on the other hand, does not change the link count. The symlink is recognizable because its mode starts with l
(lower-case L), and because ls
shows the link's target.
$ ln -s file.txt symlink.txt $ ls -li total 8 320779 -rw-r--r-- 2 user group 14 2014-05-29 20:21 file.txt 320779 -rw-r--r-- 2 user group 14 2014-05-29 20:21 hardlink.txt 320780 lrwxrwxrwx 1 user group 8 2014-05-29 20:32 symlink.txt -> file.txt
You'll notice that the symlink and other files have different inode numbers, but our hard links have the same:
$ touch test.txt $ ls -li total 8 320779 -rw-r--r-- 2 user group 14 2014-05-29 20:21 file.txt 320779 -rw-r--r-- 2 user group 14 2014-05-29 20:21 hardlink.txt 320780 lrwxrwxrwx 1 user group 8 2014-05-29 20:32 symlink.txt -> file.txt 320781 -rw-r--r-- 2 user group 0 2014-05-29 20:40 test.txt
If we now delete the original file.txt
, we can see that the hard link is still a valid file, that's because the inode still exists. Note its link count was decremented by 1
$ rm -f file.txt $ ls -li total 4 320779 -rw-r--r-- 1 user group 14 2014-05-29 20:21 hardlink.txt 320780 lrwxrwxrwx 1 user group 8 2014-05-29 20:32 symlink.txt -> file.txt 320781 -rw-r--r-- 2 user group 0 2014-05-29 20:40 test.txt $ cat hardlink.txt Hello, World!
On the other hand, the symlink is dangling:
$ cat symlink.txt cat: symlink.txt: No such file or directory
If we now delete the last remaining hard link to the inode 320779
, its link count goes down to 0
, and its contents are permanently deleted:
$ rm hard.txt $ sudo find . -inum 320779 $ # No results.
You can see this by creating a large file. The disk usage does not increment when creating hardlinks, nor does it decrement when deleting hardlinks as long as one remains:
$ df -h . Filesystem Size Used Avail Use% Mounted on /dev/sda1 4.0G 2.0G 2.0G 75% / $ dd if=/dev/zero bs=1M count=1024 > bigfile $ ls -lh total 1,1G -rw-r--r-- 1 user group 1,0G 2014-05-29 21:08 bigfile $ df -h . Filesystem Size Used Avail Use% Mounted on /dev/sda1 4.0G 3.0G 1.0G 75% / $ ln bigfile bighardlink $ ls -lh total 2,1G -rw-r--r-- 1 user group 1,0G 2014-05-29 21:08 bigfile -rw-r--r-- 1 user group 1,0G 2014-05-29 21:08 bighardlink $ df -h . Filesystem Size Used Avail Use% Mounted on /dev/sda1 4.0G 3.0G 1.0G 75% / $ rm bigfile $ ls -lh total 1,1G -rw-r--r-- 1 user group 1,0G 2014-05-29 21:08 bighardlink $ df -h . Filesystem Size Used Avail Use% Mounted on /dev/sda1 4.0G 3.0G 1.0G 75% / $ rm bighardlink $ ls -lh total 0 $ df -h . Filesystem Size Used Avail Use% Mounted on /dev/sda1 4.0G 2.0G 2.0G 75% /
The other reason why the contents of a file may not be discarded immediately is that a program may have opened the file in order to read or write to it.
Let's try to see what happens then. First, we create a process that will write to a file, one line per second, for an hour.
# First terminal window $ for i in $(seq 3600); do echo $i; sleep 1; done > counter.txt
Here's what happens: First the shell opens the file counter.txt
using the file descriptor 1
(stdout
, the standard output), and then, keeping the file open until the loop ends, it runs the loop. Each time the echo
command is executed, it writes a number to the standard output, which is mapped to the already-opened file counter.txt
.
If the shell re-opened the file at each iteration, it would overwrite its contents with a single line each time, and that is obviously not the case, as we can see if we open a second terminal and type:
# Second terminal window $ cat counter.txt 1 2 3
Let's leave that command running, and from another terminal open the file for reading.
# Second terminal window $ tail -f counter.txt 1 2 3 4 5 …# tail -f keeps on showing the end of the file as more lines are appended.
Now, what happens if we we delete the file from a third terminal window?
# Third terminal window $ ls -l counter.txt total 4 -rw-r--r-- 1 user group 288 2014-05-30 14:50 counter.txt $ rm counter.txt $ ls -l counter.txt ls: cannot access counter.txt: No such file or directory
We seem to have successfuly deleted the file (under Microsoft Windows, it would likely tell us that the file is locked by a running process), but the tail -f
command continues to happily monitor the file for changes:
# Second terminal terminal window … 10 11 12 …
Could we run another command and make it use that deleted file? The answer is yes, through the proc
filesystem. The proc
filesystem is a virtual filesystem showing informations about the running processes and about the current system state (if you want to learn more, see also the sys
filesystem, which shows other info). In that filesystem, we'll be able to see each process' file descriptors.
# Third terminal window $ pidof tail 3457 $ ls -l /proc/$(pidof tail)/fd total 0 lrwx------ 1 user group 64 2014-05-30 14:53 0 -> /dev/pts/3 lrwx------ 1 user group 64 2014-05-30 14:53 1 -> /dev/pts/3 lrwx------ 1 user group 64 2014-05-30 14:53 2 -> /dev/pts/3 lr-x------ 1 user group 64 2014-05-30 14:53 3 -> /tmp/w/counter.txt (deleted) $ cat /proc/$(pidof tail)/fd/3 1 2 3 … 19 20
As we saw here, we can read the whole file (not just the new data constantly appended to it). We can even write to it. The shell running the for
loop writes to a file descriptor at a given position in the file, and that position is is only advanced when data is written through that file descriptor, so if we write some text at the end of the file, it will be overwritten by the loop when more numbers are written. We therefore have to write something long enough, and see the result quickly.
$ echo "very long string very long string" >> /proc/$(pidof tail)/fd/3; sleep 1; cat /proc/$(pidof tail)/fd/3 1 2 … 23 24 y long string very long string
The ls
command tells us that the file /proc/3457/fd/3
is a symbolic link (its mode starts with l
), but actually it is more than that, since if we copy that symbolic link around, it looses its "link to a deleted file" super-powers:
$ cp -a /proc/$(pidof tail)/fd/3 . $ ls -l 3 total 0 lrwxrwxrwx 1 user group 28 2014-05-30 15:23 3 -> /tmp/w/counter.txt (deleted) $ cat 3 cat: 3: No such file or directory
If we try to write to it, we can see it is actually a symbolic link to a file named /tmp/w/counter.txt (deleted)
:
$ ls -l "/tmp/w/counter.txt (deleted)" ls: cannot access /tmp/w/counter.txt (deleted): No such file or directory $ echo "very long string very long string" >> 3; cat /proc/$(pidof tail)/fd/3 1 2 … 50 51 # We didn't see the end of the very long string $ ls -l "/tmp/w/counter.txt (deleted)" $ cat "/tmp/w/counter.txt (deleted)" very long string very long string $ cat 3 very long string very long string
As we saw, the very long string
wasn't appended to the deleted file, but instead a file named /tmp/w/counter.txt (deleted)
was created and our string was written to it, and the symbolic link 3
now points to that file.
However, the symbolic link in /proc
has retained its magical properties, and is not pointing to the file /tmp/w/counter.txt (deleted)
but instead to the deleted /tmp/w/counter.txt
:
$ echo "very long string very long string" >> /proc/$(pidof tail)/fd/3; sleep 1; cat /proc/$(pidof tail)/fd/3 1 2 … 59 60 y long string very long string
The disk usage keeps incrementing as long as new data is appended to the deleted file, and the space is freed only once the process ends:
$ df -h . Filesystem Size Used Avail Use% Mounted on /dev/sda1 4.0G 2.0G 2.0G 75% / $ dd if=/dev/zero bs=1M > bigfile & $ df -h . Filesystem Size Used Avail Use% Mounted on /dev/sda1 4.0G 2.3G 1.7G 75% / $ rm bigfile & $ df -h . Filesystem Size Used Avail Use% Mounted on /dev/sda1 4.0G 2.6G 1.4G 75% / $ df -h . Filesystem Size Used Avail Use% Mounted on /dev/sda1 4.0G 2.9G 1.1G 75% / $ killall dd $ df -h . Filesystem Size Used Avail Use% Mounted on /dev/sda1 4.0G 2.0G 2.0G 75% /
You could try to experiment the interaction between open files and hard links. You will discover that when a file is deleted, /proc
considers it irremediably deleted, even if you had two hard links to the same data and use the second to recreate a hardlink identical to the one you deleted:
# In first terminal $ for i in $(seq 3600); do echo $i; sleep 1; done > counter.txt
# In second terminal $ tail -f counter.txt
# In third terminal $ ln counter.txt counter-hardlink.txt $ readlink /proc/$(pidof tail)/fd/3 /tmp/w/counter.txt $ rm counter.txt $ readlink /proc/$(pidof tail)/fd/3 /tmp/w/counter.txt (deleted) $ touch counter.txt $ readlink /proc/$(pidof tail)/fd/3 /tmp/w/counter.txt (deleted) $ rm counter.txt $ ln counter-hardlink.txt counter.txt $ readlink /proc/$(pidof tail)/fd/3 /tmp/w/counter.txt (deleted)