From Here to There   

File Systems

A file system is simply a program to organize and access data on a storage medium — hard disk, DVD-ROM, or whatever. Examples are ext4 and nfs for Linux, VFAT and NTFS for Windows, ISO 9660 for CD and DVD optical disks, and so on.

In general, media can be partitioned for convenience, with each partition having a separate file system. On the medium, a typical organization is a hierarchical series of directories which organize data into discrete chunks called files. None of these methods are standardized; each file system type can only be read and written by its own program, usually installed as a device driver on a particular operating system.

Application programs do not access media; instead they send read and write requests to the appropriate file system driver.

Hard Links

In a Unix/Linux filesystem, we must distinguish between a file name, an inode, and the data.

File Name
File names are kept in tables of the form
filename1   inodeX
filename2   inodeY
.
.
.
Each such table is called a directory. When a file name is referenced, it is immediately converted by Linux to the corresponding inode number, and the file name is discarded.
Inode
An inode (index node) is referred to by a unique (within the file system) integer. Each inode contains the "metadata" about a file — the owner, permissions, dates, link count (see below), and most importantly, pointers to the physical location(s) of the data on the disk. It does not contain the file name. Each set of data on disk (a file or directory) corresponds to exactly one inode, and vice versa.
Data
The data resides in sectors or blocks of the medium. A large file will use many blocks, which do not have to be located consecutively. (This scattering is called disk fragmentation.) The data’s inode has a pointer to each and every data block, in logical order.

It is perfectly possible for multiple file names (perhaps in multiple directories) to point to the same inode; each name then refers to the same area on disk. These are usually called hard links. Since inodes are unique to a particular file system (i.e. partition), hard links can only exist within a single partition. Although it is theoretically possible, Linux file systems do not permit directories to be hard-linked to other directories.

Every time a filename is created, the link count in the inode is increased by one. Deleting a file by name is more properly called "unlinking"— the link count is decremented by one and the filename is removed from its directory. If and only if the link count becomes zero, the data area on disk is returned to the "available space pool" where it can be reused (rewritten) in the future.

There is no concept of the "first" or "second" linked file names; they are treated identically.



File Commands

rm

Delete a directory entry to a file. If the inode’s link count goes go zero, delete the data.

rm file1 [...]


rm -r

Recursively delete a directory, its subdirectories, and all files inside. This is the only way to delete non-empty directories.

rm -r dir1 [...]


rmdir

Delete an empty directory. (rmdir cannot remove a non-empty directory even using sudo; use rm -r instead.

rmdir dir1 [...]


touch

create a new zero-length file
touch newfile [...]
change the date of an existing file.  the default is now
touch oldfile [...]


cp

Copies and possibly renames files — or with the -a (archive) option, directories — from one location to another. Always physically writes data, creating a new directory entry and inode.

#copy and rename a file
cp file1 /somewhere/file2    

#copy files to a directory, which must exist
cp file1 /somewhere/file2  

#copy entire directory.  Create target if it does not exist.
cp -a dir1 /somewhere/dir2   


mv

Moves and possibly renames files or directories from one location to another.

#move to another directory
mv file1 /somewhere/file1   

#move and rename
mv file1 /somewhere/file2

#rename without moving
mv file1 file2   

#move into another file system.
Physically, copy and delete
mv file1 /anothersys/file2           


ln

Creates a hard link — another file name pointing to an existing inode; the data now has an extra name. Since it uses an existing inode, it only works within a file system. As mentioned above, when deleting a file with more than one link, the data doesn’t “really” go away until you have deleted all the directory entries (file names) for that inode.

#linkname must be in same file system
ln file1 linkname  


ln -s

Creates a symbolic (or soft) link to another file or directory. Symbolic links are no relation whatsoever to hard links! A symlink is actually a regular file containing a text string, and when referenced in a command its name is replaced by the string (like a text editor’s “replace” function) before the command is executed. Therefore, the string does not have to be the name of something that actually exists. (If it isn’t, the command presumably will fail.) A soft link can be the name of a directory or file, in the same or different file system, or even on a different computer.

#linkname1 points to file1 in directory someloc
ln -s /someloc/file1 linkname1 

#linkname2 points to a directory on another computer
ln -s joe@somewhere:/anotherloc/dir2/ linkname2  


rsync

Efficiently copies files and directories — a file is copied only if it does not exist on the target or if it is newer than the target version, in which case only the new sections of the file are physically copied. The target can be anywhere. Rsync has many options; the most common are -a -u -v, standing for archive, update, verbose.

#copy contents of dir1 into dir2
dir1/ is the equivalent of dir/*
rsync -auv dir1/ dir2   

#copy dir1 itself; 
create new directory dir2/dir1 if necessary
rsync -auv dir1  dir2   

#copy to remote site; don't copy backup files.
rsync -auv ~/words/ --exclude="*.bak" dierdorf@prismnet.com:public-web"

Rsync is a very fast way of creating an “additive” backup; files you subsequently delete are still there in the backup. If you want “old” files to be deleted in the destination, use the --del option.


unison

Synchronizes two directories. Copies files in both directions if necessary to make the directories identical (older versions of files are replaced by newer wherever they are.) Keeps a data base of file names and dates, so it can recognize where both versions of a file have changed since the last sync or where a file has been deleted in one place but not the other. If both versions of a file have changed, you can compare them to decide which one you want.

# command line
unison     firstdir seconddir

#graphical -- use if possible
unison-gtk firstdir seconddir 

For example, I keep the Linux SIG directory synchronized between my laptop and desktop computer, using the command

ssh gw -X "unison-gtk -times /sony/lbsig ~/lbsig"

In this command, ssh gw -X says to run the command on the “GW” machine in graphics mode, -times says to preserve file creation times, and /sony/lbsig and ~/lbsig are the directories as seen from GW!

Unison is not installed by default — use

sudo apt-get install unison unison-gtk 
to retrieve it from the repository.

BTW, the Unison program is available for Windows and OSX, so it is possible to sync between a Win and Linux machine, for example.


dd

“Low-level” (bit-for-bit) copy. Dangerous, because it can wipe out a file system, but very useful, because it can back up file systems, duplicate hard drives, etc. Be very sure not to confuse if= (input file) and of= (output file)!!! Copies all bytes of if into of unless count= is used.

# simple copy
dd if=source       of=dest

#copies zero bytes into dest
Note -- since /dev/zero is infinitely long, this will create a  
dest file using all remaining space in the partition!
dd if=/dev/zero    of=dest
                  
# writes 1024 random bytes (2 blocks of 512)
dd if=/dev/urandom of=dest   count=2
         
#sets block size to 1 byte, zeroes first 400 bytes
dd if=/dev/zero    of=dest   count=400 bs=1
  
#wipes a partition
sudo dd if=/dev/zero    of=/dev/sdb2
             
#wipes an entire hard drive
sudo dd if=/dev/zero    of=/dev/sdb
              
#duplicates one drive onto another (same size)
sudo dd if=/dev/sdb     of=/dev/sdc   
           
#creates an ISO image of a CD or DVD
dd if=/dev/cdrom   of=mystuff.iso
Note that sudo is necessary if you don’t have write permission on the destination. Other than that, there’s no warning whatsoever if you decide to commit suicide. Consider the command
sudo dd if=/dev/zero of=/
which will silently wipe out your current Linux partition.

scp

Secure copy to or from a remote site — the data is encrypted during the transfer. Part of the ssh (Secure Shell) suite.

#transfer a file
scp  myfile dierdorf@prismnet.com:privatedir

#transfer and rename
scp  dierdorf@prismnet.com:public-web/index.html  index.html.bak
Note that rsync and unison both use scp to transfer files between computers.








Sharing Data Across a Network    

Sharing Linux-to-Linux — NFS

The most common way to mount a “foreign” Linux file system on your computer is using NFS (Network File System).

  1. Make sure nfs-kernel-server and nfs-common packages are installed. (Ubuntu doesn’t install them by default.)
    sudo apt-get install nfs-common nfs-kernel-server
  2. Edit /etc/hosts.allow to have a line such as:
    ALL: othercomp 192.168.0.111 dell sony ...
    listing computers that you are willing to allow access to THIS computer. You can use either an IP address or a DNS hostname.
  3. Edit /etc/exports to have lines like these for each file system you’re willing to export and the computer it can be exported to:
    /       dell(rw,sync)
    /extra  dell(rw,sync)
    /images dell(rw,sync)
    #
    /       sony(rw,sync)
    /extra  sony(rw,sync)
    /images sony(rw,sync)
    
    In this case, I’m allowing three different file systems (root, extra, and images) to be exported to computers sony and dell.

    Note that if you are on mycomp and want to access othercomp, you have to change othercomp’s /etc/hosts.allow and /etc/exports files to contain “mycomp”.

  4. Reboot all computers to have everything take effect.
  5. At this point you should be able to mount the other computer’s directories:
    sudo mount -t nfs othercomp:/home/dierdorf mymountpoint
    

    It’s easier to put entries in your /etc/fstab file:

    dell:/home/dierdorf    /dell   nfs  relatime,users,noauto  0 0
    sony:/home/dierdorf    /sony   nfs  relatime,users,noauto  0 0
    

    where /dell and /sony are the mountpoints. The noauto option means they will not be auto-mounted at boot, since they might not be available on the network at the time. When you decide you need one, just say:

    mount /sony

    or whatever. (You don’t need to use sudo because of that users option.)

Sharing Linux-to-Linux—SSHFS

Although I usually use NFS, there is a second way of mounting one Linux system’s file systems on another Linux machine— The Secure Shell File System, or sshfs. It works pretty much the same way as NFS, except the mount and umount commands are different. Assuming you have sshfs installed (it isn’t by default), then the commands are:


Mount
sshfs yourfilesys mymountpoint

Unmount
fusermount -u mymountpoint

/etc/hosts.allow and /etc/exports should be set up to give permissions as above.

By the way, since Linux and OSX are both “Unix-like” operating systems, either NFS or SSHFS can be used to link file systems between Linux and Apple systems.







Linux and Windows   

Using Samba

I’m going to be working with a home network with five machines. Their URLs are:

gw    my desktop  Linux 12.10
dell  old laptop  Linux 12.10
sony  new laptop  Linux 12.10
becky desktop     Linux 12.10
photo desktop     Windows 7
I use NFS connections for Linux-to-Linux, but interconnecting with Windows requires interfacing with the Windows Workgroup facility formerly called SMB and now called (at least by Microsoft) CIFS. The Linux (or OS-X) tool for this is called Samba.

Step by Step