Handling Files   

File Systems (from Feb 2012)

A file system is simply a program to organize and access data on a storage medium — hard disk, DVD-ROM, or whatever. Examples are ext4 and nfs for Linux, VFAT and NTFS for Windows, ISO 9660 for CD and DVD optical disks, and so on.

In general, media can be partitioned for convenience, with each partition having a separate file system. On the medium, a typical organization is a hierarchical series of directories which organize data into discrete chunks called files. None of these methods are standardized; each file system type can only be read and written by its own program, usually installed as a device driver on a particular operating system.

Application programs do not access media; instead they send read and write requests to the appropriate file system driver.

Hard Links

In a Unix/Linux filesystem, we must distinguish between a file name, an inode, and the data.

File Name
File names are kept in tables of the form
filename1   inodeX
filename2   inodeY
.
.
.
Each such table is called a directory. When a file name is referenced, it is immediately converted by Linux to the corresponding inode number, and the file name is discarded.
Inode
An inode (index node) is referred to by a unique (within the file system) integer. Each inode contains the "metadata" about a file — the owner, permissions, dates, link count (see below), and most importantly, pointers to the physical location(s) of the data on the disk. It does not contain the file name. Each set of data on disk (a file or directory) corresponds to exactly one inode, and vice versa.
Data
The data resides in sectors or blocks of the medium. A large file will use many blocks, which do not have to be located consecutively. (This scattering is called disk fragmentation.) The data’s inode has a pointer to each and every data block, in logical order.

It is perfectly possible for multiple file names (perhaps in multiple directories) to point to the same inode; each name then refers to the same area on disk. These are usually called hard links. Since inodes are unique to a particular file system (i.e. partition), hard links can only exist within a single partition. Although it is theoretically possible, Linux file systems do not permit directories to be hard-linked to other directories.

Every time a filename is created, the link count in the inode is increased by one. Deleting a file by name is more properly called "unlinking"— the link count is decremented by one and the filename is removed from its directory. If and only if the link count becomes zero, the data area on disk is returned to the "available space pool" where it can be reused (rewritten) in the future.

There is no concept of the "first" or "second" linked file names; they are treated identically.



File Commands

rm

Delete a directory entry to a file. If the inode’s link count goes go zero, delete the data.

rm file1 [...]


rm -r

Recursively delete a directory, its subdirectories, and all files inside.

rm -r dir1 [...]


rmdir

Delete an empty directory. (rmdir cannot remove a non-empty directory even using sudo; use rm -r instead.

rmdir dir1 [...]


touch

create a new zero-length file
touch newfile [...]
change the date of an existing file.  the default is now
touch oldfile [...]


cp

Copies and possibly renames files — or with the -a (archive) option, directories — from one location to another. Always physically writes data, creating a new directory entry and inode.

#copy and rename a file
cp file1 /somewhere/file2    

#copy files to a directory, which must exist
cp file1 /somewhere/file2  

#copy entire directory.  Create target if it does not exist.
cp -a dir1 /somewhere/dir2   


mv

Moves and possibly renames files or directories from one location to another.

#move to another directory
mv file1 /somewhere/file1   

#move and rename
mv file1 /somewhere/file2

#rename without moving
mv file1 file2   

#move into another file system.
Physically, copy and delete
mv file1 /anothersys/file2           


ln

Creates a hard link — another file name pointing to an existing inode; the data now has an extra name. Since it uses an existing inode, it only works within a file system. As mentioned above, when deleting a file with more than one link, the data doesn’t “really” go away until you have deleted all the directory entries (file names) for that inode.

#linkname must be in same file system
ln file1 linkname  


ln -s

Creates a symbolic (or soft) link to another file or directory. Symbolic links are no relation whatsoever to hard links! A symlink is actually a regular file containing a text string, and when referenced in a command its name is replaced by the string (like a text editor’s “replace” function) before the command is executed. Therefore, the string does not have to be the name of something that actually exists. (If it isn’t, the command presumably will fail.) A soft link can be the name of a directory or file, in the same or different file system, or even on a different computer.

#linkname1 points to file1 in directory someloc
ln -s /someloc/file1 linkname1 

#linkname2 points to a directory on another computer
ln -s joe@somewhere:/someloc/dir2/ linkname2  


rsync

Efficiently copies files and directories — a file is copied only if it does not exist on the target or if it is newer than the target version, in which case only the new sections of the file are physically copied. The target can be anywhere. Rsync has many options; the most common are -auv, standing for archive, update, verbose.

#copy contents of dir1 into dir2
rsync -auv dir1/ dir2   

#create new directory dir2/dir1 if necessary
rsync -auv dir1  dir2   

#copy to remote site; don't copy backup files.
rsync -auv ~/words/ --exclude="*.bak" dierdorf@prismnet.com:public-web"

Rsync is a very fast way of creating an “additive” backup; files you subsequently delete are still there in the backup.



unison

Synchronizes two directories. Copies files in both directions if necessary to make the directories identical (older versions of files are replaced by newer wherever they are.) Keeps a data base of file names and dates, so it can recognize where both versions of a file have changed since the last sync or where a file has been deleted in one place but not the other.

# command line
unison     firstdir seconddir

#graphical -- use if possible
unison-gtk firstdir seconddir 

For example, I keep the Linux SIG directory synchronized between my laptop and desktop computer.

Unison is not installed by default — use

sudo apt-get install unison unison-gtk
to retrieve it from the repository.

dd

“Low-level” (bit-for-bit) copy. Dangerous, because it can wipe out a file system, but very useful, because it can back up file systems, duplicate hard drives, etc. Be very sure not to confuse if= (input file) and of= (output file)!!! Copies all bytes of if into of unless count= is used.

# simple copy
dd if=source       of=dest

#copies zero bytes into dest
Note -- since /dev/zero is infinitely long, this will create a  
dest file using all remaining space in the partition!
dd if=/dev/zero    of=dest
                  
# writes 1024 random bytes (2 blocks of 512)
dd if=/dev/urandom of=dest   count=2
         
#sets block size to 1 byte, zeroes first 400 bytes
dd if=/dev/zero    of=dest   count=400 bs=1
  
#wipes a partition
dd if=/dev/zero    of=/dev/sdb2
             
#wipes an entire hard drive
dd if=/dev/zero    of=/dev/sdb
              
#duplicates one drive onto another (same size)
dd if=/dev/sdb     of=/dev/sdc   
           
#creates an ISO image of a CD or DVD
dd if=/dev/cdrom   of=mystuff.iso


scp

Secure copy to or from a remote site — the data is encrypted during the transfer. Part of the ssh suite.

#transfer a file
scp  myfile dierdorf@prismnet.com:privatedir

#transfer and rename
scp  dierdorf@prismnet.com:public-web/index.html  index.html.bak

Last modified: Tue Jul 10 17:56:41 CDT 2012