CS 355 - Systems Programming:
Unix/Linux with C

How to Copy a File

Reference: Molay, Understanding Unix/Linux Programming, Chapter 2.6-2.8

The cp command

Our standard approach to studying a Unix system command:

  1. What does cp do?
  2. How does cp work?
  3. Can I write cp?

What does cp do?

Typical usage of cp to copy a file:

cp source_file target_file

Read the manual:

$ man cp
CP(1)           BSD General Commands Manual          CP(1)

NAME
     cp -- copy files

SYNOPSIS
     cp [-R [-H | -L | -P]] [-fi | -n] [-apvX] source_file target_file
     cp [-R [-H | -L | -P]] [-fi | -n] [-apvX] source_file ...
   target_directory

DESCRIPTION
     In the first synopsis form, the cp utility copies the contents of the
     source_file to the target_file. In the second synopsis form, the con-
     tents of each named source_file is copied to the destination
     target_directory. The names of the files themselves are not changed. If
     cp detects an attempt to copy a file to itself, the copy will fail.

. . .

How does cp do it?

cp uses two system calls in addition to open, read, and close:

int fd = creat(char *filename, mode t_mode);
Opens a file named filename for writing. If filename doesn't exist, the kernel creates it. If it does exist, the kernel discards its contents and truncates its file to 0.
ssize_t result = write(int fd, void *buf, size_t amt);
Copes amt bytes of data from the memory location at pointer buf to a file with descriptor fd. Returns the number of bytes actually written to the file, or -1 in case of failure.

Can I write cp?

The basic operation of cp:

open sourcefile for reading
open copyfile for writing
while not EOF
    read from source to buffer
    write from buffer to copy
close sourcefile
close copyfile

C source code for cp:

#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>

#define BUFFERSIZE 4096
#define COPYMODE 0644

int main(int ac, char *av[])
{
    int  in_fd, out_fd, n_chars;
    char buf[BUFFERSIZE];

    /* check arguments */
    if (ac != 3){
    	fprintf(stderr, "usage: %s source destination\n", *av);
        return 1;
    }

    /* open files */
    if ((in_fd=open(av[1], O_RDONLY)) == -1) {
        perror("Cannot open source file");
        return 1;
    }

    if ((out_fd=creat( av[2], COPYMODE)) == -1) {
        perror("Cannot creat destination file");
        return 1;
    }

    /* copy files */
    while ((n_chars = read(in_fd, buf, BUFFERSIZE)) > 0)
        if (write(out_fd, buf, n_chars) != n_chars) {
            perror("Write error");
            return 1;
    }

    if (n_chars == -1) {
        perror("read error");
        return 1;
    }

    /* close files	*/
    if (close(in_fd) == -1 || close(out_fd) == -1) {
        perror("Error closing file(s)	");
        return 1;
    }

    return 0;
}

Buffering

Consider the effect of the buffer size (BUFFERSIZE) on the number of read and write system calls. System calls are time consuming because jumping into and out of kernel takes time. The CPU runs in supervisor mode with a special stack and memory environment when executing kernel code and runs in user mode when executing user code.

Kernel buffering keeps copies of disk blocks in memory to reduce the time needed to write the data to the physical disk. It makes disk I/O faster, but it does not eliminate the need for buffering at the user code level.