UNIX tutorial
Lecture Slides (Introduction to UNIX) 
Lecture Slides (Advanced UNIX) 
The objective of this tutorial is to promote understanding of basic UNIX commands and their implementation using a Command-Line Interface. MacOS and various flavors of Linux are UNIX-based operating systems.
The command-line interface is the single most powerful tool in a bioinformaticians toolbox. Some basic understanding of how to use the Command Line to move, modify and view files containing molecular and descriptive data is an absolute requirement for modern biological analysis, and mastery of the command line and its powerful tool set is highly recommended.
- Command Line trickery
- List a directory
- Navigating the directory structure
- Create a directory
- Change a directory
- Locate file in filesystem
- Viewing files
- Copying files
- Concatenate files
- Moving files
- Deleting files
- Deleting directories
- Searching files for a pattern
- Working with tabular files
- Combining commands into a pipeline
- Basic shell scripting
- Modifying text files using the command line
- Getting help
- Connecting to a remote computer
- Copying files between machines
- Text file conversion
- Change file permissions
- Uncompressing files and directories
- Emacs text editor
Working from the command line can be quite powerful. There are several key combinations that can be used to navigate within a command line entry or modify portions of a command that can be quite powerful. Becoming familiar with these key combinations will greatly enhance your ability to become a command line ninja.
- Up/Down Arrows: Pressing up or down will navigate you through your command line history. Try this out by typing the following command in this exact order. It doesn’t matter if you know what these command do yet, we will get to that in subsequent sessions. For now we are just trying to populate your command history:
- Navigating the command line: As you get more and more comfortable with using the command line you will begin to generate commands that are longer and unwieldily. The length of your commands will also increase when you want to modify multiple files, files with long names or files in complex directory structures. Taken together, a long and detailed command can become difficult to manage. Fortunately, the text of a command can be easily navigated by simple keyboard shortcuts. Of note, there are some differences between the standard linux shortcuts and Mac OS terminal short-cuts. Some of these are detailed below, but you may hear about a different shortcut from somewhere else and it may or may not work the same depending on your OS.
- Home and End: Most of you who have used a Windows machine will be familiar with what the Home and End keys on your keyboards do. They do the same thing in the command line that they do in Word or other Windows and Linux based programs in that they move you to the beginning of the command or the end of the command respectively. However, if you are on a Mac and using a standard Mac keyboard that does not come with a Home or End key you will need to use keyboard shortcuts to get the same effect. Pressing Ctrl-A will take you to the beginning of the command line (same as Home) and Ctrl-E will take you to the end.
This may sound relatively painful, but without designated Home and End keys it is a reasonable second option. The keys are easy to press with your left hand and they are also used by other Mac and Linux programs so becoming comfortable with them will help you in a variety of tasks. Of note, for those of you who type on iPads or iPhones using a bluetooth keyboard this key combination also works making navigating text much easier on this device.
- Ctrl+Left and Ctrl+Right: Similar to Home (Ctrl-A) and End (Ctrl-E) this shortcut will help you navigate text entered on the command line. While Home and End took you to the beginning and the end of a line, Ctrl+Left and Ctrl+Right take you between arguments on a command. Arguments are separated on the command line by a space. This keyboard shortcut can rapidly move you through a lengthy command with multiple arguments.
- Try it out by first typing the following (completely made up and nonsensical) command on the command line and navigating up and down the command.
- Ctrl+u: This clears the entire line to the left of the cursor. This is the fastest way to clear an entire command if your cursor is at the end.
- Ctrl+k: This deletes everything from the position of the cursor to the end of the line (everything to the right of the cursor)
- Ctrl+w: This deletes the the word before (to the left) of the cursor
- Ctrl+r
- Tab
To list all directories and files within a current directory, type:
ls
To see a detailed list of all directories and files sorted by creation date, type:
ls -lrt
Navigating the directory structure
To work with a file or directory, it is important to tell the system where you are. There are two ways to do that: the absolute and the relative path. The absolute path gives the whole location of the file, for example:
/home/darwin/Desktop/file
The relative path is described with the signs
“.” and “/”:
./ = current directory
../= parent directory of ./ (one directory up)
/ = root directory (top directory)
When no location is specified, UNIX assumes we mean “./”.
~ is your home directory. In Ubuntu Linux, this will be /home/username (for example /home/john if your user id is john)
To find out which directory you currently are in, type: pwd
This will show an output such as: /home/darwin/
To create a new directory, type:
mkdir directory1
This will create a directory called directory1
To go “down” one directory (in this example to go from darwin to directory1), type:
cd directory1
To go “up” one directory (in this example, to go from directory1 to darwin), type:
cd ..
Now cd to directory1, create directory2 in it and cd to directory2. To go “straight back” to your home directory from directory2, type:
cd
There are many ways to view a text file. One way for simple viewing is to type:
less
The less command displays as much of the file as can fit onto the screen. To scroll up and down within the document, use the arrow keys. Hitting the space bar will bring a new screen-full of information. To search forward in the file for a given pattern, enter:
/pattern
where “pattern” represents the character string you wish to find. To search for the next occurrence of the “pattern”, press:
n
To exit the less program and return to the promt, press:
q
Viewing part of a file
The head command displays the first few lines at the top of a file. But you can tell it how many lines to display. Example:
head -10 file
(shows the first 10 lines of that file) The tail command displays the last few lines of a file. By default tail will show the last ten lines of a file, but you can tell it how many lines to display. Example:
tail -10 file
(shows the last ten lines of that file)
Copying files
To copy a file using a terminal, use:
cp original-file new-file
This makes a new file that contains the same information as the original file with the name indicated.
cp can also be used to copy a file from one directory to another. For example:
cp ~/directory1/file ./newfile
To copy an entire directory, use the -r option:
cp -r dir-to-copy/ new-dir/
Note: on some computers cp -r does not copy dot-files (files that start with ‘.’, for example .bashrc). Also, you may run into trouble copying to or from directories on which you do not have appropriate permissions.
The cat (short for concatenate) command is a frequently used command. It is used to print the entire content of a file to your screen (use ctrl-c to break if you run cat on a too large file). It can also be used to concatenate text files together.
cat file1 file2 >file3
The contents of file1 and file2 are now in file3.
Check your directory and pick two gff files to concatenate into a file called concatenate.gff. Check the results using less and see that the content of both files are found in the newly generated concatenate.gff.
Moving files
On a UNIX file system, moving a file is the same as renaming it. The command mv works like:
mv file1 file2
Files can be moved from one location to another:
mv location1/file1 location2/file1
To move a file and also rename it, specify the name after the second location. For example:
mv ../new.fastq ./movedfile.fastq
rm is the command to remove a file. There are flags (options) that can be used with rm. To delete a file, type:
rm filename
Note that you can of course only remove the file once!
To delete a directory, type:
rm -r directory1
This will remove the directory and all the files and directories in it, so use this command carefully.
Count lines in a file
To count the number of lines in a file, type:
wc -l
The -l option restricts the output to only giving the number of lines.
Locate file in the filesystem
Used to find the location of files and directories
Note that you would need locate database to be generated for the locate command to work. If that is the case follow the instructions given by the locate program or ignore this part of the tutorial.
Type locate followed by the name of the file you are looking for:
locate
Grep is an acronym for General Regular Expression Print.
In other words, the grep command is excellent for searching for a particular pattern in a file and outputting the results to the screen. This pattern can be literal (exact characters), such as chr_1, but could also be a regular expression (often called regexp). Regular expressions are a powerful way to find and replace strings of a particular format. For example, regular expressions can be used to search for a pattern such as “chr_” followed by any number (e.g. chr_1 as well as chr_999 etc should match). Note: A regexp provides more flexibility in a search, but can be difficult to get right in the beginning so make sure to check that your results are what you expect them to be. It is highly recommended to put the pattern you are searching for within single quotes:
grep ‘pattern’ filename
For example to retrieve all lines matching fastq headers:
grep '@' fastqfile
Adding the option --color highlights the match on the screen.Try the grep again:
grep --color '@' fastqfile
Match a number in the fastq file:
grep -E '[0-9]' fastqfile
The -E option extends the regular expression capabilities of grep and [0-9] matches any digit.
Add the color option (--color as described above) to see what you matched. The color option must come right in front of the pattern you are searching for (in this case [0-9]).
To count how many matches grep got, you can use the -c option:
grep -E -c '[0-9]' fastqfile
To search specifically for headers having 4 digits in their id, you can use the “#” in your regexp:
grep -E -c '[0-9]{4}#' fastqfile
grep -E '[0-9]{4}#' fastqfile
Split large files
With the large data sizes typically generated in genomics, it can sometimes be an advantage to split a file into several smaller ones:
split -l 10000 fastqfile
will split the file called filename into 4 separate files with 10000 lines in each. The files will be starting with x (for example xaa). Check with ls that you have these files and cound the lines using
wc -l x*
Make sure that you split files so that the sequence, quality and header information are kept intact in the same file. You can check the split outputs by using tail:
tail x*
To generate a file containing a subset of columns from a tabular (tab) file you can use:
cut
The “-f” flag allows you to specify which field (i.e. column) to extract. To get the first column from the file:
cut –f 1 filename
to get the first and the third columns from the file :
cut –f 1,3 filename
Sorting columns of a tabular file can be useful for digesting large data outputs. For example running:
sort -k 3 file > file_sorted
will sort the lines alphabetically in hsscl.gff by the third column and outputs to hsscl.sorted. If you instead would like to sort it numerically, you need to use the -n option. The sort command is often used with the -u option to get the unique set. Alternatively, uniq could be used on a sorted list:
uniq -f 8 file_sorted
The -f option makes uniq compare column 8, which in the gff file is the sequence names. This way we can count how many unique sequences we have in the file.
Combining commands into a pipeline
UNIX lets you combine virtually any commands by “glueing” them together with the pipe symbol “|”. The output of the first command will be used as input of the next command. Combining grep and wc will give you the number of lines having a particular pattern:
grep pattern filename | wc -l
or generating a unique list from a file:
sort filename | uniq
(Note that you can also count by using the -c command in grep (i.e grep -c), but here we are illustrating how to combine commands).
Basic shell scripting
Another way to combine commands is to use a shell script. This way you can save the commands and settings for reuse on future datasets. One of the most common shell scripting languages is bash. Open emacs (or other suitable text editor). The first line of a bash script must contain the shebang line, which references the bash interpreter in the operating system:
#!/bin/bash
Modifying text files using the command line
You can use awk (or gawk) to search through a file and make changes.
awk 'condition {action}' inputfile
Another way to process files is to use a perl one liner. Perl is a programming language commonly used in bioinformatics. It is possible to run simple perl commands directly on the command line (perl one liners):
perl -ne < inputfile 'condition{action}'
While the Workshop is in session, one of your best sources of help are the teaching assistants and support staff. This page will still be accessible after the course, but contains help on only a few basic UNIX commands. Some material may not apply to your home systems. On most systems, typing the man command invokes information about commands. For example, to learn how cp works, type:
man cp
To exit the man page, hit:
q
Note: man pages are often a bit obtuse, but are worth reading nevertheless.
Connecting to a remote machine
ssh is a program to connect to another computer. For Linux and MacOS X, there are two ways of using ssh from a terminal:
ssh user@computer
or
ssh computer -l user
For example:
or
ssh workshop.org -l darwin
You will then be asked for your password. If you are using a notebook computer that does not already have an ssh client program, you can get one for free. If you are running MacOS X or Linux, simply open a terminal. If you are running Windows, try PuTTY.
Copying files between machines
Some of you may be familiar with FTP (File Transfer Protocol), which we do not recommend because it is a relatively insecure method of transferring files between computers. To transfer files between machines, you should use:
scp
This command works very similarly to cp, except that the files you are copying reside on different machines. To copy a file from a remote computer to the computer you are working at, type:
scp user@remote-computer:/remote/path/remote-file /local/path/file
You will then be prompted for the user’s password on the remote computer. After you enter it, the file will be copied. For example:
scp [email protected]:/home/darwin/Info/info /home/darwin/facts
This command will copy the file called ‘info’ that lives on bigbox.university.edu at /home/darwin/Info to /home/darwin on my current machine and will rename it ‘facts’. For those of you familiar with FTP, this is like ‘get remote-file local-file’. Note: if you do not specify a local name, scp will name the file the same name as the remote file. To copy a local file onto a remote computer, you type:
scp /local/path/local-file user@remote-computer:/remote/path/remote-file
For those of your familiar with FTP, this is like ‘put local-file remote-file’. Note: As with cp, you need to have read/execute permissions for the two files. Also, you should only specify one computer name (either the local host or the remote host). As with the cp command, you can specify the -r flag to recursively enter directories. Note: this does not work with Windows machines. scp works only between UNIX based operating systems. SSH File Transfer Protocol or SFTP is a network protocol that provides file transfer and manipulation functionality over any reliable data stream. To copy files from a remote computer to a local computer, first navigate to the local directory where you want to copy or retrieve files, then type:
sftp user@remote-computer
You will then be prompted for the user’s password on the remote computer. After you enter it, you will be logged into the user’s home directory on the remote computer. Use UNIX commands to navigate to the remote directory where you want copy or retrieve files. To copy a local file onto a remote computer, you type:
put local-file remote-file
To copy a remote file onto the local computer, you type:
get remote-file local-file
To copy multiple remote files onto the local computer, you type:
mget remote-file local-file
Text file conversion
When text files are created in Windows, they are given line endings formatted for DOS. These files are unable to be utilized by programs run in UNIX, and must be converted. There are many ways to acomplish this; however the simplest is by utilizing the following command, where you replace dosfile.txt with the name of your file, and unixfile.txt with the new file name:
tr -d ‘\15\32′ < dosfile.txt > unixfile.txt
Similarly, to convert from Mac line endings to UNIX, you type:
tr ‘\r’ ‘\n’ < macfile.txt > unixfile.txt
Change file permissions
chmod – change file access permissions
Change file permissions to make a file executable:
chmod +x filename
Uncompressing files and directories
tar : create tape archives and add or extract files. Example:
tar –zvxf tarfile.tar.gz
unzip : extract all files of the archive into the current directory and subdrectories:
unzip zippedfile.zip
gunzip : uncompress a gzip’d file:
gunzip gzippedfile.gz
Finally, open the text editor called emacs:
emacs
You should follow the tutorial by typing:
ctrl-h and then t
Go through the text and test the commands on the command line (remove the # when running the commands). See what happens at the prompt in the terminal window. If you like any of the results, you can add the various commands to your own ~/.bash_profile file on your system, but make sure to make a copy of the original .bash_profile before you start.

