Introduction to UNIX
Practicals and tape_archive.zip
The objective of this tutorial is to promote understanding of basic UNIX commands and their implementation using a Command-Line Interface. MacOS and various flavours of Linux are UNIX-based operating systems.
The command-line interface is the single most powerful tool in a bioinformaticians toolbox. Some basic understanding of how to use the Command Line to move, modify and view files containing molecular and descriptive data is an absolute requirement for modern biological analysis, and mastery of the command line and its powerful tool set is highly recommended.
Directory structure
To work with a file or directory, it is important to tell the system where you are. There are two ways to do that: the absolute and the relative path. The absolute path gives the whole location of the file, for example: /home/darwin/Desktop/file
The relative path is described with the signs “.” and “/”:
./ = current directory (When no location is specified, UNIX assumes “./”.)
../= parent directory of ./ (one directory up)
/ = root directory (top directory)
~ is your home directory. In Ubuntu Linux, this will be /home/username (for example /home/john if your user id is john)
Some basic commands
Commands are composed this way:
command option filename
Command |
Option |
Description |
pwd | print working directory | |
ls | list directory contents | |
-l | list contents in long format (with rights, size, modification…) | |
-a | list also hidden files/directories (e.g. .profile) | |
history | display command history | |
cd | directory | change directory |
mkdir | new_directory | make directory |
clear | clear the screen | |
exit | quit command line |
Rights and Permissions in UNIX
The permission system in UNIX and in Mac OSX is different than in Windows. As you login, you get rights to run programs and to create and delete files in your home directory. That’s sufficient for most of your work. But sooner or later (new program installation, USB drive un-/mounting…), you will need to get more rights, i.e. to become a superuser (root). A safe way is to get “temporary” rights just for one action, type sudo (superuser do)and the command you were not allowed to execute:
sudo command
If you know that you will need the superpower for more commands, type:
sudo su
but be aware that you can destroy your system. When you finish, become a “regular” user by typing:
exit
File permission is another example of special rights. When you work with files created by someone else it could happen that you are not allowed to change it. To display files permissions and sizes use:
ls -l
What does drwxrwxrwx mean?
The permission for a file, you can only read, looks really simple:
-r––—–
A simple command change the file (or directory) signature to give permissions to read and write for all groups:
chmod a+rw file
Command |
Option |
Description |
chmod | +rwx | add permissions to read-write-execute for you and groups |
-rwx | remove permissions to read-write-execute for you and groups | |
ugoa | permissions for user-group-other-all |
Copying, Moving, Renamingand Deleting files
On a UNIX file system, moving a file is the same as renaming it. The commands works like:
Command | Option | Description |
cp | -i | prompt before overwrite |
-n | do not overwrite an existing file | |
-r | copy directory with all files | |
-u | move only when the SOURCE file is newer or the destination file is missing | |
-v | explain what is being done (verbose mode) | |
mv | -i | prompt before overwrite |
-n | do not overwrite an existing file | |
-u | move only when the SOURCE file is newer or the destination file is missing | |
-v | explain what is being done (verbose mode) | |
rm | -i | prompt before evert removal |
-r | remove directory with all files | |
-v | explain what is being done (verbose mode) |
Examples
To copy a file using a terminal, use:
cp original-file new-file
This makes a new file that contains the same information as the original file with the name indicated. cp can also be used to copy a file from one directory to another. For example:
cp ~/directory1/file ./newfile
To copy an entire directory, use the -r option:
cp -rv dir-to-copy/ new-dir/
Note: on some computers cp -r does not copy dot-files (files that start with ‘.’, for example .bashrc). Also, you may run into trouble copying to or from directories on which you do not have appropriate permissions.
Exploring files
Command | Option | Description |
head | (-10) | view first 10 lines of the document |
tail | (-10) | view last 10 lines of the document |
less | view all document, page by page | |
cat | file | print whole content of the file on the screen |
file1 file2 > file3 | merge (concatenate) two files into third | |
wc | (-l,-w,-c) | print line, word and byte counts for each file |
There are many ways to view a text file. One way for simple viewing is to type:
less
The less command displays as much of the file as can fit onto the screen. To scroll up and down within the document, use the arrow keys. Hitting the space bar will bring a new screen-full of information. To search forward in the file for a given pattern, enter:
/pattern
where “pattern” represents the character string you wish to find. To search for the next occurrence of the “pattern”, press n. To exit the less program and return to the prompt, press q.
Locate file in the filesystem
Used to find the location of files and directories
Note that you would need locate database to be generated for the locate command to work. If that is the case follow the instructions given by the locate program or ignore this part of the tutorial.
Type locate followed by the name of the file you are looking for:
locate
Search files for a pattern
Grep is an acronym for General Regular Expression Print.
In other words, the grep command is excellent for searching for a particular pattern in a file and outputting the results to the screen. This pattern can be literal (exact characters), such as chr_1, but could also be a regular expression (often called regexp). Regular expressions are a powerful way to find and replace strings of a particular format. For example, regular expressions can be used to search for a pattern such as “chr_” followed by any number (e.g. chr_1 as well as chr_999 etc should match). Note: A regexp provides more flexibility in a search, but can be difficult to get right in the beginning so make sure to check that your results are what you expect them to be. It is highly recommended to put the pattern you are searching for within single quotes:
grep option ‘pattern’ filename
Command | Option/Pattern | Description |
grep | ‘pattern‘ | find a ‘pattern’ in the document |
–color | highlights the match on the screen | |
-E | extends the regular expression capabilities of grep | |
-E ‘[0-9]’ | [0-9] matches any digit in range (-E could be omitted) | |
-E ‘[a-z]’ | [a-z] matches any character in range (-E could be omitted) | |
-E ‘+‘ | give all not empty lines | |
-c | count lines with matches | |
-n | print the line number where the ‘pattern’ was found | |
-r | read all files in the directory | |
-w | find only ‘pattern’ standing as whole word |
For example to retrieve all lines matching fasta headers:
grep -E ‘>‘ filename
To count how many matches grep got, you can use the -c option:
grep -E -c ‘[0-9]’ filename
Split large files
With the large data sizes typically generated in genomics, it can sometimes be an advantage to split a file into several smaller ones:
split -l 10000 filename
Uncompressing files and directories
tar : create tape archives and add or extract files. Example:
tar –zvxf tarfile.tar.gz
unzip : extract all files of the archive into the current directory and subdrectories:
unzip zippedfile.zip
gunzip : uncompress a gzip’d file:
gunzipgzippedfile.gz
Command | Option/Pattern | Description |
tar | -A | append tar files to the archive |
-c | create a new archive | |
-r | append files to the end of the archive | |
-x | extract files from an existing archive | |
unzip | extract all files of the archive into the current directory and subdrectories | |
gunzip | uncompress a gzip’d file |
Combining commands into a pipeline
UNIX lets you combine virtually any commands by “glueing” them together with the pipe symbol “|”. The output of the first command will be used as input of the next command. Combining grep and wc will give you the number of lines having a particular pattern:
grep ‘pattern‘filename| wc -l
grep ‘pattern‘ filename > headers_filename
Navigating the command line: As you get more and more comfortable with using the command line you will begin to generate commands that are longer and unwieldily. The length of your commands will also increase when you want to modify multiple files, files with long names or files in complex directory structures. Taken together, a long and detailed command can become difficult to manage. Fortunately, the text of a command can be easily navigated by simple keyboard shortcuts. Of note, there are some differences between the standard linux shortcuts and Mac OS terminal short-cuts. Some of these are detailed below, but you may hear about a different shortcut from somewhere else and it may or may not work the same depending on your OS.
Hot-keys in Linux (not only) | Description |
Alt+Tab | jump to the next window (i.e. from Command Line to Firefox) |
Ctrl+Tab | jump to the next tab (e.g. in Firefox) |
Ctrl+N | creates a new window |
Ctrl+T | creates a new tab (different on the Command Line!!!) |
Ctrl+C, Ctrl+V | copy and paste (different on the Command Line!!!) |
Hot-keys on a Command Line | Description |
Arrow up/down | choose previous command |
Ctrl+Shift+T | creates new tab |
Ctrl+Shift+C | copy |
Ctrl+Shift+V | paste |
Ctrl+A or Home | take you to the beginning of the command line |
Ctrl+E or End | take you to the end of the command line |
Ctrl+Left Arrow | jump leftwards to beginning of the words |
Ctrl+Right Arrow | jump rightwards in the gap between words |
Ctrl+U | clears the entire line to the left of the cursor (this is the fastest way to clear an entire command if your cursor is at the end) |
Ctrl+W | delete the word before (to the left) of the cursor |
Ctrl+K | delete everything from the position of the cursor to the end of the line (everything to the right of the cursor) |
Ctrl+C | close/exit the program (!!! note the difference from the most common utilization !!!) |
Ctrl+Z | suspend/interrupt the program – user can later continue by typing the command ‘fg‘ (short for foreground) or by typing ‘bg‘ (short for background) |