Introduction to UNIX

Practicals and tape_archive.zip

The objective of this tutorial is to promote understanding of basic UNIX commands and their implementation using a Command-Line Interface. MacOS and various flavours of Linux are UNIX-based operating systems.

The command-line interface is the single most powerful tool in a bioinformaticians toolbox. Some basic understanding of how to use the Command Line to move, modify and view files containing molecular and descriptive data is an absolute requirement for modern biological analysis, and mastery of the command line and its powerful tool set is highly recommended.

 

Directory structure

To work with a file or directory, it is important to tell the system where you are. There are two ways to do that: the absolute and the relative path. The absolute path gives the whole location of the file, for example: /home/darwin/Desktop/file

The relative path is described with the signs “.” and “/”:

./ = current directory (When no location is specified, UNIX assumes “./”.)

../= parent directory of ./ (one directory up)

/ = root directory (top directory)

~ is your home directory. In Ubuntu Linux, this will be /home/username (for example /home/john if your user id is john)

 

Some basic commands

Commands are composed this way:

command option filename

 

Command

Option

Description

pwd print working directory
ls list directory contents
-l list contents in long format (with rights, size, modification…)
-a list also hidden files/directories (e.g. .profile)
history display command history
cd directory change directory
mkdir new_directory make directory
clear clear the screen
exit quit command line

 

Rights and Permissions in UNIX

The permission system in UNIX and in Mac OSX is different than in Windows. As you login, you get rights to run programs and to create and delete files in your home directory. That’s sufficient for most of your work. But sooner or later (new program installation, USB drive un-/mounting…), you will need to get more rights, i.e. to become a superuser (root). A safe way is to get “temporary” rights just for one action, type sudo (superuser do)and the command you were not allowed to execute:

sudo command

If you know that you will need the superpower for more commands, type:

sudo su

but be aware that you can destroy your system. When you finish, become a “regular” user by typing:

exit

File permission is another example of special rights. When you work with files created by someone else it could happen that you are not allowed to change it. To display files permissions and sizes use:

ls -l

What does drwxrwxrwx mean?

The permission for a file, you can only read, looks really simple:

-r–—–

A simple command change the file (or directory) signature to give permissions to read and write for all groups:

chmod a+rw file

 

Command

Option

Description

chmod +rwx add permissions to read-write-execute for you and groups
-rwx remove permissions to read-write-execute for you and groups
ugoa permissions for user-group-other-all

 

Copying, Moving, Renamingand Deleting files

On a UNIX file system, moving a file is the same as renaming it. The commands works like:

 

Command Option Description
cp -i prompt before overwrite
-n do not overwrite an existing file
-r copy directory with all files
-u move only when the SOURCE file is newer or the destination file is missing
-v explain what is being done (verbose mode)
mv -i prompt before overwrite
-n do not overwrite an existing file
-u move only when the SOURCE file is newer or the destination file is missing
-v explain what is being done (verbose mode)
rm -i prompt before evert removal
-r remove directory with all files
-v explain what is being done (verbose mode)

Examples

To copy a file using a terminal, use:

cp original-file new-file

This makes a new file that contains the same information as the original file with the name indicated.
cp can also be used to copy a file from one directory to another. For example:

cp  ~/directory1/file ./newfile

To copy an entire directory, use the -r option:

cp -rv dir-to-copy/ new-dir/

Note: on some computers cp -r does not copy dot-files (files that start with ‘.’, for example .bashrc). Also, you may run into trouble copying to or from directories on which you do not have appropriate permissions.

 

Exploring files

Command Option Description
head (-10) view first 10 lines of the document
tail (-10) view last 10 lines of the document
less view all document, page by page
cat file print whole content of the file on the screen
file1 file2 > file3 merge (concatenate) two files into third
wc (-l,-w,-c) print line, word and byte counts for each file

 

There are many ways to view a text file. One way for simple viewing is to type:

less

The less command displays as much of the file as can fit onto the screen. To scroll up and down within the document, use the arrow keys. Hitting the space bar will bring a new screen-full of information. To search forward in the file for a given pattern, enter:

/pattern

where “pattern” represents the character string you wish to find. To search for the next occurrence of the “pattern”, press n. To exit the less program and return to the prompt, press q.

 

Locate file in the filesystem

Used to find the location of files and directories

Note that you would need locate database to be generated for the locate command to work. If that is the case follow the instructions given by the locate program or ignore this part of the tutorial.

Type locate followed by the name of the file you are looking for:

locate

Search files for a pattern

Grep is an acronym for General Regular Expression Print.

In other words, the grep command is excellent for searching for a particular pattern in a file and outputting the results to the screen. This pattern can be literal (exact characters), such as chr_1, but could also be a regular expression (often called regexp). Regular expressions are a powerful way to find and replace strings of a particular format. For example, regular expressions can be used to search for a pattern such as “chr_” followed by any number (e.g. chr_1 as well as chr_999 etc should match). Note: A regexp provides more flexibility in a search, but can be difficult to get right in the beginning so make sure to check that your results are what you expect them to be. It is highly recommended to put the pattern you are searching for within single quotes:

grep option ‘pattern’ filename

Command Option/Pattern Description
grep pattern find a ‘pattern’ in the document
–color highlights the match on the screen
-E extends the regular expression capabilities of grep
-E ‘[0-9]’ [0-9] matches any digit in range (-E could be omitted)
-E ‘[a-z]’ [a-z] matches any character in range (-E could be omitted)
-E + give all not empty lines
-c count lines with matches
-n print the line number where the ‘pattern’ was found
-r read all files in the directory
-w find only ‘pattern’ standing as whole word

 

For example to retrieve all lines matching fasta headers:

grep -E > filename

To count how many matches grep got, you can use the -c option:

grep -E -c ‘[0-9]’ filename

 

Split large files

With the large data sizes typically generated in genomics, it can sometimes be an advantage to split a file into several smaller ones:

split -l 10000 filename

 

Uncompressing files and directories

tar : create tape archives and add or extract files. Example:

tar –zvxf tarfile.tar.gz

unzip : extract all files of the archive into the current directory and subdrectories:

unzip zippedfile.zip

gunzip : uncompress a gzip’d file:

gunzipgzippedfile.gz

 

Command Option/Pattern Description
tar -A append tar files to the archive
-c create a new archive
-r append files to the end of the archive
-x extract files from an existing archive
unzip extract all files of the archive into the current directory and subdrectories
gunzip uncompress a gzip’d file

 

 

Combining commands into a pipeline

UNIX lets you combine virtually any commands by “glueing” them together with the pipe symbol “|”. The output of the first command will be used as input of the next command. Combining grep and wc will give you the number of lines having a particular pattern:

grep patternfilename| wc -l

grep pattern filename > headers_filename

 

Navigating the command line: As you get more and more comfortable with using the command line you will begin to generate commands that are longer and unwieldily. The length of your commands will also increase when you want to modify multiple files, files with long names or files in complex directory structures. Taken together, a long and detailed command can become difficult to manage. Fortunately, the text of a command can be easily navigated by simple keyboard shortcuts. Of note, there are some differences between the standard linux shortcuts and Mac OS terminal short-cuts. Some of these are detailed below, but you may hear about a different shortcut from somewhere else and it may or may not work the same depending on your OS.

 

Hot-keys in Linux (not only) Description
Alt+Tab jump to the next window (i.e. from Command Line to Firefox)
Ctrl+Tab jump to the next tab (e.g. in Firefox)
Ctrl+N creates a new window
Ctrl+T creates a new tab (different on the Command Line!!!)
Ctrl+C, Ctrl+V copy and paste (different on the Command Line!!!)

 

Hot-keys on a Command Line Description
Arrow up/down choose previous command
Ctrl+Shift+T creates new tab
Ctrl+Shift+C copy
Ctrl+Shift+V paste
Ctrl+A or Home take you to the beginning of the command line
Ctrl+E or End take you to the end of the command line
Ctrl+Left Arrow jump leftwards to beginning of the words
Ctrl+Right Arrow jump rightwards in the gap between words
Ctrl+U clears the entire line to the left of the cursor (this is the fastest way to clear an entire command if your cursor is at the end)
Ctrl+W delete the word before (to the left) of the cursor
Ctrl+K delete everything from the position of the cursor to the end of the line (everything to the right of the cursor)
Ctrl+C close/exit the program (!!! note the difference from the most common utilization !!!)
Ctrl+Z suspend/interrupt the program – user can later continue by typing the command ‘fg‘ (short for foreground) or by typing ‘bg‘ (short for background)