Genome data analysis in Python

A brief tutorial on the use of jupyter notebooks and the python data analysis library pandas for genomic data analysis.

Workshop on Population and Speciation Genomics, Český Krumlov, January 2020.
By Hannes Svardal (hannes.svardal@uantwerpen.be)

Jupyter notebooks can run locally or on a server. You access them in your browser. Here you will run a jupyter notebook on your amazon cloud instance (AMI).

To start the jupyter server

  • First, you need to get a terminal on your amazon cloud instance (AMI). You can do that in two ways.
    • Either, use guacamole. In your web browser, go to the address
    • Or use ssh from your terminal
  • Navigate into the tutorial directory: cd ~/workshop_materials/20_python_jupyter_pandas/
  • Start a screen session by typing: screen
  • Confirm with Return
  • Start the conda virtual environment: conda activate conda (we created a conda environment that contains required python packages)
  • Start the notebook server: jupyter notebook --port=8888
  • The command blocks the terminal. That is normal. Keep it running. You can get back to a functional terminal by typing Ctrl + a, d (first Ctrl + a, then d)
  • In your local browser, navigate to the web address: http://c2-XXX-XXX-XXX-XXX.compute-1.amazonaws.com:8888 (replace XXX with your Amazon instance IP address, see above)
  • You will see the folder contents. Click on 202001_jupyter_pandas_tutorial.ipynb to open the notebook and start a python kernel.
  • Follow the exercises in the jupyter notebook. If you have questions please ask.
  • You can download the whole material from this tutorial as a zip file here

Running the tutorial after the workshop, on your local machine

If you have python and jupyter installed, you can simply run the notebook in the following way: