Modern biological research projects regularly employee techniques capable of generating extremely large data sets. Specifically, microbiome investigations utilize amplicon surveys (16S rRNA, ITS, or 18S rRNA gene sequence) or metagenomic approaches to assess microbial ecology and gene expression studies take advantage of RNA-seq technology to identify differential gene regulation. Analysis of data resulting from any of these techniques requires proficiency in computational (UNIX, R) and statistical (exploratory data analysis, hypothesis testing, uni- and multivariant analysis) techniques.

The approach for analysis of both microbiome and gene expression analysis begins with appropriate understanding of the study design and metadata, proceeds through a process of data quality control and filtering, quantifies this filtered data and ultimately results in the production of tabular count data. In the case of microbiome studies, this is a table of each microbial taxa per sample and for gene expression studies a table of transcript counts per sample. Additional data types (e.g. taxonomic assignments, gene functional information) may also be created during this process and ultimately associated with or merged with the count table. These data types can then be explored using various plots and interrogated using statistical techniques. This Workshop provides instruction for how to proceed through each of these stages providing a strong foundation for working with count data and subsequent statistical analysis, plotting and interpretation.

Requirements: Students are required to bring their own laptops to participate in the course. All software and data will be provided and managed by the course organizers. No previous experience in bioinformatics is needed.

Collaborating Institutions:

Harvard University Center for AIDS Research (CFAR)

Ragon Institute

Sub-Saharan African Network for TB/HIV Research Excellence (SANTHE)

Centre for the AIDS Program of Research in South Africa (CAPRISA)

Organizing Team:

Scott Handley, Washington University

Doug Kwon, Ragon Institute

Matt Hayward, Harvard University

Barry L. Hykes, Washington University

Chandni Desai, Washington University

KRISP Teaching Assistants

Eduan Wilkinson

Dennis Maletich Junqueira

San Emmanuel James


Date Day Time Presenter Topic Location
8 Oct Monday 9a – 12p Sophie Shaw Introduction to Unix AHRI Seminar Rooms 1&2
Monday 2p – 5p Sophie Shaw Introduction to Sequencing Data and Quality Control

Quality Control Exercises

AHRI Seminar Rooms 1&2
9 Oct Tuesday 9a – 12p Lindsay Droit Building Successful Sequencing Libraries AHRI Seminar Rooms 1&2
Tuesday 2p – 5p Scott Handley Introduction to Data Science with R slides exercise AHRI Seminar Rooms 1&2
10 Oct Wednesday 9a – 12p Scott Handley Preprocessing Microbiome Data for Quantitative Microbiome Analysis slides AHRI Seminar Rooms 1&2
Wednesday 2p – 5p Scott Handley Processing Microbiome Data for Quantitative Microbiome Analysis exercise AHRI Seminar Rooms 1&2
11 Oct Thursday 9a – 12p Chandni Desai Host Differential Gene Expression Analysis slides Onomo Hotel
Thursday 2p – 5p Chandni Desai & Barry Hykes Host Differential Gene Expression Analysis Exercise  Onomo Hotel
12 Oct Friday 9a – 12p Matt Hayward Metagenomic Analysis  Onomo Hotel
Friday 2p – 5p Everyone Open Lab

Transferring Data/Using AWS

Feedback Survey

 Onomo Hotel
Friday 5p – 6p Everyone Reception  Onomo Hotel