Modern biological research projects regularly employee techniques capable of generating extremely large data sets. Specifically, microbiome investigations utilize amplicon surveys (16S rRNA, ITS, or 18S rRNA gene sequence) or metagenomic approaches to assess microbial ecology and gene expression studies take advantage of RNA-seq technology to identify differential gene regulation. Analysis of data resulting from any of these techniques requires proficiency in computational (UNIX, R) and statistical (exploratory data analysis, hypothesis testing, uni- and multivariant analysis) techniques.

The approach for analysis of both microbiome and gene expression analysis begins with appropriate understanding of the study design and metadata, proceeds through a process of data quality control and filtering, quantifies this filtered data and ultimately results in the production of tabular count data. In the case of microbiome studies, this is a table of each microbial taxa per sample and for gene expression studies a table of transcript counts per sample. Additional data types (e.g. taxonomic assignments, gene functional information) may also be created during this process and ultimately associated with or merged with the count table. These data types can then be explored using various plots and interrogated using statistical techniques. This Workshop provides instruction for how to proceed through each of these stages providing a strong foundation for working with count data and subsequent statistical analysis, plotting and interpretation.


Collaborating Institutions:

Harvard University Center for AIDS Research (CFAR)

Ragon Institute

Sub-Saharan African Network for TB/HIV Research Excellence (SANTHE)

Centre for the AIDS Program of Research in South Africa (CAPRISA)


Organizing Team:

Scott Handley, Washington University

Doug Kwon, Ragon Institute

Teaching Assistants:

Matt Hayward, Harvard University

Barry L. Hykes, Washington University

Chandni Desai, Washington University

 

SCHEDULE

Week 1 : 6-12 January, 2019

Date Day Time Presenter Topic Location
14 Oct Sunday 6p – 10p Everyone Reception TBD
15 Oct Monday 9a – 12p Sophie Shaw Introduction to UNIX TBD
Monday 2p – 5p Scott Handley Introduction to Data Science with R Part 1 TBD
16 Oct Tuesday 9a – 12p Lindsay Droit Building Successful Sequencing Libraries TBD
Tuesday 2p – 5p Scott Handley Introduction to Data Science with R Part 2 TBD
17 Oct Wednesday 9a – 12p Scott Handley Preprocessing Microbiome Data for Quantitative Microbiome Analysis (lecture) TBD
Wednesday 2p – 5p Scott Handley Preprocessing Microbiome Data for Quantitative Microbiome Analysis (lab) TBD
18 Oct Thursday 9a – 12p TBD Host Differential Gene Expression Analysis (lecture) TBD
Thursday 2p – 5p TBD Host Differential Gene Expression Analysis (lab) TBD
19 Oct Friday 9a – 12p TBD Topical Presentation or Open Lab TBD
Friday 2p – 5p TBD Topical Presentation or Open Lab TBD
20 Oct Saturday 9a – 12p TBD