So this morning started off with a lecture from Dr. Konrad Paszkiewicz on the ‘state of the union’ with respect to Sequencing Technology right now…
Dr. Konrad Paszkiewicz
University of Exeter
Director of Wellcome Trust Biomedical Bioinformatics Hub
First and foremost, if you are familiar with what molecular biology is and what sequencing is and don’t know who Fred Sanger is…then you’ve probably been living in a hole…
Konrad gave a lovely homage to Fred Sanger in his talk because if it weren’t for Dr. Sanger we really wouldn’t have come as far as we’ve come in sequencing technology so quickly.
I’m not going to get into the painstaking history of sequencing technology but rather give a brief listing and you can click to learn more as you see fit:
- Maxam-Gilbert Sequencing: Based on chain breakage of DNA, nasty chemicals, radiolabels and sequencing gels.
- Cycle Sequencing (Sanger): Chain termination, use thermal cycler, heat stable polymerase, flourescent dyes. (Applied Biosciences developed the first software to ‘call’ peaks)
- 1972: First gene sequenced from RNA
- 1976: First bacteriophage genome
- 1995: First whole genome shotgun (sanger sequencing) of H. influenzae
- 2004: Birth of 454 pyrosequencing (side note: I entered grad school in 2003.)
Human genome project:
- Nature pub group (see Mammalian Genomes post), publically funded, shotgun BAC approach.
- Science group: A Venter venture (private but data publically available). Wanted to do the genome faster for less than the publically funded group. Different method: Shear DNA, sequence the small fragments and rely on bioinformatics for assembly and scaffolding into a genome. Raise ethical concerns about ownership of genomes and the idea of patenting genes.
What has the Human Genome Project given us?
Whether you agree with it’s overall utility or not it really has given us a lot…
Second Generation Sequencing:
- Common features of all sequencers nowadays are they use adapters to fix DNA, some form of PCR amplification/library creation, fluorescent probes, all can do paired-end reading, most can sequence a human genome in a day, all require post processing of data for quality control, on average shorter read lengths than Sanger, capable of high volume.
Illumina HiSeq 2000 and 2500
- Pros: Large volume (300 Gb/run), short runs ( Cons: To achieve low cost you have to run LOTS of samples and short read lengths (36-150 bp)
- With the upgrade to 2500 you can produce 1 billion reads in 2-9 days using a flowcell, depending on how you use it. (ave length 18-150 bp).
- The 2500 is meant for rapid sequencing of limited samples
- The 2000 is meant for research and high throughput
- It’s the difference between obtaining a human genome from 1 sample in 27 hrs (rapid) or 5 samples in 12 days (High througput).
- For us bacteriologists: You could either do 48 genomes amidst all ‘lanes’ available in cell (rapid) or doing 48 genomes/lane (high throughput but longer).
454 Roche System (We have this at WRAIR)
- Pros: long reads (200-1000 bp), multiple samples at once (multiplexing), short run (1 Gb/3 hrs. Also relies on emPCR protocol. $700/run, $50K/instrument + $75K/library prep system. Meant for shorter reads. Unfortunately, libraries are not compatible with Ion Proton.
- Meant for longer reads (ie. genome sequencing or assemblies of Mb sized genomes). No optics either, average length 200 bp, 2 hr/run or 8+ hrs with library prep system. 60-80 million reads with P1 chip. $1500/run, $150K/instrument + $75K/library prep system. Not compatible with Ion Torrent system and also has 454 chemistry (emPCR protocol)
- 454 like chemistry, No optics uses pH, up to 400 bp reads, 2 hour run-time (+5 hours on One Touch), Output is depending on chip type (314, 316, 318). A 318 well will give >1 GB of data in 3 hrs. ~$700/run, $50K for instrument and $75K for additional One Touch station and Server. Libraries are not compatible with Ion Proton
Want to learn more about sequencers?
Problems associated with NGS:
- Sequencing is only going to be as good as your sample prep, so if there’s contamination or degradation, that’s what you’ll get out of your sequencer.
- When your organism has a high bias toward GC or AT then it becomes more difficult to sequence.
- 454 and Ion Torrent have problems with homopolyers, Illumina has this problem too but to a lesser extent because of their specific PCR protocol that incorporates ‘blocking’ via a terminator at the end of each cycle.
- Need to be reminded what a homopolymer is? A long stretch of a single nucleotide in a DNA sequence (ie. AAAAAA). The longer the stretch the less ‘confident’ the machine becomes when reading the nucleotides, signal is ‘maxed’ out and you end up for varying numbers of that nucleotide in the output that will need to be resolved.
Cost Breakdown for Illumina sequencing (HiSeq)…
Third Generation Sequencers:
Single molecule sequencing:
- PacBio has a machine available. Basically the machine is designed to collect absolutely ALL the light given off by the photon that occurs when a base is read. This system requires library prep (so some bias may still be inevitable as with current systems). The nifty thing about this system has to do with it’s potential applications to epigenetics. Because they slow the reaction with the polymerase and methylated bases take longer to disassociate than non-methylated bases–they can measure the ‘time’ and determine the DNA that is methylated while sequencing. Also you can circularize DNA and sequence the same molecule over and over. Theoretically you can get fragments up to 10kb, the process is 40 min (minus prep). Currently it has about a 15% error rate though, is uber-expensive ($750K) and you only get 10-100 Mb/run.
- PacBio training courses: https://github.com/PacificBiosciences/Bioinformatics-Training/wiki
- This method developed by Oxford Nanopore, uses–you guessed is ‘a very small pore’ and electrical current to detect DNA. Different bases will elicit a different signal when reacted with the electrical current. No library prep is the goal of this technology as well as the possibilities for parallelization. However, DNA moves really quickly and they haven’t found a way yet to either slow the DNA down enough or make the pore thin enough to force one base through at a time. Currently, they are at 4-5 bases at a time. There is also a lot of electrical noise generated so teasing out your signal is challenging. They came up with two methods.
- Strand Sequencing: A pore that slows DNA down via it’s design to 4-5 bases at a time.
- Exonuclease sequencing: An exonuclease chops the DNA and slows it down so it ‘falls’ into the pore hopefully 1 to a few bases at a time. This may post an indel problem similar to what’s afflicting current methods today BUT has prospects for sequencing proteins, polymers, small molecules and possible replace mass spectrometry (though that’s a ways off).
- GRIDIon (possibly around $30K) is said to be able to sequence a human genome in 2 hrs for $1000.
- MinIon (the smaller/USB version) is designed to be a ‘throw away’ sequencer that works for about 6 hrs. Costs $900 approx (for a 2000 pore chip) and assuming it delivers 10kb reads could produce 20 Mb. Error rate is stated to be about 4%
- And a useful link for learning more about Nanopore:
Want to access a MinIon? Get on the list:
To view the evolution of sequencing costs and putting PacBio and Oxford Nanopore on the list:
And what’s in store for the future???
Nanowires! Nanopore sequencing in parallel…be hopeful but don’t hold your breath
And other options can be read about here: http://www.allseq.com/knowledgebank/sequencing-platforms
A great paper to read that puts all the mass sequencing in perhaps a more realistic light.
It’s a study and a reality check on whether personalize sequencing really will be the ‘cure-all’ for us in the future and it’s important to recognize that…
We can sequence and sequence and sequence, but without biological and environmental context…it means nothing.
So why don’t we have our MinIon’s yet?
Patience my petulant child…the technology is coming and we have so much data coming out our ears right now that we are still playing catch up with the current output from platforms in use today that another platform potentially MORE convenient but with it’s own sets of biases, confounding factors, prep protocols etc would drive the sanest bioinformatician just a bit batty…
A bit battier, that is, than we obviously already are.