MacVectorBaseLogoWhiteTransparentBackgroundlarge2x

Sequence Analysis Tools for Molecular Biologists

Home MacVector Assembler Downloads Try HowToBuy Support Contact Forums

de novo NGS Assembly Using Velvet

You can use Assembler to align millions of short Next Generation Sequencing (NGS) reads using the Velvet de novo assembler. There is also a blog post describing this functionality. Note that these assemblies can become extremely computationally intensive. In particular, the amount of RAM installed on the machine is of most importance. In our hands, we have found that with 16 GB of RAM, you can reasonably assemble a maximum of about 10 million 100nt reads within a couple of hours on a laptop. Above this, Velvet requires more memory and will start using your hard drive as extra "swap" space, significantly slowing down the computation. This can be alleviated a little with a very fast SSD hard drive, but repeating the same assembly with 20 million reads may lead to 24 hr assembly times.

Adding Reads to a Project

Assembly projects have two toolbar buttons for adding sequence data to the project. The Add Seqs button is used to add read data to the project - these are typically Fastq or Fasta formatted files containing sequence data from Illumina Solexa, SOLiD, 454 or MiSeq sequencing runs. To save disk space, the files are not copied - MacVector just notes their location, so its important not to move them after you have created a project.. The image below shows a project populated with a pair of Fastq read files representing paired-end reads from a bacterial sequencing project. Each file has approximately 2.4 million 90nt reads.

VelvetB4Assembly

Assembling Using Velvet

Next we simply select the files we want to assemble (if nothing is selected, then all of the files in the project will be used) then click on the Velvet button;

VelvetDialog

The most important setting in the dialog is the Hash ("K-MER") Length. Sometimes you will need to play with this a little to get optimal results. This value must be shorter than the length of the reads, so if you are using older data where read lengths were only in the 33nt range, you will need to reduce this appropriately. Values between 31 and 51 appear to work best for typical bacterial sequencing projects. The value should be an odd number - if you enter an even number, Velvet will round it down to the next odd number.

The second most critical value is the checkbox to indicate that the source files contain paired reads. You can have multiple sets of paired reads - MacVector is clever enough to work out which files represent pairs of each other. However, with MacVector 13 the paired reads must reside in separate files.

The other values give you fine grained control over the assembly - typically you can just select the Auto settings to have Velvet work out suitable parameters by itself.

Results

When completed, the project window refreshes to display all of the contigs created by the assembly. Note that the original source files remain untouched - the reads are copied into results. You can sort the contigs by length or number of reads. There is also an "UnusedReads.fa" file created that contains all of the reads that did not assemble.

VelvetAfterAssembly

The Properties tab gives you an overview of the assembly - the number of contigs, total length, etc. One of the most useful parameters is the N50 statistic. This is one measure that attempts to estimate the likely quality of the assembly. It is defined as the length for which the sum of the contigs of that length or longer exceeds half of the sum of the lengths of all contigs.

VelvetProperties

Viewing Contigs

The contigs created can be viewed by double-clicking on them to display in the Contig Editor;

VelvetContigEditor

With MacVector 13, the contigs cannot be edited - behind the scenes the data is stored in the popular BAM format to save space and this is essentially a read-only format. However, the BAM file can be extracted from the project and used as input to otherapplications that accept that format as input.

There is also a Summary tab that lists useful summary information about the contig.

VelvetContigSummary

Exporting Contig Consensus Sequences

One additional nice feature of the Assembly Project is that you can select one or more (or all) contigs and export them in Fastq or Fasta format. This is a useful way of building up large assemblies from a series of smaller assemblies. You can run assemblies on multiple subsets of files (create a separate project for each one so you can run assemblies in parallel), export all of the contigs as Fastq files, then create a new master project where you can "assemble the assemblies" using phrap.

FlatLogo2019

Copyright © 2023 MacVector, Inc. All rights reserved. Terms of Use.

MacVector, Inc • PO Box 1147 • Apex • North Carolina 27502 • USA

phone: +1-919-303-7450 • toll free: +1-866-338-0222 • fax: +1-919-303-7449

Overview

Creating a Sequencing Project

Base Calling Using phred

Vector Trimming with cross_match

Assembling Sequences using phrap

Editing and Analysis of Contigs

NGS Reference Assembly using Bowtie

NGS de novo Assembly using Velvet

Comparing Assembler and AssemblyLIgn

SplitFastqFile - a ultility to break up large fastq files.

Functional comparison with Sequencher