From Base Calling to Assembly

This section of the quick tour shows you how to process samples before an assembly, and how to assemble contigs. These steps are:

  1. Base calling with PHRED
  2. End clipping
  3. Vector trimming
  4. Assembling contigs

Note that the first three steps are optional - you do not have to do any of the first three steps before assembly. Depending on the nature of your data, however, you may get better and "cleaner" assemblies if you do all the preprocessing steps in the order shown above.

Base Calling with PHRED

Why base call?

CodonCode Aligner works best if your data have base-specific quality scores, like those assigned by the base calling program PHRED. PHRED was developed and copyrighted by Phil Green and co-workers at the University of Washington, and is often viewed as the "Gold Standard" for base calling. Aligner allows you to run PHRED on Aligner projects in a single step.

You should do this step if you have sequence traces without quality scores. If your sequences already have quality scores (for example because they were processed by the "KB" base caller from ABI, you can skip this step and go on to end clipping.

Tip: If you are not sure if your sequences have quality scores, look at the "Quality" column in the project view for your samples . For samples that you not have quality scores, the number in the "Quality" column is 0 for all samples. If the numbers are not 0, the samples have quality scores (make sure that you see the actual samples, not just the folder they are in - the quality scores for folders are always 0).


To do the "Base Calling" step in Aligner, you will need a trial license or a purchased full license. If you are using Aligner in demo mode, and your sequences do not have quality scores or you want to try the base calling, please request a trial license before proceeding.

How to base call

Typically, base calling takes only one or a few seconds per sample. When the base calling step is complete, the progress dialog disappears, and the project window changes - you now see 7 samples in the "Unassembled Samples" folder, and 7 samples in the "Trash" folder. Click on the triangle before the trash to expand it, which shows you this view:

Note the status area at the bottom with the text "Finished calling bases". To view recent progress and warning messages from Aligner, you can click on it, which opens the following window:

The newest messages are at the top. You can see where the samples in the trash came from - they are the original samples that have been renamed and moved to the trash. Click the "Close" button to dismiss this dialog.

Emptying the Trash

We won't be using the samples in the trash anymore, so let us get rid of them. Go to the "File" menu, and select "Empty Trash...". Read the warning dialog that appears (you always read warning dialogs carefully, right?), and then click "Yes". Next, you need to save the project.

Save often!

So far, none of the changes have actually been saved - if you would close the project without saving it, you would loose all the changes. It's a good idea to save your project often, especially before and after doing anything major.

If you have received and installed a license key for a time-limited trial or for a full version, save your project as described below. If you are using the demo version, you cannot save, so go on to the "End Clipping" section.

To save your project, you can use the "Save Project" button in the project view, or go to the "File" menu and select "Save Project", or use the keyboard shortcut (Control-S on Windows, Command-S on OS X). Each of these will save the project, and the status area will then show "Saving project complete". If you opened a project, but did not make any changes to it, the "Save Project" button and menu items will be dimmed and inactive until you actually make a change that needs to be saved.

To save a copy of your project under a new name, go to the "File" menu and select "Save Project As...". That's a good idea if you make any changes to your project, but think that you might want to go back to an older saved version later.

End Clipping

Why Clip Ends?

Sequence traces often have short low-quality sequence at the beginning , and longer low-quality sequence stretches at the end. There are a number of possible good reasons to remove these low-quality parts - for example to get better and cleaner assemblies, or to get better results in database comparisons with BLAST or similar programs.

CodonCode Aligner can use quality scores to automatically find and remove these low-quality sequences. After end clipping, Aligner can also identify sequences that are very short, for example due to failed sequencing reactions, and automatically move them to the trash. You do not have to use the end clipping step in Aligner, but you may get better assemblies if you do.

How to Clip Ends

Clipping ends requires that your sequences have base-specific quality scores. If you did the previous step (base calling), PHRED will have assigned quality scores to all bases.

In this example, Aligner would clip on average almost 200 bases per read, most from the end of reads (the reads in this example are short reads from ABI 373 sequencers). You can click on the "Change Parameters" button to change your criteria for end clipping. After changing your settings, the same clipping preview dialog will appear again with the updated numbers.

To apply the clipping to your samples, press the "Clip" button. Now, save your project again (unless you are using the demo version).

Vector Trimming

Why Trim Vector?

Sequences often have short stretches of cloning vector sequence at the beginning. If the cloned insert is short, the sequence read may also extent through it, and have vector sequence at the end of the read. Finally, some clones may not have any inserts, and contain only vector sequence.

During sequence assembly, these vector sequences may cause false joints where they overlap, or they may break up contigs if vector sequence at the end "blocks" further extension of the contig. Therefore, it is a good idea to remove ("trim") vector sequences before assembly.

How to Trim Vector

Before you can use vector trimming, you need to define which vector sequences you want to screen against, and the criteria that any match to a vector sequence must meet. To set your vector screening preferences, follow these steps

Typically, you should use a "custom" vector file - a simple text file that contains the vector sequences you are using in FASTA format. For illustration purposes, we will use a custom vector file that is included with Aligner and contains just one "dummy" vector sequence. Do make sure this sequence is used in vector screening, we must set the preferences as follows:

So far, we have told Aligner in which file our vector sequences are, but we have not yet told it which sequence to use for vector screening. You vector files can contain many sequences, but you should always use only the vectors that were actually used in cloning for the vector trimming step - if you choose too many vectors, the vector trimming will take much longer, and you may get false hits.

Our "QuickTourVector.txt" file contains two dummy vector sequences, which we want to select:

You can select multiple vector sequences by shift-clicking to make continuou selections, as we just did, or by control-clicking (OS X: command-clicking) to make discontinous selection.

Your preference dialog now should look like this:

In this example, Aligner found a match to our dummy vector at the start of one clone (A819.r). When you click "Apply Trims", 30 bases will be removed from the start of this sequence. Go ahead and do this, then save the project (unless you are using the demo version).

We suggest that you play around with the vector trimming preferences a bit, and use a text editor to look at the vector file we used. For example, try setting the "Max. distance to end" value to 300, and repeat the vector trimming - what happens? Also, why not check the online help for explanations about the parameters and the vector trimming algorithm?

Assembling Contigs

In the project view for our "Example1" project, click on the "Unassembled Samples" folder to select it. Then, either go to the "Contig" menu and select "Assemble", or click on the "Assemble" button in the project view.

CodonCode Aligner will look for overlaps between all the sequences in your "Unassembled Samples" container, and try to build contigs for overlapping sequences. You will see a progress window as the assembly goes on; this should take only a few seconds. After the assembly is done, the project window will show the newly formed contig:

The project view just shows a tabular overview of the contigs. The next sections described how to view the aligned bases and a graphical overview of the contig, and how to navigate in contigs.

Aligner Home Page   -  Quick Tour Start   -  Previous   -   Next: The Contig View