|
Searching for Coding Regions
Using Coding Preference Plots to finding coding regions
After you determine the sequence of a piece of DNA, one of the first things you would like to know is whether it codes for a protein. If you are lucky, your sequence will contain at least one long open reading frame—a reading frame of at least 50 to 100 codons that contains no stop codons. You can translate the open reading frame and search the NBRF Protein Identification Resource database or the GenBank nucleic acid database to see if there is a match with a known protein. If you find a match, you will have answered your question.
If there is no match, you need some other method of determining the biological significance of the open reading frame. It may code for a previously unsequenced protein, or it may have no biological significance whatsoever - after all, not all open reading frames are protein coding regions.
MacVector provides a range of analyses to help you make this decision. In addition to open reading frame analysis, the program provides various methods use that base or codon composition to help you determine if your open reading frame has the characteristics of a protein coding region. In conjunction with these methods, you can use nucleic acid subsequence analysis to look for motifs in your sequence, such as ribosome binding sites or intron-exon splice sites, that may help define the exact boundaries of coding regions.
|
|