Sequence preprocessing help
This implementation will automatically search the input sequence
(amino acid or nucleotide) for the V3 loop.
Search options
- Force V3 at position 1: This option forces the program to
score the sequence from position 1. Gaps are left as-is.
If the sequence is given in
nucleotides, the translation is performed in frame 1.
- Fast-find V3: A quick attempt to find the V3 loop (after
nucleotide translation, if necessary) is made using a regular
expression. All forward and reverse frames are searched. Gaps are
left as-is. This method is very fast, but not perfect.
- Align to matrix: The most rigorous method, and the
slowest. The program will find and score the portion of the sequence
best aligning to a consensus V3 loop. The program will make insertions and deletions in order to put input V3 sites in register with
homologous sites represented in the matrix. The output will indicate
where insertions were removed and deletions identified to give the
resulting score.
If the resulting alignment or PSSM score is out of the
usual range (the middle 95% of a general sample of subtype B or
subtype C sequences), this will be noted in the returned results. An
unusual score indicates that the alignment (whether yours or the
program's) is probably unreliable.
Scoring degenerate sequences
Checking Expand degenerate sequences will instruct the program
to score all possible combinations of amino acid sequences, given an
input nucleotide sequence
containing IUPAC
ambiguity symbols.
There are two options: Average
score will deliver only the simple average of scores over all
combinations; Full expansion will enumerate and score each
sequence combination separately, as well as report the average. Note
that it doesn't take too many ambiguities in the sequence for the
number of possible sequences to become very large. For example, a
sequence with 9 codons containing amino-acid-changing ambiguities
would yield 512 different sequences upon expansion. The upper limit
for the number of combinations the progam is willing to analyze is
16,384 when computing the average score only, and 512 when requesting
enumeration of all combinations.
Some efficiencies are built in.
Ambiguities, say in third positions, that do not change
amino acids are not expanded. Only amibiguities within the sequence
spanned by the matrix are expanded.
The user is advised that the average score over combinations is
an extremely rough guide to the "X4-ness" of the population.
26 Feb 2009