Sequence preprocessing help


This implementation will automatically search the input sequence (amino acid or nucleotide) for the V3 loop.

Search options

If the resulting alignment or PSSM score is out of the usual range (the middle 95% of a general sample of subtype B or subtype C sequences), this will be noted in the returned results. An unusual score indicates that the alignment (whether yours or the program's) is probably unreliable.

Scoring degenerate sequences

Checking Expand degenerate sequences will instruct the program to score all possible combinations of amino acid sequences, given an input nucleotide sequence containing IUPAC ambiguity symbols.

There are two options: Average score will deliver only the simple average of scores over all combinations; Full expansion will enumerate and score each sequence combination separately, as well as report the average. Note that it doesn't take too many ambiguities in the sequence for the number of possible sequences to become very large. For example, a sequence with 9 codons containing amino-acid-changing ambiguities would yield 512 different sequences upon expansion. The upper limit for the number of combinations the progam is willing to analyze is 16,384 when computing the average score only, and 512 when requesting enumeration of all combinations. Some efficiencies are built in. Ambiguities, say in third positions, that do not change amino acids are not expanded. Only amibiguities within the sequence spanned by the matrix are expanded.

The user is advised that the average score over combinations is an extremely rough guide to the "X4-ness" of the population.
FR icon 26 Feb 2009