Filtering and clipping sequences by gene region


Each sequence in the Los Alamos database is assigned standard start and stop coordinates based on the HXB2 HIV-1 complete reference genome. Filtering is accomplished by retrieving these coordinates as part of the initial query, and comparing these to a reference table. Clipping involves performing a short alignment at the 5' and 3' ends against the reference sequence.

Filtering by gene

After running the query, the "Downloads" control gives options for selecting those returned sequences that contain a desired gene region. In the pulldown menu, select the gene region desired, or Any to retrieve all returned sequences.

Clipping sequences to gene region

Check the box labelled "Clip sequences to region" to retrieve the only the sequence corresponding to the desired gene.

Precision clipping is not currently guaranteed. Contact the developer with any issues you may have with this feature.

Table of regions

The table is based on this one found on the Los Alamos site, but hand-corrected using http://www.hiv.lanl.gov/content/sequence/HIV/MAP/hxb2.xls as a guide.
regionstartstop
5' LTR1633
5' LTR R456551
5' LTR U31455
5' LTR U5552633
TAR453513
Gag-Pol7905096
Gag7902292
p17 (matrix)7901185
p24 (capsid)11861878
p218791920
p7 (nucleocapsid)19212085
p120862133
p621342292
Pol CDS20855096
p51 (RT)25503869
p15 (RNAse H)38704229
p31 (integrase)42305096
protease22932549
Vif CDS50415619
Vpr CDS55605850
Tat CDS (plus intron)58318469
Tat exon 158316045
Tat exon 283798470
Rev CDS (plus intron)59708653
Rev exon 159706045
Rev exon 283798653
Vpu CDS60626310
Env CDS62258795
V166156692
V266936812
V371107217
V473777475
V576027631
RRE77108061
gp4177588795
gp12063157757
gp16063158795
Nef CDS87979417
3' LTR90869719
3' LTR R95419636
3' LTR U390869540
3' LTR U596379719