Setup and Customize¶
Use environment variables¶
TRANSVAR_CFG¶
store the path to transvar.cfg
export TRANSVAR_CFG=path_to_transvar.cfg
If not specified, TransVar will use [installdir]/lib/transvar/transvar.cfg directory or your local ~/.transvar.cfg if the installation directory is inaccessible.
TRANSVAR_DOWNLOAD_DIR¶
store the path to the directory where auto-download of annotation and reference go
export TRANSVAR_DOWNLOAD_DIR=path_to_transvar_download_directory
If not specified, TransVar will use [installdir]/lib/transvar/transvar.download directory or your local ~/.transvar.download if the installation directory is inaccessible.
Install and specify reference genome assembly¶
Download from TransVar database¶
For some genome assembly (currently hg18, hg19, hg38, mm9 and mm10) we provide download via
transvar config --download_ref --refversion [reference name]
See transvar config -h
for all choices of [reference name]
).
Manual download and index¶
For other genome assemblies, one could manually download the genome as one file and index it manually by,
samtools faidx [fasta]
Once downloaded and indexed, the genome can be used through the “–reference” option followed by path to the genome:
transvar ganno -i "chr1:g.30000000_30000001" --gencode --reference path_to_hg19.fa
or “–refversion” followed by the short version id.
transvar ganno -i "chr1:g.30000000_30000001" --gencode --refversion hg19
One can store the location in transvar.cfg file. To set the default location of genome file for a reference version, say, to path_to_hg19.fa,
transvar config -k reference -v path_to_hg19.fa --refversion hg19
will create in transvar.cfg an entry
[hg19]
reference = hg19.fa
so that there is no need to specify the location of reference on subsequent usages.
Install and specify transcript annotations¶
Download from TransVar database¶
One could automatically download transcript annotations via E.g.,
transvar config --download_anno --refversion hg19
which download annotation from TransVar database to [installdir]/lib/transvar/transvar.download directory or your local ~/.transvar.download if the installation directory is inaccessible. See transvar config -h for all version names. These will also create default mappings under the corresponding reference version section of transvar.cfg like
[hg19]
ucsc = /home/wzhou1/download/hg19.ucsc.txt.gz
Index from GTF files¶
TransVar databases can be obtained from indexing a GTF file. For example,
transvar index --refseq hg38.refseq.gff.gz
The above will create a bunch of transvar databaase files with the suffix hg38.refseq.gff.gz.transvardb*.
Download from Ensembl ftp¶
One also has the option of downloading from Ensembl collection.
transvar config --download_ensembl --refversion mus_musculus
Without specifying the refversion, user will be prompted a collection of options to choose from.
Know Current configuration¶
To show the location and the content of currently used transvar.cfg, one may also run
transvar config
which returns information about the setup regarding to the current reference selection, including the location of the reference file and database file.
Current reference version: mm10
reference: /home/wzhou/genomes_link/mm10/mm10.fa
Available databases:
refseq: /home/wzhou/tools/transvar/transvar/transvar.download/mm10.refseq.gff.gz
ccds: /home/wzhou/tools/transvar/transvar/transvar.download/mm10.ccds.txt
ensembl: /home/wzhou/tools/transvar/transvar/transvar.download/mm10.ensembl.gtf.gz
specifying --refversion
displays the information under that reference version (without changing the default reference version setup).
Set default reference builds¶
To switch reference build
transvar config --switch_build mm10
switches the default reference build to mm10. This is equivalent to
transvar config -k refversion -v mm10
which sets the refversion slot explicitly.
Use Additional Resources¶
TransVar uses optional additional resources for annotation.
dbSNP¶
For example, one could annotate SNP with dbSNP id by downloading the dbSNP files. This can be done by
transvar config --download_dbsnp
TransVar automatically download dbSNP file which correspoding to the current default reference version (as set in transvar.cfg). This also sets the entry in transvar.cfg. With dbSNP file downloaded, TransVar automatically looks for dbSNP id when performing annotation.
transvar panno -i 'A1CF:p.A309A' --ccds
A1CF:p.A309A CCDS7243 (protein_coding) A1CF -
chr10:g.52576004T>G/c.927A>C/p.A309A inside_[cds_in_exon_7]
CSQN=Synonymous;reference_codon=GCA;candidate_codons=GCC,GCG,GCT;candidate_sn
v_variants=chr10:g.52576004T>C,chr10:g.52576004T>A;dbsnp=rs201831949(chr10:52
576004T>G);source=CCDS
Note that in order to use dbSNP, one must download the dbSNP database through
transvar config --download_dbsnp
or by configure the dbsnp
slot in the configure file via
transvar config -k dbsnp -v [path to dbSNP VCF]
Manually set path for dbSNP file must have the file tabix indexed.
Control the length of reference sequence¶
TransVar reduces the reference sequence in a deletion to its length when the deleted reference sequence is too long. For example
$ transvar ganno -i 'chr14:g.101347000_101347023del' --ensembl
outputs
chr14:g.101347000_101347023del ENST00000534062 (protein_coding) RTL1 -
chr14:g.101347000_101347023del24/c.4074+29_4074+52del24/. inside_[3-UTR;noncoding_exon_1]
CSQN=3-UTRDeletion;left_align_gDNA=g.101347000_101347023del24;unaligned_gDNA=
g.101347000_101347023del24;left_align_cDNA=c.4074+29_4074+52del24;unalign_cDN
A=c.4074+29_4074+52del24;aliases=ENSP00000435342;source=Ensembl
where the deletion sequence is reduced to its length (del24). The –seqmax option changes the length threshold (default:10) when this behavior occur. When –seqmax is negative, the threshold is lifted such that the reference sequence is always reported regardless of its length, i.e.,
$ transvar ganno -i 'chr14:g.101347000_101347023del' --ensembl --seqmax -1
outputs the full reference sequence:
chr14:g.101347000_101347023del ENST00000534062 (protein_coding) RTL1 -
chr14:g.101347000_101347023delTTGGGGTGAGAAATAGAGGGGACT/c.4074+29_4074+52delAGTCCCCTCTATTTCTCACCCCAA/. inside_[3-UTR;noncoding_exon_1]
CSQN=3-UTRDeletion;left_align_gDNA=g.101347000_101347023delTTGGGGTGAGAAATAGAG
GGGACT;unaligned_gDNA=g.101347000_101347023delTTGGGGTGAGAAATAGAGGGGACT;left_a
lign_cDNA=c.4074+29_4074+52delAGTCCCCTCTATTTCTCACCCCAA;unalign_cDNA=c.4074+29
_4074+52delAGTCCCCTCTATTTCTCACCCCAA;aliases=ENSP00000435342;source=Ensembl