Description
This track shows the genomic positions of variants in the
AVADA database.
AVADA is a database of variants built by a machine learning software
that analyzes full text research articles to find the gene mentions in the text that
look like they are most relevant for monogenic (non-cancer) genetic diagnosis, finds variant
descriptions and uses the genes to map the variants to the genome. For details see the
AVADA paper.
As the data is automatically extracted from full-text publications, it includes
some false positives. In the original study, out of 200 randomly selected articles,
only 99 were considered relevant after manual curation. However, this share is very high
compared to the Genomenom track. Ideally, the track is used
in combination with variants found in human patients, to find relevant literature,
or with Genome Browser tracks of variant databases that curated a single study
for each variant, like our tracks for HGMD or LOVD.
Display Conventions and Configuration
Genomic locations of a variants are labeled with the variant description
in the original text. This is not a normalized HGVS string, but the original
text as the authors of the study described it.
The Pubmed ID, gene and transcript for each variant are shown on the
variant's details page, as well as the PubMed title, authors, and abstract.
Mouse over the variants to show the gene, variant, first author, year, and title.
The data has been lifted from hg19 to hg38.
Data access
The raw data can be explored interactively with the Table Browser,
for download, intersection or correlations with other tracks. To join this track with others
based on the chromosome positions, use the Data Integrator.
For automated download and analysis, the genome annotation is stored in a bigBed file that
can be downloaded from
our download server.
The file for this track is called avada.bb. Individual
regions or the whole genome annotation can be obtained using our tool bigBedToBed
which can be compiled from the source code or downloaded as a precompiled
binary for your system. Instructions for downloading source code and binaries can be found
here.
The tool
can also be used to obtain only features within a given range, e.g.
bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg19/bbi/avada.bb -chrom=chr21 -start=0 -end=100000000 stdout
For automated access, this track like all others, is also available via our
API. However, for bulk processing in
pipelines, downloading the data and/or using bigBed files as described above is
usually faster.
Methods
The AVADA VCF file was reformatted at UCSC to the bigBed format.
The program that performs the conversion is available on
Github. The paper reference information was added from
MEDLINE and is used Courtesy of the U.S. National Library of Medicine, according
to its
Terms and Conditions.
Credits
Thanks to Gill Bejerano and Johannes Birgmeier for making the data available.
References
Johannes Birgmeier, Cole A. Deisseroth, Laura E. Hayward, Luisa M. T. Galhardo, Andrew P. Tierno, Karthik A. Jagadeesh, Peter D. Stenson, David N. Cooper, Jonathan A. Bernstein, Maximilian Haeussler, and Gill Bejerano.
AVADA: Towards Automated Pathogenic Variant Evidence Retrieval Directly from the Full Text Literature. .
Genetics in Medicine. 2019.
PMID: 31467448
|