Description
This track displays single nucleotide polymorphisms (SNPs) identified by published
Genome-Wide Association Studies (GWAS), collected in the
NHGRI-EBI GWAS Catalog
published jointly by the National
Human Genome Research Institute (NHGRI) and the European Bioinformatics Institute (EMBL-EBI).
Some abbreviations
are used above.
From http://www.ebi.ac.uk/gwas/docs/about:
The Catalog is a quality controlled, manually curated, literature-derived
collection of all published genome-wide association studies assaying at least
100,000 SNPs and all SNP-trait associations with p-values < 1.0 x
10-5 (Hindorff et al., 2009). For more details about the Catalog
curation process and data extraction procedures, please refer to the
Methods page.
Methods
From http://www.ebi.ac.uk/gwas/docs/methods:
The GWAS Catalog data is extracted from the literature. Extracted information
includes publication information, study cohort information such as cohort size,
country of recruitment and subject ethnicity, and SNP-disease association
information including SNP identifier (i.e. RSID), p-value, gene and risk
allele. Each study is also assigned a trait that best represents the phenotype
under investigation. When multiple traits are analysed in the same study either
multiple entries are created, or individual SNPs are annotated with their
specific traits. Traits are used both to query and visualise the data in the
Catalog's web form and diagram-based query interfaces.
Data extraction and curation for the GWAS Catalog is an expert activity; each
step is performed by scientists supported by a web-based tracking and data
entry system which allows multiple curators to search, annotate, verify and
publish the Catalog data. Papers that qualify for inclusion in the Catalog are
identified through weekly PubMed searches. They then undergo two levels of
curation. First all data, including association information for SNPs, traits
and general information about the study, are extracted by one curator. A second
curator then performs an additional round of curation to double-check the
accuracy and consistency of all the information. Finally, an automated pipeline
performs validation of the extracted data, see the
Quality control and SNP mapping section below for more
details. This information is then used for queries and in the production of the
diagram.
Data Access
The raw data can be explored interactively with the Table Browser, or Data Integrator.
For automated analysis, the genome annotation can be downloaded from the downloads server
(gwasCatalog*.txt.gz) or the public MySQL server. Please refer to our
mailing list archives
for questions, or our Data Access FAQ for more information.
Previous versions of this track can be found on our archive download server.
References
Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, Manolio TA.
Potential etiologic and functional implications of genome-wide association loci for human diseases
and traits.
Proc Natl Acad Sci U S A. 2009 Jun 9;106(23):9362-7.
PMID: 19474294; PMC: PMC2687147
|
|