One of the most challenging problems in biomedical research is to understand the underlying mechanisms of complex diseases. Great effort has been spent on finding the genes associated to diseases (Botstein and Risch, 2003; Kann, 2009). However, more and more evidences indicate that most human diseases cannot be attributed to a single gene but arise due to complex interactions among multiple genetic variants and environmental risk factors (Hirschhorn and Daly, 2005). Several databases have been developed storing associations between genes and diseases such as CTDTM (Davis, et al., 2014), OMIM® (Hamosh et al., 2005) and the NHGRI-EBI GWAS catalog (Welter et al., 2014). Each of these databases focuses on different aspects of the phenotype-genotype relationship, and due to the nature of the database curation process, they are not complete. Hence, integration of different databases with information extracted from the literature is needed to allow a comprehensive view of the state of the art knowledge within this research field. With this need in mind, we have created DisGeNET.
DisGeNET is a discovery platform integrating information on gene-disease associations (GDAs) from several public data sources and the literature (Piñero et al., 2015 ). The current version contains (DisGeNET v4.0) contains 429,036 associations, between 17,381 genes and 15,093 diseases, disorders and clinical or abnormal human phenotypes, and 72,870 variant-disease associations (VDAs), between 46,589 SNPs and 6,356 diseases and phenotypes. Given the large number of GDAs compiled in DisGeNET, we have also developed a score in order to rank the associations based on the supporting evidence. Importantly, useful tools have also been created to explore and analyze the data contained in DisGeNET. DisGeNET can be queried through Search and Browse functionalities available from this web interface, or by a plugin created for Cytoscape to query and analyze a network representation of the data. Moreover, DisGeNET data can be queried by downloading the SQLite database to your local repository. Furthermore, an RDF (Resource Description Framework) representation of DisGeNET database is also available. It can be queried using our SPARQL endpoint and a Faceted Browser. Follow the link for more information.
DisGeNET database has been cited by several papers. Some of them can be reviewed here.
The disease-associated variants in DisGeNET have been annotated with the following attributes:
- The position in the chromosome
- The reference and alternative alleles
- The class of the variant: SNP, deletion, insertion, indel, somatic SNV, substitution, sequence alteration, and tandem repeat
- The allelic frequency according to the 1000 Genomes Project
- The allelic frequency according to the Exome Aggregation Consortium
- The most severe consequence type according to the VEP
- 429,036 GDAs comprising 17,381 genes and more than 15,000 diseases and phenotypes.
- New data sources: Orphanet and the NHGRI-EBI GWAS Catalog
- New annotations of genes to phenotypes
- We have updated our DisGeNET Gene-Disease Ontology with new association types
- We have updated our DisGeNET Score to include the new sources
- New disease annotations: HDO and HPO
- A Disease Specificity Index (DSI) and a Disease Pleiotropy Index (DPI) have been computed for the genes
- Information on more than 45,000 SNPs associated to diseases
- The complete set of supporting publications for each GDA available
- Publications now can be sorted or filtered by publication year
- A large number of negative associations has been removed from the text mining sources (BeFree, GAD, and LHGDN) using regular expression approaches
October, 2015: We are happy to announce that our group has become a new node, and first in Spain, of the Nanopublication Network: please see the current nodes at nanopub-network monitor. Thanks to Dr. Tobias Kuhn for all his support and enthusiasm.
August, 2015: New DisGeNET Nanopublications release: The release of the Nanopublication distribution of DisGeNET v3.0 is here! More than 1 000 000 GDA statements comprising 17 000 genes and more than 14 000 diseases as nanopublications in the Semantic Web
- More GDAs comprising 17 000 genes and more than 14 000 diseases as linked data in the Semantic Web
- New disease-phenotype annotation data from the Human Phenotype Ontology
- New linksets
- A new full metadata description of the dataset compliant with the W3C HCLS and the Open PHACTS specifications
- New types of searches
May, 2015: New DisGeNET release: over 17 000 genes and more than 14 000 diseases. A new data source has been added (ClinVar) and we have improved GDAs from BeFree. More information on human clinically relevant variations is also available.
April, 2015: DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes, published in Database. See the paper here.
November 26-27, 2014: Our PostDoc, Núria Queralt Rosinach, was invited as a Linked Data expert to the RDConnect/Elixir/BioMedBridges BYOD workshop named 'Rare Disease Registries (and biobanks)'. DisGeNET was brought to the event as a linkable dataset to show the value of Linked Data. Our involvement in the Open PHACTS project was highlighted during the workshop. See more at BYOD Workshop.
October 23, 2014: We have published DisGeNET as nanopublications, which is a new Linked Dataset using the nanopublication approach and the Trusty URI technique. To see more information, go to DisGeNET Nanopublications section.
October 15, 2014: DisGeNET appears for the first time in the LOD cloud diagram (2014-08-30 update). This diagram shows datasets published in Linked Data format and it is built based on their metadata description on the DataHub as well as on metadata extracted from a crawl of the Linked Data web (DisGeNET DataHub site here)
October 2, 2014: Sequence variant information available in DisGeNET! More than 8,000 SNPs associated to disease have been annotated using text mining. Check some examples here. More information coming soon....
September 16, 2014: We have updated the metadata description of the DisGeNET Linked dataset, see it at DisGeNET VoID.
August 27, 2014: An update of the DisGeNET Cytoscape tutorial has been made available.
August 8, 2014: A tutorial to illustrate the functionalities of the web interface has been made available. Follow the link for more information.
July 01, 2014: The update of DisGeNET RDF has been released (version 2.1.0). 13172 diseases and 16666 genes linked by 381056 gene-disease associations in the Semantic Web represented by a new data model and more annotation and linkouts. Please, find all the new data and information related @ http://rdf.disgenet.org/
June 16, 2014: Bug in DisGeNET plugin concerning coloring nodes by disease class has been fixed. We recommend our users to download the plugin and the database again. For more information, contact support(at)disgenet(dot)org
May 5, 2014: A new version of DisGeNET has been released (version 2.1). We have new data from text mining using our BeFree system .
April 26-27, 2014: Janet Piñero and Núria Queralt participated to the Network of Biothings Hackathon.
April 24, 2014: DisGeNET RDF integrated in the Web of Linked Data can be now navigated via a new implemented Faceted browser.
April 16, 2014: The paper describing the Biomedical Named Entity Recognition (BioNER) used to extract and identify genes/proteins and diseases in DisGeNET BeFree dataset is out! Check it here.
April 11, 2014: The paper introducing the Semanticscience Integrated Ontology (SIO) in which the DisGeNET association type ontology it has been integrated is out! Check it here.
February 5, 2014: New release of DisGeNET available, with updated info and data from two new resources: Text mining (TEXTM) and Rat Genome Database (RGD).
September, 2013: An RDF (Resource Description Framework) representation of DisGeNET database has been created that can be queried using our SPARQL endpoint. Making DisGeNET data available as RDF Linked Open Data promotes integration with other RDF representations of resources in the semantic web. Follow the link for more information."
February, 2013: The DisGeNET association type ontology developed in our group has been integrated in the Semantics Science Integrated Ontology, (SIO) which is an integrated ontology of types and relations for rich description of objects, processes and their attributes. Thanks to Dr. Michel Dumontier for accepting this collaboration and helping us in the integration.
January, 2013: DisGeNET registered with the Neuroscience Lexicon NeuroLex.
November 30, 2012: “DisGeNET: from MySQL to Nanopublication, modelling gene-disease associations for the Semantic Web” will be presented at the SWAT4LS in Paris.
October 8-9, 2012: DisGeNET presented at the SME Bioinformatics Forum, in Barcelona.
July 20, 2012: New release of DisGeNET available, with updated info and data from two new resources: Genetic Association Database (GAD) and Mouse Genomics Database (MGD).