Tab separated files
The file contains gene-disease associations from UNIPROT, CGI, ClinGen, Genomics England, CTD (human subset), PsyGeNET, and Orphanet.
The file contains gene-disease associations obtained by text mining MEDLINE abstracts using the BeFree system.
The file contains all gene-disease associations in DisGeNET.
The file contains all gene-disease-pmid associations in DisGeNET.
The file contains variant-disease associations from UNIPROT, ClinVar, GWASdb, and the GWAS catalog.
The file contains variant-disease associations obtained by text mining MEDLINE abstracts using the BeFree system.
The file contains all variant-disease associations in DisGeNET.
The file contains all variant-disease-pmid associations in DisGeNET.
The file contains the disease-disease associations (DDAs) computed for curated sources (UNIPROT, CGI, ClinGen, Genomics England, CTD (human subset), PsyGeNET, and Orphanet). For each DDA we provide the number of shared genes, shared variants, and a Jaccard Index (JI). For more information about the DDAs, go here.
The file contains the disease-disease associations (DDAs) computed for all sources in DisGeNET. For each DDA we provide the number of shared genes, shared variants, and a Jaccard Index (JI). For more information about the DDAs, go here
RDF linked dataset
The DisGeNET-RDF data dump and the VoID description file are accessible to download.
Variant to Gene mappings
The file contains the mappings of DisGeNET variants (dbSNP Identifiers) to NCBI Entrez identifiers according to dbSNP database
UMLS CUI to several disease vocabularies
The file contains the mappings of DisGeNET genes (Entrez Gene Identifiers) to UniProt entries
The file contains the mappings of DisGeNET UMLS CUIs to the following disease vocabularies: DO, EFO, HPO, ICD9CM, ICD10, ICD10CM, MSH, NCI, OMIM, ORDO, Notice that not every CUI in DisGeNET has an equivalent code in another vocabulary (see table below). Also, the correspondence is not always 1:1. The mappings were generated with the UMLS Metathesaurus (v 2019AA )
UMLS CUI to top disease classes
The file contains the mappings of DisGeNET concepts to the top disease classifications from the following vocabularies and ontologies
- The DisGeNET disease type
- MeSH disease class from the C and F branches.
- The Disease Ontology.
- The Human Phenotype Ontology.
- The UMLS Semantic Type.
DisGeNET Gene Sets
The files contain disgenet gene-disease association data in format gmt (Gene Matrix Transposed file format). In the gmt format, each row represents a gene set, and each line contains: ID (tab) Description (tab) Gene (tab) Gene (tab).
To create the gene sets, we have only taken into account diseases having 10 or more genes, and less than 1000 genes.In our format:
- ID = Disease Concept Unique Identifier
- Description = Disease Name
- Gene = identified by one of the two possible identifiers (Entrez gene id, or gene symbols)
DisGeNET COVID-19 Data
The COVID-19 DisGeNET data collection is the result of applying state-of-the art text mining tools developed by MedBioinformatics solutions to the LitCovid dataset (Chen, Allot, and Lu, 2020), to identify mentions of diseases, signs and symptoms. The LitCovid dataset contains a selection of papers referring to Coronavirus 19 disease. The COVID-19 DisGeNET data collection is made available under the Attribution-NonCommercial-ShareAlike 4.0 International License. For more information, see the Legal Notices.
DisGeNET annotations for the IntAct Coronavirus dataset
We have annotated the IntAct Coronavirus dataset (downloaded on May 18, 2020) using DisGeNET data. This dataset contains molecular interactions extracted from publications involving viral proteins from the Coronaviridae family and human proteins, along with a certain proportion of other model organisms. For the annotation of this dataset with DisGeNET data only human proteins were taken into account.
We provide the following files:
DisGeNET nanopublications in TriG format