Important notice
The DisGeNET database is made available under the Attribution-NonCommercial-ShareAlike 4.0 International License whose text can be found here. For more information, see the Legal Notices page.
Tab separated files
Gene-Disease Associations
Curated gene-disease associations
The file contains gene-disease associations from UNIPROT, CGI, ClinGen, Genomics England, CTD (human subset), PsyGeNET, and Orphanet.
BeFree gene-disease associations
The file contains gene-disease associations obtained by text mining MEDLINE abstracts using the BeFree system.
ALL gene-disease associations
The file contains all gene-disease associations in DisGeNET.
ALL gene-disease-pmid associations
The file contains all gene-disease-pmid associations in DisGeNET.
Variant-Disease Associations
Curated variant-disease associations
The file contains variant-disease associations from UNIPROT, ClinVar, GWASdb, and the GWAS catalog.
BeFree variant-disease associations
The file contains variant-disease associations obtained by text mining MEDLINE abstracts using the BeFree system.
ALL variant-disease associations
The file contains all variant-disease associations in DisGeNET.
ALL variant-disease-pmid associations
The file contains all variant-disease-pmid associations in DisGeNET.
Disease-Disease Associations
Curated disease-disease associations
The file contains the disease-disease associations (DDAs) computed for curated sources (UNIPROT, CGI, ClinGen, Genomics England, CTD (human subset), PsyGeNET, and Orphanet). For each DDA we provide the number of shared genes, shared variants, and a Jaccard Index (JI). For more information about the DDAs, go here.
ALL disease-disease associations
The file contains the disease-disease associations (DDAs) computed for all sources in DisGeNET. For each DDA we provide the number of shared genes, shared variants, and a Jaccard Index (JI). For more information about the DDAs, go here
RDF linked dataset
The DisGeNET-RDF data dump and the VoID description file are accessible to download.
SQLite Files
Current Release
DisGeNET SQLite 2020 - v7.0
Mappings
Variant to Gene mappings
Variant-Gene Mappings File
The file contains the mappings of DisGeNET variants (dbSNP Identifiers) to NCBI Entrez identifiers according to dbSNP database
UMLS CUI to several disease vocabularies
The file contains the mappings of DisGeNET genes (Entrez Gene Identifiers) to UniProt entries
UMLS CUI to several disease vocabularies
The file contains the mappings of DisGeNET UMLS CUIs to the following disease vocabularies: DO, EFO, HPO, ICD9CM, ICD10, ICD10CM, MSH, NCI, OMIM, ORDO, Notice that not every CUI in DisGeNET has an equivalent code in another vocabulary (see table below). Also, the correspondence is not always 1:1. The mappings were generated with the UMLS Metathesaurus (v 2019AA )
vocabulary | DO | EFO | HPO | ICD10 | ICD10CM | ICD9CM | MONDO | MSH | NCI | OMIM | ORDO |
percent | 30.2 | 14.7 | 28.3 | 7.5 | 14.3 | 7.7 | 43.9 | 35.7 | 27.5 | 23.9 | 19.8 |
UMLS CUI to top disease classes
UMLS CUI to top disease classes
The file contains the mappings of DisGeNET concepts to the top disease classifications from the following vocabularies and ontologies
- The DisGeNET disease type
- MeSH disease class from the C and F branches.
- The Disease Ontology.
- The Human Phenotype Ontology.
- The UMLS Semantic Type.
DisGeNET Gene Sets
The files contain disgenet gene-disease association data in format gmt (Gene Matrix Transposed file format). In the gmt format, each row represents a gene set, and each line contains: ID (tab) Description (tab) Gene (tab) Gene (tab).
To create the gene sets, we have only taken into account diseases having 10 or more genes, and less than 1000 genes.
In our format:- ID = Disease Concept Unique Identifier
- Description = Disease Name
- Gene = identified by one of the two possible identifiers (Entrez gene id, or gene symbols)
DisGeNET COVID-19 Data
The COVID-19 DisGeNET data collection is the result of applying state-of-the art text mining tools developed by MedBioinformatics solutions to the LitCovid dataset (Chen, Allot, and Lu, 2020), to identify mentions of diseases, signs and symptoms. The LitCovid dataset contains a selection of papers referring to Coronavirus 19 disease. The COVID-19 DisGeNET data collection is made available under the Attribution-NonCommercial-ShareAlike 4.0 International License. For more information, see the Legal Notices.
DisGeNET annotations for the IntAct Coronavirus dataset
We have annotated the IntAct Coronavirus dataset (downloaded on May 18, 2020) using DisGeNET data. This dataset contains molecular interactions extracted from publications involving viral proteins from the Coronaviridae family and human proteins, along with a certain proportion of other model organisms. For the annotation of this dataset with DisGeNET data only human proteins were taken into account.
We provide the following files:
gene-disease associations for all proteins in the dataset
The file contains gene-disease associations from all sources in DisGeNET for the proteins in the IntAct dataset.gene-disease-pmid associations for all proteins in the dataset
The file contains gene-disease associations from all sources in DisGeNET, for the proteins in the IntAct dataset, plus the supporting publication and an exemplary sentence from the abstract of the publicationvariant-disease associations for all proteins in the dataset
The file contains the variant-disease associations from all sources in DisGeNET for the proteins in the IntAct dataset.variant-disease-pmid associations for all proteins in the dataset
The file contains the variant-disease associations from all sources in DisGeNET for the proteins in the IntAct dataset, plus the supporting publication and an exemplary sentence from the abstract of the publicationDisGeNET nanopublications
DisGeNET nanopublications in TriG format