Important notice

The DisGeNET database is made available under the Attribution-NonCommercial-ShareAlike 4.0 International License whose text can be found here. For more information, see the Legal Notices page.

Tab separated files

Gene-Disease Associations

Curated gene-disease associations

The file contains gene-disease associations from UNIPROT, CGI, ClinGen, Genomics England, CTD (human subset), PsyGeNET, and Orphanet.

BeFree gene-disease associations

The file contains gene-disease associations obtained by text mining MEDLINE abstracts using the BeFree system.

ALL gene-disease associations

The file contains all gene-disease associations in DisGeNET.

ALL gene-disease-pmid associations

The file contains all gene-disease-pmid associations in DisGeNET.

Variant-Disease Associations

Curated variant-disease associations

The file contains variant-disease associations from UNIPROT, ClinVar, GWASdb, and the GWAS catalog.

BeFree variant-disease associations

The file contains variant-disease associations obtained by text mining MEDLINE abstracts using the BeFree system.

ALL variant-disease associations

The file contains all variant-disease associations in DisGeNET.

ALL variant-disease-pmid associations

The file contains all variant-disease-pmid associations in DisGeNET.

Disease-Disease Associations

Curated disease-disease associations

The file contains the disease-disease associations (DDAs) computed for curated sources (UNIPROT, CGI, ClinGen, Genomics England, CTD (human subset), PsyGeNET, and Orphanet). For each DDA we provide the number of shared genes, shared variants, and a Jaccard Index (JI). For more information about the DDAs, go here.

ALL disease-disease associations

The file contains the disease-disease associations (DDAs) computed for all sources in DisGeNET. For each DDA we provide the number of shared genes, shared variants, and a Jaccard Index (JI). For more information about the DDAs, go here

README

README file

RDF linked dataset

RDF Downloads

The directory contains the DisGeNET-RDF data dump and the VoID description files corresponding to DisGeNET version 6.0

Mappings

UniProt Downloads

UniProt Downloads

Variant to Gene mappings

Variant-Gene Mappings File

The file contains the mappings of DisGeNET variants (dbSNP Identifiers) to NCBI Entrez identifiers according to dbSNP database

UMLS CUI to several disease vocabularies

The file contains the mappings of DisGeNET genes (Entrez Gene Identifiers) to UniProt entries

UMLS CUI to several disease vocabularies

The file contains the mappings of DisGeNET UMLS CUIs to the following disease vocabularies: DO, EFO, HPO, ICD9CM, ICD10, ICD10CM, MSH, NCI, OMIM, ORDO, Notice that not every CUI in DisGeNET has an equivalent code in another vocabulary (see table below). Also, the correspondence is not always 1:1. The mappings were generated with the UMLS Metathesaurus (v 2019AA )

vocabulary DO EFO HPO ICD10 ICD10CM ICD9CM MONDO MSH NCI OMIM ORDO
percent 30.2 14.7 28.3 7.5 14.3 7.7 43.9 35.7 27.5 23.9 19.8

UMLS CUI to top disease classes

UMLS CUI to top disease classes

The file contains the mappings of DisGeNET concepts to the top disease classifications from the following vocabularies and ontologies

  • The DisGeNET disease type
  • MeSH disease class from the C and F branches.
  • The Disease Ontology.
  • The Human Phenotype Ontology.
  • The UMLS Semantic Type.
NOTICE: not all concepts have mappings to these categories with the exceptions of the DisGeNET disease type and the UMLS Semantic Type

DisGeNET Gene Sets

The files contain disgenet gene-disease association data in format gmt (Gene Matrix Transposed file format). In the gmt format, each row represents a gene set, and each line contains: ID (tab) Description (tab) Gene (tab) Gene (tab).

To create the gene sets, we have only taken into account diseases having 10 or more genes, and less than 1000 genes.

In our format:
  • ID = Disease Concept Unique Identifier
  • Description = Disease Name
  • Gene = identified by one of the two possible identifiers (Entrez gene id, or gene symbols)

DisGeNET annotations for the IntAct Coronavirus dataset

We have annotated the IntAct Coronavirus dataset (downloaded on May 18, 2020) using DisGeNET data. This dataset contains molecular interactions extracted from publications involving viral proteins from the Coronaviridae family and human proteins, along with a certain proportion of other model organisms. For the annotation of this dataset with DisGeNET data only human proteins were taken into account.

We provide the following files:

gene-disease associations for all proteins in the dataset

The file contains gene-disease associations from all sources in DisGeNET for the proteins in the IntAct dataset.

gene-disease-pmid associations for all proteins in the dataset

The file contains gene-disease associations from all sources in DisGeNET, for the proteins in the IntAct dataset, plus the supporting publication and an exemplary sentence from the abstract of the publication

variant-disease associations for all proteins in the dataset

The file contains the variant-disease associations from all sources in DisGeNET for the proteins in the IntAct dataset.

variant-disease-pmid associations for all proteins in the dataset

The file contains the variant-disease associations from all sources in DisGeNET for the proteins in the IntAct dataset, plus the supporting publication and an exemplary sentence from the abstract of the publication