DisGeNET RDF

Note to users

The description provided in this page refers to the RDF release 5.0, corresponding to DisGeNET version 5.0. The RDF release 6.0 is not yet available

The DisGeNET-RDF Linked Dataset is an alternative way to access the DisGeNET data and provides new opportunities for data integration, querying and integrating DisGeNET data to other external RDF datasets.

The DisGeNET RDF API has been selected among the 10 ELIXIR Recommended Interoperability Resources ELIXIR announced its first portfolio of Recommended Interoperability Resources (RIRs) to facilitate interoperability and reusability of life science data and support the principles of FAIR data management on December 2018. The full list of ELIXIR Recommended Interoperability Resources is available here.

The RDF version of DisGeNET has been developed in the context of the Open PHACTS project to provide disease relevant information to the knowledge base on pharmacological data. DisGeNET-RDF has been integrated in the Open PHACTS Discovery Platform among other resources such as ChEMBL, WikiPathways and neXtProt. Aimed at exploring and querying DisGeNET data across the linked data in the platform, APIs are currently available in the Open PHACTS API v1.5 (see the OPS API Web site for up to date information).

To perform faceted and precise searches the DisGeNET-RDF linked data is accessible via a Faceted browser.

In addition, DisGeNET-RDF linked data can be accessed for question-answering via a SPARQL endpoint. An alternative SPARQL interaction with the DisGeNET-RDF data is via a LODEStar interface here, which is a SPARQL endpoint and linked data browser for querying and browsing RDF datasets developed in the EBI. Furthermore, some DisGeNET queries are available at Bioqueries. See the 4.3 SPARQL Endpoint: Example Queries section for more details and query examples.

The RDF Linked Dataset is accompanied with a full dataset description, which is compliant with the W3C HCLS specification. For more information on the dataset description of the RDF Dataset go to Metadata Description section.

To download the dump files please, go to the Data Downloads section.

Release Information

DisGeNET-RDF v5.0

The RDF distribution of DisGeNET includes new annotation and new linksets:

  • All linksets updated, i.e. all ontologies updated.
  • Disease-phenotype annotation data have been integrated from 3 different sources:
    • Annotations from the Human Phenotype Ontology
    • Text-mined annotations from The Human Phenotype Ontology: Semantic Unification of Common and Rare Disease. Groza et al., 2015
    • Text-mined annotations from Analysis of the human diseasome using phenotype similarity between common, genetic, and infectious diseases. Hoehndorf et al., 2015.
  • RDF enhancement and data model changes

DisGeNET Nanopublication v5.0

The nanopublication distribution of DisGeNET includes all DisGeNET v5.0 gene-disease association statements along with its provenance, evidence, and attribution structured as nanopublications. Please, refer to the release notes and the RDF section in this page for more details.

Linked Dataset Description

There are three main components in the RDF dataset: GDA content, metadata description of the RDF dataset (VoID description), and linkouts to other Linked Datasets. The current RDF representation of DisGeNET (v4.0.0) has 30,506,021 triples serialized in Turtle syntax that annotate 429,036 gene-disease associations (GDAs), 17,381 genes, and 15,093 diseases involved in these associations. The RDF graph model is centered on the GDA concept, and different information around GDA, such as the gene and disease involved, and the type of association is represented. Also, the gene identified by the Entrez or NCBI GeneID and the disease identified by the UMLS CUI have different attributes annotated (see the Schema below). Entities and properties are semantically defined using standard ontologies such as the National Cancer Institute thesaurus (NCIt), and resources identified by using de-referenceable IRIs. GDAs are integrated using the DisGeNET Association Type Ontology and they are semantically harmonized using SIO classes (see the DisGeNET ontology section below).

A full dataset description of the RDF Linked Dataset is provided using among others the Vocabulary of Interlinked Datasets (VoID), an RDF Schema W3C recommended vocabulary for expressing metadata about RDF datasets. This dataset description, which is compliant with the W3C HCLS specification and the Open PHACTS specification, includes the provenance of the DisGeNET relational database, the primary databases, and the BeFree text mining tool (see the DisGeNET VoID file description). The type of curation and level of evidence of each original database are also tracked and annotated. Each data instance in DisGeNET is explicitly referenced to this dataset description in order to granulate and trace back the provenance to the instance level.

In addition, linkouts to the LOD are set in order to both enrich DisGeNET GDAs annotations with external Semantic Web resources, and to extend the current GDAs content of the Web of knowledge. Specifically, a total number of 4,962,315 linksets to the LOD through Bio2RDF, linked life data network projects among others exists in the current version. All entities linked are related using the same SKOS predicate skos:exactMatch. Other linkset statistics between entities can be found at the DisGeNET DataHub site in the DataHub registry. Consequently, DisGeNET appears in the last update of the LOD cloud diagram (2014-08-30 update). This diagram shows datasets published in Linked Data format and it is built based on their metadata description on the DataHub as well as on metadata extracted from a crawl of the Linked Data Web.

Metadata Description

The RDF Linked Dataset is accompanied with a full dataset description, which is compliant with the W3C HCLS specification. The full VoID description at DisGeNET_VoID.ttl.

DisGeNET-RDF Schema

The data model of the RDF representation of DisGeNET is shown below. Click on the picture to zoom in.

In this new release, GDAs are now identified by "303 URIs" following the W3C recommendation to build URIs for the Semantic Web. Each GDA is defined by a unique combination of a gene (NCBI GeneID), a disease (UMLS CUI), an association type defined by our ontology (see section below), a data source of provenance, and a PubMed article (PMID) giving evidence to the gene-disease association. A unique identifier based on Universally Unique Identifiers (UUID) generated by a cryptographic hash function, is established for each GDA. The DisGeNET GDA ID is composed by: 'DGN' + UUID, e.g. DGN7ab3d8cae0c9f1150cb65a985aa8c0a1. The new namespace is 'http://rdf.disgenet.org/resource/gda/'. The new GDA IRI pattern is: namespace + DisGeNET ID,

e.g. 'http://rdf.disgenet.org/resource/gda/DGN7ab3d8cae0c9f1150cb65a985aa8c0a1'.

For an example of triples related to a single gene-disease association in DisGeNET, see here.

The DisGeNET Association Type Ontology

The DisGeNET Association Type Ontology was developed in our group to fill the gap in formal semantics for the definition of types of associations described between a gene and a disease in biological databases. This ontology was generated using all terms provided by the GDAs original databases. It is an OWL ontology that can be accessed at GeneDiseaseAssociation.owl. The DisGeNET ontology is integrated into the Sematicscience Integrated Ontology (SIO), which is an OWL ontology that provides essential types and relations for the rich description of objects, processes and their attributes [PDF]. You can check SIO gene-disease association classes from this URL or download the entire SIO OWL-DL ontology file . The SIO ontology can be also accessed at the NCBO Bioportal. DisGeNET GDAs in RDF are semantically harmonized using SIO classes.

Gene-Disease Ontology

Access to the RDF Linked Dataset

Data Downloads

The DisGeNET-RDF data dump and the VoID description file are accessible to download.

Faceted Browser

DisGeNET-RDF linked data can be navigated via a Faceted browser.

SPARQL Endpoint

DisGeNET-RDF data are accessible using the query language SPARQL via our public SPARQL endpoint. The dataset is stored in a Virtuoso's QUAD Store in which the name of the graph is 'http://rdf.disgenet.org'. It is powered by Virtuoso open-source v7.1.0.

An alternative SPARQL interaction with the DisGeNET-RDF data is via a LODEStar interface at the DisGeNET LODEStar Endpoint, which is a SPARQL endpoint and linked data browser for querying and browsing RDF datasets developed in the EBI.

DisGeNET GRAPH

The DisGeNET-RDF dataset is deployed in the graph: 'http://rdf.disgenet.org'.

DisGeNET NAMESPACES*

The namespaces required to query DisGeNET are:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX void: <http://rdfs.org/ns/void#>
PREFIX sio: <http://semanticscience.org/resource/>
PREFIX ncit: <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#>
PREFIX up: <http://purl.uniprot.org/core/>
PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX dctypes: <http://purl.org/dc/dcmitype/>
PREFIX wi: <http://http://purl.org/ontology/wi/core#>
PREFIX eco: <http://http://purl.obolibrary.org/obo/eco.owl#>
PREFIX prov: <http://http://http://www.w3.org/ns/prov#>
PREFIX pav: <http://http://http://purl.org/pav/>
PREFIX obo: <http://purl.obolibrary.org/obo/>

*Our SPARQL endpoint is configured with these prefixes, thus their definition is not required when executing queries from our endpoint.

RDF Entity Examples

In order to help the user to query DisGeNET RDF data, for each type of entity represented in DisGeNET we provide an example of its RDF annotation serialized in Turtle syntax, see here.

Access DisGeNET via ontology

To facilitate the retrieval of data, several ontologies are deployed in the quad store in order to perform question/answering walking the ontologies. The deployed ontologies are:

  • The Semanticscience Integrated Ontology (SIO),
  • the Human Disease Ontology (DO),
  • the Orphanet Rare Disease Ontology (ORDO),
  • the NCI thesaurus (NCIt),
  • the Human Phenotype Ontology (HPO),
  • the Experiment Factor Ontology (EFO).
  • the Evidence Code Ontology (ECO).

Please, note the coverage of DisGeNET with other disease terminologies summarized in the disease table in downloads.

SPARQL Endpoint Example Queries

The purpose of DisGeNET linked dataset is to enable richer queries over the data. Below we provide examples of how to explore DisGeNET data.

Examples

Query 1.1: Retrieve all the gene-disease associations (GDAs) and their general description

# Get all the GDAs of type 'Therapeutic' (sio:SIO_001120) and their related annotation to general description. SELECT ?gda ?label ?comment ?title ?id ?voidSubset FROM <http://rdf.disgenet.org> WHERE { ?gda rdf:type sio:SIO_001120 ; rdfs:label ?label ; rdfs:comment ?comment ; dcterms:title ?title ; dcterms:identifier ?id ; void:inDataset ?voidSubset } LIMIT 20

Execute

-----------------

Query 1.2: Retrieve all the GDAs and their related gene and disease entities

# Get all the GDAs, associated gene and disease URIs based on the DisGeNET ID, NCBI GeneID, and UMLS CUI, respectively. SELECT ?gda ?gene ?disease FROM <http://rdf.disgenet.org> WHERE { ?gda sio:SIO_000628 ?gene,?disease . ?gene rdf:type ncit:C16612 . ?disease rdf:type ncit:C7057 } LIMIT 20

Execute

-----------------

Query 1.3: Retrieve all the supporting evidences for the association between Rett Syndrome and the MECP2 gene

# Give me all the supporting evidences in DisGeNET, for the association between the "Rett Syndrome" disease (umls:C0035372) and the MECP2 gene (ncbigene:4204). SELECT DISTINCT ?gda <http://linkedlifedata.com/resource/umls/id/C0035372> as ?disease <http://identifiers.org/ncbigene/4204> as ?gene ?score ?source ?associationType ?pmid ?sentence WHERE { ?gda sio:SIO_000628 <http://linkedlifedata.com/resource/umls/id/C0035372>, <http://identifiers.org/ncbigene/4204> ; rdf:type ?associationType ; sio:SIO_000216 ?scoreIRI ; sio:SIO_000253 ?source . ?scoreIRI sio:SIO_000300 ?score . OPTIONAL { ?gda sio:SIO_000772 ?pmid . ?gda dcterms:description ?sentence . } }

Execute

-----------------

Query 1.4: Retrieve all the GDAs from CURATED sources and with a score greater than or equal to 0.4

# Give me all the GDAs from CURATED sources (uniprot, ctd_human, clinvar) with a score greater than or equal to 0.4. SELECT DISTINCT ?gda ?disease ?source ?score WHERE { ?gda sio:SIO_000628 ?gene, ?disease ; sio:SIO_000253 ?source ; sio:SIO_000216 ?scoreIRI . ?disease a ncit:C7057 . ?scoreIRI sio:SIO_000300 ?score . FILTER regex(?source, "UNIPROT|CTD_human") FILTER (?score >= 0.4) } ORDER BY DESC(?score) LIMIT 100

Execute

-----------------

Query 1.5: Retrieve all the diseases associated with transporters

# Give me all the diseases associated with proteins classified as 'transporter' according to the PANTHER classification. SELECT DISTINCT ?disease ?diseaselabel ?diseasename WHERE { ?panther rdfs:subClassOf sio:SIO_000275 ; dcterms:title ?panthername . FILTER regex(?panthername, "transporter") ?protein sio:SIO_000095 ?panther . ?gene sio:SIO_010078 ?protein . ?gda sio:SIO_000628 ?gene, ?disease . FILTER regex(STR(?disease), "umls/id") ?disease dcterms:title ?diseasename . ?disease rdfs:label ?diseaselabel } LIMIT 100

Execute

-----------------

Query 1.6: Retrieve the genes associated with Alzheimer disease

# For Alzheimer Disease, give me all the genes associated with the disease with a score greater than 0.29. SELECT DISTINCT ?gene str(?geneName) as ?name ?score WHERE { ?gda sio:SIO_000628 ?gene,?disease ; sio:SIO_000216 ?scoreIRI . ?gene rdf:type ncit:C16612 ; dcterms:title ?geneName . ?disease rdf:type ncit:C7057 ; dcterms:title "Alzheimer's Disease"@en . ?scoreIRI sio:SIO_000300 ?score . FILTER (?score > 0.29) } ORDER BY DESC(?score)

Execute

-----------------

Query 1.7: Retrieve all the GDAs classified according to the association types of the DisGeNET ontology

# Give me all GDAs in DisGeNET and the type of relationship between genes and diseases. SELECT DISTINCT ?gda ?type ?label WHERE { ?gda rdf:type ?type . ?type rdfs:subClassOf+ sio:SIO_000983 . ?type rdfs:label ?label } LIMIT 50

Execute

-----------------

Query 1.8: Retrieve the diseasome

# Give me all the associations between diseases (diseasome) based on shared genes. SELECT DISTINCT ?disease ?diseaseName ?gene ?disease2 ?diseaseName2 WHERE { ?gda sio:SIO_000628 ?disease,?gene . ?gda2 sio:SIO_000628 ?disease2,?gene . ?disease dcterms:title ?diseaseName . ?disease2 dcterms:title ?diseaseName2 . FILTER regex(?gene, "ncbigene") FILTER regex(?disease, "umls/id") FILTER regex(?disease2, "umls/id") FILTER (?disease != ?disease2) FILTER (?gda != ?gda2) } LIMIT 50

Execute

-----------------

Query 1.9: Retrieve the gene-gene network

# Give me all the associations between genes based on shared diseases. SELECT DISTINCT ?gene ?geneName ?disease ?gene2 ?geneName2 WHERE { ?gda sio:SIO_000628 ?disease,?gene . ?gda2 sio:SIO_000628 ?disease,?gene2 . ?gene dcterms:title ?geneName . ?gene2 dcterms:title ?geneName2 . FILTER regex(?disease, "umls/id") FILTER regex(?gene, "ncbigene") FILTER regex(?gene2, "ncbigene") FILTER (?gene != ?gene2) FILTER (?gda != ?gda2) } LIMIT 50

Execute

-----------------

Query 1.10: Retrieve all diseases in DisGeNET classified as 'Ovarian cancer'

# Give me all diseases in DisGeNET that belong to 'Ovarian cancer' class in the Human Disease Ontology (DOID:2394). SELECT DISTINCT ?umls ?umlsTerm ?doid ?doTerm WHERE { ?gda sio:SIO_000628 ?umls . ?umls dcterms:title ?umlsTerm ; skos:exactMatch ?doid . ?doid rdfs:label ?doTerm ; rdfs:subClassOf+ <http://purl.obolibrary.org/obo/DOID_2394> . FILTER regex(?umls, "umls/id") } LIMIT 20

Execute

-----------------

Query 1.11: get all variants, its associated diseases and associated genes

# Give me all variants, their associated genes (if any) and the diseases associated to these variants. SELECT distinct ?disease str(?diseaseTitle) as ?diseaseName ?variant str(?variantTitle) as ?variantName ?gene str(?geneTitle) as ?geneName str(?chrValue) as ?chromosome str(?posValue) as ?position str(?refValue) as ?refAllele str(?altValue) as ?altAllele ?type as ?mostSevereConsequence str(?speValue) as ?specificity str(?pleioValue) as ?pleiotropy FROM <http://rdf.disgenet.org> WHERE { ?vda sio:SIO_000628 ?variant,?disease . ?disease a ncit:C7057 . ?disease dcterms:title ?diseaseTitle . ?variant a ?type . ?variant dcterms:title ?variantTitle . ?variant sio:SIO_000216 ?spe,?pleio . ?spe a sio:SIO_001351 ; sio:SIO_000300 ?speValue . ?pleio a sio:SIO_001352 ; sio:SIO_000300 ?pleioValue . ?variant sio:SIO_000061 ?chr . ?chr sio:SIO_000300 ?chrValue . ?pos sio:SIO_000300 ?posValue . ?variant sio:SIO_000791 ?pos . ?variant sio:SIO_000223 ?ref,?alt . ?ref a geno:0000152 ; sio:SIO_000300 ?refValue . ?alt a geno:0000476 ; sio:SIO_000300 ?altValue . FILTER (?type!=so:0001060) OPTIONAL { ?variant so:associated_with ?gene . ?gene a ncit:C16612 . ?gene dcterms:title ?geneTitle } } ORDER BY ASC(?disease) ASC(?variant) LIMIT 20

Execute

-----------------

Query 1.12: get all phenotypic annotations provided by HPO

# Give me all phenotypic annotations reported by HPO for "Alzheimer's Disease" SELECT distinct ?disease str(?diseaseTitle) as ?diseaseName ?phenotype str(?phenotypeTitle) as ?phenotypeName ?source ?sourceTitle FROM <http://rdf.disgenet.org> WHERE { ?phenotype a sio:SIO_010056 . ?disease a ncit:C7057 ; dcterms:title "Alzheimer's Disease"@en . ?pda a sio:SIO_000897 . ?pda sio:SIO_000628 ?phenotype . ?pda sio:SIO_000628 ?disease . ?disease dcterms:title ?diseaseTitle . ?phenotype dcterms:title ?phenotypeTitle . ?pda sio:SIO_000253 ?source . ?source dcterms:title ?sourceTitle . # FILTER REGEX(?sourceTitle,"Groza") # FILTER REGEX(?sourceTitle,"Hoehndorf") FILTER REGEX(?sourceTitle,"HPO") FILTER (?disease != ?phenotype) } ORDER BY ASC(?disease) ASC(?phenotype)

Execute

-----------------

SPARQL Endpoint Example Federated Queries

The purpose of the Federated queries is to integrate DisGeNET data with other Linked Datasets in the LOD cloud. The DisGeNET SPARQL endpoint supports federated queries over other linked datasets such as UniProt, Gene Expression Atlas (GXA), and WikiPathways. The DisGeNET SPARQL endpoint supports the syntax and semantics of SPARQL 1.1 for executing queries distributed over different SPARQL endpoints. Below we provide examples of how to explore perform federated queries.

  • DisGeNET + WikiPathways
  • DisGeNET + other SPARQL endpoints (ChEMBL, Ensembl, Uniprot,..)
  • Examples

    FED1: DisGeNET + WikiPathways (queries made in collaboration with the WikiPathways RDF team. Thanks!!!)

    NAMESPACE

    PREFIX wp: <http://vocabularies.wikipathways.org/wp#>

    Query 2.1: Retrieve the genes and pathways associated with 'Marfan Syndrome'

    # Give me all disease genes for 'Marfan Syndrome' (MeSH:D008382 or OMIM:601665) in DisGeNET and the pathways for these genes from WikiPathways. Output the disease name, NCBI Gene ID, HGNC gene name, gene label, WikiPathways pathway ID and name. PREFIX wp: <http://vocabularies.wikipathways.org/wp#> SELECT DISTINCT str(?DiseaseName) as ?DiseaseName ?gene str(?GeneName) as ?GeneTitle ?PathwayID str(?PathwayName) as ?PathwayName WHERE { # Query DisGeNET for disease-genes ?disease skos:exactMatch <http://id.nlm.nih.gov/mesh/D008382> . # alternatively, searching by MIM term: # ?disease skos:exactMatch <http://bio2rdf.org/omim:601665> ?gda sio:SIO_000628 ?gene,?disease . ?gene rdf:type ncit:C16612 ; dcterms:title ?GeneName . ?disease rdf:type ncit:C7057 ; dcterms:title ?DiseaseName . # Query WikiPathways for gene-pathways SERVICE <http://sparql.wikipathways.org/> { ?geneProduct a wp:GeneProduct ; dc:identifier ?gene ; rdfs:label ?GeneLabel ; dcterms:isPartOf ?pathway . ?pathway dc:identifier ?PathwayID ; dc:title ?PathwayName } } ORDER BY DESC(?GeneName)

    Execute

    -----------------

    Query 2.2: Retrieve the pathways associated with 'Pulmonary Emphysema'

    # Give me all pathways in WikiPathways for CURATED disease genes associated with 'Pulmonary Emphysema' (MeSH:D011656) in DisGeNET. Output the disease name, WikiPathways pathway ID and name. PREFIX wp: <http://vocabularies.wikipathways.org/wp#> SELECT DISTINCT str(?DiseaseName) as ?DiseaseName ?PathwayID str(?PathwayName) as ?PathwayName WHERE { # Query DisGeNET for disease-genes ?gda sio:SIO_000628 ?gene,?disease ; sio:SIO_000253 ?source . ?disease rdf:type ncit:C7057 ; skos:exactMatch <http://id.nlm.nih.gov/mesh/D011656> ; dcterms:title ?DiseaseName . ?gene rdf:type ncit:C16612 ; dcterms:title ?GeneName . FILTER regex(?source, "uniprot|ctd_human|clinvar") # Query WikiPathways for gene-pathways SERVICE <http://sparql.wikipathways.org/> { ?geneProduct a wp:GeneProduct ; dc:identifier ?gene ; rdfs:label ?GeneLabel ; dcterms:isPartOf ?pathway . ?pathway dc:identifier ?PathwayID ; dc:title ?PathwayName . } } ORDER BY DESC(?PathwayName)

    Execute

    -----------------

    Query 2.3: Retrieve the pathways associated with 'Schizophrenia', and show the number Schizophrenia genes in each pathway

    # Give me all pathways in WikiPathways and the total number of disease genes in each pathway for 'Schizophrenia' (MeSH:D012559). We will consider associations from CURATED sources with DisGeNET score greater than 0.35. Output the disease name, WikiPathways pathway ID, pathway name, and the number of disease genes. PREFIX wp: <http://vocabularies.wikipathways.org/wp#> SELECT DISTINCT str(?DiseaseName) as ?DiseaseName ?PathwayID str(?PathwayName) as ?PathwayName count(DISTINCT ?gene) AS ?genes WHERE { # DisGeNET: get disease-genes ?gda sio:SIO_000628 ?gene,?disease ; sio:SIO_000216 ?scoreIRI . ?gene rdf:type ncit:C16612 . ?disease rdf:type ncit:C7057 ; skos:exactMatch <http://id.nlm.nih.gov/mesh/D012559> ; dcterms:title ?DiseaseName . ?scoreIRI sio:SIO_000300 ?score . FILTER (?score > "0.35"^^xsd:decimal) # WikiPathways: get gene-pathways SERVICE <http://sparql.wikipathways.org/> { ?geneProduct a wp:GeneProduct ; dc:identifier ?gene ; dcterms:isPartOf ?pathway . ?pathway dc:identifier ?PathwayID ; dc:title ?PathwayName . } # end of service } # end of query ORDER BY DESC(?genes)DESC(?PathwayName) LIMIT 100

    Execute

    -----------------

    Query 2.4: Retrieve the genes and pathways associated with 'Diabetes Mellitus, Type 2'

    # Give me all disease genes for 'Diabetes Mellitus, Type 2' (MeSH:D003924) with DisGeNET score greater than 0.35, that are involved in pathways and the number of pathways in WikiPathways in which each gene is involved. Output the disease name, gene URI, gene name, and the number of pathways. Please, be aware that this query takes some time due to the amount of data crossed. PREFIX wp: <http://vocabularies.wikipathways.org/wp#> SELECT DISTINCT ?diseaseName ?gene ?geneName ?nPathways WHERE { ?disease skos:exactMatch <http://id.nlm.nih.gov/mesh/D003924> ; rdf:type ncit:C7057 ; dcterms:title ?diseaseName . ?gda sio:SIO_000628 ?gene,?disease ; sio:SIO_000216 ?scoreIRI . ?gene rdf:type ncit:C16612 ; dcterms:title ?geneName . ?scoreIRI sio:SIO_000300 ?score . { SELECT ?gene ?nPathways WHERE { # WikiPathways: get gene-pathways SERVICE <http://sparql.wikipathways.org/> { SELECT ?gene COUNT(DISTINCT ?pathway) as ?nPathways WHERE { ?geneProduct a wp:GeneProduct ; dc:identifier ?gene ; dcterms:isPartOf ?pathway . ?pathway dc:identifier ?pathwayid } GROUP BY ?gene } # end of service } # end of where } # end of subquery FILTER (?score > "0.35"^^xsd:decimal) } # end of query LIMIT 100

    Execute

    -----------------

    Query 2.5: Retrieve the number of genes for 'Bardet-Biedl Syndrome' disease and indicate the number of genes present in pathways

    # For 'Bardet-Biedl Syndrome' disease (MeSH:D020788), give me the total number of associated genes in DisGeNET and the total number of these genes in WikiPathways. Output the disease name, the total number of disease genes in Wikipathways and the total number of disease genes in DisGeNET. PREFIX wp: <http://vocabularies.wikipathways.org/wp#> SELECT DISTINCT ?DiseaseName count(distinct ?gene) as ?GeneInPathway ?TotalGene WHERE { SELECT * WHERE { # Total # of DisGeNET genes in WikiPathways ?gda sio:SIO_000628 ?gene,?disease . ?disease rdf:type ncit:C7057 ; skos:exactMatch <http://id.nlm.nih.gov/mesh/D020788> ; dcterms:title ?DiseaseName . ?gene rdf:type ncit:C16612 . # WikiPathways: get gene-pathways SERVICE <http://sparql.wikipathways.org/> { SELECT * WHERE { ?geneProduct a wp:GeneProduct ; dc:identifier ?gene ; dcterms:isPartOf ?pathway } } # Total # genes in DisGeNET { SELECT DISTINCT ?DiseaseName count(distinct ?gene2) as ?TotalGene WHERE { ?gda sio:SIO_000628 ?gene2,?disease . ?disease rdf:type ncit:C7057 ; skos:exactMatch <http://id.nlm.nih.gov/mesh/D020788> ; dcterms:title ?DiseaseName . ?gene2 rdf:type ncit:C16612 } } } }

    Execute

    -----------------

    Query 2.6: Retrieve the genes and the pathways associated with 'Bardet-Biedl Syndrome' disease. In addition, list all the genes involved in each of the pathways found

    # For 'Bardet-Biedl Syndrome' disease (MeSH:D020788), retrieve from DisGeNET the genes, the pathway(s) in which the gene is involved from WikiPathways, and all the genes present in each of these pathways. Output the disease name, gene in DisGeNET, pathway ID, and gene in Wikipathways. PREFIX wp: <http://vocabularies.wikipathways.org/wp#> SELECT DISTINCT ?DiseaseName ?gene ?PathwayID ?allGeneInPw WHERE { # DisGeNET: get disease-genes ?disease rdf:type ncit:C7057 ; skos:exactMatch <http://id.nlm.nih.gov/mesh/D020788> ; dcterms:title ?DiseaseName . ?gda sio:SIO_000628 ?gene,?disease . ?gene rdf:type ncit:C16612 . # WikiPathways: get gene-pathways SERVICE <http://sparql.wikipathways.org/> { ?geneProduct a wp:GeneProduct ; dc:identifier ?gene ; dcterms:isPartOf ?pathway . ?pathway dc:identifier ?PathwayID ; dc:title ?PathwayName . { SELECT ?PathwayID ?allGeneInPw WHERE { ?geneProduct a wp:GeneProduct ; dc:identifier ?allGeneInPw ; dcterms:isPartOf ?pathway . ?pathway dc:identifier ?PathwayID ; dc:title ?PathwayName . FILTER regex(str(?allGeneInPw), "ncbigene") } # end of select } # end of subquery } # end of service } # end of query ORDER BY DESC(?gene) DESC(?PathwayID) DESC(?allGeneInPw)

    Execute

    -----------------

    Query 2.7: Retrieve the total number of both disease genes and all genes involved in each pathway for the 'Bardet-Biedl Syndrome'

    # For 'Bardet-Biedl Syndrome' disease (MeSH:D020788), retrieve from DisGeNET the number of disease genes in a pathway in WikiPathways, the pathway ID, and the number of all genes in each of these pathway. Output the disease name, DisGeNET genes in the pathway, pathway ID, and all genes in the Wikipathways pathway. PREFIX wp: <http://vocabularies.wikipathways.org/wp#> SELECT DISTINCT ?DiseaseName count(distinct ?gene) as ?diseasegenesinthepathway ?PathwayID count(distinct ?allGeneInPw) as ?allgenesinthepathway WHERE { # DisGeNET: get disease-genes ?disease rdf:type ncit:C7057 ; skos:exactMatch <http://id.nlm.nih.gov/mesh/D020788> ; dcterms:title ?DiseaseName . ?gda sio:SIO_000628 ?gene,?disease . ?gene rdf:type ncit:C16612 . # WikiPathways: get gene-pathways SERVICE <http://sparql.wikipathways.org/> { ?geneProduct a wp:GeneProduct ; dc:identifier ?gene ; dcterms:isPartOf ?pathway . ?pathway dc:identifier ?PathwayID ; dc:title ?PathwayName . { SELECT ?PathwayID ?allGeneInPw WHERE { ?geneProduct a wp:GeneProduct ; dc:identifier ?allGeneInPw ; dcterms:isPartOf ?pathway . ?pathway dc:identifier ?PathwayID ; dc:title ?PathwayName . FILTER regex(str(?allGeneInPw), "ncbigene") } # end of select } # end of subquery } # end of service } # end of query ORDER BY DESC(?diseasegenesinthepathway) DESC(?allgenesinthepathway)

    Execute

    -----------------

    Query 2.8: Retrieve the total number of both disease genes and all genes involved in each disease pathway and secondary pathways for 'Bardet-Biedl Syndrome'

    # For 'Bardet-Biedl Syndrome' disease (MeSH:D020788), retrieve from DisGeNET the number of disease genes in a pathway in WikiPathways, the pathway ID or let's call it disease pathway, the number of all genes shared between the disease pathway and a secondary pathway, and the secondary pathway ID. Output the disease name, DisGeNET genes in the pathway, disease pathway ID, and all genes in the disease pathway shared with another pathway, and the secondary pathway ID. PREFIX wp: <http://vocabularies.wikipathways.org/wp#> SELECT DISTINCT ?DiseaseName count(distinct ?gene) as ?diseasegenesinthepathway ?PathwayID count(distinct ?allGeneInPw) as ?allgenesinthepathway ?PathwayID2 WHERE { # DisGeNET: get disease-genes ?disease rdf:type ncit:C7057 ; skos:exactMatch <http://id.nlm.nih.gov/mesh/D020788> ; dcterms:title ?DiseaseName . ?gda sio:SIO_000628 ?gene,?disease . ?gene rdf:type ncit:C16612 . # WikiPathways: get gene-pathways SERVICE <http://sparql.wikipathways.org/> { ?geneProduct a wp:GeneProduct ; dc:identifier ?gene ; dcterms:isPartOf ?pathway . ?pathway dc:identifier ?PathwayID ; dc:title ?PathwayName . { SELECT ?PathwayID ?allGeneInPw WHERE { ?geneProduct a wp:GeneProduct ; dc:identifier ?allGeneInPw ; dcterms:isPartOf ?pathway . ?pathway dc:identifier ?PathwayID ; dc:title ?PathwayName . FILTER regex(str(?allGeneInPw), "ncbigene") } } { SELECT ?allGeneInPw ?PathwayID2 WHERE { ?geneProduct a wp:GeneProduct ; dc:identifier ?allGeneInPw ; dcterms:isPartOf ?pathway2 . ?pathway2 dc:identifier ?PathwayID2 ; dc:title ?PathwayName2 . FILTER regex(str(?allGeneInPw), "ncbigene") } # end of select } # end of subquery } # end of service } # end of query ORDER BY DESC(?diseasegenesinthepathway) DESC(?allgenesinthepathway)

    Execute

    -----------------

    Query 2.9: Retrieve the number of disease genes, the number of all genes, and the number of secondary pathways in each disease pathway for 'Bardet-Biedl Syndrome'

    # For 'Bardet-Biedl Syndrome' disease (MeSH:D020788), retrieve from DisGeNET the number of disease genes in a pathway in WikiPathways, the pathway ID or let's call it disease pathway, the number of all genes in the disease pathway, and the number of secondary pathways annotated to genes in each disease pathway. Output the disease name, DisGeNET genes in the pathway, disease pathway ID, the number of all genes in the disease pathway, and the number of secondary pathways. PREFIX wp: <http://vocabularies.wikipathways.org/wp#> SELECT DISTINCT ?DiseaseName count(distinct ?gene) as ?diseasegenesinthepathway ?PathwayID count(distinct ?allGeneInPw) as ?allgenesinthepathway count(distinct ?PathwayID2) as ?totalPathways WHERE { # DisGeNET: get disease-genes ?disease rdf:type ncit:C7057 ; skos:exactMatch <http://id.nlm.nih.gov/mesh/D020788> ; dcterms:title ?DiseaseName . ?gda sio:SIO_000628 ?gene,?disease . ?gene rdf:type ncit:C16612 . # WikiPathways: get gene-pathways SERVICE <http://sparql.wikipathways.org/> { ?geneProduct a wp:GeneProduct ; dc:identifier ?gene ; dcterms:isPartOf ?pathway . ?pathway dc:identifier ?PathwayID ; dc:title ?PathwayName . { SELECT ?PathwayID ?allGeneInPw WHERE { ?geneProduct a wp:GeneProduct ; dc:identifier ?allGeneInPw ; dcterms:isPartOf ?pathway . ?pathway dc:identifier ?PathwayID ; dc:title ?PathwayName . FILTER regex(str(?allGeneInPw), "ncbigene") } } { SELECT ?allGeneInPw ?PathwayID2 WHERE { ?geneProduct a wp:GeneProduct ; dc:identifier ?allGeneInPw ; dcterms:isPartOf ?pathway2 . ?pathway2 dc:identifier ?PathwayID2 ; dc:title ?PathwayName2 . FILTER regex(str(?allGeneInPw), "ncbigene") } # end of select } # end of subquery } # end of service } # end of query ORDER BY DESC(?diseasegenesinthepathway) DESC(?allgenesinthepathway)

    Execute

    Query 2.10: Retrieve the pathways associated with 'Lafora Disease'

    # For 'Lafora Disease' (MeSH:D020192), give me the associated genes from LITERATURE sources in DisGeNET with a score less or equal than 0.2, and the pathways annotated to these disease genes in WikiPathways. Output the disease name, the gene URI, the score, the number of publications, the pathway URI, and the pathway name. PREFIX wp: <http://vocabularies.wikipathways.org/wp#> SELECT DISTINCT str(?DiseaseName) as ?DiseaseName ?gene ?score count(distinct ?publication) as ?numberOfPublications ?PathwayID str(?PathwayName) as ?PathwayName WHERE { # Query DisGeNET for disease-genes ?gda sio:SIO_000628 ?gene,?disease ; sio:SIO_000772 ?publication ; sio:SIO_000253 ?source ; sio:SIO_000216 ?scoreIRI . ?disease rdf:type ncit:C7057 ; dcterms:title ?DiseaseName ; skos:exactMatch <http://id.nlm.nih.gov/mesh/D020192> . ?gene rdf:type ncit:C16612 ; dcterms:title ?GeneName . ?source wi:evidence ?evidence . ?evidence rdfs:label ?label . ?scoreIRI sio:SIO_000300 ?score . FILTER (?score < "0.2"^^xsd:decimal || ?score = "0.2"^^xsd:decimal) FILTER regex(?label, "literature", "i") # Query WikiPathways for gene-pathways SERVICE <http://sparql.wikipathways.org/> { ?geneProduct a wp:GeneProduct ; dc:identifier ?gene ; rdfs:label ?GeneLabel ; dcterms:isPartOf ?pathway . ?pathway dc:identifier ?PathwayID ; dc:title ?PathwayName } } ORDER BY DESC(?score) DESC(?numberOfPublications) DESC(?PathwayName)

    Execute

    -----------------

    EBI RDF Source

    DisGeNET + other SPARQL endpoints (ChEMBL, Ensembl, Uniprot,..)

    Query 3.1: Retrieve the potential drug targets for 'Aarskog Syndrome'

    # For 'Aarskog Syndrome' disease (UMLS_CUI:C0175701), give me the associated proteins from CURATED sources that are targets for molecules in ChEMBL. Output the disease name, the source, the number of supporting evidences for each GDA, the gene, the target, the molecule and the activity. PREFIX cco: <http://rdf.ebi.ac.uk/terms/chembl#> SELECT DISTINCT str(?diseaseName) as ?diseasename ?source count(distinct ?gda) as ?evidences ?gene ?target ?molecule ?activity WHERE { ?gda sio:SIO_000628 ?gene,?disease ; sio:SIO_000253 ?source . ?disease rdf:type ncit:C7057 ; dcterms:title ?diseaseName . ?gene rdf:type ncit:C16612 ; sio:SIO_010078 ?protein . ?protein skos:exactMatch ?uniprot . FILTER regex(?source, "UNIPROT|CTD_human|CLINVAR") FILTER (?disease = <http://linkedlifedata.com/resource/umls/id/C0175701>) FILTER regex(?uniprot, "http://purl.uniprot.org/uniprot/") # Query ChEMBL for activity data { SELECT DISTINCT * WHERE { SERVICE <http://www.ebi.ac.uk/rdf/services/chembl/sparql> { ?uniprot a cco:UniprotRef . ?targetcmpt cco:targetCmptXref ?uniprot . ?target cco:hasTargetComponent ?targetcmpt . ?assay cco:hasTarget ?target . ?activity a cco:Activity ; cco:hasMolecule ?molecule ; cco:hasAssay ?assay . FILTER (?target = <http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2862>) } # end of service } # end of select } # end of subquery } # end of query

    Execute

    -----------------

    Query 3.3.2: Retrieve diseases associated with genes in Ensembl

    # Give me all diseases in DisGeNET associated with a gene in Ensembl with NCBI Gene ID 675. Output the Ensemble gene ID, the disease, and the disease name ordered alphabetically. PREFIX ensemblterms: <http://rdf.ebi.ac.uk/terms/ensembl/> SELECT DISTINCT ?ensemblg ?disease str(?diseasename) as ?diseaseName WHERE { # Query Ensembl for genes SERVICE <https://www.ebi.ac.uk/rdf/services/ensembl/sparql> { ?ensemblg ensemblterms:DEPENDENT ?gene ; obo:RO_0002162 <http://identifiers.org/taxonomy/9606> . FILTER regex(str(?ensemblg), 'ensg', 'i') FILTER (?gene = <http://identifiers.org/ncbigene/675>) } # end of service # Query DisGeNET for associated diseases ?gda sio:SIO_000628 ?gene, ?disease . ?disease a <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C7057> ; dcterms:title ?diseasename . } # end of query ORDER BY ASC(UCASE(str(?diseaseName)))

    Execute

    -----------------

    Query 3.3.3: Retrieve disease coding genes in DisGeNET with disease annotation in UniProt

    # Give me all proteins in DisGeNET encoded by disease genes that have disease annotation in UniProt. Output the protein and the disease annotation. SELECT ?protein ?comment WHERE { ?protein a ncit:C17021; skos:exactMatch ?uniprot . FILTER(strstarts(str(?uniprot), "http://purl.uniprot.org/uniprot")) # Query UniProt for proteins with disease annotation SERVICE <http://sparql.uniprot.org/sparql> { ?uniprot up:annotation ?annotation . ?annotation a up:Disease_Annotation ; rdfs:comment ?comment . } } LIMIT 10

    Execute

    -----------------

    FED6: DisGeNET + Biomodels

    NAMESPACE

    PREFIX sbmlrdf: <http://identifiers.org/biomodels.vocabulary#>


    Query 2.6.1: Retrieve disease proteins that are involved in computational models

    # Give me all proteins in DisGeNET encoded by disease genes that are participants in Biomodels. Output the protein, the model element URI, the type of model element URI, and the qualifier. PREFIX sbmlrdf: <http://identifiers.org/biomodels.vocabulary#> SELECT * WHERE { ?protein a ncit:C17021; skos:exactMatch ?uniprot . FILTER(strstarts(str(?uniprot), "http://purl.uniprot.org/uniprot")) # Query biomodels { SELECT * WHERE { SERVICE <https://www.ebi.ac.uk/rdf/services/biomodels/sparql> { ?modelElement rdf:type ?elementType ; ?qualifier ?protein . ?qualifier rdfs:subPropertyOf sbmlrdf:sbmlAnnotation . FILTER (strstarts(str(?protein), "http://identifiers.org/uniprot/")) } } } } LIMIT 20

    Execute

    -----------------

    Documentation

    About RDF, Linked Data, Semantic Web technologies

    Good introductions to the field are:

    • Wikipedia, always a good place to start!
    • W3C, look at whom develops the standards. The World Wide Web Consortium (W3C) is an international community where Member organizations, a full-time staff, and the public work together to develop Web standards.
    • EBI RDF Platform documentation, the EBI is one of the major Linked Data providers of Life Sciences. There is a comprehensive documentation in its website that is worth reading.

    DisGeNET-RDF Getting Started

    Don't get scared, get started: check the DisGeNET-RDF v3.0.0 - tutorial and the DisGeNET-RDF v4.0.0 - tutorial.

    Guidelines for Linked Data

    Do you want to provide Linked Data? The best place to start is reading the guidelines!

    In this subsection we provide links to documents that had been very useful to develop DisGeNET-RDF.

    HOW-TO:

    Nanopublications

    The Integrative Biomedical Informatics Group is pleased to announce the second publication of the DisGeNET Nanopublications that is a Linked Dataset implemented in combination of the nanopublication approach [nanopub.org] and the Trusty URIs technique [PDF]. It is an alternative way to mine statements about gene-disease associations contained in DisGeNET. Nanopublications are a new way of publishing structured data that allows the tracking of provenance along with the scientific statement. The Trusty URIs is a novel technique to make resources in the Web immutable and verifiable, and to ensure the unambiguity of the data linking in the (semantic) Web. This new Linked Dataset provides nanopublications about scientific statements of human GDAs. These GDAs published as Trusty URI nanopublications are machine-interpretable, immutable, permanent, and verifiable. Each GDA statement has its provenance description providing evidence, attribution, creation time, and further context of its creation. Each GDA is classified as “CURATED”, “PREDICTED”, or “LITERATURE” in the DisGeNET context to categorize the evidence of the statement based on the type of assertion and curation made in the original databases. DisGeNET nanopublications include metadata annotations about the general topic of the nanopublications, i.e. ‘Gene-Disease Association’, semantically described by SIO to facilitate its discoverability in the Semantic Web (see PDF).

    Linked Dataset Description

    The third release of DisGeNET published as nanopublications is a distribution of DisGeNET v4.0 (Nanopublications version v4.0.0.0). The dataset consists of 1,414,902 nanopublications, representing the same number of scientific statements for 429,036 different GDAs with their detailed provenance, levels of evidence and publication information descriptions, all annotated as RDF statements and encapsulated into the nanopublication RDF graphs (5,659,608 graphs in total). Specifically, the dataset is composed of 48,106,668 N-Quads, i.e. RDF triples with their graph (or “context”) added as the fourth member in the tuple (Subject, Predicate, Object, Context), everything being serialized in TriG syntax.

    DisGeNET Nanopublication Schema

    The official guidelines to create nanopublications were used. A DisGeNET nanopublication is modeled by 4 named graphs: head, assertion, provenance and publication information. The head graph defines the structure of the nanopublication by linking to the other graph URIs. The assertion graph contains the description for a specific single GDA assertion. The provenance graph includes provenance, evidence and attribution statements that were directly mapped from the VoID description of the RDF dataset. Finally, the publication information graph includes all the metadata information regarding the nanopublication itself, see figure below (Click on the image to zoom in). The source of data for the DisGeNET nanopublications set is the RDF Linked Dataset version of DisGeNET. To implement Trusty URIs, the GitHub Java implementation was used.

    Nanopublication Example

    Access to the Nanopublications Linked Dataset

    DisGeNET nanopublications can be accessed in two ways: they can be downloaded as a file in TriG format from the download section, and they are deployed in a new decentralized nanopublication server network, which is a distributed server network with a REST API to provide and propagate nanopublications identified by trusty URIs [ref]. DisGeNET nanopublications are registered in datahub with other datasets formatted as nanopubublications. New: For performance reasons DisGeNET nanopublications are not accessible anymore via our SPARQL endpoint. To download the current dataset, which is the nanopublication distribution of the DisGeNET v4.0: nanopubs-v4.0.0.0.

    For performance reasons DisGeNET nanopublications are not accessible anymore via our SPARQL endpoint.

    To download the current dataset, which is the nanopublication distribution of the DisGeNET v4.0: nanopubs-v4.0.0.0.

    SPARQL Example queries

    DisGeNET nanopublications can be explored using the query language SPARQL via a SPARQL endpoint. With illustrative queries we show how to explore GDAs with DisGeNET nanopublications and how to integrate them with relationships published in other LOD sources. As example we can query DisGeNET nanopubs to answer the following question:

    What are the proteins (and their protein interactions) associated to Alzheimer Disease with curated evidence?


    Query 1.1: Retrieving Gene-Disease Associations

    # First, we query DisGeNET for all the genes associated to Alzheimer Disease (umls:C0002395). This query only involves the assertion graph. SELECT DISTINCT ?gene FROM <http: //rdf.disgenet.org/nanopubs> WHERE { GRAPH ?head { ?assertion a np:Assertion . } GRAPH ?assertion { ?gda sio:SIO_000628 ?gene, ?disease . ?gene rdf:type ncit:C16612 . ?disease rdf:type ncit:C7057 . FILTER regex(?disease, "umls/id/C0002395") } } LIMIT 10

    Query 1.2: Filtering By Evidence

    # Second, we filter the prior results with those assertions annotated as CURATED DisGeNET evidence. This query involves the provenance graph. SELECT DISTINCT ?gene ?evidence FROM <http: //rdf.disgenet.org/nanopubs> WHERE { GRAPH ?head { ?assertion a np:Assertion . ?provenance a np:Provenance . } GRAPH ?assertion { ?gda sio:SIO_000628 ?gene, ?disease . ?gene rdf:type ncit:C16612 . ?disease rdf:type ncit:C7057 . FILTER regex(?disease, "umls/id/C0002395") } GRAPH ?provenance { ?assertion wi:evidence ?evidence . FILTER regex(?evidence, "curated") } } LIMIT 10

    Query 1.3: Linking with Other LOD Resources

    # Finally, we cross DisGeNET prior results with the Interaction Reference Index database data, which contains protein-protein interactions (PPI) annotations, through Bio2RDF::irefindex SPARQL endpoint, federating the query. Since in DisGeNET-RDF is also represented the relation between gene and the protein/s that encodes, we are able to cross DisGeNET with Bio2RDF::irefindex by Protein resources through the corresponding linkset to 'http://bio2rdf.org/uniprot:UniProtID'. PREFIX bio2rdf-ifx: <http: //bio2rdf.org/irefindex_vocabulary:> SELECT DISTINCT ?gene ?protein ?protein_dgn ?evidence ?ppi ?protein_irx WHERE { ?gene sio:SIO_010078 ?protein . ?protein skos:exactMatch ?protein_dgn . FILTER regex(?protein_dgn, "bio2rdf.org/uniprot:") GRAPH ?head { ?assertion a np:Assertion . ?provenance a np:Provenance . } GRAPH ?assertion { ?gda sio:SIO_000628 ?gene, ?disease . ?gene rdf:type ncit:C16612 . ?disease rdf:type ncit:C7057 . FILTER regex(?disease, "umls/id/C0002395") } GRAPH ?provenance { ?assertion wi:evidence ?evidence . FILTER regex(?evidence, "curated") } # Get the interactome data from Bio2RDF::irefindex SERVICE <http: //irefindex.bio2rdf.org/sparql> { OPTIONAL { ?ppi a bio2rdf-ifx:Pairwise-Interaction ; bio2rdf-ifx:interactor_a ?protein_dgn ; bio2rdf-ifx:interactor_b ?protein_irx . } } } LIMIT 100

    Version History

    DisGeNET v4.0 RDF Release Information

    The RDF distribution of DisGeNET includes all DisGeNET v4.0 new content, besides new annotation and new linksets:

    • More GDAs comprising more than 17,000 genes and 15,000 diseases as linked data in the Semantic Web.
    • All linksets updated, i.e. all ontologies updated.
    • Disease-phenotype annotation data from the Human Phenotype Ontology.
    • New linksets to the Experimental Factor Ontology (EFO).
    • New annotation: all diseases annotated to the original term(s) of provenance.
    • EFO is also deployed in our SPARQL endpoint such as the Human Disease Ontology, the Human Phenotype Ontology and ORDO, in order to perform queries walking the ontology hierarchy. See examples in the SPARQL section.
    • RDF enhancement, data model changes, and fixed bugs:
      • Updated RDF Schema to encompass new annotations.
      • Changed formal description of the property linking diseases to their phenotypic profile: sio:'is manifested as' (sio:SIO_000341) replaced by sio:'has phenotype' (sio:SIO_001279).
      • Fixed data typing language: language tags correctly added on RDF Literal data types.

    DisGeNET v3.0 RDF Release Information

    The RDF distribution of DisGeNET includes new annotation and new linksets:
    • More GDAs comprising 17 000 genes and more than 14 000 diseases as linked data in the Semantic Web.
    • All linksets updated, i.e. all ontologies updated.
    • New disease-phenotype annotation data from the Human Phenotype Ontology.
    • New linksets to NCI Thesaurus, Orphanet Rare Disease Ontologies (ORDO), and DECIPHER.
    • New taxonomic annotation: all GDAs annotated to the Homo sapiens (Human) taxon.
    • New full metadata description of the dataset compliant with the W3C HCLS and the Open PHACTS specifications
    • (the Open PHACTS specifications specially used for linkset descriptions).
    • More mappings to the Linked Open Data cloud.
    • New and alternative LODEStar SPARQL access.
    • New types of searches: six ontologies are deployed in our SPARQL endpoint such as the Human Disease Ontology, the Human Phenotype Ontology and ORDO, in order to perform queries walking the ontology hierarchy. See an example in the SPARQL section.
    • RDF enhancement, data model changes, and fixed bugs:
      • New "303 URIs" for DisGeNET GDAs and PANTHER class entities.
      • New labels.
      • Primary source evidence better described with the Evidence Code Ontology and new properties.
      • New name descriptions: foaf:name predicate replaced by dcterms:title.
      • Fixed formal description of the DisGeNET Score: Score described as an object property, and not as a datatype property.
      • Fixed formal description of gene-disease association type 'label' from original source attribute: now described as a datatype property by a new predicate: sio:SIO_000255 replaced by sio:SIO_000300.

    DisGeNET v3.0 Nanopublication release:

    The nanopublication distribution of DisGeNET includes all DisGeNET v3.0 gene-disease association statements along with its provenance, evidence, and attribution structured as nanopublications. Please, refer to the release notes and the RDF section in this page for more details.

    DisGeNET v4.0 Nanopublication release:

    The nanopublication distribution of DisGeNET includes all DisGeNET v4.0 gene-disease association statements along with its provenance, evidence, and attribution structured as nanopublications. Please, refer to the release notes and the RDF section in this page for more details.