Datapunk Circuits: A Multi-Dimensional, Open-Source, Genomic Database for Clinical Investigation

FROM THE COVER: Pathfinder Bioinformatics Scholars at the Center of Excellence in Generative Medicine, Univ. of Bridgeport

By Peter D’Adamo, ND

 

The Center of Excellence in Generative Medicine (COEGM) at the University of Bridgeport College of Naturopathic Medicine is rapidly becoming the leading research and industry leader in naturopathic clinical bioinformatics.  Much of the utility and pertinence of the software solutions produced by the COE lies in its recognition that today’s physicians cannot easily parse increasingly available large datasets (such as produced by genome or microbiome reporting services) in any sort of real-time efficient manner.  This is indeed a dilemma, as ‘big data’ approaches (in particular those employing machine learning algorithms) are increasingly pointing the way to the possibilities of more precise treatment based on high-value considerations.

One possible approach to the problem was a more 'generative' approach, as advanced by William Wimsatt: That we are limited beings and the world we try to understand is complex.(1) For Wimsatt, robustness (believing that a particular apple exists because we can see it, feel it, smell it, taste it, and hear it crunch when we eat it) is measured by the multidimensionality of the data models: The more we can detect things in multiple ways, the more we are inclined to believe they exist. Closely connected to robustness are the heuristics, rules of thumb that we use to think about the world and which are foundational to his epistemology.

Heuristics can be wrong or biased but tend to work well when applied to what is robust in the world. For example, a basic generative heuristic is derived from cybernetics and is known as 'law of requisite variety.' In essence, it mandates that the number of states of a control mechanism must be greater than or equal to the number of states in the system being controlled. This heuristic, along with personalized clinical data and robust molecular network data, permits the design of computationally generated, personalized, multi-axis polypharmacy, well-suited for natural products, where the therapeutic index of the agent combination rises significantly, but the overall safety profile remains essentially unaltered. 

The generative paradigm works well in addressing the challenge of how to approach the onslaught of 'big data' into the clinical workspace. Person-specific genomic, metabolomic, and microbiome data files can easily reach into the hundreds of megabytes of information. Clearly, we need analytic tools capable of automating the basic handling, analysis, and integration of this wealth of information. The challenge is daunting, but the potential rewards are almost unfathomable: A more precise clinical impression, where (paradoxically) 'more data is better than better data,' and broadly applied evidence-based conclusions are only one part of the evaluation framework.

Over the past five years in partnership with Datapunk Bioinformatics LLC, the COEGM has developed a variety of computational tools for precision medicine using generative-based algorithms; some proprietary, such as the well-known and regarded Opus23 genomic development platform, along with its two add-on analytic modules, Utopia (microbiome) and Icarus (metabolome). Other apps on the servers are open source and free-to-use.

On August 25, 2018, the COEGM announced the release of ‘Circuits’ a gene-based open source platform combining genomic data in a variety of robust dimensions. Circuits is web-based, has an imaginative and intuitive user interface, and is free to use. I’d like to use the rest of this article to introduce and describe the capabilities of Circuits and invite the readers of the Townsend Letter to explore its possibilities.

The Interface

Circuits resides at the web address https://www.datapunk.net/circuits and can be accessed by any modern browser. Because of the data depiction density, it is not optimized for small handheld smart phones, and it is recommended that a desk or laptop machine be used to access the app.

Circuits’ initial presentation appears as shown below. In its default state, Circuits displays the data for the gene MTOR (mammalian target of rapamycin) as a place-filler. Immediately below the title slug, Circuits displays the known PPI (protein-protein interactions) associated with the target gene (in this case MTOR) as a Cytoscape network.  Afferent nodes are shaded red whilst efferent nodes are colored green.  This network is an effective way to navigate Circuits as any node will bring up its related gene and load it into the main window. Users can also search for any gene/protein by using the traditional search input at the top right of the screen. To help users refine their query, the search feature will autocomplete over 30K currently recognized gene symbols.

Circuits Main Window Graphic

Scrolling downwards, the user will see the six scrollable panes that contain the data mash-ups. From the top left we can see a detailed description of the gene, and across from that, a pane that depicts the available data on agents associated with the expression of the gene. This is a unique, human-curated database that was originally developed for use with the Opus23 platform.

Circuits employs a variety of modal popups to provide additional contextual data. Clicking on any agent will trigger a popup window that draws a unique radar plot that we call the ‘genomic logo’ of the agent. This logo depicts the strength, action, and targets of the indicated agent, using a complex algorithm based on study design, scope, and subject type.

 

Model Popup Showing Data for Alpha-Lipoic Acid Graphic

 

The next row of two panes further down show disease associations and clinically relevant SNPs (single nucleotide polymorphisms) associated with the target gene. Pathology data is derived from ClinVar, OMIM and GWAS, while SNP associations are from GWAS and the exclusive human-curated SNP database developed for use in the Opus23 program. Clicking on a hyperlinked disease or SNP will also launch informational popups.

The next row of panes highlights, on the left, any adverse drug reactions linked to specific polymorphisms of the target gene and known tissue and organ distributions of the target gene.  As with all internal hyperlinks, clicking on any link in these panes will trigger a popup containing additional data.

The next row of informational panes shows, on the left, pathway regulations associated with the target gene pathway and its effect (either up-regulation or down-regulation). The bottom right pane shows etiological links associated with the target gene that are inferred via the target gene’s disease associations.

The final single pane displays HMDB (Human Metabolome Database) linkages to the target gene. Clicking on the metabolite common name will trigger a popup display detailing the metabolite.

The Data

Most of the data used by Circuits was developed initially for use by the Opus23 application from publicly available repositories. Exceptions include the SNP and agent expression datasets, which were developed entirely by Datapunk human curators. The PPI, etiome, and diseaseome datasets were enriched by combining multiple source data, in some cases programmatically through structured machine earning. A few of the larger sources are listed as references.(2-8) It should be noted that the publication date of several of the references may be over several years old; however, these articles typically announce and describe the dataset, the actual databases they represent are almost all continuously updated; and through its network of application programming interfaces (APIs), so is Circuits.

Circuits: Dataset Linkages Graphic

 

 

Test-Driving Circuits

Readers are encouraged to ‘surf’ Circuits and explore the target genes that seem more interesting. Click away! However here are a few hard links to help get you started.

·       The ABO ‘secretor’ gene (FUT2)

https://www.datapunk.net/circuits/index.pl?FUT2

·       Mitochondrial enzymes that catalyze the oxidative deamination of amines, such as dopamine, norepinephrine, and serotonin (MAOA and MAOB)

https://www.datapunk.net/circuits/index.pl?MAOA

https://www.datapunk.net/circuits/index.pl?MAOB

·       Catechol-O-methyltransferase (COMT) catalyzes the transfer of a methyl group from S-adenosylmethionine to catecholamines, including the neurotransmitters dopamine, epinephrine, and norepinephrine.

https://www.datapunk.net/circuits/index.pl?COMT

·       PPAR-gamma is a regulator of adipocyte differentiation. Additionally, PPAR-gamma has been implicated in the pathology of numerous diseases including obesity, diabetes, atherosclerosis, and cancer.

https://www.datapunk.net/circuits/index.pl?PPARG

·       MTHFR catalyzes the conversion of 5,10-methylenetetrahydrofolate to 5-methyltetrahydrofolate, a co-substrate for homocysteine (HCy) remethylation to methionine.

https://www.datapunk.net/circuits/index.pl?MTHFR

 

The Code

The server-side portion of Circuits was written in the Perl language, the ‘Swiss Army Chainsaw’ of bioinformatics.  Client-side elements, such as network depictions and graphic displays of information, were coded in JavaScript using the Cytoscape JS and HighCharts JS frameworks. The PPI network was normalized using the Graphviz graphing package and the CPAN Graph module.

The Easter Egg

Users can store particularly interesting or important genes in a non-tracking cookie ‘wallet,’ for long-term use, thus allowing them to retrace prior investigations.

I hope the Townsend Letter readers have half as much fun exploring Circuits as I did envisioning and coding it.  We at the Pathfinder Scholar Program at the COEGM are planning on expanding our open-source offerings to include a microbiota explorer that uses taxon interaction networks and Markov chains to produce a multigenerational approach to eubiosis; and a small metabolite (metabolome) explorer that employs machine learning classifiers to generate metabolic patterning characteristics.  I’ll make sure to alert the readers when these tools become available.

 

References

1.     Wimsatt W. Re-Engineering Philosophy for Limited Beings: Piecewise Approximations to Reality. Cambridge: Harvard University Press. 2007

2.     Landrum MJ, et al.  ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 2018 Jan 4. PubMed PMID: 29165669.

3.     Prasad, TSK, et al. Human Protein Reference Database - 2009 Update. Nucleic Acids Research. 2009;37, D767-72. 

4.     Liu YI1, Wise PH, Butte AJ. The "etiome": identification and clustering of human disease etiological factors. BMC Bioinformatics. 2009 Feb 5;10 Suppl 2:S14.

5.     Kaplun A, et al. PGMD: a comprehensive manually curated pharmacogenomic database. The Pharmacogenomics Journal. 2016;16:124–128. 

6.     Thul PJ, Lindskog C. The human protein atlas: A spatial map of the human proteome. Protein Sci. 2018 Jan;27(1):233-244.

7.     Barabási A-L,  Gulbahce N, Loscalzo J. Network Medicine: A Network-based Approach to Human Disease.  BMC Bioinformatics. 2009; 10(Suppl 2): S14.

8.     Wishart DS, et al., HMDB 4.0 — The Human Metabolome Database for 2018. Nucleic Acids Res. 2018. Jan 4;46(D1):D608-17. 29140435

 

 

Peter J. D’Adamo, ND

Distinguished Professor of Clinical Medicine and Bioinformatics

Center of Excellence in Generative Medicine

University of Bridgeport, College of Naturopathic Medicine

https://www.coegm.com/