Attention data scientists and genetic and computational toxicologists! We are now accepting applications for the EMGS Bioinformatics Challenge 2020. It will be held at the annual meeting from September 12th to 16th, 2020 in Palm Springs, CA. The goal of this competition is to generate interaction between EMGS members and bioinformatics/data science experts. We encourage members to work together to develop novel tools and approaches that harness publicly available “big data” to identify signatures of genotoxic hazards and provide insight into their mechanisms of action.

Please submit a 250-word abstract with a data visualization and/or any inquiry to by April 15th, 2020. You will receive a confirmation email upon submission. Selected participants will receive (a) a $300 monetary award for academic participants and (b) a membership waiver for non-member participants. The abstracts will be evaluated by four criteria:

  • Model performance in classifying/predicting carcinogens
  • Identification of a novel genotoxic mode(s) of action
  • Effectiveness of the data visualization
  • Originality and innovation

Participating groups will then be invited to send representatives to present their work at the EMGS 2020 annual meeting and compete for the grand prize of $1,200.

We welcome any data-intensive study that involves genomics, proteomics, metabolomics, and high-throughput data, including (but not limited to):

1. Tox21 is an initiative carried out by multiple scientific agencies within the U.S. government, i.e. Environmental Protection Agency, National Toxicology Program, and Food and Drug Administration. The purpose of this effort is to use high-throughput in vitro tests to examine the biological/toxicological effects of a 10,000-chemical library, primarily made up of environmental chemicals. The latest invitroDB v3.1 dataset is now publicly available for download and review. Some example assays and endpoints include p53 status, γH2AX, multiple assays reporting on other markers of DNA damage, micronucleus formation, apoptosis, cell cycle, numerous cytotoxicity indicators, etc.

2. Health Canada has committed to provide toxicogenomic data on over 100 compounds in TK6 cells exposed to genotoxic and non-genotoxic substances. Litron Laboratories has also made a 67+ compound dataset from the MultiFlow DNA Damage assay publically available for this bioinformatics challenge.

Health Canada document (downloadable pdf)

MultiFlow DNA Damage assay data (downloadable pdf)

3. DrugMatrix is one of the world’s largest databases of toxicogenomic profiles from carefully designed/standardized rat experiments. The database houses toxicogenomic profiles of over 600 compounds from rodent liver, kidney, muscle, heart, bone marrow, spleen, intestine, brain, and cultured rodent hepatocytes in dose-response design, with multiple time points, and with matching apical data.

4. TG-Gates is an openly available database produced by the Japanese Toxicogenomics Project Consortium. It consists of toxicogenomic data for 170 compounds (mostly drugs). Carefully designed experiments were conducted in rats (liver and kidney) and primary human hepatocytes at multiple doses and time points.

5. NCI-60 Human Tumor Cell Lines Screen is the anti-cancer screening program of the U.S. National Cancer Institute (NCI) to screen over 100,000 chemical compounds and natural products. It includes 60 human tumor cell lines, such as leukemia, melanoma, and cancers of the lung, colon, brain, ovary, breast, prostate, and kidney. These cell lines have been characterized at the molecular level and the data are available through Genomics and Pharmacology Facility, Developmental Therapeutics Program of the NCI. These omics data include gene expression on mRNA, miRNA and protein levels; sequence variants, methylation status of DNA methylation sites, metabolomics data, CNVs, as well as spectral karyotyping data. Another extensive source of drug response and molecular profiling data (whole exome sequencing, gene expression, copy number alteration and DNA methylation) for ~1000 cell lines is Sanger Institute (Genomics of Drug Sensitivity in Cancer).

6. Broad Institute’s Connectivity Map provides gene expression data for ~1,000 genes from L1000 high-throughput platform that were found to be a representative subset of the whole transcriptome, being useful to infer ~80% of the remaining unmeasured transcriptome. These expression data are available on multiple levels of processing for several cell lines chemically perturbed by nearly 20,000 small molecules or genetically perturbed by overexpression or knockdown of ~5,000 genes.

Please find a downloadable flyer here: Bioinformatics Flyer (downloadable)

Last update: July 29th, 2019