Who Am I?

Hi, my name is Duo Peng, I am a scientist with training in Genetics (B.S.), Bioinformatics (M.S.) and Cellular Biology (Ph.D.).

Currently, I am a senior computational biologist at the Chan Zuckerberg Biohub San Francisco.

My research interests:

1. Data-driven modeling of host response to pathogens.

a. Species-wide in toto public data mining for paired host and viral gene expression, build machine learning models to resolve host gene signatures at different resolutions, and predict host response under perturbations.

b. Develop a comprehensive understanding of the complex effects of host genes on virus infection outcomes through data mining of virus-centric pooled CRISPR screen datasets.

2. Developing scalable computation pipelines to automate biology-informed design of CRISPR experiments and the validation of genome editing outcomes.

3. Data-driven understanding of subcellular architecture of the human proteome.

4. Data-driven understanding of the landscape of cellular responses from multimodal assays.

Education

University of Georgia, Athens, Georgia, U.S.A.

2012-2017

Dissertation: Developing CRISPR/Cas9 for Genome-Wide Gene Editing in the Human Pathogen Trypanosoma cruzi

University of Georgia, Athens, Georgia, U.S.A.

2012-2016

Dissertation: Frequent Intra-Family Recombination in the Largest Repository of Antigen Variants in The Protozoan Pathogen Trypanosoma cruzi

Wuhan University, Wuhan, Hubei, P.R.China

2006-2010

Thesis: Predicting Trans-splicing by Analysis of RNA-seq Sequencing Data

Selected Publications

See Google Scholar for a complete list

1.  D. Peng*, M. Vangipuram, J. Wong, M.D. Leonetti* (2024) protoSpaceJAM: an open-source, customizable and web-accessible design platform for CRISPR/Cas insertional knock-in. Nucleic Acids Research (* corresponding authors) [link]

2.  D. Peng, E.G. Kakani, E.Mameli, C. Vidoudez, S.N. Mitchell, G.E. Merrihew, M.J. MacCoss, K. Adams, T.A. Rinvee, W.R Shaw, F. Catteruccia. (2022) A male steroid controls female sexual behaviour in the malaria mosquito. Nature [link]

3.  D. Peng, R. Tarleton. (2015) EuPaGDT: A Web Tool Tailored to Design CRISPR Guide RNAs for Eukaryotic Pathogens. Microbial Genomics [link]

4.  D. Peng, S.P. Kurup, P.Y. Yao, T.A. Minning, R.L. Tarleton. (2014) CRISPR-Cas9-mediated Single-gene and Gene Family Disruption in Trypanosoma cruzi. mBio [link]

5.  D. Peng, X. Gu, L.J. Xue, J.H. Leebens-Mack, C.J. Tsai. (2014) Bayesian phylogeny of sucrose transporters: Ancient Origins, Differential Expansion and Convergent Evolution in Monocots and Dicots. Frontiers in Plant Science [link]

6.  D.B. Weatherly*, D. Peng*, RL Tarleton. (2016) Recombination-driven Generation of the Largest Pathogen Repository of Antigen Variants in the Protozoan Trypanosoma cruzi. BMC Genomics (* equal contribution) [link]

7.  Z. Zuo*, D. Peng*, X. Yin, X. Zhou, H. Cheng, R. Zhou. (2013) Genome-wide Analysis Reveals Origin of Transfer RNA Genes From tRNA Halves. Molecular Biology and Evolution (* equal contribution) [link]

8.  K. Werling, R. Shaw, M. Itoe, K. Westervelt, P. Marcenac, D. Paton, D. Peng, N. Singh, A. Smidler, A. South, A. Deik, L. Mancio-Silva, A. Demas, E. Calvo, S. Bhatia, C. Clish, F. Catteruccia (2018) Steroid Hormone Function Controls Non-competitive Plasmodium Development in Anopheles. Cell [link]

9.  W. Wang, D. Peng, RP Baptista, Y Li, JC Kissinger, RL Tarleton. (2021) Strain-specific genome evolution in Trypanosoma cruzi, the agent of Chagas disease. PLOS Pathogens [link]

Work Experience

Senior computational biologist
2024.07-present

1. Build machine learning models to resolve host gene signatures at different resolutions, and predict host response under perturbations.
2. Data-driven understanding of the landscape of cellular responses from multimodal assays

Bioinformatics data scientist II
2023.01-2024.06

1. Data-driven understanding of subcellular architecture (preprint).
2. Species-wide data mining for paired host and viral gene expression, build machine learning models to resolve host gene signatures at different resolutions.

Bioinformatics data scientist I
2021.11-2022.12

1. ProtoSpaceJAM: Genome-wide CRISPR knock-in design at scale using biologically informed algorithms (paper, webapp).
2. DeepGenotype: Calculate frequencies of protein-level mutations from deep-sequencing reads of CRISPR-edited cells (codebase).

Software developed

1.  Webserver: Eukaryotic Pathogen gRNA design tool (This webserver had 24,907 users, 49,267 visits, 17,972 job requests from 91 countries [Google Analytics, 2021])
    online access (hosted at the University of Georgia)

2.  Webserver: ProtoSpaceJAM - CRISPR HDR design at scale
    online access (hosted at the Chan Zuckerberg Biohub)

3.  Automated Image Preprocessing and Malaria-oocyst Recognition Tool
    online access (hosted at AWS cloud)
    code base:
        Preprocessing
        Recognition