Machine Learning Guides Peptide Nucleic Acid Flow Synthesis and Sequence Design
The Pentelute Lab aims to invent new chemistry for the efficient and selective modification of proteins, to ‘hijack’ these biological machines for efficient drug delivery into cells and to create new machines to rapidly and efficiently manufacture peptides and proteins.
Pentelute Lab, Chemistry, MIT, Chemistry Department, Boston, Cambridge, Biology, Peptides, Peptide, Proteins, Science, Rapid, Brad Pentelute, Brad,
17958
portfolio_page-template-default,single,single-portfolio_page,postid-17958,bridge-core-3.0.1,qode-page-transition-enabled,ajax_fade,page_not_loaded,,paspartu_enabled,paspartu_on_top_fixed,paspartu_on_bottom_fixed,qode_grid_1200,qode_popup_menu_push_text_top,qode-theme-ver-28.7,qode-theme-bridge,disabled_footer_top,wpb-js-composer js-comp-ver-6.8.0,vc_responsive
 

Machine Learning Guides Peptide Nucleic Acid Flow Synthesis and Sequence Design

Machine Learning Guides Peptide Nucleic Acid Flow Synthesis and Sequence Design

Advanced Science Volume9, Issue34 December 8, 2022 - 2201988

Chengxi Li, Genwei Zhang, Somesh Mohapatra, Alex J. Callahan, Andrei Loas, Rafael Gómez-Bombarelli, Bradley L. Pentelute

Abstract

Peptide nucleic acids (PNAs) are potential antisense therapies for genetic, acquired, and viral diseases. Efficiently selecting candidate PNA sequences for synthesis and evaluation from a genome containing hundreds to thousands of options can be challenging. To facilitate this process, this work leverages machine learning (ML) algorithms and automated synthesis technology to predict PNA synthesis efficiency and guide rational PNA sequence design. The training data is collected from individual fluorenylmethyloxycarbonyl (Fmoc) deprotection reactions performed on a fully automated PNA synthesizer. The optimized ML model allows for 93% prediction accuracy and 0.97 Pearson's r. The predicted synthesis scores are validated to be correlated with the experimental high-performance liquid chromatography (HPLC) crude purities (correlation coefficient R2 = 0.95). Furthermore, a general applicability of ML is demonstrated through designing synthetically accessible antisense PNA sequences from 102 315 predicted candidates targeting exon 44 of the human dystrophin gene, SARS-CoV-2, HIV, as well as selected genes associated with cardiovascular diseases, type II diabetes, and various cancers. Collectively, ML provides an accurate prediction of PNA synthesis quality and serves as a useful computational tool for informing PNA sequence design.

Category
2022, Publications