Translating the structure of plastics to the language of computers

Polymers — the long repetitive molecules that make up materials like plastic, silicon, nylon, and rubber — are notoriously difficult to model on computers. A single polymer molecule can contain thousands of atoms that take on a variety of structures even within one cohesive material. Being able to model these structures, however, could give researchers a better way to develop new polymeric materials and predict their properties.

Now, researchers at the Pritzker School of Molecular Engineering at the University of Chicago have created a tool that lets them represent collections of long, complex polymers in representations that can be easily processed by computers and artificial intelligence programs.

“This is an exciting step toward being able to streamline the process of new polymer development,” said Juan de Pablo, Liew Family Professor of Molecular Engineering and senior author of the new work, published in Digital Discovery. “If we want to solve some of the biggest engineering challenges in the world today, we need to be able to design new polymers more quickly.”

The tool, called Generative Big Simplified Molecular Input Line Entry System, or G-BigSMILES, is openly available to the research community.

The need for speed

Despite the plethora of polymers that make up the vast majority of consumer products today, researchers see a pressing need for new types of polymers. Most current polymers are not recyclable, leading to plastic pollution and a constant need to extract materials from the earth for new polymer production.

Ludwig Schneider
Postdoctoral researcher Ludwig Schneider

“Polymers pose a huge environmental challenge right now because so many polymers are made out of petrochemicals,” said Ludwig Schneider, a postdoctoral researcher and first author of the new work. “New polymers can both be more environmentally friendly and offer new capabilities as the building blocks of things like batteries, micro-electronics, medical devices, and water filtration membranes.”

Today, researchers striving to design new polymers often go through many rounds of trial and error to pinpoint the right chemical formula for a polymer to have the properties they want. However, the advent of high-powered artificial intelligence programs offers a way to better predict these properties before producing the polymers.

The only problem: computers aren’t very good at interpreting the line drawings that scientists use to draw the structures of polymers. Moreover, creating line drawings for large polymers — which can contain thousands of atoms arranged in many ways — is a time-consuming task for researchers and can fail to represent these polymers with sufficient detail.

While most polymers consist only of carbon, hydrogen, oxygen, and nitrogen atoms, the variety of ways these atoms can be connected is enormous. In most cases, the carbon atoms make up the backbone of the polymers, and other chemical groups can be attached in many different ways, each giving the chains unique chemical properties. Describing this assortment of molecules is a challenge.

“Scientists can sit down and create line drawing of all these polymer structures, but it’s incredibly tedious,” said Schneider. “Our goal was to have a better way for computers to represent polymers in a compact and understandable way.”

Capturing diversity

A system known as SMILES already existed that could be used to represent small molecules with strings of computer code. However, the complexity of polymers, and the fact that a single polymer can take on different conformations, made the standard SMILES fall short of representing most polymers.

Schneider created a new version of G-BigSMILES to overcome these challenges, giving the system the capability to better represent the repetitive nature of polymers as well as capture the variation in polymer structure.

With G-BigSMILES, a scientist can write a single line that represents the entire ensemble of structures a polymer can take on to form a material.

“It simplifies things a lot,” said Schneider. “Instead of writing these repetitive sequences fifteen-thousand times, you can write them once and just notate that it repeats fifteen-thousand times.”

Schneider, de Pablo, and their colleagues are now working on how to pass the G-BigSMILES data along to machine learning platforms that can take the data on polymers and learn to predict their properties. They hope, in the long run, that will speed up the field of polymer discovery and lead to new, innovative materials for scientists upon which to build future technology.

Citation: “Generative BigSMILES: an extension for polymer informatics, computer simulations & ML/AI,” Schneider et al, Digital Discovery, November 17, 2023. DOI: 10.1039/D3DD00147D

Funding: This work was supported by an Eric and Wendy Schmidt AI in Science Postdoctoral Fellowship, and by the National Science Foundation Convergence Accelerator.