6 min read

Why We Should Drop (Graph-based) Bond Order and No Longer Use Aromaticity

Written by: David Mobley, PhD


OpenFF force fields assign parameters based on queries operating on the chemical graph for molecules, via the SMARTS language. This language is rather flexible, leaving a wide variety of ways in which parameters could be assigned. In OpenFF, we call the specific choice of substructure searches used to assign parameters “chemical perception”.

Here, I will argue that our force fields and chemical perception rely too heavily on explicit bond order and aromaticity and that these should largely be dropped, with the force field being re-fit from scratch based on new typing that ignores these graph properties.

Background/history

Current OpenFF force fields assign parameters based on chemical perception that was largely inherited- or ported from AMBER-family force fields rooted in “parm99”-style typing used in AMBER’s parm96/parm99 and then picked up by Christopher Bayly’s “parm@Frosst” force field and GAFF. This is partly due to the origins of OpenFF – Bayly spent sabbatical at UC Irvine in the Mobley Lab in 2016 porting parm@Frosst typing into OpenFF’s SMARTS-based format by hand.

Bayly’s work creating initial typing for OpenFF leaned heavily on two models: a specific aromaticity model that he felt captured the requisite features he needed, the MDL aromaticity model as implemented by OpenEye (and, later, by RDKit as we worked to ensure we could use the requisite model in both packages); and on explicit use of bond orders to assign key valence parameters – most critically, to recognize the difference between single, double and triple bonds for assigning bond stretching parameters and to a lesser extent, even angle and torsional parameters.

Bayly’s work led to an initial force field which worked well, especially after a consistent refit, and the basic chemical perception employed has provided the foundation for all of our force fields to date (as of early 2025), with modest extensions and improvements in a variety of places but without any major changes in overall approach.

Problems with the use of explicit bond order and aromaticity

Core problems with the use of explicit bond order and aromaticity center around conjugation, choice of aromaticity model, and resonance forms.

In substructure searches, explicit bond orders may be used to tag a particular bond as single, double, or triple. Superficially, this makes sense, especially for bond stretching parameters, as single, double, and triple bonds would be expected to have dramatically different length and stiffness. However, the underlying electron distribution of course is affected by surrounding chemistry, and thus cannot be neatly binned into exact bond orders on the basis of the chemical graph. If a bond order for a particular bond is closer to 1.5, should its parameters be those of a single bond, a double bond, or somewhere in between? What if the bond order is 1.3, or 1.6? Who decides? Operations based on a simple graph-based bond order neatly divide chemistry into integer bins even if the underlying reality ought not to be so simple.

Choice of aromaticity model (or the use of aromaticity at all) further complicates the situation; certain bonds are tagged as aromatic bonds (“:”), for example. This occurs in our current force fields both for bond and torsional parameters (though not for angles; at present these more frequently separate single bond (“-”) and any bond (“~”) categories, with some use of double or ring bonds, but never aromatic). However, there are various different aromaticity models which can be used to decide which chemistries are or are not aromatic, making a force field that uses explicit bond orders be subject to the whims of the particular choice of aromaticity model. Concretely, does an aromatic ring get parameters for alternating single and double bonds (which seems incorrect!) or a single set of parameters all around the ring? And which rings ARE even aromatic? If explicit bond orders are used in parameter assignment, the choice of aromaticity model then substantially impacts parameter assignment within rings (if they might be aromatic), but also assignment of parameters for groups leaving rings (e.g. bridgehead bonds, angle bending for groups leaving rings, etc.). This makes force field quality and behavior dependent on the choice of and quality of a particular aromaticity model.

Resonance structures also pose challenges; carboxylate (COO-) is a particularly common example. Typical molecular mechanics force fields usually treat negatively charged carboxylate as having two equivalent oxygens, each bearing a fraction of the overall negative charge. However, a typical chemical graph representation may have one of the oxygens as being single bonded with an overall negative formal charge, and the other being a neutral oxygen with a double bond. Thus, to properly parameterize the two as equivalent, the underlying chemical perception must recognize that this is a carboxylate group and so in this context a single bonded oxygen and a double bonded oxygen should both get the same parameters. Similar issues arise for nitro groups, but also for a wide variety of other functional groups. While many of these could be enumerated and explicitly treated by a force field (recognizing such functional groups and adding explicit bond-order-based typing to correctly parameterize them) this adds considerable complexity to typing.

A final challenge in this space is the need for standardization to handle variations in user input, such as different choices of resonance structure given as input, or different choices of kekulization, or representations of aromatic rings in their kekule form or as aromatics. A production level force field ought to give final parameterizations which aren’t sensitive to the particular details of input as long as they provide a reasonable representation of the underlying chemistry. How, then, to standardize? Should we sanitize all input to follow a particular aromaticity model?

A radical approach - stop typing based on aromaticity and bond order

All of these problems could be avoided if we redesign our force fields to avoid using explicit bond orders and assign all parameters based on element, connectivity, and perhaps minimal other descriptors such as formal charge. For example, a carboxylate oxygen is an oxygen with a single connected carbon atom; there is no need to know the bond order of the bond in order to correctly parameterize it, or similarly for a nitro group.

Aromaticity is somewhat more complex, but a molecule is aromatic (or not) on the basis of the underlying chemistry and and there is no need to use explicit bond order (“:”) to recognize aromaticity, since this requires the substructure search language to apply a particular aromaticity model. Instead, the parameterization engine can simply assign appropriate parameters by whatever means necessary – either via an aromaticity model if necessary, or via some other means.

Dropping aromaticity and bond order from chemical perception also resolves the user input problem; if the substructure searches used to assign parameters are invariant to the choices the user makes in terms of the specific chemical graph provided as input, there is no need to standardize the graph by imposing a particular aromaticity model or set of bond orders.

Partial charge assignment provides an example of this approach

While most of our latest force field uses bond order at present, our approaches to assigning partial charges already give us a successful example of what happens without explicit bond order. Our initial force fields have used AM1-BCC charges, which use a semiempirical QM approach to assign charges. This operates on the underlying molecule, without any awareness of bond order and with accurate representation of the underlying resonance, so that equivalent atoms (such as carboxylate oxygens) end up being treated consistently regardless of user input.

More recently, we developed NAGL, a graph-network-based approach for assigning charges that works similarly. This doesn’t use explicit bond orders, but rather uses elements, connectivity, average formal charge, and ring membership, and yields charges that correctly handle these cases regardless of user input and aromaticity model.

These success highlight that a more general approach is possible.

Parameter interpolation based on bond order

Previously, we have experimented with – and implemented into our force field spec – the use of partial (Wiberg) bond order for the interpolation of parameters. The idea is that instead of assigning bond or torsional parameters (especially) based on graph-based bond order, when we assign partial charges with AM1-BCC or a similar approach, we would calculate a partial bond order for each bond and then the parameterization engine would use this to assign appropriate parameters. Ideally, in this approach, substructure queries could be written to use the any bond (“~”) primitive rather than an explicit bond order, and parameters would be assigned based on the underlying partial bond order (accounting for conjugation, resonance, and other effects) without any need for a particular aromaticity model or other such choices.

While science on this bond-order-based approach at OpenFF is temporarily on hold because our early work on partial bond order-based interpolation failed to demonstrate improved accuracy, the overall approach remains appealing because it fixes so many problems which are otherwise extremely difficult to tackle.

Other approaches in this vein have shown signs of promise; for example, the work of Moitessier, Labute and collaborators (DOI 10.1021/acs.jcim.7b00645 and related) shows considerable promise for being able to infer torsional parameters based on relatively simple calculations about the local chemical environment.

What should we explore?

So far, OpenFF has not had time and resources to explore this area substantially. An obvious approach would be to “start over” with a force field which completely avoids using explicit bond order in assigning parameters and instead assign all parameters based on element, connectivity, and perhaps minimal other descriptors such as formal charge.

This approach would likely not be enough on its own; to succeed, the local chemical environment in the absence of bond order should be used to broadly look up a particular category of typing, and then a model with access to the underlying chemistry and chemical graph would be used to assign the proper parameters. Such a model could be designed to produce parameters similar to or identical to those assigned at present (e.g. binning chemistry neatly into single, double, triple or aromatic bonds), or it could be made more general so that it gives a continuum of parameters based on the specific local chemistry, fully capturing conjugation in an interpolated manner.

Overall, it’s exciting to think that such an approach might be more easily generalizable than our current typing scheme, work better to capture more diverse chemistries, and more easily capture and/or avoid issues with resonance and aromaticity models while avoiding problems caused by varying choices made in user input.