Cite as: Benner, S. A. (2023) “Assembly Theory and Agnostic Life Finding”. Primordial Scoop, e20230324. https://doi.org/10.52400/APJX5069
To meet its covenant with the taxpaying public, NASA must today be equipping missions with instruments to detect life on relevant alien worlds. In this context, alien worlds are “relevant” if they have liquid water and reside in our Solar System.
Despite protestations to the contrary, NASA is not actually doing this. Not even on Mars, the locale most likely to hold alien life and, conveniently, also the least expensive relevant destination for a vacation.
NASA is collecting stones on Mars. NASA is flying drones on Mars. NASA is visiting zones on Mars, but only those selected to be unlikely to hold extant life.
And NASA has not looked for extant life on Mars in any place where it might find it since the Viking missions in 1976.
Part of the reason for this is programmatic. NASA administrators frequently confide a fear that public support for NASA might erode if a mission explicitly sent to seek life on Mars fails to find life there. This is a popular interpretation of the outcome of the Viking 1976 missions.
But “NASA” does not decide what missions to fly. Instead, what flies is driven by the “communities” that NASA calls upon to write white papers, do peer reviewing, and select missions.
Unfortunately, these communities are largely inexpert in organic chemistry. And, as the American Chemical Society will remind you, chemistry is the “central science”. Attempting to find alien life without having experienced knowledge of organic chemistry is like attempting to do physics without having experienced knowledge of calculus.
Exemplifying this, a recent NASA White Paper purporting to define “standards” for evaluating claims of alien life detection showed that these communities do not understand even the limited organic chemistry performed by the Viking 1976 landers. The very same White Paper also shows how the very community that might select missions for NASA to search for alien life lacks the organic chemical expertise needed to answer crisply the question: “How would we detect alien life if we were to encounter it on Mars?
Several groups have attempted to improve NASA in this respect, with imperfect success.
For example, recognizing that Darwinian evolution is universal to all biology, synthetic biologists have built alternative molecular system that support molecular evolution in water. The “Polyelectrolyte Theory of the Gene” (PETG) emerged from this work. PETG teaches that to create information by Darwinian evolution, that information must be encoded on linear biopolymers that must have two structural features:
(a) The linear biopolymers must have a repeating backbone charge
(b) They must be composed of a limited number of size-shape interchangeable building blocks.
We can easily design instruments to concentrate such molecules from highly dilute solutions. Such instruments can support a truly agnostic life finding instrument, an ALF.
PETG and its value in constructing ALFs has been known for 20 years. Nevertheless, NASA has still not implemented it. Indeed, the aforementioned “standards” White Paper, written just last year, does not even mention the PETG, or synthetic biology, or available agnostic life detection methods that might exploit it.
Indeed, that White Paper does not even recognize the distinction between strategies to find life that seek molecules necessary to support Darwinian evolution, versus strategies that seek molecules that are products of Darwinian evolution.
Which brings me to Assembly Theory (AT).
AT was recently proposed as an approach to better seek the products of Darwinian systems. The White Paper does mention AT, but without comment.
Assembly Theory will next week be discussed at a non-NASA meeting focused on Enceladus at UCLA. My cardiac surgeon has other plans for me on Monday, so I will not be there.
Therefore, the organizer, Carolyn Porco, asked me to blog a good faith critique of AT to assist in the discussion at UCLA. Here it is. Unfortunately, the comments that follow are long; I did not have time to write something short.
One encounters many challenges when attempting to evaluate AT, if we use a recent paper (Marshall et al. 2021) as its scriptural text. Much of the challenge arises because that text, like Scripture, is confused.
To be pedantic, let us start with the first sentence relevant to the exposition of AT in that text:
“[W]e postulate that living systems can be distinguished from non-living systems as they produce complex molecules in abundance which cannot form randomly so what is needed is a way to quantify this complexity.”
Chat GPT struggles with this sentence. Note the ambiguity in the “as” clause due to a missing comma, as well as the “which” clause, perhaps for the same reason. We re-write the sentence to capture what the authors presumably mean:
“We propose that living systems differ from non-living systems, in that the first contain many copies of many molecules that have a certain level of complexity (to be defined separately), while the second do not.”
Now, it is transparently false that non-living systems cannot contain molecules that are complex, no matter how complexity is defined.
For example, most of the molecules within a lump of coal are complex, no matter how complexity is defined. Their covalent assembly is not repetitive. They were made by many steps, primarily from non-complex systems. And the same is true for the organic molecules in carbonaceous chondrites.
Thus, the qualifier “in abundance” is not a “throw away” in Marshall. To be a biosignature, the same complex molecule must be present in abundance.
Thus, we do not need “a way to quantify this complexity”, at least not exactly. Rather, we need a metric to quantify a combination of complexity and abundance.
“Abundance” appears later in Marshall as an operational concept (“detectable” at >10,000 identical copies). But this concept must be coupled to a statement about sample size. I suspect that in a kilogram of coal, one can find 10,000 exemplars of many molecules that meet the AT definition of complexity. After all, a kilogram of coal contains many molecules.
Separately, Marshall argues that non-life “cannot provide the specific set of biases needed to produce complex molecules such as Taxol in abundance.” This statement would also be false, if Taxol were to be a “privileged” molecule.
Taxol might be the privileged thermodynamic energy minimum of an ensemble of atoms, and therefore be formed “in abundance” for thermodynamic reasons. Alternatively, Taxol may arise from a sequence of kinetically favored processes. Oligomeric RNA might fall into this category on a Hadean surface dominated by basaltic glass containing borate and condensed phosphate.
We assume that Marshall fully understands this, and finds this discussion infuriatingly pedantic. But it is worth mentioning to make a point.
Lee Cronin, a principal behind AT, once asked me what the largest known molecule was. Lee thought that the correct answer was something made of molybdenum. I thought that the correct answer was the Cullinan Diamond, in the crown jewels of King Charles III (now in several pieces; it had children).
I thought that my answer was obviously correct. Lee thought that my answer was obviously uninteresting.
But if you go to a vein of kimberlite in South Africa, you will discover > 10,000 exemplars of the same (large) molecule of differing size in a cubic meter of sample. They are there (most likely) because they represent thermodynamic minima on an energy landscape under conditions where they are formed.
Now, AT would dismiss diamonds because their structures are “repetitive”. But not so fast. The diamonds contain defects. Those defects hold information. Perhaps biological information. Differing from one diamond to the next, but perhaps not. You figure out whether these diamonds are exact replicas, or differ in the same sense as a dachshund differs from a poodle.
This, of course, raises an operational issue, asking how one would actually, in South Africa, do something to distinguish between a collection of diamonds as evidence of life deep in the Earth, or just thermodynamically stable assemblies of carbon atoms. Which brings us to another challenge in evaluating AT: Its advocates and critics often move back and forth between theory and operation in ways that make it difficult to keep the discussions separate.
Let us address the theoretical discussion first. Numerous authors writing for a half-century have attempted to look at molecular structures as theoretical constructs (represented by letters and lines in a graph) to metric their “complexity”. Uthamacumaran et al. recently criticized AT as being a “suboptimal restricted version” of some of these older complexity theories.
Cole Mathis, another AT principal, engaged this criticism here. Much of the comparison is “inside baseball”, and will not concern those seeking to build ALF instruments. So let us pass it over.
Of concern to us, however, Cole re-defines “amounts” and “abundance” not as 10,000, but rather (and the locution is unclear, so I cite it exactly) “100s-1 millions of identical copies”. Again, this number is not associated with a sample size.
Also not discussed is the “density of life” and its “background”. Detecting life in a brick of baker’s yeast is much easier than detecting a single yeast cell in a liter of water. This is easier than detecting a single yeast cell in a sample of carbonaceous chondrite. These thoughts are for later, when we discuss AT as an operational concept.
Theoretically, Cole argues in favor of AT over these older complexity metrics because, he says, AT does more than describe the complexity of the molecule. It is more than a measurement of the ability of its information to be compressed. Instead, Cole argues, the complexity metric provided by AT has something to do with “the physical production of that molecule!” (Exclamation point in original).
Cole writes: “Easily describing a graph that represents a molecule is not the same as easily synthesizing the real molecule in chemistry or biology.”
Indeed, it is not.
Now, I am a card-carrying member of the American Chemical Society. So, I would be very interested (with an emphasis on “very”) in a complexity metric that represents how easy it is to really synthesize a “real molecule”.
However, I also know that organic chemists burn our entire careers trying to understand atom-by-atom reactivity in real organic molecules. The reactivity of each atom in an organic molecule is influenced by atoms attached to it, and atoms attached to those, and atoms attached to those. There are no algorithms. Even the heuristics have exceptions, and exceptions to the exceptions.
Thus, organic chemists understand that a metric of the type that prompt Cole’s exclamation point is wickedly difficult to construct. From their hard-fought lives, organic chemists are very skeptical (with an emphasis on “very”) of any claim by anyone that they have found an algorithm or heuristic that does so.
Not being experts in organic chemistry, most members of NASA communities do not share this skepticism. Thus, they are likely to start reading Marshall, have their eyes glaze over as they enter the Sargasso Sea of its prose, and then say: “Well, it was published in Nature Communications after peer review, so it must be right.”
Which brings me to the problem that Carolyn Porco tasked me with. Does AT provide a solution to this wickedly difficult problem? After all, it is called Assembly Theory, implying that it incorporates some model for how organic molecules are assembled.
Unfortunately, Marshall is scripturally inchoate. At its lowest level, and curiously, its “basic building blocks” seem to be “bonds”, not atoms. It speaks of “assembly paths” as “joining operations”, presumably computational (pace Cole), not of “real molecules”. The paths are “formalized mathematically using directed multigraphs”. We are told that “the formal details are unnecessary”; we are directed to the Supplementary material.
An attempt to download the Supplementary gets me only an invitation from Nature Briefing to be spammed once a day.
But Marshall (the main text) does appear to claim that the assembly path has something to do with the assembly of real molecules. Witness: “The molecular assembly number (MA) therefore captures the specificity of the molecule in the face of this combinatorial explosion and provides an agnostic measure of the likelihood for any molecular structure to be produced more than once.”
Produced by what? Perhaps by real chemical reactions.
Now, the schematic in Figure 1 does not involve real reactions, even those called “joining reactions”. The organic chemist in me has difficulty inferring the real process adumbrated in the Figure, leading to adenosine triphosphate (ATP). And elsewhere, we are told that the only “limits from chemistry” are “valence rules”.
The Figure legend suggests that Marshall has in mind real a route to a real molecule, Taxol, that involves 30 steps. But I know the chemistry of Taxol; it was synthesized just down the road from me in Tallahassee, and I do not know what that route could possibly be.
Later, Marshall suggests that they have a “computational model” to “help determine how the probability of the spontaneous formation of detectable amounts of any given molecule changes with MA”.
No. Not if we are talking about real reactions with real molecules. And certainly not if the only “limits from chemistry” are “valence rules”.
But Marshall persists. Try this sentence pair:
“The probabilities we calculated represent the likelihood of an unconstrained or random assembly process generating that specific compound, given that the abiotic precursors are so abundant that they do not limit the formation. These probabilities do not represent the absolute probability of a molecule ever forming, rather they represent the chance of the molecule forming in the unconstrained, undirected process described herein any detectable abundance.”
Chat GPT again has problems. A run on sentence. Perhaps “herein” should be “here in”? And so on.
But with its consideration of the abundance of abiotic precursors, this surely looks like a claim that the authors have solved the wicked problem, and can calculate the probability that a real molecule might emerge by a “random assembly process”.
Without any input from chemistry other than “valence rules”? I doubt it. And not even with all known chemical input. Call me skeptical. But if AT can do that, it is far more important as a tool than it is to guide a mission to look for life on Mars.
Perhaps I am wrong, but I do not think that AT has solved the wickedly complex problem central to organic chemistry: The problem of reactivity. The AT number is theory, and theory only.
But do we care about theory if we nevertheless get an operational tool, something that can fly to Mars to detect life there?
After all, no matter what the complexity metric, including metrics from Marshall and from Uthamacumaran, if we find Taxol in kilogram amounts in a sample of Mars, we would take it as a sign of life. No matter how we score its complexity. After we make sure that Taxol is not a privileged molecule, either thermodynamically or kinetically. And after we make sure that we did not bring it from Earth ourselves.
The most readers from the NASA community reader, again not knowing much organic chemistry, might look at Figure 4 in Marshall and say: “See, they have distinguished peptides from living yeast from toasted yeast from coal from a chondrite from a mid-Miocene paleomat simply by looking at the fragments they get from an orbitrap mass spectrometer. They even tried home brewed beer and Scotch whiskey. It works!!! What more do you pedantic organic chemists want?”
An orbitrap mass spectrometer examines a sample and reports the most intense ions that it finds there. This is the primary mass spectrum, MS1. The instrument then puts energy into each of the trapped ions to create fragments from them. It then records, for each primary ion, a secondary mass spectrum, MS2, that come from the fragmentation.
Here, the fragmentation is real chemistry. Some bonds fragment easily. Some bonds fragment with greater difficulty. So, this is not AT in any theoretical sense. It is real “disassembly theory” (DT?). Unfortunately, and again, modern organic chemical theory does not allow fragmentation to be predicted more than heuristically. Further, disassembly by ionization need not re-trace in reverse a path that assembled the molecule, real or theoretical.
Marshall reports that biological samples have a number of MS1 peaks in the mass/charge range of 300-500 daltons (the mass of ATP is in this range). They then fragment these, counting the number of fragments observed in the MS2 spectra. They note a correlation between the number of fragments observed and their molecular assembly number for the ion being fragmented.
They then note that few non-living samples produce MS1 ions in the 300-500 Dalton range. Those that do do not give a large number of peaks in their derived MS2 spectra.
Of course, not all molecules from living systems that gave important ions in MS1 produce large number of MS2 fragments. Living systems also contain simple molecules, judging simplicity by the number of fragments they have in their MS2 spectra.
So far so good.
But note, this approach could be operationally applied without molecular assembly theory at all. And certainly, without needing to resolve any argument between Cole and Uthamacumaran over which metric of complexity is better.
We could simply define an disassembly number that is equal to the number of different fragments found for a single molecule in MS2. This would avoid confusing big molecules that have repetitive, low information, structures from those that have non-repetitive, high information structures.
Of course, it would not identify as non-life the oligomeric RNA made in my lab last year abiologically on borate glass. A single MS1 ion would have many different sequences, allowing compositional complexity around a single mass give the appearance of complex single molecules presented in “abundance”. But never mind.
So let me propose an operational life detection strategy based on Disassembly Theory (DT). It involves these steps:
(a) Ship an orbitrap MS to Mars.
(b) Scoop in a sample.
(c) Record the MS1 of the sample, identifying major ions.
(d) Fragment any major ion that has a mass between 300 and 1000 Daltons.
(e) Count the number of peaks in each of the secondary MS2 mass spectra.
(f) If one, or perhaps more, of the MS2’s has (let us say) a dozen or more fragments (you decide), say that we have found in abundance a molecule whose complexity is such that the molecule requires Darwinian evolution to create.
Now, this definition of complexity, the number of fragments a compound generates after its ion is trapped and fragmented, is not theoretical. It is real. It does capture chemistry. And it can be calibrated by examining the mass spectra of an arbitrarily large number of molecules or samples. Marshall has started the calibration.
Whether this is good or bad is a matter of opinion. If one insists on defining complexity by a hypothetical, non-real path to synthesis, fine, if the number of fragments correlates with the molecular assembly number calculated by that theory. Which Marshall says it does. But it is not operationally necessary.
Let us approach Marshall from this perspective and now do the analysis from a purely operational perspective.
Why might a carbonaceous chondrite, a nonbiological source of carbon, fail to give a positive signal in this operational assay? Or, for that matter, coal, an erstwhile biological but currently non-biological source of carbon?
One reason of course is that they do not present any molecule “in abundance” in the 300 to 500 dalton mass range. Indeed, both the chondrite and the coal have carbon on the way to becoming graphite, with much higher molecular weights. Or, as in the Cullinan diamond, on the way to becoming part of the crown jewels of Charles III.
Likewise, these compounds are hydrophobic. Marshal extracts materials from a sample with a mixture of water and methanol, which certainly disfavors molecules from coal or chondrites that would be acceptably complex in a mass range.
But that is perhaps okay as well. After all, life and water might very well need complex molecules that are dissolved in water that do have modest molecular weight. After all, DT is not looking at the macromolecules in yeast either. These do not appear in the 300 to 500 Dalton selected mass range, and they are not abundant enough in exact molecular reproduction. So DT still works, but by a mechanism that has nothing at all to do with AT.
But we do not care why it works. An operational exploration does not need a mechanism. It simply needs to work.
But let us bring yet another level of reality to the operation. If we go to Mars, it is conceivable that the biological organics are admixed with nonbiological meteoritic organics. We are looking at the products of Darwinian evolution here, not the molecules that enable Darwinian evolution. Can we see the molecules that are the products of Darwinian evolution in a background of molecules that are not?
Here, the selection of the 300 to 500 Dalton mass range might actually be a useful separation principle to see these. This might concentrate biological organics that give many fragments in their MS2 spectra to the exclusion of non-biological organics.
But then we must worry about sample size and concentration.
The amount of biological organics that can be sustained in a sample is a function of the energy available to the biology in that sample. With the distance from the Sun ~1.5 times that of the distance to Earth, Mars has a ~45% of Earth’s Solar energy, with a weak atmosphere making some of it toxic. Geothermal energy can be considerable at the correct location, but one must sample the correct location.
Therefore, a NASA community trying to decide, in peer review, whether to spend scarce resources to fly to detect life on an alien world must worry about the concentration of life in the intended locale. And if one attempts to solve the problem by simply taking a large sample of water from that locale and evaporating it, a hostile NASA community peer review would likely argue that the process would also concentrate the non-biological organics that would create background that would obscure the biological organics.
Now, we are partisans, and we are prone to special pleading.
However, the reason why we advanced agnostic life finding (ALF) instruments that focus on molecules necessary to implement Darwinian evolution, rather than molecules that are the products of Darwinian evolution, is not just because we see the first is more primary than the second.
Rather, the Polyelectrolyte Theory of the Gene requires that Darwinian systems universally have informational macromolecules with repeating backbone charges. This repeating charge is a “handle” that can be used to concentrate them selectively from other organics, even if they are present only sparsely. Electrodialysis can separate the universal informational biopolymers from other molecules.
Once concentrated, one can examine them at leisure for other properties that indicate their role in biological information transfer and Darwinian evolution. Including whether they are built from a limited vocabulary of size- and shape-regular building block. Which is the only other universal requirement for an informational polymer to support Darwinian evolution.
The ‘dis-assembler’ approach has been implemented by Bob Hazen, Jim Henderson Cleaves and others, who find that pyrolysis GC-MS gives a distinctive ‘signature’ of biological vs unbiological samples. https://www.pnas.org/doi/abs/10.1073/pnas.2307149120 . Not clear to me on a quick read if they mixed biological material (yeast, say) with unbiological organics (Muschison material, for example), but it shows that this approach that uses real world chemistry has promise.
I am clearly missing a point here, how coal is good sample of a non-living object? Isn’t a product/an artifact of biochemistry?
Coal was used as an example of abiologically processed biological material, and was shown to be distinct from rock or biological material (see their Figure 2)