Cite as: Benner, S. A. (2023) “Making Operational the Polyelectrolyte Theory of the Gene in the Agnostic Search for Martian Life”. Primordial Scoop, e20230915.
The Polyelectrolyte Theory of the Gene holds that Darwinian evolution, to have evolvable information, must use a semi-linear informational biopolymer that has a repeating common charge. As Jan Špaček details elsewhere, such molecules are readily concentrated, even from dilute solution, by electrodialysis. Once concentrated to get an analyzable amount of material, determining its structure becomes a task long standard in the chemistry of natural products. However, modern instrumental method can simplify this task, and perform it with smaller amounts of material.
By axiom, Darwinian evolution is the property that distinguishes animate from inanimate chemistry. This distinction breaks down only at the point where life becomes “intelligent”.
Intelligent life, should we encounter it in our exploration of the galaxy, will be easy to recognize. Indeed, it is more likely that intelligent life will encounter us, since we are only now barely ourselves becoming intelligent.
This axiom leads to the key question that drives life detection exercises: “When handed a passel of molecules in a sample from an alien world, how does one determine whether or not it reflects Darwinian chemistry?”
Nearly all “biosignature tests” proposed by mainline astrobiology look within this sample for molecules that are the products of Darwinian processes. Here, feature of those molecules or their collections must not be made by non-Darwinian processes.
These classical proposals encounter problems that suffer from no lack of discussion.
- Single molecular species, such as specific amino acids or lipids, have long been abandoned as biosignatures. True, they are made on Earth by processes that emerged by Darwinian evolution. However, they are also encountered in extra-Terran samples, most notably, carbonaceous chondrite meteorites. These samples almost certainly have not been exposed to a biosphere.
- Chiral molecules, if found in samples where their left-handed and right-handed forms are present in unequal amounts, are classically interpreted as biosignatures. This too is seen to be ambiguous, following the discovery of such unequal amounts for chiral molecules recovered from meteorites.
- Organic molecules with isotopically “light” carbon 12C/13C ratios have also been proposed as biosignatures. However, what is considered “light” depends on the reservoir of carbon atoms from which those molecules have been assembled. Further, non-biological reactions can also produce products that favor light isotopes.
- Darwinian systems generally use only a small subset of the molecules that are possible with the huge combinatorial diversity available to organic species. Thus, if the passel contains only a few of the compounds that it might contain, this may be interpreted as a biosignature. However, nonbiological processes are also known to yield only a few compounds. Further, after the living system dies, the diversity of molecular structures increases as its molecules decompose. Thus, coal has a more complex set of molecular structures than the plants from which it arose. Coal still can be recognized as the product of biology. However, to do so, the molecular analysis must have a level of completeness that is hard to achieve remotely in a space mission.
Responding to the ambiguity in the interpretation of each of the classical biosignatures, much current thinking simply seeks to combine these. For example, the biosignature may be strengthened if we find chiral organic compounds in low diversity mixtures that have light isotopes.
Homochirality is a concept driven by similar thoughts. Here, a set of different chiral molecular species, related in some way functionally, is inspected. If all species have analogous chirality, then this similarity is taken as a biosignature, even though each individual chiral molecule is not by itself a reliable biosignature.
Similarly, recent efforts to identify agnostic and universal biosignatures have relied on assessments of the structural complexity not of mixtures alone, but of the complexities of molecules that are abundant in those mixtures. Thus, Lee Cronin and Sarah Walker suggest that if a molecule is found in that sample that has a Molecular Assembly (MA) index above a certain level, it must have been made by processes that required Darwinian evolution to emerge.
Some time ago, we suggested that a more reliable class of biosignature would not look for molecules that were the products of Darwinian evolution, but rather for molecules that are needed to implement Darwinian evolution. Here, the focus was on the biopolymers that carry the evolvable information essential for life. In you and me, this is deoxyribonucleic acid, or DNA.
The Polyelectrolyte Theory of the Gene[i] constrains the structure of such biopolymers. It requires that the general structure of the genetic biopolymer be invariant with respect to changing information content. It also requires that the general physical behavior of the genetic biopolymer be invariant with respect to changing information content.
These invariance requirements arise because of the need for the biopolymer to function continuously as its information content changes, even if it changes dramatically. For example, the biopolymer must remain soluble in water (a physical behavior) even as its sequence changes. The biopolymer must continue to be recognized by the enzymes that replicate it, a structural interaction that requires general structure constancy.
The Polyelectrolyte Theory of the Gene is a powerful tool when searching for life in the cosmos because relatively few molecular systems have such invariance behaviors. For example, in a protein, changing a single amino acid can dramatically change its physical properties. An example is the hemoglobin; only a single amino acid out of 500 amino acids need be changed to cause the protein to start to precipitate, causing sickle cell anemia.
Thus, proteins cannot be the informational molecule to support the informational needs of Darwinian evolution. They must be the products of an informational molecule that has different structural features.
As its name implies, the Polyelectrolyte Theory of the Gene requires those informational biopolymers to be polyelectrolytes. They must have a repeating charge. On DNA, that repeating charge is carried by the negatively charged phosphate groups that link the nucleosides. The nucleosides themselves come in four different varieties, G, A, C, and T. The sequence of those nucleosides in the linear polymer with a repeating negative charge determines the information of the gene.
The properties of the four nucleosides by themselves are quite different. For example, guanine, the nucleobase in the nucleotide guanosine (G), is very insoluble. Conversely, the nucleobase in the nucleotide thymidine (T), is very soluble. So how is it possible for DNA as an informational molecule with a sequence TTTTT to dramatically change its information content to GTGTG without causing a dramatic change in solubility?
The answer to this question comes from the repeating backbone charge. That repeating negative charge, carried by the linking phosphate groups, so dominates the overall biophysical behavior and structure of the molecule that it makes no difference whether the molecule is TTTTT or, for that matter, CCCCC or AAAAA. That charge is so dominant that the molecule remains soluble even when the information content diverges across the entire range of possible information contents. Notably in chemistry, there is no other molecular structure that is so dominant. Behavioral invariance cannot be achieved in any other way.
The structural invariance arises from the base pairing used to hold two strands together. Thus, the A:T, T:A, G:C, and C:G base pairs have essentially the same size and shape. This allows the double helix to have an overall structure that doesn’t change with changes in its information content. Proteins, including DNA polymerase, that must interact with the double helix, must replicate these base pairs, therefore we can look at molecular systems that are roughly structurally invariant. Further, under the notion of an “aperiodic crystal structure”, replication fidelity is high.
Interestingly, to have more or less the same sizes and shapes, the building blocks of an informational biopolymer, if they are chiral, must also be homochiral. This is not true again in proteins if they are not obtained by direct translation of informational biopolymers. The often-cited counterexample is gramicidin, which is built from amino acids, about half of which are of the left-handed variety and about half of which are of the right-handed variety.
The polyelectrolyte structure of genetic molecules universally conveniently allows for its easy concentration from water where the organisms dependent on it are evolving. As described in detail elsewhere, polyanions move toward an anode electrode carrying a positive charge. Polycations move towards a cathode, an electrode carrying a negative charge. An agnostic life finding (ALF) device can concentrate genetic molecules from an alien life form in an alien sample of water using electrophoresis, or electrodialysis.
This is true even if Martian life is very sparsely distributed in that sample of water. The density of this distribution depends on the amount of the limiting resource(s), which may be free energy, carbon, or inorganic species. The rate of metabolism, and the resulting biomass are a product of these. Therefore, the concentration of informational biopolymers from Martian life may be low.
Fortunately, as SpaceX and other entities plan to send humans to Mars, they plan to beforehand mine large amounts of this water ice as an in situ resource to create propellant for those humans to return to earth. ALF can stand astride this flow of large amounts of Martian water to concentrate from it, Martian informational biopolymers even if they are present in very small amounts.
Previous descriptions of ALF have focused on its ability to concentrate polyelectrolytes from water samples. But then what? After the material is concentrated, we must determine its molecular structure. In particular, we must:
- determine what its building blocks are, in particular, whether they are few in number, representing a “limited vocabulary” of many possible building blocks
- assess whether or not those building blocks conform to the “size and shape” regularity required by the Polyelectrolyte Theory of the Gene.
Now, Martian “DNA” will not represent the first “structure proof” challenge ever faced by chemists. Thus, from ~1880 to ~1930, chemists managed to determine the structures of the building blocks of DNA, RNA, and proteins, the principal biopolymers on Earth. The details of how they did it have been lost in our education system, but should not be. For example, Robert Olby has written a captivating story of how this was done for DNA.[ii]
A general principle in chemistry notes that if one has a large amount of material, any approach to analyzing structure is conceivable. No structural analysis is possible when the amounts of material are small.
Of course, the amounts of material that these chemists had a century ago were large, with the amounts of proteins larger than the amounts of RNA, with the amounts of DNA available quite small by comparison. Balancing this, we can add to amounts we need to perform whatever analytical method we require simply by mining more Martian water ice. Since SpaceX is planning to mine tons of it, we can get more.
However, really balancing this are the instruments that we today have available that were unavailable (indeed, unimaginable) by chemists a century ago.
Let us lay out this problem in three levels, using Terran DNA to start the discussion.
- if we had a sample of homopolymeric DNA, let us say CCCCC… with length heterogeneity, how would we determine that we had that homopolymer? And then, how would we determine that it was built from repeating units of cytidine?
- generalizing, if we had a sample of polymeric DNA built from the four building blocks, again with length heterogeneity and sequence heterogeneity, what analytical method might show that we had such a polymer? And then could we determine what units it was built from, in particular, to show that those units were selected from a limited vocabulary, and then, that they all had similar sizes and shapes.
The concepts of “length heterogeneity” and “sequence heterogeneity” are important. If a sample contains many copies of a DNA molecule with a single sequence, its analysis is trivial. Mass spectrometry would identify that molecule by its mass. Fragmentation of the molecule during mass spec experiments would identify the masses of units that are fragmented off. Four masses would be distinguished in our DNA. And in addition, the sequence of the molecule would be determined.
However, we cannot expect that ALF will deliver, even in concentrated form, Martian polyelectrolyte genetic molecules of a single sequence. ALF will likely concentrate polyelectrolyte genetic biopolymers from more than one Martian organism. The concentration will involve their fragmentation to give different lengths of molecules. The sequences of those fragments are likely to be all different, even if they are built from a limited vocabulary of building blocks.
Returning to classical analyses of the structures of Terran biopolymers, these all depended on degradation to constituent building blocks. For RNA, the degradation was trivial; upon treatment with base, RNA gave four fragments that were easily isolated and structurally characterized.
For proteins, the degradation was likewise trivial. Acid degradation broke down the backbone and gave 17 of the 20 encoded amino acids. Missing was tryptophan, whose side chain was destroyed in acid. Asparagine was converted to aspartic acid, one of the 20. Glutamine was converted to glutamic acid, another of the 20.
With a sample of Martian polyelectrolyte, the first step will involve treating them with strong acid and strong base. However, unlike the classical chemists, we will apply modern instrumentation to determine what the structures are. For example, matrix assisted laser desorption ionization (MALDI) coupled to an Orbitrap high resolution mass spectrometer found almost complete mass ladders without interfering side products after RNA was treated with acid.[iii]
The extent of degradation makes the Orbitrap analysis more or less difficult. Thus, if a biopolymer is built from six units, if degradation is complete, the Orbitrap will observe six ions with masses N, O, P, Q, R, and S. This will be the outcome no matter how many Martian species have donated their genetic material to the analysis, or what their length heterogeneity was.
Those masses will strongly constrain the structures of the building blocks of the Martian biopolymer. Chemists will synthesize molecules as candidates for N … S and confirm structural hypotheses. The structure of the informational biopolymer of Martian life will be complete.
If the degradation is complete only to the di-unit state, the degradation mixture will contain 30 possible dimers NN, NO, ON, NP … SS. These will give 21 masses, as some have identical masses (for example MP and PM). Again, the Orbitrap output will strongly constrain the structures, not only of the di-units, but also of their constituent units.
Even if degradation is only partial, Orbitrap results will be useful. The generic species will have masses provided by the following equation: M = nN + oO + pP + qQ + rR + sS. Here, the upper-case letters denote the masses (in daltons) of each of the six building blocks. The lower-case italicized letters are integers that denote the number of each of those blocks in the molecule forming the ion.
In a procedure that is foundational to structure analysis in chemistry (this is how methane was shown to have the structure CH4, for example), the masses of N … S can be extrapolated from the multiplicity of peaks that are observed in the mass spectrometer. All of the ions need not be assigned. Indeed, only a few of them can deliver the number of building blocks and their respective masses, more if the number of building blocks is larger, fewer if the number of building blocks is smaller. The key to this analysis is that n … s are integers.
This analysis requires, of course, that the fragmentation be uniform. The Polyelectrolyte Theory of the Gene has something to say about this. Because the building blocks that are linked together to give an evolvable informational biopolymer must have structural regularity, their linkages have structural regularity. Thus, those linkages will fragment with regularity, to give limited number of regular fragments with a chance of fragmentation at a given covalent bond is inversely proportional to the bond strength. Thus, we do not need to assume a particular structural linkage to get an enzyme that will regularly fragment the Martian biopolymer with unknown linkages.
In many cases of “alien DNA” provided by synthetic biology, this approach will also work. For example, Piet Herdewijn, Nicholas Hud, and others have assembled alternative genetic biopolymers with a negative charge not held by the linking group per se, but rather by a functional group that is attached to the linking atom.[iv] These too can be fragmented chemically.
But what if the Martian “DNA” does not fragment in acid or base? While such linkages are not well represented experimentally in what synthetic biology has delivered, they are conceivable. For example, the building blocks may be joined by secondary amines or a C-O-NH-C linkage. When chemists get access to Martian material, they will likely also try reducing or oxidizing degradation conditions, if acidic and basic conditions. The first will fragment C-O-NH-C linkages.
However, the ionizing energy of mass spec itself can do the fragmentation. Essentially all macromolecules fragment upon “harsh” ionization; the fragmentation mechanisms depend on the detailed chemistry of the macromolecule and its units.
For example, RNA and DNA both fragment upon ionization, but have different fragmentation behaviors. RNA produces large amounts of c- and y-ions, which arise from the cleavage of the 5′ P-O bond.[v] DNA generates predominantly (a-B)- and w-ions from DNA analytes. These arise from the cleavage of a 3′-C-O bond.
Andersen et al. (2006) pursued this as a structural analysis. They found that the fragmentation of the tetranucleotide UCUA generates the CUA species, with a single peak at m/z = 892.25. If both the parent and the cleaved product are observed, the loss of mass corresponds to the mass of a single building block. Again, even in mixtures, even if only some of the mass fragments are identified, the combinatorial puzzle can be worked through. Again, because the species being observed are integer combinations of their fragments.
There is little doubt that even with advanced instrumentation, some “playing around” will be necessary. Classical MS simply involves injecting sample, a process that leads to ready fragmentation. Matrix -assisted laser desorption ionization (MALDI) is more delicate. However, different matrices are optimal for different biopolymers, [vi] and we do not know a priori which matrix is best for Martian “DNA”.
But for all of these, concentration is the first step, even though we are using modern instruments where the analysis can be done on micromoles of Martian “DNA”. However, the Polyelectrolyte Theory of the Gene is a powerful concept because it guides not only our design of instruments to analyze them, but also the strategies to analyze them.
[i] https://www.liebertpub.com/doi/full/10.1089/ast.2016.1611
[ii] Olby, R. (1974) The Path to the Double Helix. The Discovery of DNA. NY, Dover.
[iii] Bahr, U., Aygün, H., & Karas, M. (2009). Sequencing of single and double stranded RNA oligonucleotides by acid hydrolysis and MALDI mass spectrometry. Analytical chemistry, 81(8), 3173-3179.
[iv] Bean, H. D., Anet, F. A., Gould, I. R., Hud, N. V. (2006) Glyoxylate as a backbone linkage for a prebiotic ancestor of RNA. Origins Life Evolution Biospheres 36, 39-63.
[v] Andersen, T. E., Kirpekar, F., & Haselmann, K. F. (2006). RNA fragmentation in MALDI mass spectrometry studied by H/D-exchange: mechanisms of general applicability to nucleic acids. Journal of the American Society for Mass Spectrometry, 17(10), 1353-1368.
[vi] (a) Schulz, E., Karas, M., Rosu, F., & Gabelica, V. (2006). Influence of the matrix on analyte fragmentation in atmospheric pressure MALDI. Journal of the American Society for Mass Spectrometry, 17, 1005-1013. (b) Cohen, S. L., & Chait, B. T. (1996). Influence of matrix solution conditions on the MALDI-MS analysis of peptides and proteins. Analytical chemistry, 68(1), 31-37.