A simplified biochemical model
The salient feature of natural selection that makes it in interesting challenge for modelling (versus artificial selection) is that in the place of human-mediated (or user-mediated) selective processes, you need some mechanism by which to make the situation competitive between genes—so that their ability to survive and to propagate themselves can vary based on differences between them.
The Black Smoker’s model is kept as simple as I could come up with, given these requirements, and consists of the following elements:
- Environmental chemical substrates
- Genes allowing the extraction of energy from the substrates
- Organisms containing the genes
Environmental substrates
The chemical substrates are represented by 64-bit long bitstrings. The different types of chemical substrates each have associated with them a specific per-unit available energy, which the system keeps consistent as it needs to generate them after a simplified thermodynamics, described below1.
Genes
Genes are also represented by 64-bit long bitstrings. They allow the containing organism to metabolize a given available substrate if the first eight bits of the gene and the substrate are complementary (a one where the other has a zero, or vice-versa). After this, the efficiency of the metabolism is based on how many of the bits in the remainder of the strings are complementary.
Metabolic rules
The biochemical rules are: the percentage of bits in the two strings (gene and substrate) that are complementary determines the energy of the product produced. Each equivalent of substrate metabolized by the organism containing the gene produces one equivalent of product, one of byproduct. The per unit energy of byproduct and product always adds up to 80 per cent of what was available in the initial substrate, so enthalpy steadily leaks out of the system as reactions proceed2.
The system randomly generates a new byproduct when it encounters certain new combinations of gene and substrate.3 After this, the reaction is remembered, and the same byproduct is always produced from this gene and substrate.
The byproduct produced is dumped to the environment, where it becomes available for other genes to metabolize. The product energy is added to the organism’s energy.
Reproduction
Once the organism has acquired enough energy through metabolism, it ‘fissions’, producing one offspring, which takes half of the organism’s available energy with it into its new life.
Death
In each metabolic ‘round’ a fixed ‘cost of living’ energy is removed from each organism. If the energy falls below a threshold, the organism dies, and its remaining energy is returned to the environment, in the form of its ‘coat protein’—a compound with its signature derived from the content of the organism’s fourth or ‘regulatory’ gene.
Variation
Each organism has four genes. During each fission event, there is a low probability of mutations in each gene—either a simple ‘point’ mutation in which a bit is reversed (1 to 0, or 0 to 1), or a more severe ‘translation’ mutation, in which the bits are rotated an arbitrary distance.
Regulatory gene
While three of the four ‘genes’ in the organism are treated as genes controlling metabolic pathways extracting energy from available compounds, the fourth gene is treated as a regulatory gene. The amount of energy the organism will cache before reproducing, the proportion of its energy the organism passes to its progeny at reproduction, the buoyancy of the organism, and the organism’s coat signature, all critical factors in the simulation, are derived from the sequence of this gene.
Input flux
The primary power source for the ecosystem is a wavering ‘input chemical flux’4—a set of five substrates generated randomly at the beginning of the simulation, and replenished in varying amounts each round. The quantity of input flux varies on a number of cycles—some comparatively long, some comparatively short. Note that therefore, though these compounds are treated biochemically like all other compounds with respect to metabolism by organisms, and product/byproduct generation, unlike all other compounds in the environment, these compounds (1) are produced from outside, not as a byproduct or product of metabolism, (2) do not accumulate as the product of metabolic reactions (as do the biogenic compounds) but instead are regularly replenished from without.
Input flux programmability
In the applet version of the simulation, users may now also now set up multiple smokers, specify which inputs are available at which of the smokers, and specify independently for each input at each smoker how it varies in productivity over time, and whether it eventually extinguishes during the course of the simulation.
Seeding
At the beginning of the simulation, after generation of the input flux compound types, the system autogenerates random genotypes until it hits upon one that can metabolize one of the input flux compounds5—it then generates an organism with this genotype, and lets it loose, with the simulation rules as described above.
Notes
1 I bothered with the simplified thermodynamics only as far as I felt was really necessary—it’s not so much meant to be an approximation of the real world situation as a consistent set of rules of engagement by which the genes can compete. The system does bother with thermodynamics as far as to make sure that enthalpy poured into the system steadily ‘leaks’ out of reactions as organisms make use of it just because this adds an also interesting element to the simultation—the generation of a (sort of) ordered ecosystem from previous chaos, even though overall entropy definitely increases, as the enthalpy rich compounds pour through the system, and as the organisms degrade the available enthalpy in the process of their metabolism. See also the detailed description of the biochemical rules.
2 This, of course, is an approximation. In the catabolic biochemical reactions that provide living organisms with energy in the real world, there are usually several byproducts, and the energy absorbed from the reaction (the change in the enthalpy from the sum of enthalpy in the precursors to the sum of enthalpy in the products) is usually stored as relative potential chemical energy in an ‘energy currency compound’ such as ATP. However, storing the energy useful to the organism, embodying a percentage of the input enthalpy, as abstract ‘energy’ in the organism, and dumping a fraction of the rest into a single byproduct unique to the reaction is simpler, is still a meaningful approximation from the point of view of creating a competitive environment, and also approximates two ecologically significant principles—(1) that after death, the accumulated biochemicals in the organism are potentially energetically useful to other organisms (saprobes or predators—in this simulation, this compound does become available to other organisms when the containing organism dies, as the organism’s ‘coat’ protein’), and (2) that simple biochemical reactions do not usually reap all the energy available in a compound in a single step, thus opening the door to more elaborated systems that actually make use of the byproduct, within the organism, or without (scavenging/multistep catabolic pathways). Doing it this way allows the simulation to run without a supercomputer, and still lets us get to relatively elaborate multilevel ecologies and multistep metabolic pathways.
3 In detail: byproduct ‘signatures’ (the 64 bit string representing the byproduct) are produced using a pseudo-random function seeded with the first 16 bits of the gene, and the middle 32 bits of the substrate. The result is that about three-quarters of mutations to the gene do not produce a new byproduct, but rather (usually very slightly) affect the rate at which the same byproduct is produced. Another one-eighth of mutations disables the ability of the gene to operate on the byproduct entirely, while the final one-eighth do result in a new byproduct being produced. The intent is to somewhat balance the ecological stability of the simulation—this way, not all advantageous mutations to the gene (those likely to be selected for, since they will increase the metabolic efficiency of the organisms) must radically effect the ecology for genes operating on the byproduct the first gene is producing—though some will.
4 This is mathematically identical to the ‘solar flux’ that drove the Clockmaker simulation; it’s renamed here because in the environment of a black smoker, the driving energy really does come from chemical compounds, not from solar flux. Note that if you catch them in a pedantic mood, ecologists will tell you they actually consider the ecology of modern black smokers to be based at least indirectly on the solar flux, since the oxidizing potential used by most organisms in the chemolithoautotrophic reactions ultimately comes from oxygen provided by photosynthetic plants—I’m not going to split hairs over this—but no, should this have occured to anyone, the fact that the modern smokers’ ecologies depend on oxygen does not mean black smokers can’t be the seat of life—various theoreticians, notably Russell, have pointed out that there are other ways to get oxidizers, and postulates photo-oxidized FeIII as the vital source in the Hadean oceans. See http://www.gla.ac.uk/projects/originoflife and the Black smoker page’s recommended reading list for more information.
5 In the applet version, you can control which input flux this seed organism depends on, and at at which smoker it begins its history, or leave it random.
