niman wrote:
Since the recombination in Beijing is so obvious and clear cut, it is worth going through the data and associated concepts in detail, so even the casual reader can understand. Beijing released 5 closely related sequences, and even though 2 of the 5 were only partials, they had the expected markers in the published portion, so it is clear that all five isolates represent an evolved version of the S188T-subclade, which has 11 HA changes.
Four of those changes define the sub-clade. They were in sequences first seen in the late spring and throughout the summer (in India, Thailand, Ghana, Australia, New Zealand, South Africa). Three of these four changes led to changes in the protein (S188T, E377K, and S454N) and all four changes are in virtually all sequences with S188T. Thus, these were "early" changes. The other 7 changes seen in Beijing followed, and most can be found in other sequences with S188T, with the best matches in recent sequences from Asia or Australia.
However, one of the Beijing sequences had four clustered changes, which included the loss of 2 of the 4 "early" markers. The two "new" markers were close to the "missing" old markers, creating a strong signal for recombination involving a relative short segment of the gene (the gene has 1701 positions, and the changes were from position 897 to 1215). Recombination was supported by an earlier Beijing sequence (from late 2009), which had 3 of the 4 changes (at positions 897, 1056, and 1171). The 4th change (at position 1215) was rare, but found in two earlier sequences which also had the wild type sequence at positions 1056 and 1171, so they also had 3 of the 4 changes (at positions 1056, 1171, and 1215) so it is likely that these earlier isolates recombined to produce one sequence with all four changes, which then recombined with the November, 2010 isolate in Beijing to produce the recombinant. Thus, expalining four clustered changes involving sequences already present in the H1N1 database is very straightforward, since 3 of the 4 were already circulating in a known Biejing H1N1 sequence.
In contrast, trying to explain these four clustered changes by "random mutation" is very difficult and lacks credibility.
There is a long list of problems with the "random mutation" explanation. Apart from this discussion, the influenza changes are far from random. There are MANY random chnages that would not change the protein, yet are never found in the pH1N1 database. Similarly, there are multiple ways to make a given protein chnage, which are also never found in the database. This was easily seen when someone took the protein sequence of multiple Mill Hill submissions at GISAID and generated a nucleotide sequence using the genetic code. The nucleotide sequence would code for the proper protein sequence, but the sequence was not "natural". It didn't match any known sequence, and even when reduced to in single nucleotide chnages, many of the changes were not found in any H1N1 sequence. Thus, only certain changes are "allowed" which clearly demonstrates that "random mutations" are a fantasy.
However, over and above the fact that only certain changes are found, use of "random mutations to explain away the four consecutive changes in the Beijing recombinant fails at multiple levels.
One general problem relates to synonymous changes. These changes are "silent" in that they do not produce a change in the protein sequences. Since changes are rare (1 in 1000 or 1 in 10,000), the vast majority of sequences would not have a given change, so its emergence as the dominant change requires heavy selection (since the change is competing with the wild type sequence), but synonymous chnages don't offer an obvious selection advantage.
Thus, changing C1056T back to 1056C is a serious challenge, made more difficult by the fact that it would happen after the emerging sequence was formed. Thus, a random mutation would have to target the precise position (1056) and change the T back into a C.
However, the sequences would then require a second independent event at position 1171 to change the G back to an A. Once again the "random muation" would have to be precise (only involve position 1171) and coincidentally target the adjacent polymorphisms (so it would have to be precise AND clustered).
The two random mutations to generate the reversions (a return to the wild type sequence), would then have to be followed by two more "mistakes" at adjacent positions (897 which is adjacent to 1056, and 1215 which is adjacent to 1171). Thus, four consecutive changes would be required.
These four consecutive changes generated over a very short time frame is POSSIBLE, but EXTREMELY improbable, and far more difficult than recombination to acquire chnages that have been descroibed previously, with 3 of 4 located in an earlier Beijing sequence. .