Does SARS-CoV-2 reverse transcribe and integrate into our genome?
The short version: A preprint has emerged claiming that there is evidence that SARS-CoV-2 is reverse transcribed and integrated into the human genome. None of the evidence it provides justifies such a conclusion, and it demonstrates a failure to understand fundamental aspects of coronavirus biology and frankly the limitations of the methods used to make that conclusion. Furthermore there even appears to be an attempt by the preprint authors to make their data more difficult to scrutinize because it is available only upon request and not included in the paper. Its findings, even if true (something I have significant doubts about), have no relevance for mRNA vaccines.
A preprint has recently surfaced and been seized upon as proof that SARS-CoV-2 is being reverse transcribed into our genome, and somehow the argument has been extended to be that this means that vaccines, in particular mRNA vaccines, against SARS-CoV-2, are unsafe. If you want a short read of the problems Marius Walter, postdoc in the Verdin lab summarized the implausibility of the paper’s claims here fairly comprehensively.
The crux of the argument per the preprint is based mainly on a few observations:
Some patients have persistently positive PCRs, suggesting that viral genome sequences are being retained.
The presence of chimeric sequences (sequences with both human and viral components) on RNA-seq (more on what this means shortly, I promise).
The claim that SARS-CoV-2 is being reverse transcribed into our genome is an extraordinary claim, and in science we have a saying: extraordinary claims require extraordinary evidence. Let me be very explicit here: by no stretch of the imagination does this paper provide any convincing evidence to support this idea, let alone something close to the Sagan standard.
First though, let’s discuss a bit on why this claim is extraordinary:
SARS-CoV-2 is not a retrovirus. It has no means of producing a DNA transcript nor integrating it into a host’s genome. The argument made in this preprint is not that SARS-CoV-2 itself is doing the reverse transcription and integration, however.
Endogenous human reverse transcription is extremely rare and limited to a few genetic entities, and occurs in a sequence-specific manner. It is extremely implausible that host reverse transcriptases could pick up random cytosolic RNA and simply place it into the genome, discussed in more detail here.
Coronaviruses kill the cells they infect; it would be very unusual to see a virus that is lethal to the cells it infects integrate, and furthermore, as long as they had a coronavirus infection the cells would eventually die.
Firstly, the leap from “persistently positive PCR” to “reverse transcription and integration” is absolutely not justified. The idea that the only way RNA viruses can possibly cause persistent infection is by integrating into the host genome is false- this is in fact not even the only strategy that HIV, the virus probably best known for reverse transcription, uses to establish persistent infection. The persistence of a virus inside a host requires that the host’s immune system be unable to clear it. Some examples of how that may be accomplished other than reverse transcription:
Immunological suppression via multiple mechanisms:
Direct or indirect killing of the cells of the immune system e.g. HIV can induce the expression of the Fas ligand protein on infected cells, which can induce killing of activated T cells that come into contact with it through signaling using the Fas on the surface of the T cell.
Suppressing function of antibodies e.g. herpes simplex virus type 1 (HSV-1) contains proteins that bind antibodies and prevent them from activating the complement system.
Suppression of production of antiviral cytokines e.g. Epstein-Barr virus (EBV) interferes with TLR signaling to suppress the activation of the NFκB pathway that would go on to induce multiple antiviral cytokines.
Maintenance of a viral reservoir within an immunologically privileged site e.g. the central nervous system, the testicles, etc. as seems to be done by Ebola.
Latency wherein the viral genome exists as a separate DNA structure within the nucleus where it remains quiescent with reactivation under certain cues. This is done by herpesviruses.
I should point out that reverse transcription into the genome in any arrant cell alone would never be sufficient to result in a persistent infection because as long as it kept producing viral transcripts, the immune system would destroy the infected cell (unless the host had certain immunological defects).
Secondly, although deep profiling of the genome does appear to identify RNA viral sequences that are not from retroviruses, this is still an extraordinarily rare event (though that rarity is subject to the limitation that the virus in question would have to be able to infect germ cells; notably it has never been observed for coronaviruses despite the evidence of many other RNA viral genomes that have been found). So in short, the leap to “SARS-CoV-2 is routinely being reverse transcribed and integrating into our genome” with this foundation is a truly extraordinary claim, and the persistently positive PCRs are explainable by simpler mechanisms. To be clear, the assumption that someone with persistently positive PCRs has actual infectious virus in them is not necessarily correct- which to their credit, the authors do acknowledge. The nature of the replication of SARS-CoV-2 and other coronaviruses (something I will return to shortly, as it explains the next issue) means that viral RNA can persist for a prolonged amount of time within double-membraned vesicles that may not be readily accessible by the immune system or nucleases within the infected cell, or the replication could be occurring within the cell at such a low level that it’s not even lytic and thus these individuals are not infectious, which seems to account for at least some of the persistent positives (persistent positives in someone with significant immunocompromise should be treated with caution, however, as they would be expected to have difficulty clearing virus and thus a PCR’s pre-test probability of being positive is high). However, if YOU (the person reading this) have persistently positive PCR results, you should not make the assumption that they are artifacts from the method.
The other point has to do with the presence of chimeric sequences via RNA-seq analysis. RNA-seq is a method to analyze which genes are “on” within a particular cell by attempting to profile the RNA within. There are many variations, but in general it starts by taking primers that are complementary to a bunch of genes (so that the RNA transcripts stick to them) and then running a reverse transcription reaction. Here’s the key point: the reverse transcription reaction often undergoes template switching. What that means is the reverse transcriptase starts a reaction on one RNA, then pauses, and then wanders onto another RNA. The result is it makes a DNA which has a piece of the sequence from one RNA and then a piece of the sequence of the other RNA. After this is done, bioinformatics analysis is used to align the sequences to the genome and see which genes were “on.” In other words: THIS IS LITERALLY A PROCEDURE WHICH GENERATES CHIMERIC TRANSCRIPTS. If a cell is infected with SARS-CoV-2, some of those transcripts will have pieces of SARS-CoV-2’s genome on them which will result in… SARS-CoV-2/human (or whatever type of cell it is) chimeric sequences. If I decided to infect the cell with any other virus, I would expect to get chimeric transcripts of human/my favorite virus. To state it more bluntly, the findings of this paper are explainable entirely by artifactual findings that result from the nature of the method.
There is also a secondary explanation having to do with the biology of the coronaviruses themselves. Coronaviruses recombine and mutate very well because their RNA-dependent RNA polymerase (the machinery which replicates their genome) is also prone to template switching. In other words, a coronavirus RNA polymerase could start with copying a coronavirus gene, pause, then wander and pick up a host RNA and then copy over that to make a chimeric transcript.
This would actually be very easy to show. Our genes undergo splicing to remove sequences called introns so that only exons remain. If this were artifact, we would expect that essentially all of the chimeric sequences would contain exons (i.e. no introns) from the cell in question. So let’s do that. Except…
So basically, if you want to scrutinize the data, you have to ask the author for the findings. Why this would not be included in the supplementary data is a complete mystery to me.
I would also add that the paper never examines the cells for evidence of a complete SARS-CoV-2 genome, and thus even if we are to take its findings as being truthful and rigorous (which there is strong reason to suspect they are not), the fragments of SARS-CoV-2’s genome are not sufficient for pathogenesis or persistent infection.
This paper does not substantiate the claim that SARS-CoV-2 is being reverse transcribed and integrating into the genome, and seems to be totally unaware of what an extraordinary claim that is. The experiments it does are not a good representation of what may be going on inside a real human. People aren’t cell lines; cell lines have complex, multifaceted genomic differences from our cells (that’s how they get to be immortal). Overexpressing LINE-1 and then observing more reverse transcription does not support anything. LINE-1 RT levels inside the cell are low, and despite the abundance of LINE elements in the genome, there are only about 60 of them which are active (consider that LINEs account for ~21% of the human genome and there are an estimated 860,000 such elements within it). I could conceive that there could be a very rare event within the cell in which a LINE-1 sequence grabs the wrong RNA and traffics back into the nucleus and integrates- that’s sort of how LINE-1 elements work (though again- they are sequence-specific, and given the short lifetime of RNAs within the cell, the probability that any specific non-LINE RNA could be picked up by mistake is infinitesimal), but coronavirus replication occurs in replication transcription complexes (RTCs) that are segregated from the rest of the cytoplasm. I find it very hard to believe that a LINE-1 RT could access these RNA sequences and reverse transcribe them.
On the point of what this means for an mRNA vaccine: literally nothing. This paper has absolutely nothing to do with them. If you’re wondering what would happen if the RNA from a vaccine were accidentally picked up by this proposed mechanism and integrated into the host cell, any of the following scenarios:
The sequence would behave like a processed pseudogene, lacking any ability to recruit host transcription machinery and would sit in the genome, quiescent.
If the sequence somehow inserted downstream of a promoter sequence that could recruit transcription machinery, the cell would express spike protein, be recognized by the immune system, and then be killed.
If the sequence inserted itself into the middle of gene (specifically in the middle of an exon), you would get a mutant protein that had sequences from SARS-CoV-2 that would be processed by antigen-presenting machinery and trigger a T cell response that killed the cell.
I hope that gives you some appreciation for how incredibly hard successful gene therapy is.
This preprint makes conclusions that are not supported by its data, its findings are most readily explained by artifacts from the methods used, and it doesn’t consider key aspects of coronavirus biology that would also explain the results. I am unconvinced, and even if true, I have no concerns about what this would mean for an mRNA vaccine.
References
Alexandersen S, Chamings A, Bhatta TR. 2020. SARS-CoV-2 genomic and subgenomic RNAs in diagnostic samples are not an indicator of active replication. Nat Commun. 11(1):6059.
Boldogh I, Albrecht T, Porter DD. 2011. Persistent Viral Infections. In: Baron S, editor. Medical Microbiology. Galveston (TX): University of Texas Medical Branch at Galveston.
Cohen JI. 2020. Herpesvirus latency. J Clin Invest. 130(7):3361–3369.
Feschotte C, Gilbert C. 2012. Endogenous viruses: insights into viral evolution and impact on host biology. Nat Rev Genet. 13(4):283–296.
Flint SJ, Enquist LW, Racaniello VR, Rall GF, Skalka AM. 2015. Principles of Virology: 2 Vol set - Bundle. 4th ed. Washington, D.C., DC: American Society for Microbiology.
Hilleman MR. 2004. Strategies and mechanisms for host and pathogen survival in acute and persistent viral infections. Proc Natl Acad Sci U S A. 101 Suppl 2(Supplement 2):14560–14566.
Hwang B, Lee JH, Bang D. 2018. Single-cell RNA sequencing technologies and bioinformatics pipelines. Exp Mol Med. 50(8):96.
Lodish H, Berk A, Kaiser C, Krieger M, Bretscher A, Ploegh H, Amon A, Martin K. Molecular cell biology. 8th ed. New York: W.H. Freeman; 2016
Lubinski JM, Jiang M, Hook L, Chang Y, Sarver C, Mastellos D, Lambris JD, Cohen GH, Eisenberg RJ, Friedman HM. 2002. Herpes simplex virus type 1 evades the effects of antibody and complement in vivo. J Virol. 76(18):9232–9241.
Perlman S, Dandekar AA. 2005. Immunopathogenesis of coronavirus infections: implications for SARS. Nat Rev Immunol. 5(12):917–927.
The Sagan standard: Extraordinary claims require extraordinary evidence. Effectiviology.com. [accessed 2020 Dec 16]. https://effectiviology.com/sagan-standard-extraordinary-claims-require-extraordinary-evidence/.
V’kovski P, Kratzel A, Steiner S, Stalder H, Thiel V. 2020. Coronavirus biology and replication: implications for SARS-CoV-2. Nat Rev Microbiol. doi:10.1038/s41579-020-00468-6. http://dx.doi.org/10.1038/s41579-020-00468-6.
Wiedemann A, Foucat E, Hocini H, Lefebvre C, Hejblum BP, Durand M, Krüger M, Keita AK, Ayouba A, Mély S, et al. 2020. Long-lasting severe immune dysfunction in Ebola virus disease survivors. Nat Commun. 11(1):3730.
Younesi V, Nikzamir H, Yousefi M, Khoshnoodi J, Arjmand M, Rabbani H, Shokri F. 2010. Epstein Barr virus inhibits the stimulatory effect of TLR7/8 and TLR9 agonists but not CD40 ligand in human B lymphocytes: Inhibition of TLR stimulation by EBV. Microbiol Immunol. 54(9):534–541.