Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Exploring roles of human APOBEC-mediated RNA editing activity
(USC Thesis Other)
Exploring roles of human APOBEC-mediated RNA editing activity
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Exploring Roles of Human APOBEC-mediated RNA Editing Activity
By
Kyu Min Kim
A Dissertation Presented to the
FACULTY OF THE USC GRAUDATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree of
DOCTOR OF PHILOSOPHY
(MOLECULAR BIOLOGY)
August 2023
ii
ACKNOWLEDGEMENTS
First and foremost, I would like to extend my deepest gratitude to my esteemed PhD advisor,
Dr. Xiaojiang S. Chen, for his exceptional mentorship and unwavering support throughout my
PhD research journey. His profound knowledge and guidance have been invaluable, enabling me
to conduct thorough and methodical research. Dr. Chen has consistently demonstrated trust in my
abilities, fostering an environment that encourages independent thinking and exploration of novel
ideas. Whenever I encountered obstacles, he readily offered his assistance, consistently providing
the guidance I needed. His mentorship has played an instrumental role in my growth as an
independent scientist. I cannot overstate the significance of his encouragement and support in my
achievements.
I would also like to express my heartfelt appreciation to the members of my dissertation
committee, Dr. Matt Michael, Dr. Carolyn Phillips, and Dr. Peter Calabrese, for their valuable time
and insightful suggestions, which have greatly contributed to the quality of my research.
I am immensely grateful to all the past and present members of our lab who have supported
and accompanied me on this scientific journey. Special thanks to Dr. Aaron Wolfe and Dr. Fumiaki
Ito for their mentorship during my early days in the lab. I am deeply indebted to Dr. Hanjing Yang
for her boundless assistance and intellectual contributions to my research. I would also like to
express my gratitude to our lab members, Shanshan Wang, Josue Pacheco, Benjamin Fixman,
William Fried, and Ziyuan Li, for their camaraderie and collaboration. It is a joy to have all of you
as members of the Chen lab, and I treasure the countless memorable moments we have shared
together.
Lastly, I would like to extend my heartfelt thanks to my family and my wife, Dr. Yejin
Kim, for their unconditional love and unwavering support. Their constant encouragement and
belief in me have been a driving force, motivating me to overcome every obstacle and challenge I
have encountered.
iii
TABLE OF CONTENTS
ACKNOWLEDGEMENTS ......................................................................................................... ii
LIST OF TABLES ........................................................................................................................ iv
LIST OF FIGURES ...................................................................................................................... v
ABSTRACT ................................................................................................................................. vii
Chapter I: Specific Editing of the RNA Genome of SARS-CoV-2 by Host APOBEC
Enzymes ......................................................................................................................................... 1
Chapter I: Introduction ............................................................................................................ 1
Chapter I: Results ..................................................................................................................... 5
Chapter I: Discussion .............................................................................................................. 35
Chapter I: Materials and methods ........................................................................................ 40
Chapter II: Decoding APOBEC3’s RNA Editing Enzyme-Substrate Dynamics .................. 47
Chapter II: Introduction......................................................................................................... 47
Chapter II: Results .................................................................................................................. 50
Chapter II: Discussion ............................................................................................................ 74
Chapter II: Materials and Methods ...................................................................................... 79
REFERENCES ............................................................................................................................ 83
iv
LIST OF TABLES
Table 1. Total C-to-U RNA editing data in the 7 SARS-CoV-2 regions ......................11
Table 2. Validation of the predicted A3A-mediated sites ............................................. 72
v
LIST OF FIGURES
Figure 1.1. Schematic representation of the structure of 11 members of the APOBEC family. .......... 3
Figure 1.2. Experimental design of APOBEC-mediated editing of SARS-CoV-2 RNA. ...................... 5
Figure 1.3. The C to U RNA editing rates by APOBECs detected on the selected SARS-CoV-2
segments in our cell-based assay system. .................................................................................................. 7
Figure 1.4. Local sequence context at the APOBEC-edited C sites on SARS-CoV-2 RNA. ................. 8
Figure 1.5. Single nucleotide variations (SNPs) of the SARS-CoV-2 genome sequences database
derived from patients. ............................................................................................................................... 19
Figure 1.6. Overall features of the RNA around the most preferred APOBEC-edited sites on SARS-
CoV-2. ......................................................................................................................................................... 20
Figure 1.7. Verification of C-to-U mutation as a result of direct RNA editing on the transcript of a
SARS-CoV-2 reporter segment. ............................................................................................................... 22
Figure 1.8. The potential effect of APOBEC-mediated editing on SARS-CoV-2 mutations and
fitness. ......................................................................................................................................................... 23
Figure 1.9. Single nucleotide variations of SARS-CoV-2 in representative clades and
characterization of C-to-U mutations in the Omicron variant (21M). ................................................. 26
Figure 1.10. SARS-CoV-2 replication and virion production in cells expressing APOBECs. ............ 27
Figure 1.11. SARS-CoV-2 replication and progeny production in different Coca-2 cell lines. .......... 29
Figure 1.12. Verification of C-to-U mutation caused by WT A3A-induced editing of SARS-CoV-2
virus in the 5’UTR region in Caca-2 cell culture infection assay. ......................................................... 31
Figure 1.13. Relations between SARS-CoV-2 infection and APOBEC expression. ............................. 32
Figure 1.14. A predicted secondary structure for 5’UTR region of SARS-CoV-2 and its functional
motifs. ......................................................................................................................................................... 33
Figure 2.1. A sensitive cell-based fluorescence analysis and sequencing method confirms A3A-
mediated specific RNA editing. ................................................................................................................ 50
Figure 2.2. Analysis of A3A-mediated RNA editing substrates. ............................................................ 52
Figure 2.3. Comparison of A3A editing events on the endogenous SDHB mRNA transcript and its
genomic DNA (gDNA). .............................................................................................................................. 53
Figure 2.4. C-to-U RNA Editing on EVI2B RNA substrate by all APOBEC family members. ......... 55
vi
Figure 2.5. A3A-mediated RNA editing primarily occurs in the cytoplasm. ....................................... 57
Figure 2.6. Specific RNA editing activity of A3B chimeras containing short A3A N-terminal peptide
sequences. ................................................................................................................................................... 60
Figure 2.7. Specific RNA editing activity by A3B chimera constructs with mutations to convert A3B-
CD2 domain to A3A-like sequences. ........................................................................................................ 62
Figure 2.8. Lack of RNA editing activity by catalytically inactive A3A-E72A and A3A chimera
replacing its loop1 region from that of A3B-CD2 ................................................................................... 63
Figure 2.9. Structural and sequence features for optimal A3A-mediated RNA editing events. ......... 66
Figure 2.10. Prediction and validation of novel RNA editing sites on cellular mRNA. ...................... 70
Figure 2.11. Characteristic analysis of potential editing sites predicted by A3A-mediated RNA
editing site. ................................................................................................................................................. 73
vii
ABSTRACT
The APOBEC3 family of human cytidine deaminases functions in a range of cellular
activities, including the innate and acquired immune system, by inducing C -to-U in single -stranded
DNA and/or RNA mutations. This dissertation focuses on investigating the rol es of APOBEC RNA
editors, specifically APOBEC1, APOBEC3A, and APOBEC3G, as catalytic enzymes involved in
RNA editing activity. While their existence as RNA editors has been recently established, the
precise mechanisms and specific targets of their editing activity remain poorly understood. My
research aims to elucidate how these APOBEC RNA editors selectively target and edit human
endogenous RNA s and/or viral RNA s within human cells. My dissertation is mainly divided into
two parts , each addressing distinct aspects of APOBEC RNA editing :
- Chapter I: Specific Editing of the RNA Genome of SARS-CoV-2 by Host APOBEC
Enzymes
- Chapter II: Unraveling the Enzyme-Substrate Properties of a Specific APOBEC3A-
mediated RNA Editing
The findings from this dissertation will con tribute to a better understanding of the functions and
regulatory mechanisms of APOBEC RNA editors, paving the way for potential therapeutic
applications and targeted interventions in RNA editing processes.
1
Chapter I: Specific Editing of the RNA Genome of SARS-CoV-2 by Host APOBEC
Enzymes
Adapted from a publication: The Roles of APOBEC-mediated RNA Editing in SARS-CoV-2
Mutations, Replication and Fitness. Kyumin Kim, Peter Calabrese, Shanshan Wang, Chao Qin,
Youliang Rao, Pinghui Feng & Xiaojiang S. Chen (2022) Scientific Reports
Chapter I: Introduction
Since the emergence of severe acute respiratory syndrome coronavirus -2 (SARS-CoV-2)
under name of COVID -19 at the end of 2019, it has become a pandemic threatening public safety
and human health around the world [1]. Of the numerous scientific efforts to tackle this highly
infectious disease, identifying genome of SARS -CoV-2 to fully grasp the enemy was addressed as
a top priority [2]. SARS-CoV-2 is classified as an RNA virus (order Nidovirales, family
Coronaviridae) enveloped with a single -stranded positive-sense RNA genome [2, 3]. In general,
RNA viruses exhibit high mutation rates [4, 5], but cor onaviruses, including SARS -CoV-2, have
moderate genetic variability compared to other RNA viruses because errors caused by the RNA -
dependent RNA -polymerase (RdRP) can be corrected by a proofreading machinery, such as RNA
exonuclease ExoN [6-8]. However, to date enormous sequence data have revealed its persistent
accumulation of new mutations [9], highlighting the importance of understanding the evolution of
SARS-CoV-2 genome. There are three main sources for SARS -CoV-2 viral mutations [10, 11]: (1)
Spontaneous random errors during replications, (2) Viral replication proofreading errors defined
as a defective repair mechanism, and (3) Host -driven viral genome editing [12]. Recent SARS -
CoV-2 genome data show mutational pattern s with specific directionality rather than random
2
genetic variations, so evolutionary pressure by the host is understood as a key reason for SARS -
CoV-2 genomic mutations [12-17].
It is suggested that the host -mediated direct mutations on SARS -CoV-2 occur in the course
of the immune response [11]. Reactive oxygen species (ROS) [18] and two classes of human RNA
deaminases, adenosine deaminases acting on RNA (ADAR) [19] and apolipoprotein-B ( ApoB)
mRNA editing enzyme, catalytic polypeptide -like (APOBEC) [20] proteins, have the potential to
cause viral mutations as a reaction to antiviral activity and/or as a by -product of this process. The
ROS could oxidize nucleic acids and cause viral mutation and inactivation [18, 21, 22]. A
hypothesis has been recently proposed that the genetic variations in G -to-U and C -to-A observed
in SARS-CoV-2 genome might be related to mutagenic activity of ROS [17]. The ADAR enzymes
modify adenosine to inosine (A -to-I, read as A -to-G) in double -stranded RNA, and play important
roles in antiviral or pro -viral effects [23-25]. It has been argued that ADAR could generate A -to-
G mutations (and U -to-C on negative -strand) on SARS -CoV-2 because some COVID -19 related
RNA-seq data s howed A-to-G variations both on viral and known human target transcripts, albeit
the editing levels were low (< 1%) [14, 26]. The APOBEC proteins deaminate cytosine to uracil
(C-to-U) in single -stranded DNA and/or RNA [27], and function in innate and adaptive immune
response by generating C -to-U editing on pathogens [28-30]. The overwhelmingly extensive C -to-
U transition (about 40 % of the total single nucleotide variations) , from the initial analysis of
SARS-CoV-2 genomic variations to the recent reports on its evolutionary traits, was most
prominent, so this viral mutation pattern was interpreted as a result of RNA editing by APOBECs
rather than random mutations [14, 15, 17, 31-36]. However, despite the absolute importance of
APOBEC-mediated SARS-CoV-2 genome editing, direct evidence of this characteristic is poorly
understood.
3
Figure 1.1. Schematic representation of the structure of 11 members of the APOBEC family.
APOBEC proteins contain 1 or 2 conserved cytidine deaminase motifs. The amino acid sizes of each
APOBEC are indicated on the right.
The human APOBEC proteins are composed of a total of 11 members: APOBEC1,
APOBEC2, seven APOBEC3s (A3A, A3B, A3C, A3D, A3F, A3G, and A3H), APOBEC4, and
AID, each of which has a different biological role, and they play a variety of physiological
functions, including immune response, by regulating C -to-U editing on their target substrates
(Figure 1.1) [37, 38]. The C-to-U RNA editing enzyme was first discovered when researchers
found a truncated version of APOB protein resulting from catalytic deamination (C -to-U editing)
of ApoB mRNA by a novel enzyme, thus regulating its role in lipid metabolism [39-41]. Based on
this functionality, the first C -to-U RNA editing enzyme was termed APOBEC1 (A1), or “ ApoB
4
mRNA editing enzyme, catalytic polypeptide 1”, and the rest of the APOBEC protein members
was named from this founding member [42]. Unlike A1, the other APOBEC members were
considered to only have single -stranded DNA editing activity, rather than playing a role in RNA
editing [43]. However, with the help of breakthroughs in next -generation RNA-seq technology, it
has been recently found that two APOBEC3 proteins (A3A [44, 45] and A3G [46]) could also
perform RNA editing activity. Therefore, it is very important to study whether the host APOBEC
RNA editors (A1, A3A, and A3G) are cable of generating mutations in the SARS -CoV-2 genome
in understanding the most protruding C -to-U mutational signatures.
This part of dissertation particularly investigates whether APOBEC proteins directly
generate C-to-U mutations in SARS -CoV-2 genome sequence, and how such mutations may
impact viral replication, infectivity, future therapeutics, and vacc ine development. Among three
human APOBEC RNA editors, we identified RNA motifs that were favored by A1 and A3A,
respectively, and A3G was relatively poorly involved in the SARS -CoV-2 mutation. The continued
mutation/evolution of SARS -CoV-2 poses potential danger to negate the fruits of current sacrifice
by emerging new strains that can become more virulent [47] or evade current vaccines [48]. The
features of mutational sequence by APOBEC RNA editors will be an important reference for
predicting the evolutionary direction of SARS -CoV-2 as well as understanding the genetic
variability of other RNA viruses.
5
Chapter I: Results
Design of APOBEC-mediated SARS-CoV-2 mutations
Figure 1.2. Experimental design of APOBEC-mediated editing of SARS-CoV-2 RNA.
(1) Diagram of the SARS -CoV-2 genomic RNA, showing the positions (box) of the seven RNA
segments (1-7) selected for studying the RNA editing by APOBECs. (B) Reporter vector (top)
that contain each of the seven selected viral RNA segments. (C) Three APOBEC editor vectors
(top, A1-2A-A1CF, A3A, and A3G) and the Western blot showing their expression in 293T cells
6
(bottom). (D) Strategy of the Safe -Sequencing-System (SSS) to minimize errors from PCR
amplification and sequencing.
Based on the anal ysis that C -to-U variation type accounts for the highest preponderance of
about 40 % among the SARS -CoV-2 RNA genome mutations, many claims have recently been
made that this characteristic can be generated by host C -to-U editing enzymes, human APOBEC
proteins [14, 15, 17, 31-36]. However, direct evidence of the APOBEC -mediated SARS-CoV-2
mutations has been poorly studied, as it is difficult to accurately investigate RNA editing rates by
each APOBEC protein possibly due to high technical sequencing errors ( 10
−2
- 10
−3
per
nucleotide sequenced in Il lumina technology).
Our lab recently developed a cell -based sensitive RNA editing assay to check C -to-U RNA
editing of A1 and its cofactors, and successfully validated the RNA editing activity [49]. In this
study, we used the cell -based RNA editing system to verify RNA editing in specific regions of
SARS-CoV-2 by three APOBEC proteins, A1+A1CF, A3A, and A3G, because these three are
reported to possess RNA editing activities [39, 45, 46]. Due to technical limitations of read length
for Illumina sequencing, we selected 7 different regions of SARS -CoV-2 with a length of 200nt,
which we chose because (1) these 7 regions have a relatively high cytosine cont ent, (2) they are
distributed to include various viral genes (Figure 1.2A). The selected fragments (200nt/fragment)
of the virus genome were constructed into reporter cassette and the corresponding mRNA was co -
expressed with APOBEC proteins in HEK293T cell s (Figure 1.2B-2C). The reporter constructs
were developed by adding an AAV intron in the middle of eGFP that can be useful to identify only
mature mRNA sequences using exon -exon junction primer, making it possible to rule out DNA
contamination issue durin g amplification steps of reverse transcribed cDNA (Figure 1.2C). In
addition, to minimize the high technical sequencing errors, we analyzed sequence variations
through the Safe -Sequencing-System (SSS), a targeted next generation deep -sequencing system,
7
with slightly adapted protocols (Figure 1.2D) [50, 51]. Briefly, (1) AccuScript high -fidelity
reverse transcriptase (known to have ~ 10
−4
− 10
−5
error rates) was used to minimize errors that
may occur during cDNA synthesis. (2) In the first round PCR step, run only initial 2 cycles with a
primer containing a Unique iDentifier (UID), a string of randomized sequence, to attach the large
family of different UID barcodes (~ 4
15
), discerning each original target molecule [50, 51]. (3)
The 2 cycles are followed by the SSS library preparation step that amplifies each target SARS -
CoV-2 region w ith Illumina adaptors (PCR error rate is ~ 10
−7
). (4 -5) Errors from paired -end
Illumina sequencing (PE150, ~ 10
−2
-10
−3
error rates) are minimized by eliminating a certain rare
mutation in the same UID family.
The sequence contexts near target cytosine by APOBECs in SARS-CoV-2 RNA sequences
All types of single nucleotide editing by APOBEC proteins in the 7 selected SARS -CoV-2
regions were analyzed through the SSS system, and the average number of UID families was about
130,000 (minimum 85,000, maximum 187, 000) among ~ 484 million (paired) rea ds. The C -to-U
editing is clearly observed by APOBEC proteins, especially by A1+A1CF and A3A, but other
Figure 1.3. The C to U RNA editing rates by APOBECs detected on the selected SARS-CoV-2
segments in our cell-based assay system.
8
variation types are not significantly different from those of control group, so the C > U editing
activity will be heavily addressed in this paper (Figure 1.3).
Figure 1.4. Local sequence context at the APOBEC-edited C sites on SARS-CoV-2 RNA.
(1) Local sequences around the significantly edited target C sites (± 5 nucleotides from target C at
position 0) by A1+A1CF, A3A, or A3G. The editing level of each C site was normalized to the Ctrl,
and only sites with 3x or higher editing levels than the nor malized value were defined as significant
editing sites. (B) Analysis of local sequences around the top 30% edited C sites (or hotspot editing
sites), showing predominantly A C motif for A1+A1CF, U C for A3A, and C C for A3G. (C-D)
Comparison of the C -to-U editing rates (%) of different dinucleotide motifs by a particular
APOBEC (panel-C) and the C -to-U editing rates (%) of a particular dinucleotide motif by the three
APOBECs (panel-D). Each dot represents the C -to-U editing level o btained from the SSS results.
In panel-D, statistical significance was calculated by unpaired two -tailed student’s t -test with P -
9
values represented as: P > 0.05 = not significant; not indicated, * = P < 0.05, *** = P < 0.001, ****
= P < 0.0001.
The total number of cytosine sites in the 7 SARS -CoV-2 regions is 307 (~ 22%, 307 out of
1400nt), and C -to-U editing levels by each APOBEC are normalized by control group (Table 1).
Among the cytosines, we define as a significant target site where the editing effici ency is at least
3 times higher than that of control, and the number of the significant sites is 135 for A1+A1CF, 67
for A3A, and 11 for A3G, respectively (Table 1 and Figure 1.4A). Sequence contexts of the
significant sites (± 5 nucleotides from target cytosine) are identified, and consistent with previous
reports, A1+A1CF and A3A prefers A or U, and U in the -1 position, respectively [45, 52]; however,
the specific sequence context by A3G is not suitable for va lidation because the sample size is too
small (n=11) (Figure 1.4A). We also confirmed the sequence features of the top 30% with the
highest editing rates to figure out hotspot RNA editing sites by each APOBEC. Interestingly, it
distinctly shows that A1+A1C F better targets A C motif (target C underlined), whereas A3A
recognizes U C motif in the hotspot editing sites, suggesting A C-to-AU mutations are probably
generated by A1+A1CF (or other cofactors, such as RBM47 [53]) and U C-to-UU may be due to
A3A in the variation patterns of SARS -CoV-2 genome (Figure 1.4B). Reflecting the importance
of the -1 position of target cytosine sites, we examined how the RNA editing efficiency for each
condition differed according to the -1 position (N C motifs, target C underlined, N: A, U, G, or C)
(Figure 1.4C-4D). The control grou p shows low C -to-U RNA editing levels with an average of
0.049 %, but unexpectedly, G C-to-GU mutation pattern indicates slightly higher value at about
0.13 %, which may be caused by other unknown sources, such as errors from library preparation
and/or certain intracellular factors (Figure 1.4C). The A C-to-AU mutation pattern is mainly
observed only by A1+A1CF, not by A3A or A3G, and the U C-to-UU pattern is shown significantly
by A3A (Figure 1.4D).
10
11
Table 1. Total C-to-U RNA editing data in the 7 SARS-CoV-2 regions
Position A1ACF A3A A3G Ctrl Gene Codon Events Entropy
147 0.2027 0.0171 0.0069 0.0090 5’UTR NA 7 0.015
151 0.0574 0.0968 0.0555 0.0549 5’UTR NA 0 0
157 0.1878 0.1249 0.1801 0.1432 5’UTR NA 0 0
162 0.0291 0.0363 0.0196 0.0229 5’UTR NA 2 0.007
164 0.1843 0.1561 0.1929 0.1440 5’UTR NA 0 0
171 0.0159 0.0131 0.0139 0.0147 5’UTR NA 1 0.003
173 0.0185 0.0081 0.0162 0.0123 5’UTR NA 1 0.003
176 0.0062 0.0141 0.0104 0.0139 5’UTR NA 1 0.003
180 0.0115 0.0101 0.0058 0.0033 5’UTR NA 3 0.007
183 0.0071 0.0050 0.0104 0.0074 5’UTR NA 0 0
186 0.2813 0.2558 0.3361 0.2708 5’UTR NA 5 0.015
190 0.0247 0.0201 0.0185 0.0147 5’UTR NA 0 0
193 0.0617 0.0675 0.0878 0.0499 5’UTR NA 3 0.007
197 0.1695 0.1502 0.2093 0.1737 5’UTR NA 0 0
203 0.0318 2.0735 0.0428 0.0156 5’UTR NA 16 0.046
206 0.1423 0.1372 0.1377 0.1369 5’UTR NA 1 0.003
207 0.1032 0.0907 0.1295 0.1056 5’UTR NA 1 0.003
214 0.1518 0.1391 0.1746 0.1401 5’UTR NA 3 0.007
217 0.3993 0.3724 0.4919 0.4131 5’UTR NA 3 0.007
218 0.0256 0.0273 0.0278 0.0304 5’UTR NA 1 0.003
222 0.0159 0.0888 0.0058 0.0115 5’UTR NA 9 0.035
225 0.0565 0.1078 0.0474 0.0221 5’UTR NA 1 0.003
228 0.0962 0.0847 0.1272 0.0745 5’UTR NA 0 0
230 0.0053 0.0181 0.0069 0.0115 5’UTR NA 2 0.005
233 0.0062 0.0302 0.0092 0.0098 5’UTR NA 2 0.007
241 0.0344 0.9392 0.0785 0.0474 5’UTR NA 23 0.167
244 0.1207 0.1047 0.1397 0.1169 5’UTR NA 0 0
245 0.1357 0.1198 0.1477 0.1243 5’UTR NA 2 0.005
254 0.1098 0.1344 0.1495 0.1149 5’UTR NA 1 0.003
255 0.0318 0.0343 0.0532 0.0352 5’UTR NA 0 0
274 0.2699 0.2408 0.3189 0.2431 Orf1a/Nsp1 S3S 1 0.003
275 0.0133 0.0131 0.0104 0.0098 Orf1a/Nsp1 L4F 2 0.015
280 0.0578 0.0670 0.0699 0.0569 Orf1a/Nsp1 V5V 1 0.007
281 0.0327 0.0454 0.0382 0.0328 Orf1a/Nsp1 P6S 1 0.003
282 0.0318 0.0212 0.0173 0.0205 Orf1a/Nsp1 P6L 2 0.005
289 0.0379 0.0847 0.0428 0.0270 Orf1a/Nsp1 F8F 2 0.005
292 0.2593 0.0725 0.1063 0.0712 Orf1a/Nsp1 N9K 1 0.003
300 0.2505 0.0171 0.0231 0.0123 Orf1a/Nsp1 T12I 3 0.007
12
302 2.4161 0.0947 0.1052 0.0917 Orf1a/Nsp1 H13Y 0 0
304 0.5229 0.2452 0.3674 0.2371 Orf1a/Nsp1 H13H 0 0
307 0.0265 0.0131 0.0162 0.0180 Orf1a/Nsp1 V14V 1 0.003
308 0.0247 0.0111 0.0312 0.0090 Orf1a/Nsp1 Q15X 0 0
311 0.1614 0.0141 0.0127 0.0074 Orf1a/Nsp1 L16F 1 0.003
313 0.0476 0.0594 0.0116 0.0147 Orf1a/Nsp1 L16L 7 0.044
320 0.2549 0.2318 0.2750 0.2284 Orf1a/Nsp1 P19S 0 0
321 0.0952 0.0846 0.1028 0.0720 Orf1a/Nsp1 P19L 0 0
329 0.1377 0.0868 0.1273 0.0950 Orf1a/Nsp1 Q22X 0 0
335 0.1121 0.1140 0.0555 0.0459 Orf1a/Nsp1 R24C 10 0.029
337 0.3095 0.1995 0.1975 0.1645 Orf1a/Nsp1 R24R 11 0.022
340 0.7341 0.0295 0.0396 0.0280 Orf1a/Nsp1 D25D 0 0
15981 0.1236 0.4844 0.0315 0.0249 Orf1b/Nsp12 I738I 3 0.009
15989 3.6475 0.0281 0.0113 0.0136 Orf1b/Nsp12 T741I 0 0
15998 3.9858 0.0215 0.0097 0.0142 Orf1b/Nsp12 T744I 0 0
16000 4.0590 0.0104 0.0210 0.0136 Orf1b/Nsp12 L745K 1 0.003
16012 3.6960 0.2768 0.3189 0.2838 Orf1b/Nsp12 R749W 0 0
16017 0.1156 0.2704 0.0428 0.0289 Orf1b/Nsp12 F750F 0 0
16022 0.1780 0.0092 0.0129 0.0062 Orf1b/Nsp12 S752F 0 0
16028 0.2114 0.0731 0.0404 0.0295 Orf1b/Nsp12 A754V 0 0
16037 0.2768 0.0346 0.0339 0.0357 Orf1b/Nsp12 A757V 0 0
16041 0.5021 0.0594 0.0557 0.0601 Orf1b/Nsp12 Y758Y 0 0
16042 0.0451 0.0399 0.0583 0.0494 Orf1b/Nsp12 P759S 0 0
16043 0.0129 0.0144 0.0170 0.0250 Orf1b/Nsp12 P759L 0 0
16045 1.3719 0.0157 0.0048 0.0085 Orf1b/Nsp12 L760F 0 0
16049 4.8525 0.0065 0.0081 0.0096 Orf1b/Nsp12 T761I 1 0.005
16054 22.1537 0.0091 0.0065 0.0085 Orf1b/Nsp12 H763Y 1 0.003
16057 1.1555 0.1087 0.0591 0.0432 Orf1b/Nsp12 P764S 2 0.005
16058 0.0156 0.0059 0.0081 0.0108 Orf1b/Nsp12 P764L 0 0
16063 0.0508 4.4625 0.0298 0.0277 Orf1b/Nsp12 Q766X 0 0
16073 0.4858 0.2361 0.2670 0.2399 Orf1b/Nsp12 A769V 0 0
16080 0.5388 0.0563 0.0292 0.0108 Orf1b/Nsp12 V771V 0 0
16084 0.1353 0.1089 0.0291 0.0085 Orf1b/Nsp12 H773Y 1 0.003
16092 6.5739 0.0163 0.0186 0.0136 Orf1b/Nsp12 Y775Y 3 0.007
16096 1.5882 0.0255 0.0194 0.0045 Orf1b/Nsp12 Q777X 0 0
16101 1.6996 0.0059 0.0065 0.0108 Orf1b/Nsp12 Y778Y 0 0
16111 1.7044 0.0268 0.0234 0.0193 Orf1b/Nsp12 L782L 11 0.022
16114 3.8418 0.0316 0.0122 0.0177 Orf1b/Nsp12 H783Y 0 0
16127 0.2108 0.1341 0.1449 0.1408 Orf1b/Nsp12 T787I 0 0
16132 0.1303 0.0190 0.0194 0.0187 Orf1b/Nsp12 H789Y 0 0
16134 0.0568 0.0105 0.0210 0.0125 Orf1b/Nsp12 H789H 0 0
13
16143 0.1109 0.0085 0.0057 0.0091 Orf1b/Nsp12 D792D 0 0
16151 0.0342 0.0313 0.0226 0.0300 Orf1b/Nsp12 S795F 0 0
16159 0.2514 0.0581 0.0710 0.0617 Orf1b/Nsp12 L7898F 0 0
16163 1.2682 0.0111 0.0048 0.0045 Orf1b/Nsp12 T799I 0 0
22982 0.0537 0.2140 0.0533 0.0266 Spike Q474X 0 0
22986 0.1657 0.1812 0.2161 0.1807 Spike A475V 0 0
22987 0.0421 0.0474 0.0247 0.0482 Spike A475A 3 0.013
22993 0.1541 0.0960 0.1250 0.1008 Spike S477S 6 0.051
22995 0.0699 0.0192 0.0216 0.0152 Spike T478I 17 0.623
22997 0.1623 0.0179 0.0255 0.0254 Spike P479S 3 0.007
22998 0.0122 0.0147 0.0317 0.0101 Spike P479L 1 0.003
23029 1.5898 0.0250 0.0046 0.0070 Spike Y489Y 5 0.027
23033 0.1759 0.0275 0.0262 0.0146 Spike P491S 0 0
23034 0.0122 0.0096 0.0147 0.0095 Spike P491L 0 0
23039 0.8731 0.0141 0.0116 0.0108 Spike Q493X 0 0
23043 0.0333 0.1334 0.0039 0.0070 Spike S494L 1 0.003
23053 0.0347 0.0391 0.0294 0.0324 Spike F497F 0 0
23054 0.0416 0.0527 0.0372 0.0426 Spike Q498X 0 0
23057 0.0727 0.0686 0.0935 0.0743 Spike P499S 0 0
23058 0.0088 0.0109 0.0062 0.0070 Spike P499L 1 0.005
23059 0.0068 0.0103 0.0077 0.0184 Spike P499P 4 0.01
23061 0.2527 0.0083 0.0039 0.0120 Spike T500I 0 0
23077 0.1004 0.0915 0.1142 0.0798 Spike Y505Y 1 0.003
23078 0.0299 0.0224 0.0355 0.0190 Spike Q506X 0 0
23081 0.9562 0.0160 0.0224 0.0196 Spike P507S 0 0
23082 0.0041 0.0160 0.0116 0.0139 Spike P507L 0 0
23086 0.5402 0.0179 0.0162 0.0152 Spike Y508Y 4 0.009
23099 1.9079 0.0109 0.0225 0.0057 Spike L513F 0 0
23103 0.3555 0.1113 0.0078 0.0070 Spike S514F 2 0.005
23111 4.2084 0.0231 0.0217 0.0134 Spike L517F 0 0
23114 0.1403 0.0456 0.0093 0.0051 Spike L518L 5 0.011
23117 1.4603 0.0340 0.0216 0.0247 Spike H519Y 0 0
23121 0.6013 0.0419 0.0590 0.0459 Spike A520V 0 0
23123 1.2683 0.0289 0.0464 0.0330 Spike P521S 1 0.003
23124 0.0224 0.0480 0.0378 0.0279 Spike P521L 0 0
23127 0.2528 0.0469 0.0349 0.0159 Spike A522V 4 0.009
23130 1.5428 0.0167 0.0255 0.0158 Spike T523I 0 0
23141 0.8901 0.0077 0.0139 0.0140 Spike P527S 0 0
23142 0.0088 0.0212 0.0062 0.0108 Spike P527L 0 0
23151 0.5557 0.0122 0.0054 0.0057 Spike S530F 0 0
23154 0.1760 0.0141 0.0209 0.0140 Spike T531I 0 0
14
23170 13.5581 0.0499 0.0131 0.0152 Spike N536N 2 0.005
23179 0.2490 0.1036 0.0132 0.0166 Spike V539V 0 0
23304 0.0180 0.0226 0.0201 0.0122 Spike T581I 0 0
23306 0.1296 0.0124 0.0158 0.0129 Spike L582F 2 0.005
23315 0.0277 0.0233 0.0147 0.0243 Spike L585F 1 0.003
23320 0.0822 0.0287 0.0168 0.0202 Spike D586D 1 0.003
23325 0.0483 0.0249 0.0126 0.0154 Spike T588I 0 0
23327 0.0608 0.0443 0.0695 0.0550 Spike P589S 0 0
23328 0.0081 0.0101 0.0084 0.0178 Spike P589L 0 0
23334 0.0413 0.4614 0.0539 0.0089 Spike S581F 0 0
23347 0.0546 0.0917 0.0400 0.0291 Spike V595V 0 0
23358 0.2362 0.0187 0.0137 0.0073 Spike T599I 0 0
23360 0.2331 0.0195 0.0179 0.0130 Spike P600S 0 0
23361 0.0698 0.1042 0.1065 0.0696 Spike P600L 0 0
23367 0.2343 0.0124 0.0105 0.0081 Spike T602I 0 0
23373 0.4349 0.0156 0.0053 0.0146 Spike T604I 2 0.005
23376 0.0215 0.0638 0.0011 0.0146 Spike S605F 0 0
23380 0.1147 0.0241 0.0338 0.0251 Spike N606N 0 0
23381 0.0153 0.0211 0.0995 0.0154 Spike Q607X 0 0
23388 0.2104 0.2062 0.2056 0.1740 Spike A609V 0 0
23393 0.0450 0.0469 0.0074 0.0431 Spike L611F 0 0
23399 0.0339 0.3703 0.0484 0.0291 Spike Q613X 0 0
23410 0.5734 0.0404 0.0242 0.0307 Spike N616N 1 0.003
23413 0.0866 0.0598 0.0557 0.0719 Spike C617C 0 0
23415 0.4066 0.0413 0.0348 0.0356 Spike T618I 1 0.003
23422 0.0878 0.1145 0.0285 0.0421 Spike V620V 5 0.011
23423 0.0216 0.0227 0.0191 0.0293 Spike P621S 2 0.005
23424 0.0270 0.0399 0.0191 0.0301 Spike P621L 0 0
23430 0.1838 0.1138 0.1120 0.1183 Spike A623V 4 0.011
23435 0.1525 1.6419 0.0258 0.0124 Spike H625Y 0 0
23439 0.3578 0.1765 0.1980 0.1642 Spike A626V 2 0.005
23444 0.0584 0.3290 0.0095 0.0049 Spike Q628X 0 0
23447 1.1507 0.0225 0.0147 0.0186 Spike L629F 0 0
23451 1.1631 0.0233 0.0158 0.0154 Spike T630I 0 0
23453 0.0733 0.0947 0.0337 0.0291 Spike P631S 2 0.005
23454 0.0080 0.0179 0.0084 0.0170 Spike P631L 1 0.003
23457 1.0737 0.0085 0.0126 0.0073 Spike T632I 3 0.008
23462 0.2915 0.1889 0.1801 0.1658 Spike R634C 0 0
23472 0.2213 0.0880 0.0063 0.0081 Spike S637F 0 0
23475 0.8028 0.0218 0.0201 0.0259 Spike T638I 0 0
23481 0.1190 0.1913 0.0190 0.0040 Spike S640F 3 0.009
15
23492 0.2261 1.4572 0.0421 0.0186 Spike Q644X 0 0
23496 1.4519 0.0311 0.0179 0.0227 Spike T645I 0 0
23498 20.7256 0.0311 0.0337 0.0267 Spike R646C 0 0
23502 0.2592 0.0976 0.0854 0.0738 Spike A647V 0 0
26252 0.0138 0.0459 0.0081 0.0240 Envelope S3L 0 0
26256 0.0398 0.1428 0.0260 0.0252 Envelope F4F 6 0.013
26261 0.0614 0.1332 0.0431 0.0492 Envelope S6L 0 0
26270 0.1999 0.0422 0.0326 0.0363 Envelope T9I 3 0.368
26276 0.0363 0.0252 0.0155 0.0154 Envelope T11M 0 0
26292 0.0130 0.0104 0.0163 0.0117 Envelope S16S 0 0
26296 0.0260 0.0185 0.0285 0.0049 Envelope L18F 0 0
26299 0.0173 0.0141 0.0033 0.0049 Envelope L19F 0 0
26305 0.0295 0.0252 0.0122 0.0154 Envelope L21F 3 0.027
26309 0.0476 0.0222 0.0155 0.0308 Envelope A22V 0 0
26313 0.0364 0.1350 0.0155 0.0105 Envelope F23F 3 0.011
26322 0.0406 0.0237 0.0130 0.0135 Envelope F26F 0 0
26326 0.1984 0.1891 0.2004 0.1888 Envelope L28L 1 0.003
26333 0.0927 0.1432 0.0359 0.0216 Envelope T30I 0 0
26335 0.1427 0.0281 0.0244 0.0308 Envelope L31L 1 0.003
26339 0.1662 0.1371 0.1563 0.1170 Envelope A32V 0 0
26340 0.0174 0.0134 0.0123 0.0105 Envelope A32A 1 0.003
26343 0.0311 0.0872 0.0244 0.0295 Envelope I33I 0 0
26344 0.0069 0.0059 0.0073 0.0086 Envelope L34F 0 0
26348 0.1951 0.0384 0.0284 0.0123 Envelope T35I 0 0
26351 0.2970 0.2484 0.2469 0.2113 Envelope A36V 0 0
26353 0.0872 0.0259 0.0187 0.0184 Envelope L37F 1 0.003
26356 0.0397 0.1013 0.0447 0.0399 Envelope R38X 0 0
26366 0.4847 0.4489 0.4346 0.4531 Envelope A41V 1 0.003
26370 0.4108 0.0200 0.0114 0.0092 Envelope Y42Y 0 0
26373 0.1710 0.0805 0.0834 0.0519 Envelope C43C 0 0
26376 0.1120 0.0490 0.0506 0.0389 Envelope C44C 0 0
26388 0.6381 0.0514 0.0508 0.0433 Envelope N48N 1 0.003
26395 0.3404 0.0170 0.0065 0.0117 Envelope L51F 2 0.005
26404 2.1195 0.0096 0.0089 0.0068 Envelope P54S 0 0
26405 0.0355 0.0208 0.0090 0.0055 Envelope P54L 0 0
26408 0.0346 0.3371 0.0203 0.0092 Envelope P55F 2 0.005
26415 0.7391 0.0587 0.0612 0.0401 Envelope Y57Y 1 0.003
26421 1.1091 0.0096 0.0122 0.0129 Envelope Y59Y 1 0.003
26423 0.2330 0.0490 0.0236 0.0129 Envelope S60F 1 0.003
26425 0.0528 0.3362 0.0268 0.0234 Envelope R61C 0 0
26437 0.0727 0.0511 0.0090 0.0129 Envelope L65L 0 0
16
26444 0.0969 0.0940 0.0073 0.0043 Envelope S67F 1 0.003
26447 0.1072 0.0178 0.0057 0.0080 Envelope S68F 2 0.005
26799 0.0164 0.0216 0.0336 0.0256 Membrane L93F 0 0
26801 0.0082 0.0498 0.0103 0.0067 Membrane L93L 9 0.049
26804 0.1242 0.0838 0.0980 0.1011 Membrane S94S 0 0
26807 0.0867 0.0166 0.0096 0.0110 Membrane Y95Y 0 0
26810 0.0082 0.0678 0.0055 0.0061 Membrane F96F 0 0
26815 0.1085 0.0505 0.0493 0.0402 Membrane A98V 0 0
26818 0.0839 0.0274 0.0075 0.0073 Membrane S99F 0 0
26822 0.0096 0.2332 0.0151 0.0116 Membrane F100F 3 0.009
26826 0.5497 0.0123 0.0082 0.0140 Membrane L102L 0 0
26833 0.2409 0.2599 0.2939 0.2733 Membrane A104V 1 0.003
26835 0.1269 0.0960 0.1042 0.1077 Membrane R105C 0 0
26839 0.2580 0.2015 0.1877 0.1010 Membrane T106M 0 0
26841 0.1057 0.1047 0.1096 0.0955 Membrane R107C 0 0
26845 0.0266 0.0426 0.0308 0.0341 Membrane S108F 0 0
26846 0.0177 0.0152 0.0178 0.0122 Membrane S108S 1 0.003
26854 0.0471 0.0368 0.0151 0.0104 Membrane S111L 0 0
26858 0.0471 0.2296 0.0164 0.0134 Membrane F112F 4 0.026
26862 0.0642 0.1056 0.0446 0.0354 Membrane P114S 0 0
26863 0.0123 0.0325 0.0185 0.0122 Membrane P114L 0 0
26869 3.3517 0.0065 0.0082 0.0067 Membrane T116I 0 0
26873 0.5824 0.0173 0.0137 0.0116 Membrane N117N 12 0.038
26877 0.1272 0.0557 0.0048 0.0073 Membrane L119F 0 0
26880 0.0717 0.0643 0.0130 0.0073 Membrane L120F 0 0
26882 0.1822 0.1206 0.0226 0.0256 Membrane L120L 6 0.013
26885 1.0290 0.0412 0.0302 0.0237 Membrane N121N 14 0.037
26889 0.2508 0.3050 0.3394 0.2985 Membrane P123S 0 0
26890 0.0082 0.0180 0.0082 0.0152 Membrane P123L 0 0
26892 0.3585 0.0346 0.0185 0.0207 Membrane L124F 1 0.003
26894 0.0198 0.4290 0.0267 0.0225 Membrane L124L 9 0.038
26895 0.0157 0.1543 0.1800 0.0085 Membrane H125Y 1 0.003
26900 0.0470 0.0577 0.0609 0.0687 Membrane G126G 0 0
26902 0.0109 0.0115 0.0096 0.0164 Membrane T127I 2 0.005
26907 0.1631 0.1040 0.0514 0.0518 Membrane L129L 1 0.005
26911 1.4145 0.0557 0.0679 0.0823 Membrane T130I 0 0
26912 0.0061 0.0166 0.0158 0.0195 Membrane T130T 2 0.009
26916 0.9252 0.0651 0.0748 0.0719 Membrane P132S 0 0
26917 0.0348 0.0434 0.0507 0.0414 Membrane P132L 0 0
26919 0.2586 0.0267 0.0130 0.0195 Membrane L133F 0 0
26922 0.1665 0.0383 0.0082 0.0067 Membrane L134L 2 0.005
17
26934 2.3362 0.0405 0.0439 0.0518 Membrane L138F 0 0
26936 0.1721 0.0780 0.0391 0.0347 Membrane L138L 7 0.023
26942 0.0459 0.1936 0.0736 0.0501 Membrane I140I 1 0.003
26947 0.1424 0.0673 0.0955 0.0885 Membrane A142V 0 0
26954 0.0807 0.0629 0.0556 0.0536 Membrane I144I 1 0.003
26955 0.0341 0.0094 0.0103 0.0103 Membrane L145F 0 0
26958 0.3098 1.0474 0.0370 0.0219 Membrane R146C 0 0
26964 1.6774 0.0224 0.0137 0.0116 Membrane H148Y 1 0.003
26967 0.2355 0.0621 0.0192 0.0195 Membrane L149F 0 0
26970 0.1777 0.1654 0.0475 0.0452 Membrane R150C 0 0
26977 0.2609 0.1963 0.2288 0.2223 Membrane A152V 0 0
26982 5.0585 0.0195 0.0240 0.0280 Membrane H154Y 0 0
26984 1.9016 0.0657 0.0761 0.0548 Membrane H154H 0 0
26985 0.0574 0.0188 0.0103 0.0104 Membrane H155Y 1 0.003
26988 0.1461 0.0679 0.0062 0.0097 Membrane L156L 2 0.005
26994 2.6298 0.0123 0.0110 0.0128 Membrane R158C 0 0
28786 0.0108 0.0689 0.0484 0.0067 Necleocapsid F171F 1 0.003
28789 0.0623 0.0510 0.0459 0.0653 Necleocapsid Y172Y 3 0.007
28791 0.0217 0.4128 0.0326 0.0310 Necleocapsid A173V 1 0.003
28801 0.2321 0.2728 0.2406 0.2596 Necleocapsid S176S 0 0
28807 0.0975 0.0819 0.1002 0.0988 Necleocapsid G178G 1 0.003
28810 0.0841 0.1151 0.0778 0.0755 Necleocapsid G179G 0 0
28814 0.0416 0.6488 0.0267 0.0151 Necleocapsid Q181X 0 0
28818 0.1058 0.1301 0.1263 0.1106 Necleocapsid A182V 0 0
28819 0.0100 0.0040 0.0142 0.0109 Necleocapsid A182A 0 0
28821 0.0253 0.0040 0.0042 0.0092 Necleocapsid S183F 2 0.013
28824 0.0145 0.0480 0.0184 0.0201 Necleocapsid S184F 0 0
28826 0.0515 0.0670 0.0393 0.0595 Necleocapsid R185C 2 0.007
28830 0.0208 0.0230 0.0184 0.0293 Necleocapsid S186F 1 0.003
28831 0.0127 0.0090 0.0084 0.0142 Necleocapsid S186S 1 0.005
28833 0.0244 0.0820 0.0234 0.0243 Necleocapsid S187L 6 0.029
28836 0.0244 0.1860 0.0543 0.0193 Necleocapsid S188L 0 0
28838 0.0904 0.0850 0.0819 0.0930 Necleocapsid R189C 0 0
28844 0.0759 0.1110 0.0652 0.0897 Necleocapsid R191C 0 0
28846 0.0380 0.0220 0.0209 0.0302 Necleocapsid R191R 5 0.013
28849 0.0912 0.0280 0.0125 0.0243 Necleocapsid N192N 9 0.019
28854 0.0145 0.1213 0.0050 0.0050 Necleocapsid S194L 12 0.095
28863 0.0623 0.0850 0.0384 0.0310 Necleocapsid S197L 3 0.037
28866 0.1635 0.0260 0.0100 0.0184 Necleocapsid T198I 0 0
28868 0.0235 0.0150 0.0117 0.0142 Necleocapsid P199S 4 0.009
28869 0.0226 0.0240 0.0192 0.0352 Necleocapsid P199L 5 0.049
18
28873 0.0208 0.0260 0.0192 0.0302 Necleocapsid G200G 0 0
28876 0.0785 0.0989 0.0868 0.0954 Necleocapsid S201S 0 0
28887 0.0713 0.0449 0.0083 0.0151 Necleocapsid T205I 18 0.142
28890 0.0162 0.0519 0.0184 0.0234 Necleocapsid S206F 3 0.016
28892 0.0487 0.0739 0.0417 0.0519 Necleocapsid P207S 5 0.016
28893 0.0289 0.0469 0.0342 0.0443 Necleocapsid P207L 0 0
28896 0.0190 0.0170 0.0293 0.0252 Necleocapsid A208V 1 0.003
28905 0.1175 0.1629 0.1128 0.1223 Necleocapsid A211V 2 0.013
28909 0.1312 0.1462 0.1297 0.1334 Necleocapsid G212G 3 0.008
28915 0.2657 0.2962 0.2625 0.2539 Necleocapsid G214G 1 0.003
28923 0.2636 0.3006 0.2765 0.2973 Necleocapsid A217V 0 0
28926 0.0244 0.0230 0.0159 0.0285 Necleocapsid A218V 0 0
28928 0.0065 0.0562 0.0043 0.0120 Necleocapsid L219F 1 0.003
28932 0.1229 0.1340 0.1337 0.1366 Necleocapsid A220V 4 0.037
28937 0.1573 0.1660 0.1614 0.1517 Necleocapsid L222L 1 0.003
28940 0.1301 0.1260 0.1162 0.1114 Necleocapsid L223L 1 0.007
28943 0.0632 0.0500 0.0518 0.0603 Necleocapsid L224F 0 0
28948 0.2163 0.0601 0.0485 0.0688 Necleocapsid D225D 3 0.009
28957 0.3226 0.1680 0.1705 0.1742 Necleocapsid N228N 10 0.022
28958 0.0127 0.0170 0.0276 0.0218 Necleocapsid Q229X 0 0
28961 0.0569 0.0849 0.0610 0.0536 Necleocapsid L230F 3 0.029
28969 0.1200 0.0494 0.0404 0.0430 Necleocapsid S232S 0 0
28977 0.0587 0.1069 0.0359 0.0327 Necleocapsid S235F 3 0.184
Taken together, we conclude that overall, A1+A1CF can edit in favor of the A C motif and A3A
prefers targeting the U C motif on the SARS -CoV-2 RNA sequence. Obviously, among the C -to-U
variations of SARS -CoV-2, A C-to-AU and UC-to-UU account for 38.23% and 31.83%,
respectively, showing overwhelmingly higher proportions compared to C C-to-CU (14.50%) or
GC-to-GU (15.44%) (Figure 1.5).
19
Figure 1.5. Single nucleotide variations (SNPs) of the SARS-CoV-2 genome sequences database
derived from patients.
A total of 987 SNPs with minor allele frequencies > 0.1 % were counted from a total of 227,167 SARS -
CoV-2 sequences on the UCSC genome browser
(https://genome.ucsc.edu/covid19.html).
Characteristics of target site showing the highest RNA editing efficiency by APOBECs
While each of the three APOBEC proteins exhibits a strong preference for specific
dinucleotide sequence motifs (AC, UC, or CC), the editing efficiency of these motifs varies
significantly. For instance, the editing efficiency ranges from 0.0041% to 22.15% for A1+A1CF
and from 0.0040% to 4.46% for A3A (Table 1). Interestingly, certain motif sites bearing AC, UC,
and CC sequences show no detectable editing by A1+A1CF, A3A, and A3G, respectively. This
suggests that factors beyond dinucleotide sequence motifs, such as secondary and tertiary RNA
structures, likely influence the editing efficiency at specific motif sites.
Regarding the RNA editing by A1+A1CF, it has been reported that the target C is
surrounded by AU rich content [54], and the downstream of the target C has relatively high U/G/A
contents, called a mooring sequence [55, 56]. Likewise, the top 3 editing site s by A1+A1CF are
all prefer the A C motif and also showed U/G/A rich contents in the downstream of the target C
20
(Figure 1.6A). Most AC motifs are favored by A1+A1CF, but among them, C16054 stands out
with the highest editing, suggesting that the editing eff iciency can be dependent on the sequence
context of upstream and downstream of target C (Figure 1.6A).
Figure 1.6. Overall features of the RNA around the most preferred APOBEC-edited sites on SARS-
CoV-2.
The predicted RNA secondary structures of the sequences near the top 3 highest editing C sites by
A1+A1CF (A), A3A (B), and A3G (C) (See related Table 1). The editing efficiency of each site is listed at
21
the top of each panel. In the secondary structure, t he target C sites are highlighted in red, and -1 positions
of the target C sites are highlighted in green for A, pink for U, and blue for C, respectively. In panel -A, the
proposed canonical mooring sequences for A1+A1CF (highlighted in sky blue) contain re latively high
U/A/G contents downstream of the target C.
Recently, Sharma and colleagues discovered RNA editing activities by A3A and A3G in
human transcripts, where more than half of target substrates have stem -loops in RNA secondary
structure and the tar get C is located in the loop region [44-46, 57]. Interestingly, our top 3 highest
A3A-mediated editing sites also have U C motifs in the loop with predicte d stem-loops secondary
structure (Figure 1.6B) [58], and the highest site is C16063 among the selected SARS -CoV-2
regions, which showed significantly higher editing efficiency than other U C motifs (Figure 1.6B).
In the selected SARS -CoV-2 regions, A3G -induced RNA editing was not detected or showed
overall marginal efficiency compared to the control group (Figure 1.6C). However, when only the
highest 3 sites by A3G are analyzed, they have C C (or U C) motifs in the loop with hairpin
secondary structure, consistent with previous reports [46, 57]. The highest site, C26895, has a C C
motif, and it shows distinct editing compared to the control (normalized value by control: 21.142),
albeit the editing rate is low (Figure 1.6C). Very recently, it was analyzed that loop location was
preferred over stem regions for C -to-U mutations in the RNA viruses, including rubella virus and
SARS-CoV-2 [35]. This is rationally explained by the fact that APOBEC proteins target single -
stranded DNA/RNA, but it will be very interesting to investigate the preferred RNA secondary
structures in each APOBEC for further study.
Since A1+A1CF and A3A share the same SARS -CoV-2 reporter 2 region (part of Orf1b,
15,968 - 16,167) with the highest RNA editing sites, we compared DNA and RNA editing activity
by each APOBEC protein in the 200nt region. The reporter DNA and mRNA were extracted
respectively and sent out for Sanger sequencing, and we analyzed C -to-U editing levels (% of U
at C site) of all 33 cytosine sites (Figure 1.7). No DNA editing was detectable in the reporter
22
region under any APOBEC conditions, but specific RNA editing was shown in consistent with our
SSS results by A1+A1CF (e.g., C16049, C16054, an d C16092 etc.). Similarly, the A3A -mediated
RNA editing was also confirmed at C16063 by Sanger sequencing, suggesting that A1+A1CF and
A3A more readily and/or effectively target RNA than DNA at specific sites in the cells (Figure
1.7).
Figure 1.7. Verification of C-to-U mutation as a result of direct RNA editing on the transcript of a
SARS-CoV-2 reporter segment.
23
The temperature-bar chart (top panel) shows the DNA and RNA C -to-T/U editing levels (%), which are
based on the Sanger sequencing results of the DNA (middle panel) and the cDNA (RNA) (bottom panel).
All C sites in this SARS -CoV-2 segment are marked with the virus nt sequence numbers on the top bar
chart. Three representative the RNA editing sites (C16049, C1605 4, and C16063) are indicated by red
arrows.
Comparison between SARS-CoV-2 genomic mutation data and APOBEC-induced RNA
editing
Figure 1.8. The potential effect of APOBEC-mediated editing on SARS-CoV-2 mutations and fitness.
(1) The number of mutational events (all single nucleotide variants) on SARS -CoV-2 RNA segment
5’UTR-Orf1a from the SARS -CoV-2 genome sequence data (the Nextstrain datasets from Dec.
24
2019 to Jan. 2
2n
d, 2022 downloaded from the GISAID database, htt ps://www.gisaid.org/hcov19-
variants/ and https://nextstrain.org/ncov/global). (B) The A3A -mediated C -to-U editing rate on UC
motif in the same 5’UTR -Orf1a region obtained from our cell -based editing system and the SSS
analysis. The C203, C222, and C241 (as terisks) all showed significant editing by A3A. (C) The C -
to-U mutation prevalence over time at C203, C222, and C241. The sequencing frequency is
represented by C in blue and U in yellow (referred to the Nextstrain datasets:
https://nextstrain.org/ncov/global).
Based on SARS -CoV-2 genome mutations [11, 12, 17, 32, 33, 35] and COVID -19 patients’
transcriptome analysis [14, 59], we next attempted to examine how the reported C -to-U genome
mutations were related to our APOBEC -mediated RNA editing results. We checked the publicly
available SARS-CoV-2 genome mutational events on the reporter 1 (part of 5’UTR -Orf1a, 142-
341), one of our selected SARS -CoV-2 regions (It references the Ne xtstrain datasets from Dec.
2019 to Jan. 22
nd
, 2022, [60] and https://nextstrain.org/ncov/global) (Figure 1.8). The most
frequent mutational type in this region is C > U variation (~53 % among all SNVs types), with the
prominent mutations occurring at C203, C222, and C241 (Figure 1.8A). Interestingly, these 3 sites
all feature U C motifs showing significant C -to-U editing by A3A in our sequencing results,
suggesting that A3A can generate these mutations (Figure 1.8B). Looking at the prevalence of
these 3 mutations, the two C -to-U variations at C203 and C222 have been reported since late 2020
possibly due to a surge of diagnosed patients, whereas in the case of C241, it rapidly changed to
U in the early COVID -19 pandemic and became a dominant strain (Figure 1.8C). This implies
that some C -to-U mutations at specific sites may have better fitness from an evolutionary po int of
view of SARS -CoV-2. Since the region of SARS -CoV-2 in which we check the sequencing results
is very limited, it is hard to interpret this in close comparison with the whole reported genomic
mutations/evolutions of SARS -CoV-2, however; the overall C -to-U mutational trends show that
some C-to-U editing can be driven from host APOBEC proteins.
In all representative clades of SARS -CoV-2 that have emerged since the initial outbreak
two years ago, C -to-U mutations have exhibited a significantly higher occ urrence compared to
25
other types of single nucleotide variations (Figure 1.9A). Even the recent omicron variants, which
rapidly spread from November 2021 onwards, continue to display a noticeable pattern of C -to-U
editing (Figure 1.9B). Notably, the AC -to-AU mutation at position C23525 resulted in an H655Y
mutation in the spike protein (Figure 1.9B). Previous studies have shown that the H655Y mutation
alters the pathways through which the virus enters cells, favoring the endosomal pathway over cell
surface entry pathways [61]. Therefore, additional investigations are warranted to
comprehensively understand the potential impact of APOBEC -mediated C-to-U RNA editing on
SARS-CoV-2 mutations and evolution.
26
Figure 1.9. Single nucleotide variations of SARS-CoV-2 in representative clades and characterization
of C-to-U mutations in the Omicron variant (21M).
Number of different single nucleotide variations (SNVs) in representative SARS -CoV-2 clades from Alpha
(20I) to Omicron (21M). (B) Table listing the characterization of C -to-U mutations from the preferred
editing motifs (U C, A C, and C C) by A3A, A1 (+A1CF), and A3G, respectively, in the representative
Omicron variant (21M).
SARS-CoV-2 replication and infectivity assay in cells overexpressing APOBECs
27
Figure 1.10. SARS-CoV-2 replication and virion production in cells expressing APOBECs.
(1) Overview of experiments for SARS-CoV-2 replication and viral production in the presence of
APOBECs. The Caco-2 stable cell lines were constructed to express A1+A1CF, A3A, or A3G
under a tetracycline-controlled promoter. The Caco-2-APOBEC stable cell lines were then
infected with SARS-CoV-2 (MOI = 0.05), and the viral RNA replication and progeny production
were measured at different time points. (B) Effect of each APOBEC expression on SARS-CoV-2
viral RNA replication. Measurement of relative viral RNA abundance at different time points
after viral infection of the Caco-2-APOBEC stable cell lines expressing A1+A1CF, A3A, or
A3G. The viral RNA abundance was measured using real-time quantitative PCR (qPCR) to detect
RNA levels by using specific primers to amplify three separate viral regions, the Nsp12, S, or N
coding regions. (C) Effect of each APOBEC expression on SARS-CoV-2 progeny production.
Infectious viral progeny yield harvested in the medium at 48 hrs and 72 hrs post-infection was
determined by plaque assay. In panel (B) and (C), statistical significance was calculated by
unpaired two-tailed student’s t -test with P-values represented as: P > 0.05 = not significant, *** =
P < 0.001.
To examine how the three AP OBEC proteins affect SARS -CoV-2 replication and
infectivity, we constructed Caco -2 (Human colon epithelial, expressing ACE2 receptor and thus is
28
a suitable cell for measuring SARS -CoV-2 infectivity [62]) stable cell lines in which
overexpression of each APOBEC was induced under doxycycline treatments (Figure 1.10A).
After SARS-CoV-2 infection in each cell line, we measured replication using real -time quantitative
PCR analysis and infectivity through the plaque assay at different time points (Figure 1.10A).
The abundance of viral RNAs, including Nsp12, S, and N, was dramatically increased by
A3A overexpression at 72 and 96 hours after SARS -CoV2 infection (Figure 1.10B). The increased
viral RNAs also correlated with higher viral yield, and in particular, there was no difference in
viral titer at different APOBEC conditions until 48 hours, but at 72 hours, the virus yield by A3A
overexpression showed an approximately 100 -fold increase (Figure 1.10C). These results
collectively suggest that SARS -CoV-2 may take advantage of A3A -mediated mutational forces for
their ow n fitness and/or evolution.
Based on the intriguing findings regarding the enhancement of SARS -CoV-2 replication
and progeny virus yield associated with A3A, we conducted further investigations to determine
whether these effects are reliant on the deaminas e activity of A3A. The quantification of viral
RNA levels through qPCR analysis targeting three distinct viral regions (Nsp12, S, or N coding
regions) revealed substantial increases in cell lines expressing A3A -WT at 72 and 92 hours
following SARS-CoV-2 infection. In contrast, no significant differences were observed in viral
RNA abundance between the control Caco -2 cells and the ∆A3A cells. (Figure 1.11A).
Moreover, the harvested viral progeny yield obtained from the A3A -WT expressing cell line
exhibited a significant increase compared to that from the control Caco -2 cells and the ∆A3A
cells (Figure 1.11B). Interestingly, A3A -E72A also demonstrated a modest enhancement in viral
RNA replication (Figure 1.11A) and viral progeny yield (Figure 1.11B). Although the pro -viral
29
effect of A3A -E72A is not as pronounced as that of A3A -WT, further investigations are required
to validate the deamination -independent role o f A3A.
Figure 1.11. SARS-CoV-2 replication and progeny production in different Coca-2 cell lines.
Ctrl: randomized gRNA control Caco -2 cell line; ∆A3A: Stable Caco -2 cell line with A3A knockout by
CRISPR: A3A WT: stable Caco -2 cell line expressing A3A wild -type protein; A3A E72A: stable Caco -2
cell line expressing catalytically inactive A3A mutant. (A) SARS-CoV-2 viral RNA replication in four
different Caco -2 cell lines (Ctrl, ∆A3A, A3A WT, and A3A 72A). The vira l RNA abundance was measured
using real -time quantitative PCR (qPCR) to detect RNA levels by using specific primers to amplify three
separate viral regions, the Nsp12, S, or N coding regions (see Methods). (B) SARS-CoV-2 progeny
production in the four diff erent Caco -2 cell lines (Ctrl, ∆A3A, A3A WT, and A3A 72A). Infectious viral
progeny yield harvested in the medium at 48 hrs and 72 hrs post -infection was determined by plaque assay
in Vero E6 cells (see Methods). Statistical significance was calculated by unpaired two-tailed student’s t -
test with P -values represented as: P > 0.05 = not significant, * = 0.01 < P < 0.05, and *** = P < 0.001.
To confirm the presence of A3A -induced mutations during SARS -CoV-2 infection in Caco -
2 cells, we conducted Sanger sequencing of the viral sequences surrounding C241 on the ’' UTR
30
of the viral genome before and after 72 hours of infection. We hypothesized th at if the C241U
mutation contributes to the observed increase in viral replication and progeny, it should be
detectable after 72 hours post -infection without utilizing the SSS sequencing method. The results
revealed clear and statistically significant C -to-U mutations at C241 72 and 96 hours after infection
in Caco-2 cells expressing A3A -WT (Figure 1.12A-12B). Importantly, this significant C241U
mutation coincided with the time points (72 and 96 hours) exhibiting detectable increases in viral
RNA replication and progeny production. In contrast to the editing observed at C241, our
sequencing results demonstrated that nearby sites C203 and C222, where C203U and C222U
mutations were observed, had a neutral effect on viral replication. However, a slight, but
statistically non-significant increase in C -to-U mutations was observed at C203 and C222 72 hours
after viral infection in the presence of A3A -WT overexpression (Figure 1.12B). Overall, while
APOBEC-mediated C-to-U mutations were detected at all five sites on the viral RNA using Sanger
sequencing in infected Caco -2 cells, significant C -to-U mutation was specifically identified at
C241U after 72 and 96 hours post -infection. This finding suggests that the C241U mutation may
confer a beneficial effect on viral re plication and progeny production, even in cell culture infection
assays. To enable more accurate detection and quantification of APOBEC -mediated mutations
throughout the course of infection, the SSS sequencing method will be necessary. Furthermore,
conducting multiple passages of the virus in cell culture or animal models will be essential for a
comprehensive assessment of the effects of different mutations on viral fitness.
While we have demonstrated the editing of SARS -CoV-2 RNA by A3A and A1+A1CF in
our cell culture system, analysis of expression profiles reveals that A3A and A1+A1CF, but not
A3G, are expressed in human organs and cell types susceptible to SARS -CoV-2 infection (Figure
1.13A-13B). These expression profiles suggest that A3A and A1+A1CF have the potential to edit
31
the viral RNA genome in vivo. Numerous human cell types expressing ACE2 in various organs,
including the lungs, heart, small intestine, and liver, can be targeted by SARS -CoV-2 [63, 64]. A3A
is expressed in lung epithelial cells, and notably, its expression level is significantly upregulated
in response to SARS -CoV-2 infection in patients [59, 65, 66] (Figure 1.13A). Conversely, A1 and
its known cofactors, A1CF and RBM47, are not expressed in the lungs but are present in the small
intestine and liver, which are also susceptible to SARS -CoV-2 infection [67] (Figure 1.13B).
Figure 1.12. Verification of C-to-U mutation caused by WT A3A-induced editing of SARS-CoV-2
virus in the 5’UTR region in Caca-2 cell culture infection assay.
Sanger sequencing raw chromatogram traces of the SARS -CoV-2 viral RNA around C241, C222, and C203
at different time points (24, 48, 72, and 96hr) post viral infection time in parental Caco -2 cells (Ctrl),
inactive a3A mutant (A3A -E72A), and WT A3A (A3A -WT) overexpressing Caco -2 cells. The sequencing
result at C241 are boxed and ma gnified on the right to show the appearance of T (U in RNA, red line) at
C241 position only in A3A -WT starting from 72 hours post viral infection, but not in inactive A3A (A3A -
32
E72A) and control cells without A3A. (B) Quantifying C-to-U editing levels (%) based on the Sanger
sequencing results at some C sites, C203, C222, and C241. The result reveals that C241 site shows
significant C-to-U editing in A3A -WT 72 and 96 hours post infection, while C203 and C222 sites show no
significant C-to-U editing. Statistical significance was calculated by unpaired two -tailed student’s t -test
with P -values represented as: P > 0.05 = not significant; ns, * = 0.001 < P < 0.05.
Figure 1.13. Relations between SARS-CoV-2 infection and APOBEC expression.
Data analysis of the expression level of six APOBECs in healthy people and COVID -19 infected patients
in Bronchoalveolar lavage fluid (BALF) samples (referred to the RNAseq data from reference [59]). (B)
Overall gene expressions of the three APOBECs (A1, A3A, A3G) and A1CF in the tissues that can be
infected by SARS -CoV-2. The commonness of viral detection (COD, relative SARS -CoV-2 infectivity)
score for each tissue is indicated by yellow shaded boxes (re ferred to the COD score based on reference
[16]). Each of gene expression values (NX) was retrieved from the human protein atlas
(http://www.proteinatlas.org)
In summary, our findings strongly indicate that the deaminase activity of A3A plays a
significant role in promoting viral replication and enhancing viral progeny production. This
observation is in line with our identification of the UC241 to UU241 mutation, a site highly
33
targeted by A3A -mediated editing in our study. Notably, this mutation is located within the viral
packaging signal and in close proximity to the viral replication regulation region within the ’' UTR
(Figure 1.14). SARS-CoV-2 RNA genome can be edited by three APOBEC proteins, but the viral
replication and infection are not affected by overexpression of A1+A1CF and A3G, or rather are
increased by A3A overexpression, thus resulting in many C -to-U mutations being accumulated by
host APOBEC proteins in SARS -CoV-2 genome. This positional association may provide a
rationale for the widespread prevalence of this mutation in circulating SARS -CoV-2 strains since
January 2020.
Figure 1.14. A predicted secondary structure for 5’UTR region of SARS-CoV-2 and its functional
motifs.
34
The secondary structure model and functional motifs of SARS -CoV2 5’UTR were redrawn based on
Miao et. al. [68]. The packaging signals are highlighted in brown, replication -related motifs are
highlighted in yellow, the leader t ranscription regulatory sequence (TRS -L) shown in blue, and ORF1a
(from AUG) marked in green. The A3A editing target sites U C241 and U C203 (shown in red) are located
on two separate loops within the packaging signal sequences that are spaciously close to t he replication
related motifs and TRS -L.
35
Chapter I: Discussion
While the unprecedented COVID -19 pandemic around the world, the continued emergence
of new viral variations has kept us on our toes, overshadowing our myriad efforts to tackle the
disease [69, 70]. Many recent studies have suggested that host factors, such as ADARs and
APOBECs, would drive the SARS -CoV-2 mutations with specific mutational tendencies [11, 14,
15, 26, 33, 35], however; direct evidence for this has been poorly studied . In this study, we present
the initial experimental evidence showcasing the ability of APOBEC enzymes, specifically
A1+A1CF and A3A, to selectively target distinct viral sequences within SARS -CoV-2 for RNA
editing. Importantly, the observed mutations resulting from this editing process are likely to play
a significant role in viral replication and overall fitness.
Our experiment aimed to better understand how the SARS -CoV-2 RNA sequences can be
edited by human APOBEC proteins (Figure 1.2). By overexpressing three APOBECs, including
A1+A1CF, A3A, and A3G, in HEK293T cells that also transcript the selected fragment of SARS -
CoV-2 genomic RNA, we observed sites edited by three APOBECs (Figure 1.4). We examined
the C-to-U editing on selected SARS -CoV-2 RNA segments in HEK293T cells by A1+A1CF,
A3A, and A3G in an APOBEC -RNA editing assay in a cell -based system [49] using the SSS safe
sequencing approach [50, 51]. This ap proach is crucial as analyzing the currently available
SARS-CoV-2 viral sequences, which are derived as “one-consensus-sequence from one -patient”
only provides information about the final selected consensus viral sequences that have survived
fitness selection. Therefore, our cell -based system in combination with the SSS approach
provides valuable insights into the APOBEC -mediated editing of specifi c viral RNA sequences
that cannot be obtained solely through analysis of deposited viral sequences.
36
A1 was not considered a potential candidate for editing the genome of SARS -CoV-2 due
to its target specificity and limited expression in the lungs, which a re the primary target tissues for
SARS-CoV-2 infection. [32]. Within our experimental system, we observed that the editing
efficiency of A1+A1CF on the AC motif surpasses that of A3A on the UC motif (Figure 1.3).
Notably, an analysis of SARS -CoV-2 variants retrieved from the database similarly revealed a
higher prevalence of AC motif mutations (38.3%) compared to UC motif mutations (31.2%)
(Figure 1.5). These results indicate that many of the A C-to-AU mutations in the SARS -CoV-2
genome from patients can be driven by A1+A1CF mediated RNA -editing in the small intestine
and liver with SARS -CoV-2 infection. Because the A C-to-AU mutations was not detected in A1
alone (Figure 1.7), it indicates that the A C-to-AU editing also requires a cofactor A1CF.
Considering the close similarity in RNA target and editing efficiency observed with another A1
cofactor, RBM47 [49, 53, 67, 71], it is plausible that RBM47 also facilitates AC -to-AU editing of
SARS-CoV-2 RNA by A1. Intriguingly, both A1 cofactors, A1CF and RBM47, were found to
physically interact with SARS -CoV-2 RNA in an interactome study [72], providing indirect
evidence that these RNA -binding A1 cofactors may recruit A1 to target SARS -CoV-2 RNA for
editing within infected tissues.
Although the majority o f APOBEC-mediated C-to-U mutations identified in our assay
system are likely to have negative or neutral effects on the virus and consequently be eliminated
during the viral infection cycle, thus not reflected in the consensus viral sequences stored in the
databank, certain mutations that confer advantages to the viru ’'s fitness are expected to be
selectively favored in emerging viral strains. In this manner, SARS -CoV-2 has the potential to
exploit the APOBEC mutational defense system for its own evolutionary benefit, such as
enhancing viral RNA replication, protein exp ression, evasion of host immune responses, and
37
improving receptor binding and cell entry mechanisms. These possibilities illustrate the intricate
interplay between the virus and the APOBEC system, with implications for viral evolution and
adaptation beyond the aforementioned aspects.
Although the precise mechanisms through which A3A -mediated editing promotes SARS -
CoV-2 replication and progeny production are intricate and necessitate further investigation, one
particular A3A-mediated mutation observed in cur rently circulating SARS -CoV-2 strains provides
valuable insights into how the virus exploits A3A -mediated mutations. Among the numerous A3A -
edited sites on SARS -CoV-2 RNA identified in our study, three C -to-U mutations were located
within the non -coding ’' UTR region: UC203, UC222, and UC241 (Figure 1.8A-8B). Notably, all
three mutations were detected in SARS -CoV-2 samples from patients at various time points since
early 2020. However, it is noteworthy that the mutation at UC241 only became prevalent in maj or
circulating viral strains from early 2020 onwards (Figure 1.8C), suggesting a selective advantage
conferred by the C -to-U mutation at UC241 for improved viral fitness. This observation is
intriguing since UC241 is situated within the non -coding ’' UTR r egion and does not directly
impact the coding sequence of any viral protein, as exemplified by the previously reported D614G
mutation in the spike protein associated with enhanced viral fitness [47, 73]. Therefore, the C241
to U mutation is unlikely to be linked to alterations in viral protein functions such as cell surface
receptor binding or polyprotein processing.
Considering the crucial rol e of APOBEC proteins in immune responses against DNA and
RNA viral pathogens [20, 30, 37, 74], we conducted investigations to determine the potential
impact of three APOBECs on SARS -CoV-2 replication and progeny production within our
experimental system (Figure 1.10A). Surprisingly, the expression of wild -type (WT) A3A in the
tested cells resulted i n a significant increase in viral RNA replication and viral progeny production
38
at 72 hours post -infection (Figure 1.10B-10C and Figure 1.11), although a similar increase in
replication and progeny production was also observed in the cell line expressing an inactive mutant
form of A3A (Figure 1.11). These pro -viral effects of A3A contrast with the well -established
antiviral effects of other APOBEC proteins [20, 30, 37, 74]. These findings indicate that the
deaminase activity of A3A, which leads to mutations in the viral genome, plays a critical role in
its pro -viral effects, altho ugh a minor deamination -independent enhancement effect cannot be
completely ruled out.
Previously, mutations occurring outside the protein -coding open reading frames (ORFs) of
SARS-CoV-2 were considered non -functional changes [75]. However, it is now recognized that
the ’' untranslated region ( ’'UTR) of SARS -CoV-2 plays a crucial role in regulating protein
expression, viral RNA replication, and virion packaging [68, 76, 77]. Notably, the UC241 mutation
resides within the viral packaging signal sequence, which adopts a stem -loop secondary structure
and is in close proximity to stem -loop structures involved in replication and the leader transcription
regulatory sequence (TRS -L) [68] (Figure 1.4). Consequently, the UC241 to UU241 mutation
may have implications for viral RNA packaging, virion production, as well as potentially
impacting RNA replication, sub-genomic RNA production, and the translation efficiency of
downstream proteins, thereby enhancing the overall fitness of the virus.
Another possibility to consider is the potential impact of host RNA or DNA editing
mediated by WT A3A, which could contri bute to the observed increase in viral RNA replication
and viral progeny production. A3A is known to possess activity in inducing mutations in both
cellular genomic DNA and RNA transcripts [37, 45]. Thus, it is plausible that A3A
editing/mutation activity may trigger specific cellular events that create a favorable environment
for SARS-CoV-2 replication. This intriguing possibility highlights the need for further
39
investigation into the potential interactions between A3A -mediated editing and cellular processes
that could facilitate viral replication.
In this study, we provide experimental evidence demonstrating the ability of A3A and
A1+A1CF to directly edit specific sites within the genomic RNA of SARS -CoV-2, resulting in
C-to-U mutations. Our findings reveal key factors that influence the RNA -editing efficiency of
these two APOBEC enzymes, including the presence of a UC dinucleotide motif for A3A or an
AC motif for A1, as well as certain structural features surrounding the target C site. Despite
APOBECs generally being recognized as host antiviral factors, our results demonstrate that
A3A-mediated RNA editing can actually promote viral replication and propagation of SARS -
CoV-2. These findings suggest that SARS -CoV-2 is capab le of exploiting APOBEC -mediated
mutations to enhance its fitness and evolutionary potential. Unlike random mutations arising
from RNA replication or oxidative stress, the limited number of UC/AC motifs present in the
SARS-CoV-2 genomic RNA, along with the potential predictability of viral RNA structures,
enable the identification of possible target C sites in both coding and non -coding regions that
may undergo editing by these APOBEC enzymes. Considering the new selection pressures
imposed by the use of va ccines and antiviral drugs, coupled with the ongoing circulation of
SARS-CoV-2 variants among both vaccinated and unvaccinated individuals, the ability to predict
potential viral mutations and anticipate the emergence of immune escape and drug -resistant
strains becomes crucial. The knowledge gained from our study may contribute to such predictive
efforts and aid in the identification of novel mutations associated with viral evolution in response
to immune and therapeutic interventions.
40
Chapter I: Materials and methods
The cell-based RNA editing system
The Cell-based RNA editing system is adapted from previously reported in reference [49].
Briefly, reporter vectors containing DNA corresponding to the different RNA segments of SARS -
CoV-2 (NC_045512.2) and the APOBEC (A1+A1CF, A3A, and A3G) editor vectors were
constructed. A1+A1CF is constructed as one open reading frame (ORF) with a self -cleavage
peptide T2A inserted between A1 and A1CF (A1 -T2A-A1CF), which will produce individual A1
and A1CF proteins in a 1:1 ratio [49, 78] . HEK293T cells were cultured in DMEM medium
supplemented with 10% FBS, streptomycin (100 μg/mL), and penicillin ( 100U/mL) and
maintained at 37 ℃, 5% CO 2. One day before transfection, the cells (250 μL) were seeded at an
approximate concentration of 250,000 cells/mL on an 8 -well glass chamber (CellVis). The cells
were then transfected with a mixture (25 μL) of an APOB EC editor vector (500 ng) and a SARS -
CoV-2 reporter vector (50 ng) and 1.5 μL of X -tremeGENE 9 transfection reagent (Sigma) and
incubated for 48 hrs. After harvesting the cells, RNA extraction with Trizol (Thermo Fisher) and
DNA extraction with QuickExtrac t (EpiCentre) was performed, respectively, according to the
manufacturer’s recommended instructions.
Sequencing library preparation
The extracted RNA was reverse transcribed with Accuscript High-Fidelity Reverse
Transcriptase (Agilent) to produce the single -stranded cDNA using a specific primer annealing to
the downstream sequence of SARS -CoV-2 reporter segments. The reaction was performed in a
volume of 20µl containing 1µg of total RNA, 100 µM of reverse primer, 1X Accuscript buffer, 10
41
mM dNTP, 0.1M DTT, 8U rNase Inhibitor, and 1µl of Accuscript High -Fidelity Reverse
Transcriptase (Agilent) for 1 hr at 42 ℃. The cDNA was then amplified for 2 cycles by adding a
forward primer annealing to th e junction region where the AAV intron is spliced out. In this first
2-cycle PCR amplification, the forward and reverse primers were attached to barcodes consists of
15 randomized nucleotides as the Unique Identifier (UID), plus four tri-nucleotides designating
four different experimental conditions: TGA for A1+A1CF; CAT for A3A; GTC for A3G; and
ACG for Ctrl. Phusion® High -Fidelity DNA Polymerase (NEB) was used for this PCR reaction:
98 ℃ 5 min - (98 ℃ 30 sec, 71.4 ℃ 30 sec, 72 ℃ 1 min) x2 72 ℃ 5 min. Th is PCR product
(330 bp) was then cleaned up using a spin column PCR cleanup kit (Thermo) to remove the free
first-round barcode primers. The second -round PCR was performed for 30 cycles with Illumina
flowcell adaptor primers using Phusion® High -Fidelity DNA Polymerase (NEB): 98 ℃ 5 min -
(98 ℃ 30 sec, 72 ℃ 1 min) x30 72 ℃ 5 min. All 28 (4 editors x 7 different SARS -CoV-2
substrates) of the different pooled PCR products (399 bp) were combined in equal amounts for the
final libraries. The final libraries we re subjected to a full HiSeq Lane (PE150, 370M paired reads,
Novogene).
Analysis of Safe-Sequencing-System
To distinguish a true mutation from random mutation during PCR and sequencing errors,
we followed the approach as reported in [50]. The details of our implementation of the method
was described in [51]. We wrote Python scripts to analyze the sequencing data. We only considered
those se quencing reads such that (1) at least 85% of the bases matched the reference sequence, and
(2) the quality scores for all the UID bases were 30 or greater (probability of a sequencing error <
0.001). We clustered reads with the same UID and barcode into UI D families. We only considered
42
those families with at least three reads with the same UID and barcode. At each nucleotide site,
the mutation frequency is calculated by dividing a numerator by a denominator. The denominator
is the number of UID families tha t, at this particular nucleotide site, have at least three reads with
quality scores of at least 20 (probability of a sequencing error < 0.01; because of this quality
restriction, the denominator may be different at different sites). The numerator is the n umber of
UID families that, at this particular site, (1) have at least three reads with quality scores of at least
20, and (2) 95% of these reads have the same base, which is different than the reference. The
probability that three out of three reads will all have the same sequencing error at a site is then 10
-
7
(=(0.01
3
)/(3
2
)).
Caco-2 Stable cell line expressing APOBEC proteins
We used lentiviral transfection to construct stable Caco -2 cell lines expressing A3A, A3G,
and A1+A1CF to study the effect of APO BEC on SARS -CoV-2 replication because Caco -2
expresses the virus receptor ACE2 [79]. Lentivirus was produced by lentiviral vector system
pLVX-TetOne-Puro (Clon-tech) in HEK293T cells. The cells (about 2 x 10
6
cells) were seeded in
a 100 mm plate one day before transfection. The cells were then co -transfected with lentiviral
packaging vectors, 1.0 μg of pdR8.91 (Gag -Pol-Tat- Rev, Addgene), 0.5 μg of pMD2.G (VSV -G,
Addgene), and 1.7 μg of the pLVX -TetOne-Puro vector encoding the APOBEC proteins, using 20
μL of X -tremeGENE 9 transfection reagent (Sigma). Lent ivirus-containing supernatant from
infected HEK293T cells was collected after 70 hrs and filtered through a 0.45 μm PVDF filter
(Millipore). Virions were precipitated with NaCl (0.3 M final) and PEG -6000 (8.5% final) at 4°C
for 6 hrs and centrifugated at 4 000 rpm at 4°C for 30 min. The pelleted virions were resuspended
in 100 μL of MEM medium. Caco -2 cells (human colon epithelial cell line, ATCC) were cultured
43
in MEM medium supplemented with 10% FBS, streptomycin (100 μg/mL), and penicillin
(100U/mL), and ma intained at 37℃, 5% CO 2. The Caco -2 stable cell lines were generated by
transducing with the lentivirus for 24 hrs and selected with 5 µg/ml of puromycin. The expression
of A1+A1CF, A3A, or A3G was induced by adding 1 μg/mL doxycycline for 24 - 96 hrs.
Expression of these APOBEC proteins was verified by Western blot.
The ∆A3A Caco-2 cell line was created by puromycin selection after targeting N -terminus
of genomic A3A exon -2 region with CRISPR -Cas9 methods (guide RNA sequence:
UGGAAGCCAGCCCAGCAUCC) and inse rting the SV40 -promoter-Puromycin resistant gene
(938 bp) through homology directed repair (HDR) system (left homology arm: 703 bp and right
homology arm: 561 bp). A randomized guide RNA was used to generate a Caco -2 cell line as a
negative control.
SARS-CoV-2 virus replication and progeny production
SARS-CoV-2 propagation, infection, and viral titration were performed as previously
described [80]. All SARS -CoV-2 related experiments were performed in the biosafety level 3
(BSL-3) facility (USC). For SARS -CoV-2 propagation, Vero E6 -hACE2 cells were used. The cells
were plated at 1.5 x 10
6
cells in a T25 flask for 12 hr and infecte d with SARS -CoV-2 (isolate USA -
WA1/2020) at MOI 0.005 in an FBS -free DMEM medium. Virus -containing supernatant was
collected when virus -induced cytopathic effect (CPE) reached approximately 80%. To assess the
effect of APOBEC (A1+A1CF, A3A, and A3G) on SA RS-CoV-2 RNA replication, the Caco -2-
APOBEC stable cells (about 2 x 10
5
cells) were plated in 12 -well plates. After 15 hours, cells were
treated or untreated with Doxycycline for 24 hours before infection. Before viral infection, the
cells were washed wit h an FBS -free medium once. Viral infection was incubated on a rocker for
44
45 min at 37 °C. The cells were washed and incubated in a medium containing 10% FBS with or
without Doxycycline. Total cellular RNA was extracted from the infected cells at 24, 48, 72 , 96
hrs. Real-time quantitative PCR (qPCR) was used to quantify the viral RNA abundance level at
the four different time points using viral RNA -specific primers to detect the Nsp12, S, and N
regions. The qPCR of the internal actin RNA abundance level is u sed as a control by using actin -
specific primers. To assess the effect of APOBEC (A1+A1CF, A3A, and A3G) on SARS -CoV-2
viral progeny production, plaque assay was used on Vero E6 -hACD2 cells that has defective innate
immunity and is highly sensitive to vira l infection, allowing sensitive quantification of viral
progeny produced from the Caco -2 cell lines. Vero E6 -hACE2 cells were seeded in 12 -well plates.
Once cell reached confluence, cells were infected with serially diluted SARS -CoV-2 virions
collected from the infected Caco -2-APOBEC stable cells that express A1+A1CF, A3A, or A3G at
48 hrs and 72 hrs after viral infection. The medium was removed after infection, and overlay
medium containing FBS -free 1 x DMEM and 1% low -melting-point agarose was added. At 4 8 and
72 h post -infection, cells were fixed with 4% paraformaldehyde (PFA) overnight and stained with
0.2% crystal violet. Plaques were counted on a lightbox.
Quantitative real-time PCR
Total RNA was extracted from the SARS -CoV-2 infected Caco -2 cells usi ng Trizol
(Thermo Fisher). The extracted RNA was then reverse transcribed with the reverse primers specific
to Nsp12, S, and N coding regions of SARS -CoV-2, and b -Actin as an internal control, respectively,
using the high -fidelity reverse transcriptase Pro toscript II (NEB). The reaction was performed in
a volume of 20 µl containing 1µg of total RNA, 100 µM reverse primer, 1X Protoscript II buffer,
10 mM dNTP, 0.1M DTT, 8U rNase Inhibitor (40U/µl), and 200U ProtoScript RT for 1 hr at 42 ℃.
45
Quantitative real-time PCR was then performed with SYBR Green (PowerUp ™ SYBR™ Green
Master Mix, Thermo Fisher Scientific) in a volume of 10 µl/well containing 1µl of reverse
transcribed cDNA product from above, 0.25 µl of forward and reverse primers (10 µM), and 5 µl
of Pow erUp™ SYBR™ Green Master Mix (2X) using a CFX Connected Real -Time PCR machine
(Bio-Rad). The indicated gene ( Nsp12, S, N) expression levels were calculated by the 2 -ΔΔCt
method and normalized by b -Actin expression level.
Western blot and antibodies
For Western blot analysis, cells were lysed in 1x RIPA buffer (Sigma). Western blot analysis
were performed from three independent transfections using FLAG -tagged APOBECs and HA -
tagged A1CF. α-Tubulin: internal loading control. The lysates were then subjected to Western blot
with anti-FLAG M2 mAb (F3165, Sigma, 1:3,000), anti -HA mAb (HA.C5, Abcam, 1:3,000), and
anti- α-tubulin mAb from mouse (GT114, GeneTex, 1:5,000) as primary antibodies. Cy3 -labelled
goat-anti-mouse mAb (PA43009, GE Healthcare, 1:3,000) was subsequently used as a secondary
antibody. Cy3 signals were detected and visualized using Typhoon RGB Biomolecular Imager (GE
Healthcare).
Analysis tools
The sequence logos were created by WebLogo 3 online tool with probability units
(http://weblogo.threeplusone.com, [81]). The predicted RNA secondary structures were calculated
by RNAstructure [58, 82] for the local RNA region. The number of mutational events on SARS -
CoV-2 RNA segment 5’UTR -Orf1a was counted from the SARS -CoV-2 genome sequence data
46
(the Nextstrain datasets from Dec. 2019 to Jan. 2
2n
d, 2022 downloaded from the GISAID database,
https://www.gisaid.org/hcov19-variants/ and https://nextstrain.org/ncov/global) using parameters
set with nucleotide (x -axis) and events (y -axis). The sequencing frequency chart was indicated by
designating the corresponding nucleotide as colored by Genotype (referred to the Nextstrain
datasets: https://nextstrain.org/ncov/global, [60].
47
Chapter II: Decoding APOBEC3’s RNA Editing Enzyme-Substrate Dynamics
Adapted from a publication: Unraveling the Enzyme -Substrate Properties of a Specific
APOBEC3A-mediated RNA Editing . Kyumin Kim, Alan B. Shi, Kori Kelley, Xiaojiang S. Chen
(2023) Journal of Molecular Biology
Chapter II: Introduction
RNA editing through deamination is a process that introduces single -nucleotide changes in
transcripts, playing a crucial role in various biological functio ns by diversifying and regulating the
transcriptome [83-85]. In humans, RNA deaminases can be classified into two main categories:
ADARs (adenosine deaminases acting on RNA) that modify adenosine to inosine (A -to-I) within
double-stranded RNA, and the APOBEC protein family (ap olipoprotein-B (ApoB) mRNA editing
enzyme, catalytic polypeptide -like) that converts cytosine to uracil (C -to-U) in single -stranded
RNA [42, 86]. The A -to-I (referred to as A -to-G) RNA editing mechanism mediated by ADARs
has been extensively investigated, and its physiological functions, such as modifying non -coding
repetitive regions and glutamate receptor GluR -B mRNA, have been relatively well -characterized
[86-88]. The discovery of a truncated form of the APOB protein resulting from catalytic
deamination (C-to-U editing) of ApoB mRNA by a novel enzyme shed light on the existence of
C-to-U RNA editing [39-41]. The first identified C -to-U RNA editing enzyme was subsequently
named APOBEC1 (A1), also known as “ApoB mRNA editing enzyme, catalytic polypeptide ”,
serving a crucial role in lipid metabolism [42]. Subsequently, the remaining members of the
APOBEC protein family were named based on this founding member. The functio nal significance
of APOBEC-mediated C-to-U editing has been extensively investigated, and its involvement in
various biological processes is increasingly recognized.
48
The human APOBEC protein family consists of a total of 11 members, each serving distinct
biological roles and participating in various physiological processes by catalyzing the deamination
of cytosine to uracil in single -stranded DNA and/or RNA [27, 37, 38]. Initially, apart from A1,
other APOBEC proteins were p rimarily considered to possess single -stranded DNA editing
activity without involvement in RNA editing [43]. However, in 2015, two separate studies reported
distinct RNA editing activities mediated by APOBEC3A (A3A). In the study by Niavarani et al.,
mutations were detected at positions G1303A and G1586A in the cDNA of the WT1 transcript
(Wilm’' Tumour 1) in non -progenitor blood mononuclear cells, where A3A expression levels were
elevated [89]. The guanine-to-adenine modifications on WT1 mRNA were confirmed by
demonstrating that knockdown or overexpression of A3A led to inhibition or enhancement of the
WT1 G1303A mutation, respectively [89]. Another group examined hundreds of C -to-U RNA
editing candidates in monocytes and macrophages under conditions of high A3A expression using
next-generation RNA -seq technology and validated the editing on SDHB mRNA (C136U),
resulting in an early stop codon (R46X) [45]. Consistent results were obtained when A3A was
transiently overexpressed in HEK293T cells, where thousands of C -to-U RNA editing events were
observed [44]. Subsequently, the same research group reported novel RNA editing functions
mediated by APOBEC3G (A3G) in HEK293T cells [46] and demonstrated the widespread A3G -
induced RNA editing in response to mitochondrial hypoxic stress in natural killer cells [90],
thereby indicating the potential for RNA editing by other APOBEC enzymes.
A3A has been documented to restrict foreign DNA from pathogens by catalyzing C -to-U
deamination [91-94]. Aberrant overexpression or dysfunction of A3A has also been implicated in
genomic DNA mutations and cancer [95, 96]. Furthermore, analysis of mesoscale genomic features
has revealed that A3A -induced mutations tend to occur in specific “ssDNA hairpin” structures,
49
which can serve as hotspots for the development of oncogenic mutations [97]. With the advent of
next-generation sequencing technology, identification of A3A -mediated RNA editing sites has
become possible, and these sites display a preference for stem -loop structures similar to those
observed in A3A -mediated ssDNA deamination [44, 45, 57, 97]. However, despite the general
understanding of A3 ’s affinity for stem -loop RNA substrates, the specific RNA sequences that
constitute preferred substrates for A3A have not been fully characterized.
Given the well -established link between elevated A3A expression and INF -γ-related
immune responses [20, 98], as well as the potential involvement of A3A in cancer development
[95, 96, 99, 100], and its impact on the evolution and infection of RNA viruses [101-103], it is
crucial to investigate the relationsh ip between RNA substrate characteristics and A3A editing
efficiency. In this study, our objective was to examine the enzyme -substrate features underlying
A3A-induced RNA editing using a recently developed sensitive cell -based fluorescence assay. By
analyzing the implications of A3A -mediated RNA editing in human cells, we aimed to shed light
on the functional aspects of A3 ’s role in RNA modification.
50
Chapter II: Results
Confirmation of A3A RNA editing with a sensitive cell-based fluorescence assay
Figure 2.1. A sensitive cell-based fluorescence analysis and sequencing method confirms A3A-
mediated specific RNA editing.
(A) Design of specific RNA editing by APOBEC proteins using a sensitive cell -based fluorescence assay.
When the target RNA is generated with an early stop codon due to C -to-U deamination by the editor
(A3A), the reporter eGFP can be localized throughout the c ell, including the nucleus (right panel). (B)
Confocal microscopy images show eGFP and mCherry signals by co -transfection of the editor vector
(A3A in this case) and each reporter vector. (Scale bar: 10µm). (C) Quantification of eGFP
51
nuclear/cytoplasmic fluorescence ratio change indicates A3A -mediated target C -to-U editing levels.
Statistical significance was calculated by unpaired two -tailed student’s t -test with P -values represented as:
P > 0.05 = not signifcant; ns, ** P < 0.01, *** P < 0.001, and **** P < 0.0001. (D) Sanger sequencing
data of the nucleotides % ratio at the target C site analyzed from DNA and RNA, respectively. The
nucleotide percentage was quantified by the area of each nucleotide in the sequencing chromatogram.
We utilized a cell -based fluorescence assay [49] to investigate A3A RNA editing. This
assay involved the expression of an editor construct containing A3A and mCherry, as well a s a
reporter construct containing the target RNA, in HEK293T cells (Figure 2.1A). The reporter
construct consisted of eGFP RNA followed by a 48 -nucleotide target RNA substrate and an RNA
encoding a strong nuclear export signal (NES, MAPKK), all within a si ngle open reading frame.
As a result of the C -terminal NES, the reporter GFP fluorescence primarily localized to the
cytoplasm (Figure 2.1A). However, when APOBEC deaminated the target RNA from C -to-U, the
reporter mRNA produced an early stop codon before the NES, leading to a shift of the reporter
GFP signal to the nucleus [49]. To specifically identify spliced mature mRNA sequences, an AAV
intron was inserted within eGFP, and an exon -exon junction prim er was employed for cDNA
preparation [102] (Figure 2.1A). This allowed the amplification of RNA sequences only, avoiding
amplification of DNA sequences during analysis. For the initial editing test, we selected six target
RNA candidate s previously reported by Sharma et al. [44, 45]. namely ASCC2, EVI2B, ICAM3,
PP A2, SDHB, and SETX. These candidates exhibited relatively high levels of C -to-U editing in
RNA-seq data, and their sequences were capable of generating a stop codon following C -to-U
RNA editing.
The findings revealed significant A3A editing in all but one ( SETX) of the RNA targets, as
evidenced by the ratio of averaged cytosolic and nuclear fluorescence signals (Figure 2.1B-1C).
Notably, EVI2B and SDHB exhibited substantially higher levels of A3A editing compared to the
other three targets ( ASCC2, ICAM3, and PP A2). To further confirm editing of the six target RNA
52
substrates, reporter DNA and mRNA were extracted separate ly from the cell lysates for Sanger
sequencing. The sequencing results confirmed that EVI2B and SDHB had the highest levels of
A3A editing, with RNA editing rates of 41.4% and 37.6%, respectively (Figure 2.1D and Figure
2.2A). ASCC2, ICAM3, and PP A2 displayed moderate RNA editing activity, with RNA editing
rates of 16.3%, 12.8%, and 11.5%, respectively. In contrast, SETX exhibited a marginal editing
rate of 3.0%, which was not statistically significant compared to the control (1.7%) (Figure 2.1D
and Figure 2.2A). The A3A editing levels estimated using the fluorescent reporter assay correlated
well with the results obtained from the in vitro RNA sequencing approach, with a correlation
coefficient R
2
= 0.9000, indicating that this fluorescence reporter a ssay is suitable for evaluating
A3A-mediated RNA editing activity (Figure 2.2B).
Figure 2.2. Analysis of A3A-mediated RNA editing substrates.
(A) Representative RNA (cDNA) Sanger sequencing chromatogram showing the extent of C -to-U editing
at a specific target C site (indicated by red arrow) under control (Ctrl) and A3A conditions. (B) Correlation
between the subcellular localization of eGFP report er fluorescence (x -axis) and the levels of C -to-U RNA
editing measured by Sanger sequencing (y -axis). Positive correlation demonstrates the effectiveness of the
eGFP reporter assay in evaluating RNA editing levels.
53
To assess the extent of A3A editing on e ndogenous DNA and RNA within cells, we focused
on the SDHB gene, which is known to be highly expressed in HEK293T cells (Figure 2.3A).
Genomic SDHB DNA and mRNA were extracted from HEK293T cells for Sanger sequencing.
Surprisingly, no detectable editing was observed at the SDHB genomic DNA target site, even when
using a uracil glycosylase inhibitor (UGI) to inhibit the repair of DNA misma tches caused by
APOBEC-induced C-to-U DNA mutations [104] (Figure 2.3B). However, we observed substantial
levels of editing at the SDHB mRNA target site, both in the absence and presence of UGI treatment
(% of U at C site: 34.0% and 32.9%, respectively) (Figure 2.3B). These findings suggest that A3A
efficiently edits the SDHB m RNA rather than its DNA at this specific site within cells. This
discrepancy may be attributed to the predominantly double -stranded DNA state of genomic DNA,
making it less susceptible to A3A -mediated editing compared to mRNA.
Figure 2.3. Comparison of A3A editing events on the endogenous SDHB mRNA transcript and its
genomic DNA (gDNA).
54
(A) A cartoon representation of endogenous SDHB gDNA and mature mRNA is shown, highlighting
features of exons and introns, and depicting the primer sets (blue arrows) flanking the target sites (red
arrow). Below shows the sequence containing a total of 20 C s ites surrounding the target C. (B) Heatmap
representing the editing levels by A3A on gDNA and mRNA. The target C is the 10th C and editing levels
of surrounding Cs were also analyzed. RNA editing is observed only at the specific target C site by A3A.
A3A targeted RNA substrates are not sufficiently edited by other APOBECs
The human APOBEC protein family comprises 11 members sharing a conserved cytidine
deaminase motif, exhibiting significant sequence homology, and featuring a core structure
consisting of a five -beta-stranded sheet surrounded by six helices [38, 105]. Given these shared
characteristics, it raised the possibility that other APOBEC members might possess the ability to
edit the specific RNA target. To investigate this, we examined the RNA editing ca pacities of all 11
APOBEC proteins using a reporter system based on EVI2B in our cell -based assay, considering
that EVI2B demonstrated the highest levels of editing by A3A (Figure 2.1).
55
Figure 2.4. C-to-U RNA Editing on EVI2B RNA substrate by all APOBEC family members.
(A) Quantification of EVI2B RNA editing by all other APOBEC proteins using the eGFP
nuclear/cytosolic ratio change assay. Data are represented as mean values ± SD indicated by dots (n=30
cells). Statistical significance was calculated between control (Ctrl) and each APOBEC protein by
unpaired two-tailed student’s t -test with P -values rep resented as: P > 0.05 = not significant; ns, and ****
P < 0.0001. (B) Quantification of the nucleotide distribution at EVI2B target C site analyzed by RNA
(cDNA) sequencing.
The results demonstrated that only A3A displayed a significant shift in fluoresce nce
intensity from the cytoplasm to the nucleus, with an average nuclear -to-cytosolic fluorescence ratio
of 0.661 (0: cytosol - 1: whole cell), whereas all other APOBEC proteins showed no discernible
differences compared to the control group (Figure 2.4A). This observation was further supported
by Sanger sequencing analysis of the EVI2B reporter transcripts, which revealed that, apart from
A3A (exhibiting a C -to-U editing rate of 43.4%), none of the other APOBEC members exhibited
56
editing activity at the EVI 2B site (Figure 2.4B). Western blot analysis confirmed the expression
of 10 APOBEC proteins, with the exception of APOBEC4 [106].
A3A performs most of its RNA editing in the cytoplasm
It has been reported that overexpressed A3A localizes to both the nucleus and cytoplasm
[107, 108], while endogenous A3A is predominantly found in the cytoplasm of monocytic cells
[109]. In contrast, A3B is mainly localized in the nucleus, primarily attributed to its unique N -
terminal domain, specifically the first 30 amino acids of A3B or loop 5/ α-helix 3 of A3B [110].
Based on these distinct subcellular localizations, we postulated that the localization patterns might
impact the RNA editing activity mediated by A3A and that A3B, despite sharing approximately
92% identical amino acids with A3A in its C -terminal A3B-CD2 domain [27, 38], might not exhibit
RNA editing activity when localized in the nucleus. To investigate this hypothesis, we evaluated
RNA editing activity with constructs that fused a strong nuclear export signal (NES) or nuclear
localization signal (NLS) at the C -terminus of either A3A or A3B, respectively.
We observed that A3A -NES, primarily localized in the cytoplasm, exhibited RNA editing
activity at the EVI2B site, comparable to A3A -WT, as evidenced by both the reporter assay
(average ratios, WT : NES = 0.622 : 0.619) and cDNA sequencing results (C -to-U editing levels,
WT : NES = 42.2% : 40.1%) (Figure 2.5A-5C). In contrast, when A3A -NLS was predominantly
localized in the nucleus, its RNA editing activity was significantly diminished, with an average
ratio of 0.332 (P value = 0.011 vs. Ctrl) in the reporter assay (Figure 2.5B) and C-to-U RNA
editing rates of 6.6 % in the sequencing analysis (Figure 2.5C). These findings suggest that A3A
demonstrates more pronounced RNA editing activity within the cytoplasmic compartment (Figure
2.5A-5C). This observation could elucidate why the engineered A3A-based programmable
57
cytidine-specific RNA editor, incorporating a strong 4x NLS, exhibits lower editing levels and a
reduced number of off -target editing sites [111].
Figure 2.5. A3A-mediated RNA editing primarily occurs in the cytoplasm.
(A) Representative fluorescence images of HEK293T cells co -transfected with the editor vectors (A3A -
WT, A3A -NES, A3A -NLS, A3B -WT, A3B -NES, and A3B -NLS) labeled with mCherry and the EVI2B
reporter vector labeled with eGFP. (scale bar: 10 µm). (B) Quantification of the subcellular localization of
the eGFP reporter (Nuclear/Cytosolic ratio). Statistical significance was indicated without a scale for
comparison with Ctrl, and comparison with each wild type was indicated with scales. (C) Nucleotide
sequencing distribution on the target EVI2B C site to analyze C -to-U editing levels. (D) Quantification of
subcellular localization of each mCherry -labeled editor. (E) Western blot showing ectopic transient
expression of A3A/B in HEK293T cells. Relative expression intensities were quantified as mean values ±
SD in duplicate independent experiments (n = 2) and normalized to Tubulin endogenous loading control.
58
In contrast to our hypothesis regarding A3B, we did not observe any discernible RNA
editing activity when A3B was expressed either in the cytoplasm or in a more pronounced nuclear
localization (Figure 2.5A-5D). This suggests that the approximately 8% diffe rence in amino acid
residues between A3A and the catalytic domain of A3B -CD2 may play a crucial role in conferring
specific RNA editing activity. Alternatively, the lower expression levels of A3B (approximately
three times lower) compared to A3A in HEK293T cells may result in undetectable RNA editing
activity (Figure 2.5E).
A3B chimeras containing an N-terminal A3A region gain RNA editing activity
We were prompted by a fascinating question about the potential importance of specific
sequences/regions unique to A3A in determining A3A -mediated RNA editing. To investigate this,
we created A3B chimeras in which the C -terminal A3B-CD2 domain contains different A3A
fragments in order to see if any RNA editing activity can be transferred from A3A to the A3B
chimeras. Because A3B -CD2 is highly homologous to A3A sequence, only the two N -terminal
regions (Region-1 and Region -2) with major sequence differences between the two proteins are
switched (Figure 2.6A). Three A3B chimeras were generated, Region -1, Region-2, and Region-
1&2. In chimera Region -1 (R1), α1/loop-1 residues of A3B -CD2 are replaced with those of A3A
(A3B residues 192 -214 to A3A residues 16 -30); In chimera Region -2 (R2), the β-2 residues of
A3B-CD2 are changed to those of A3A (A3B residues 227 -242 to A3 A residues 45 -58); and
chimera Region-1&2 (R1&2) combines both R1 and R2 chimeras (Figure 2.6A and Figure 2.7A).
We then tested the RNA editing activity of theses A3B chimeras on the efficient RNA substrates
for A3A, EVI2B and SDHB, respectively.
59
Interestingly, we found that R -1 showed a gain -of-function for RNA editing on both
substrates, but R -2 had no significant RNA editing (Figure 2.6B-6D). However, R -1&2 showed
markedly improved RNA editing levels compared to the R -1 in both the reporter assay (Figure
2.6C)) and cDNA (RNA) sequencing results (Figure 2.6D), suggesting R1 exerts a critical
influence on its RNA deamination activity and that R2 has synergistic effects to enhance R1’s RNA
editing. Unlike the results for A3A -WT shown in Fig. 1, R -1 and R -1&2 chimeras deaminated
SDHB mRNA more strongly than EIV2B (Figure 2.6B-6D and Figure 2.7B), indicating that the
double-domain A3B R1&2 chimera may have a slightly different module in targeting specific RNA
substrates compared to the single -domain A3A.
60
Figure 2.6. Specific RNA editing activity of A3B chimeras containing short A3A N-terminal peptide
sequences.
(A) Schematic descriptions of A3B chimera editor vectors (right panel, A3B -A3A-R1/-R2/-R1&2) and
RNA secondary structures of the reporter substrates (left panel, EVI2B and SDHB, marked red as targ et C
site). (B) Cell fluorescence images showing the subcellular localization of eGFP indicating RNA editing
activity of the EVI2B and SDHB reporters and each editor’s localization labeled with mCherry. Nuclear
periphery is stained with Hoechst 33342 to di stinguish nuclear and cytoplasmic regions (scale bar: 10
µm). (C) Quantification of the subcellular localization of the eGFP reporter (Nuclear/Cytosolic ratio).
61
Data are represented as mean values ± SD indicated by dots (n=30 cells). (D) Nucleotide sequencing
distributions on the target C site on EVI2B and SDHB to measure C -to-U editing levels. (E)
Quantification of subcellular localization of each mCherry -labeled editor (A3B -WT, A3B -A3A-R1/-R2/-
R1&2). Data are represented as mean values ± SD indicated by d ots (n=30 cells). Statistical significances
in (C) and (E) were calculated by unpaired two -tailed student’s t -test with P -values represented as: P >
0.05 = not significant; ns, * 0.01 < P < 0.05, ** P < 0.01, *** P < 0.001, and **** P < 0.0001.
It is wort h noting that our previous studies have shown that residues around their α1/loop-
1 of A3A and A3B -CD2 are important in regulating overall DNA deaminase activity and mC
selectivity [112, 113]. In this study, we found that the α1/loop-1 residues of A3A also played a
crucial role in performing the specific A3A -mediated RNA editing activity. When the R1 region
was further narrowed down, the GIGRHK (A3A residues 24 -29) of A3A, which is the loop 1 region,
were identified as the most critical residues (Figure 2.6C).
When an NES peptide was fused to the C -terminus of these A3B chimera constructs, we
found that their RNA editing levels increased by about 5 -20%, possibly a result of the shift of
localization from nucleus to the cytoplasmic environment (Figure 2.7C). These results indicate
that the gain of RNA editing activity by these A3B chimeras is not due to subcellular localization,
nor due to protein expression levels as they all showed comparable expression level in the cells
(Figure 2.6D). Interestingly, when the loop1 region of A3A is replaced with that of A3B -CD2
(replacing A3A -GIGRHK with A3B -DPLVLRRRQ), this A3A chimera behaved similarly to the
inactive A3A-E72A mutant in showing no detectible specific RNA editi ng activity (Figure 2.8),
further confirming the critical role of loop1 region of A3A for RNA editing activity.
Unexpectedly, the subcellular localization of R -1&2 (mCherry signals) was significantly
shifted from nucleus to the cytoplasm, exhibiting more p ronounced changes in subcellular
localization than R -1 and R -2 alone. (Figure 2.6B and 2.6E). This finding suggests that in addition
62
to the nuclear localization signals present in A3B -CD1 [110], A3B subcellular localization can be
regulated by the CD2 R1 and R2 regions.
Figure 2.7. Specific RNA editing activity by A3B chimera constructs with mutations to convert A3B-
CD2 domain to A3A-like sequences.
63
(A) Sequence alignment of the N -terminus region between A3A and A3B -CD2. Regions 1 and 2 represent
the most divergent areas, with diffe ring amino acid residues shaded in yellow. Structural features are
indicated above the sequence. (B) Evaluation of RNA editing for a reporter construct containing both
EVI2B and SDHB. The left diagram shows the features of the reporter vector, and the heat map on the right
represents the RNA editing levels of EVI2B and SDHB for each editor condition. (C) Quantification of
RNA editing activities for EVI2B on the left panel and SDHB on the right panel with A3B and each A3B
chimera. (D) Western blot analysis me asuring the expression of A3B and each A3B chimera. FLAG
indicates editor expressions, and tubulin represents the housekeeping control. Protein size (kDa) is
indicated on the right side.
Figure 2.8. Lack of RNA editing activity by catalytically inactive A3A-E72A and A3A chimera
replacing its loop1 region from that of A3B-CD2
(A) Representative fluorescent images of cells co -expressing the EVI2B reporter (eGFP) and either A3A -
E72A (inactive) or A3A -A3B-loop1 mutant (mCherry). (B) Quantification of eGFP reporter signal
representing RNA editing levels. (C) RNA (cDNA) sequencing results show no RNA editing activity at the
EVI2B target site C for both A3A -E72A (inactive) and chimeric constructs replacing A3A with A3B l oop1
sequence.
64
Structural and sequence context of RNA substrates for optimal A3A editing
Several previous studies reported that the A3A targets specific RNA substrates with certain
stem-loop secondary structures and sequence specificities [44, 45, 57, 102, 114]. However, due to
limitations in the design of RNA/transcripts substrates, it was difficult to thoroughly understand
the detailed characteristics of these targets for the va rying editing efficiency by A3A. Even for
ssDNA deamination by A3A, a DNA stem -loop secondary structure was shown to be the preferred
substrate, which was hypothesized as the reason for hotspot mutations in cancer mutations driven
by APOBEC mutational sign atures [97]. To better understand the properties of RNA substra tes for
A3A editing, we scrutinized the effect of different parts of an RNA stem -loop structure on A3A
editing using the cell -based fluorescence reporter assay.
We compared the editing activity on the EVI2B RNA after adjusting the lengths of its loop
(LP) and stem (ST). The original EVI2B RNA target has a predicted loop length of 5 nt (LP5) with
the sequence CAU CA, where the target C is underlined. We found that the change to LP4 (LP4,
CAUC) from LP5 showed a dramatic increase of RNA editing activity by A3A (Figure 2.9A).
Interestingly, the editing activity of the original EIV2B LP5 was at intermediate levels between
LP4 and LP3, possibly due to C -A+ wobble base pairing in RNA stem region ( CAUCA, possible
C-A+ wobble base pairing underlined) [115] (Figure 2.9A).
To investigate the effect of stem length on A3A -mediated RNA editing, we modified the
EVI2B target RNA stem length from the original 4 bp (ST4) to a range between 3 and 7 bp (ST3 -
ST7). Our results showed that when the stem was shortened from ST4 to ST3, RN A editing activity
was sharply decreased to only about 30% of the ST4 level, and RNA editing levels increased as
the stem lengthened to ST7. These findings suggest that the longer stems may form a more stable
65
hairpin structure, and a more stable hairpin ma y present itself to A3A more chance for editing
(Figure 2.9A). Therefore, we conclude that an RNA hairpin structure with a longer stem up to ST7,
a loop size of 4 nt (SP4), and the target C site at the 3’ end of the LP4 show higher editing efficiency
by A3 A. These results are consistent with the previous report [57].
66
Figure 2.9. Structural and sequence features for optimal A3A-mediated RNA editing events.
(A) Relative A3A -mediated RNA editing activity with the C -to-U editing level of EIV2B set to 100%
according to change in loop length and stem length, respectively. The predicted Gibbs free energy was
calculated when forming the secondary st ructure of the 48nt substrates and indicated in kcal/mol value.
(B) Relative editing levels shown as heatmaps according to sequence modifications around target C ( -4, -
3, -2, -1, and +1) of EVI2B substrate. Predicted RNA secondary structures are indicated on the left, lines
67
indicate RNA double -helix and dotted line indicates possible C -A+ wobble base pairing. (C)
Quantification of RNA editing level by A3A due to sequence variations surrounding target C region of
EVI2B. The editing levels were represented by the ratio of U distribution at the C site quantified by
Sanger sequencing results. (D) Measurement of RNA editing in response to sequence variations around
target C in the modified tetraloop and stem 5 of EVI2B and (E) by combining the sequences of the to p
two preferred targets at each position. The relative editing efficiencies were represented in bar graphs, and
the quantified editing levels of target C were indicated by heat maps. The sequence changes were
represented by blue shading, and target C was d epicted in red shading. (F) Groups and cartoon depictions
of preferred sequences at each position surrounding target C.
We further examine the sequence context around the target C site on the loop for A3A
editing. The loop sequence of the original EVI2B RNA target was modified with different
nucleotides. Specifically, if the target C position is set to 0 in the LP5 sequence, we investigated
the effect of the sequences at positions -3, -2, -1, and +1 sites (Figure 2.9B-9C). Our results
showed that the pre sence of U at the -1 position is the most crucial sequence context for A3A
targeting (Figure 2.9B-9C), which is not surprising as A3A is known to target T C motif (target C
underlined) in ssDNA [93, 116]. A recent study reported that A3A prefers DNA substrate YT CA
motif with the -2 position Y being pyrimidine, and t hus the -2 position favors C/T [95]. In our RNA
editing system, however, the -2 position shows a strong preference for A at -2 position, with the
order of preference of A > U > G > C (Figure 2.9B-9C). Although it showed that -3 position prefers
a pyrimidine (Y) and the +1 position prefers a purine (R) (Figure 2.9B), certain sequences at the
-3 and +1 positions at both ends of the LP5 allow base -pair formation to increase the RNA stem
length by one bp (i.e. ST4 to ST5) and reduce the loop size by 2 nt (i.e. LP5 to LP3), making it
difficult to evaluate the effect of the sequence context at these two positions in the original EVI2B
RNA substrate (Figure 2.9B-9C).
To identify the most preferred RNA substrate for A3A editing, the sequences on the
modified EIV2B RNA hairpin stem (ST5) and loop (LP4) were systematically altered and tested
for A3A editing efficiency. For the stem base -pair, we find that the sequence of the first stem base -
68
pair adjoint to the loop matters. A CG pair showed the highest A3A editing efficiency than other
base pairs, with the order of preference of CG > UA > GC > AU ( Figure 2.9D). For the RNA loop,
the -3 position no longer can form a base pa ir with +1 in a 4 nt loop (LP4), and our findings indicate
that the nt preference at -3 for A3A RNA editing was in the order of C > U > G > A. The -2 position
nt preference was shown to be A > U > G > C ( Figure 2.9D).
Based on the above findings on prefer red substrate features for A3A editing, we made a
few hairpin RNAs containing LP4 and ST5 containing various combination of the 4 nt loop
sequences and compare the editing efficiencies on these RNAs by A3A. The results showed a clear
ranking of editing eff iciency of these RNA substrates (Figure 2.9D-9E). The preferred loop
sequence was CAU C > CUU C > UAU C > UUU C (target C sites are underlined). Additionally, the
first stem base pair CG exhibits approximately 30% higher than the UA pair (Figure 2.9E). The
ranking of the six stem -loop RNA sequences as preferred substrates for A3A editing is shown in
Figure 2.9F.
Prediction and validation of potential RNA targets for A3A editing
We attempted to predict potential RNA targets for A3A editing based on the RNA sequence
and structural features identified as the favored substrates for A3A editing (Figure 2.9E-9F). By
searching the RNA database by inputting the 12 nt RNA sequences in the six RNA groups (group
1-6) depicted in Fig. 5F, we identified a total of 1162 sites in the coding region of human Refseq
RNA using the NCBI Blast tool (Figure 2.10A). Further analysis of th ese 1162 RNA target sites
revealed 175 of these sites overlap with previously reported sites [44, 45, 114], and 987 of these
sites are not identified before (Figure 2.10B).
69
To validate the prediction of A3A editing at the predicted sites of cellular RNAs, we
selected a total of 30 cellular RNA transcripts with predicted stop -gain or non -synonymous codon
changes, including 22 novel RNAs and 8 previously reported RNAs, for exam ining the
endogenous RNA editing levels by sequencing analysis (Figure 2.10C and Table 2). Our results
confirmed that all 30 selected sites displayed A3A RNA editing (Figure 2.10C). Consistent with
our prediction, the analysis of the results showed that th e average level of A3A -induced RNA
editing on the tetra -loop (LP4) sites follows the order: CAU C > CUUC > UAUC (target C
underlined)) (Table 2). Furthermore, the overall RNA editing efficiency was higher in the presence
of CG pairing in the first stem base -pair than the UA pairing (Table 2). Additionally, the enhanced
stability of the hairpin structure resulting from longer stem lengths exerted an influence on the
heightened level of RNA editing activity (Table S2 and Figure 2.11A). Therefore, multiple fact ors,
including sequence context and RNA secondary structure, are involved in determining the A3A -
mediated RNA editing activity.
70
Figure 2.10. Prediction and validation of novel RNA editing sites on cellular mRNA.
(A) Identification of A3A -mediated RNA editing candidates in human transcripts corresponding to six optimized
sequence groups. Inputting query sequences corresponding to each sequence group using the NCBI Blast tool to
predict potential editing sites by A3A i n coding regions. (B) Venn diagram showing the comparison between
previous studies and the candidate group of this study [44, 45, 114] . The regions corresponding to each study are
indicated in different colors. (C) Quantification of A3A -mediated RNA editing efficiency of 30 predicted sites in
endogenous transcripts from HEK 293T cells. Overlap sites with the previous studies indicated in gene names from
71
Sharma et al. (2015) Nature comms.† [45] and Sharma et al. (2017) RNA Biology* [44]. (D) The rate of codon
changes at sites corresponding to each sequence group by C -to-U editing. The number of sites retrieved in each
group is indicated above the bar graph, with synonymous in blue, nonsynonymous in orange, and stopgain in red.
(E) Correlation of the ratio of synonymous codon variants in each sequence group to the relative RNA editing level
Based on the known target RNA sequences for A3A as shown in the 6 RNA groups (Figure
2.9F and Figure 2.10A), we evaluated the synonymous, nonsynonymous, or stop codon changes
following C-to-U editing by A3A in the human mRNAs that contain the sequences of the RNA
groups 1-6. Interestingly, we found that the highest proportion of codon mutations were
synonymous (approximately 80% of the codon mutations) for the sites containing the sequence
group exhibiting the highest A3A editing activity (Group -1). The percentage of synonymous codon
mutations of these mRNAs generally decreased for less efficiently edited sites (from Group -2 to
Group-6) after A3A editing (Figure 2.10D). On the other hand, the proportion of nonsynonymous
plus the stop codon mutations of the mRNA increased for the mRNA sequences from Group -1 to
Group-6 with lower A3A editing -efficiency (Figure 2.10D). There was an obvious positive
correlation between the ratio of synonymous codon changes and the observed relative editing
efficiency among these sequence contexts (Figure 2.10E).
72
Table 2. Validation of the predicted A3A-mediated sites
# Gene
Position
(hg38)
Sequence
Stem(Loop)Stem
Avg. Editing
level (%)
Codon
Variation
1 ARMC4
10:27961636
CCUGAC(CAUC)GUCAGG 52.9 R440C
2 APC
5:112838104
UCCUCUU(CAUC)AAGAGGA 51.3 S837L
3 DNMBP
10:99969118
CCUC(CAUC)GAGG 39.0 R89X
4 GRIK4
11:120815423
GGAC(CAUC)GUCC 35.7 S98L
5 ATN1
12:6941430
GCAGCU(CAUC)AGCUGC 32.4 Q1139X
6 SMARCA2
9:2110415
AUCCU(CAUC)AGGAU 24.1 Q1152X
7 PARPBP
12:102178742
AUCCAU(CAUC)AUGGA 19.8 H386Y
8 ACY1
3:51984099
CACC(CAUC)GGUG 16.7 S12L
9 MCM10
10:13195091
UCUU(CAUC)AAGA 15.4 S600L
10 MLLT10
10:21733805
AGCU(CAUC)AGCU 14.1 S845L
11 BARD1
2:214728731
GGCUC(CUUC)GAGCU 37.2 S760L
12 CDKN2AIP
4:183446517
AGGCU(CUUC)AGCCU 32.0 S278L
13 ULK2
17:19785955
UGUUC(CUUC)GAACA 29.6 R745X
14 BAZ2B
2:159349710
UCACCU(CUUC)AGGUGA 28.9 Q1621X
15 COG3
13:45480198
GUGCU(CUUC)AGCAU 23.7 Q153X
16 JMJD4
1:227735291
GGGUC(CUUC)GACCC 23.4 R41X
17 STXBP5L
3:121257261
UUCCUU(CUUC)AAGGAG 23.3 S587L
18 HLA-F
6:29727083
CAAAGC(CUUC)GCUUUG 23.1 R413C
19 ATR
3:142465219
UUCU(CUUC)AGAA 11.0 Q2307X
20 DIS3L2
2:232163554
UUCU(CUUC)AGAA 5.0 S349L
21 ESCO2
8:27776420
CACUGUUUU(UAUC)AAAACAGUG 38.6 Q38X
22 SDHB
1:17044825
CCAUC(UAUC)GAUGG 33.1 R46X
23 ADAMTS9
3:64602018
GGAC(UAUC)GUCC 21.5 R1315C
24 DCLK1
13:36125951
CGUUUC(UAUC)GAAACG 19.4 R63X
25 BCLAF3
X:19965474
UGAC(UAUC)GUCA 17.5 R282C
26 PARP1
1:226407902
GCUC(UAUC)GAGU 15.4 R10X
27 MYO1B
2:191387275
UCUC(UAUC)GAGA 12.0 R536X
28 POLR3B
12:106444563
ACACU(UAUC)AGUGU 8.6 Q686X
29 SEC31B
10:100490796
GCCCU(UAUC)AGGGU 5.9 Q854X
30 QSER1
11:32931829
UGCU(UAUC)AGCA 5.1 Q91X
73
Figure 2.11. Characteristic analysis of potential editing sites predicted by A3A-mediated RNA
editing site.
(A) Analysis of A3A-mediated RNA editing rates according to stem length. CAUC is represented in blue,
CUUC in green, and UAUC in red for different groups of tetra loops, and the corresponding dashed lines
indicate linear regression analysis. (B) Gene ontology analysis ( http://geneontology.org/) for exploring the
potential biological functions of sites that undergo codon changes among the predicted sites. Each
annotation dataset is displayed on the r ight side of the chart. The x -axis, represented as the negative
logarithm of the False Discovery Rate (FDR), indicates the probability of correctly identifying the feature,
with a cutoff FDR < 0.05
74
Chapter II: Discussion
With the advancements in next -generation RNA-seq technology, there has been significant
progress in the study of RNA -mutational signatures. While initially considered random errors in
the transcriptional process by RNA polymerases, it is now recognized that understanding the
mechanisms by which RNA editors, such as ADARs and APOBECs, generate specific RNA edits
is crucial for unraveling the physiological role of RNA regulation within cells. Our experimental
investigation aimed to enhance our understanding of the RNA editing activities of A3A.
Dysfunctional A3A activity can result in uncontrolled DNA mutations and detrimental RNA
editing, which have been implicated in various types of cancer, including breast cancer and
hepatocellular carcinoma [100, 117].
Unrepaired DNA mutations resulting from APOBEC deamination are permanent
alterations that impact the entire corresponding RNA transcript, potential ly leading to various
genetic disorders. In contrast, RNA editing mediated by APOBECs represents a transient event
affecting only partially edited RNA transcripts. Gaining a comprehensive understanding of the
RNA editing process executed by A3A and other A POBECs is essential for elucidating their
biological functions and for devising strategies to achieve programmable targeted RNA editing,
which holds promise for potential therapeutic applications [111, 118]. In this study, we aimed to
characterize the specific features of RNA substrates that undergo A3A -mediated editing, utilizing
this knowledge to predict and validate novel cellular RNA targets susceptible to A3A -mediated
editing.
Previous studies have indicated that the RNA editing activity of A1, along with its cofactors,
is generally repressed in the cytoplasm and occurs within the nucleus within a temporal and spatial
framework, occurring after pre -mRNA splicing and prior to mRNA nuclear export [119-121]. In
75
our investigation, we observed a significant reduction in RNA editing activity when altering the
subcellular localization of A3A through fusion with a nuclear localization signal (NLS) (Figure
3). This finding suggests that the primary site for A3A -mediated RNA editing is the cytoplasm,
which contrasts with the localization pattern of A1 and its cofactors. However, despite t he high
sequence similarity between A3A and A3B -CD2 (92% identity) [27, 38], re -localizing A3B from
the nucleus to the cytoplasm did not result in any detectable RNA editing activity on the known
A3A substrates, EVI2 B and SDHB RNAs (Figure 2.5).
we observed that an A3B chimera, in which the DPLVLRRQ peptide of its CD2 domain
was replaced with the GIGRHK peptide from A3A, exhibited RNA editing activity on both A3A
substrates, regardless of whether the A3B chimeras wer e localized in the cytoplasm or nucleus
(Figure 2.6A and Figure 2.7). These findings indicate that the slight sequence variation between
A3A and A3B -CD2 is critical for determining RNA editing on specific RNA targets. Additionally,
other characteristics of A3B, such as A3B -CD1, may influence its capacity to edit RNAs in both
the cytosol and nucleus of the same RNA targets, in contrast to A3A and A1, where RNA editing
predominantly occurs in the cytoplasm and nucleus, respectively. These results also suggest that
A3B may possess inherent RNA editing activity on additional substrates, which aligns with a
previous study linking breast cancer to RNA editing by A3B [122], as well as a recent BioRxiv
report describing RNA editing events resulting from the overexpression of human A3B in a mouse
model [123].
Previous studies have demonstrated that A3A -mediated RNA editing exhibits a preference
for stem-loop structures, with the target C located at the 3'-end of the loop [44, 45, 57, 102, 114].
To gain a comprehensive understanding of the im pact of stem -loop structures and sequence context
on A3A -induced RNA editing, we conducted a thorough analysis using our cell -based fluorescent
76
assay system. In addition to confirming previous findings, our results revealed that the type of the
first stem base pair and the specific sequence of the loop also significantly influenced the efficiency
of A3A editing (Figure 2.9). Recent investigations into the optimal sequence for single -stranded
DNA interaction with A3A [95] and the most recent NMR structures of the A3A -DNA complex
[124] have emphasized the importance of T at position -1 and A at position +1 in positioning the
target C at position 0. Considering the speci fic characteristics of our RNA sequences, which
include U at position -1 and a preference for purine (G/A) at position +1, it is reasonable to propose
that the interaction between A3A and RNA may resemble the A3A -ssDNA interaction within the
catalytic active region. However, for a more precise analysis, it is essential to elucidate the
structure of the A3A -RNA complex using appropriate RNA sequences.
Based on the results from previous studies demonstrating a greater chance of observing
A3A-mediated RNA edi ting in stem -loop secondary structures containing target C at the end of
the loop [44, 45, 57, 102, 114], we assessed the extent of A3A -induced RNA editing by modifying
the RNA stem -loop substrates in a divers e manner within our cell based system (Figure 2.9). We
confirmed that RNA editing efficiency was increased as longer stems (or more stable) and with a
tetra loop sized, and by changing the loop and the first stem sequences to measure editing efficiency,
we were able to prioritize the preferred sequence groups (Figure 2.9). By utilizing the differential
A3A-mediated RNA editing efficiencies according to sequence specificity surrounding target C,
we were able to formulate predictions for potential RNA substr ates (Figure 2.9F).
Based on our observations regarding the influence of RNA stem -loop sequence context on
A3A editing, we utilized this information to predict potential cellular RNA substrates (Figure
2.9F). To accomplish this, we entered query sequences exhibiting optimal RNA editing activity
into the NCBI blast tool to search for corresponding mRNAs. The validity of our sequence -based
77
predictions was assessed from two perspectives. Firstly, a considerable pr oportion of the predicted
RNA targets for A3A overlapped with previously reported targets [44, 45, 114], revealing 987
targets that had not been previously identified. Secondly, Sanger sequencing of 30 randomly
selected predicted mRNA candidates obtained from cells overexpressing A3A confirmed A3A
editing at the expected sites (Figure 2.10C and Table 2). These results demonstrated the utility of
our prediction system in identifying A3A's RNA targets that may have remained undiscovered due
to technical limitations of RNA sequencing or limited expression of transcriptomes in specific cell
types. However, as the prediction system may deviate from the predicted RNA secondary structure,
particularly in situations where RNA dynamically interacts and folds with other RNAs in vivo
[125], it is essential to develop a supplementary system that considers the overall dynamics of
RNA folding to achieve greater prediction accuracy. Further investigations are required to
comprehend the biological consequences of RNA editing by A3A.
The analysis of potential mRNA target sites for A3A editing revealed that sequences
belonging to higher A3A -editing groups were associated with an increased frequency of
synonymous codon changes resulting from C -to-U mutations (Figure 2.10D-10E). Although a
substantial percentage of synonymous codon changes (702 sites, 60.4%) occurred following C -to-
U mutation in these high A3A -editing groups, numerous sites exhibited non -synonymous codon
changes (294 sites, 25.3%) or stop -gain codon changes (166 sites, 14. 3%) due to A3A editing
(Figure 2.10D). These findings indicate that while A3A -specific RNA editing serves specific
biological functions, cells have evolved mechanisms to minimize the potential negative impact of
excessive mutation on mRNAs carrying sequenc es susceptible to efficient A3A editing. To gain
insight into their potential biological functions or cellular pathways, we performed gene ontology
analysis on a total of 460 predicted sites exhibiting codon changes (Figure 2.11B). This analysis
78
opens avenues for investigating potential indirect functions of A3A through A3A -mediated RNA
editing, such as protein de -ubiquitination or vesicular cargo loading (Figure 2.11B). These
findings highlight the potential of our prediction system to enhance the understa nding of RNA
editing sites associated with A3A -related diseases.
DNA editing is an irreversible process that, when dysregulated, can lead to various genetic
diseases. In contrast, RNA editing is a transient and partially reversible process. Gaining a deeper
understanding of RNA editing and harnessing its potential can y ield significant clinical benefits,
particularly in the realm of programmable targeted RNA editing [111, 118]. Investigating the role
of A3A-induced RNA editing and exploring potential differences in RNA editing among other
APOBEC proteins is therefore of great interest. By leveraging the findings from this study,
advancements in the development of targeted therapies for genetic disorders, including cancer, can
be made through an enhanced understanding of the capabilities and preferred targets of APOBEC -
mediated RNA editing.
79
Chapter II: Materials and Methods
Transfection
The experimental methods utilized for this study involved culturing HEK 293T cells in DMEM
media supplemented with 10% FBS, streptomycin (100 μg/mL), and penicillin (100U/mL) and
incubated at 37°C with 5% CO2. To achieve transfection, 8 -well glass slides (CellVis) were first
coated with 0.1 mg/mL poly -D-lysine (Sigma). The cells were then diluted to a concentration of
250,000 cells/mL to ensure a clean monolayer for visualization during experimentation.
Subsequently, 250 µL of the diluted cell suspension was added to each well. After 24 hours of
initial adherence and growth, the cells were transfected using Lipofectamine 3000 Transfection
Reagent (Thermo Fisher) and a mixture of APOBEC editor (500ng) and substrate reporter vector
(50ng) in 25 μL volumes, following the manufacturer’s recommended instructions. The mixture
was allowed to sit at room temperature for 30 minutes before 15 µL was added dropwise to each
well. Expression was then allowed to occur for 48 hours.
Confocal fluorescence microscopy
For live cell microscopy in these experiments, an inverted Zeiss LSM -700 confocal microscope
with a 40x water -immersion objective was utilized to provide optimal visualization for analysis
and maximum cell countability per image. Prior to imaging, ce lls were washed with phosphate -
buffered saline (PBS) and stained with a 5 µg/mL solution of Hoechst 33342 nuclear stain in PBS
for 15 minutes. The cells were then rinsed twice with PBS and stored in imaging buffer (140 mM
NaCl, 2.5 mM KCl, 1.8 mM CaCl 2, 1.0 mM MgCl 2, 20 mM Hepes 7.4, 5 mM glucose). Imaging
was conducted at a higher laser intensity (15 -20%) and lower gain (500 -600 units) to enhance
80
signal-to-noise ratio. Excitation wavelengths for Hoechst 33342, eGFP, and mCherry were 405,
488, and 555 nm, r espectively, and the emission band -pass filters were set to 400 -480 nm, 490 -
555 nm, and 555 -700 nm, respectively. Approximately 3 -5 images were captured for each well,
and a total of 30 cells were quantified. All image analysis was performed using the LSM Toolbox
plugin incorporated within the FIJI distribution of ImageJ2 Software. [126].
RNA/DNA extraction for Sanger Sequencing
To extract RNA, the cells were harvested and Trizol reagent (Thermo Fisher) was added directly
to the wells to lyse the cells. The manufacturer's protocol for Trizol was followed, with the addition
of 50 µL of chlo roform and the extraction of approximately 40 µL of the aqueous phase. Total
RNA was then reverse transcribed into single -stranded cDNA using ProtoScript II reverse
transcriptase (NEB) and a specific primer designed to anneal downstream of substrate report er
segments. The reaction was conducted in a 20 µL volume, which contained 1 µg of total RNA,
100 µM of reverse primer, 10 mM dNTP, 0.1 M DTT, 8 U RNase inhibitor, and 0.2 µL of
ProtoScript II reverse transcriptase (NEB), at 42°C for 1 hour. The cDNA was a mplified with
Phusion® High-Fidelity DNA Polymerase (NEB) for 30 cycles (98°C: 2.5 min - (98°C: 20s,
71.7°C: 20s, 72°C: 30s)×30 - 72°C: 5 min) using a forward primer that anneals to the junction
region where the AAV intron was spliced out. To extract DNA, harvested cells were treated with
approximately 100 µL of QuickExtract ™ DNA Extraction Solution (Lucigen), vortexed, and
incubated at 65 °C for 15 minutes, followed by 2 minutes at 98°C and cooled to 4°C. 1 µL of the
extracted DNA was used in a PCR reactio n with Phusion® High -Fidelity DNA Polymerase (NEB)
and a forward primer that targets a region containing the AAV intron. The PCR reaction was
performed using the same cycle as the cDNA amplification. The final PCR products were purified
81
using a spin column PCR cleanup kit (Thermo Fisher) and submitted to Genwiz for Sanger
sequencing.
Western blot and antibodies
To assess the expression levels of FLAG -tagged APOBECs in cells, Western blot analysis was
performed. Cells were lysed using 1x RIPA buffer (Sigma) to prepare lysates. The Western blot
analysis was carried out from two independent transfections, and α-Tubulin was used as an internal
loading control. Primary antibodies used in this analysis included anti -FLAG M2 mAb (F3165,
Sigma) diluted 1:3,000 and anti-α-tubulin mAb from mouse (GT114, GeneTex) diluted 1:5,000. A
secondary antibody, Cy3 -labelled goat-anti-mouse mAb (PA43009, GE Healthcare) was
subsequently used. The Cy3 signals were detected and visualized using the Typhoon RGB
Biomolecular Imager (G E Healthcare).
Prediction of potential target RNA candidates for A3A editing
A sequence corresponding to N x N x N was determined for each group, and the complementary
sequence was automatically assigned to obtain the opposite stem (e.g., A -U, C-G, G -C, and U -A),
resulting in 64 queries per group (4 x 4 x 4 = 64). A total of 384 que ries (64 x 6 groups = 384)
were generated for the six groups. These queries were utilized to search for Refseq RNA sites with
a 100% match on NCBI Blastn ( https://blast.ncbi.nlm.nih.gov/Blast.cgi). The parameters used
were as follows: "Blastn -db refseq_rna_db/refseq_rna -query stem_size_4.fa -strand 'plus' -evalue
1000 -word_size 7 -taxids 9606 -perc_identity 100 -qcov_hsp_perc 100 -outfmt '6 delim = qaccver
stitle sa ccver pident length sstart send' -out stem_size_4 -num_threads 4". To obtain the data, we
82
extracted the coding sequence (CDS) features of mRNA from the search results and subsequently
removed any overlapping sites.
Analysis tools
In this experiment, all graphs were generated using GraphPad Prism 9, and statistical significance
was evaluated using a two -tailed Student's t -test implemented in GraphPad Prism 9. To predict the
RNA secondary structures for the local RNA region, we emplo yed the RNAstructure software [58,
82]. Gene ontology analysis was conducted on http://geneontology.org/.
83
REFERENCES
1. Hu, B., et al., Characteristics of SARS-CoV-2 and COVID-19. Nat Rev Microbiol, 2020.
2. Wu, F., et al., A new coronavirus associated with human respiratory disease in China.
Nature, 2020. 579(7798): p. 265 -269.
3. Zhu, N., et al., A Novel Coronavirus from Patients with Pneumonia in China, 2019. N Engl
J Med, 2020. 382(8): p. 727 -733.
4. Sanjuan, R., et al., Viral mutation rates. J Virol, 2010. 84(19): p. 9733 -48.
5. Elena, S.F. and R. Sanjuan, Adaptive value of high mutation rates of RNA viruses:
separating causes from consequences. J Virol, 2005. 79(18): p. 11555 -8.
6. Robson, F., et al., Coronavirus RNA Proofreading: Molecular Basis and Therapeutic
Targeting. Mol Cell, 2020. 80(6): p. 1136 -1138.
7. Denison, M.R., et al., Coronaviruses: an RNA proofreading machine regulates replication
fidelity and diversity. RNA Biol, 2011. 8(2): p. 270 -9.
8. Posthuma, C.C., A.J.W. Te Velthuis, and E.J. Snijder, Nidovirus RNA polymerases:
Complex enzymes handling exceptional RNA genomes. Virus Res, 2017. 234: p. 58 -73.
9. Zhao, W.M., et al., The 2019 novel coronavirus resource. Yi Chuan, 2020. 42(2): p. 212 -
221.
10. Sanjuan, R. and P. Domingo -Calap, Mechanisms of viral mutation. Cell Mol Life Sci, 2016.
73(23): p. 4433 -4448.
11. Wang, R., et al., Host Immune Response Driving SARS-CoV-2 Evolution. Viruses, 2020.
12(10).
12. Mourier, T., et al., Host-directed editing of the SARS-CoV-2 genome. Biochem Biophys Res
Commun, 2021. 538: p. 35 -39.
13. Abdel-Moneim, A.S. and E.M. Abdelwhab, Evidence for SARS-CoV-2 Infection of Animal
Hosts. Pathogens, 2020. 9(7).
14. Di Giorgio, S., et al., Evidence for host-dependent RNA editing in the transcriptome of
SARS-CoV-2. Sci Adv, 2020. 6(25): p. eabb5813.
15. Matyasek, R. and A. Kovarik, Mutation Patterns of Human SARS-CoV-2 and Bat RaTG13
Coronavirus Genomes Are Strongly Biased Towards C>U Transitions, Indicating Rapid
Evolution in Their Hosts. Genes (Basel), 2020. 11(7).
84
16. Wei, Y., et al., Coronavirus genomes carry the signatures of their habitats. PLoS One, 2020.
15(12): p. e0244025.
17. Graudenzi, A., et al., Mutational signatures and heterogeneous host response revealed via
large-scale characterization of SARS-CoV-2 genomic diversity. iScience, 2021. 24(2): p.
102116.
18. Valyi-Nagy, T. and T.S. Dermody, Role of oxidative damage in the pathogenesis of viral
infections of the nervous system. Histol Histopathol, 2005. 20(3): p. 957 -67.
19. Tomaselli, S., et al., ADARs and the Balance Game between Virus Infection and Innate
Immune Cell Response. Curr Issues Mol Biol, 2015. 17: p. 37 -51.
20. Olson, M.E., R.S. Harris, and D.A. Harki, APOBEC Enzymes as Targets for Virus and
Cancer Therapy. Cell Chem Biol, 2018. 25(1): p. 36 -49.
21. Smith, E.C., The not-so-infinite malleability of RNA viruses: Viral and cellular
determinants of RNA virus mutation rates. PLoS Pathog, 2017. 13(4): p. e1006254 .
22. Reshi, M.L., Y.C. Su, and J.R. Hong, RNA Viruses: ROS-Mediated Cell Death. Int J Cell
Biol, 2014. 2014: p. 467452.
23. Samuel, C.E., Adenosine deaminases acting on RNA (ADARs) are both antiviral and
proviral. Virology, 2011. 411(2): p. 180-93.
24. George, C.X., et al., Adenosine deaminases acting on RNA, RNA editing, and interferon
action. J Interferon Cytokine Res, 2011. 31(1): p. 99 -117.
25. Gonzales-van Horn, S.R. and P. Sarnow, Making the Mark: The Role of Adenosine
Modifications in the Life Cycle of RNA Viruses. Cell Host Microbe, 2017. 21(6): p. 661 -
669.
26. Picardi, E., L. Mansi, and G. Pesole, A-to-I RNA editing in SARS-COV-2: real or artifact?
2020, Cold Spring Harbor Laboratory Press: Cold Spring Harbor.
27. Yang, B., et al. , APOBEC: From mutator to editor. J Genet Genomics, 2017. 44(9): p. 423 -
437.
28. Smith, H.C., et al., Functions and regulation of the APOBEC family of proteins. Semin Cell
Dev Biol, 2012. 23(3): p. 258 -68.
29. Liu, M.C., et al., AID/APOBEC-like cytidine deaminases are ancient innate immune
mediators in invertebrates. Nat Commun, 2018. 9(1): p. 1948.
30. Harris, R.S. and J.P. Dudley, APOBECs and virus restriction. Virology, 2015. 479-480: p.
131-45.
85
31. Sadykov, M., et al., Short sequence motif dynamics in the SARS-CoV-2 genome suggest a
role for cytosine deamination in CpG reduction. J Mol Cell Biol, 2021.
32. Ratcliff, J. and P. Simmonds, Potential APOBEC-mediated RNA editing of the genomes of
SARS-CoV-2 and other coronaviruses and its impact on their longer term evolution.
Virology, 2021. 556: p. 62 -72.
33. Simmonds, P., Rampant C-->U Hypermutation in the Genomes of SARS-CoV-2 and Other
Coronaviruses: Causes and Consequences for Their Short- and Long-Term Evolutionary
Trajectories. mSphere, 2020. 5(3).
34. Kosuge, M., et al., Point mutation bias in SARS-CoV-2 variants results in increased ability
to stimulate inflammatory responses. Sci Rep, 2020. 10(1): p. 17766.
35. Klimczak, L.J., et al., Similarity between mutation spectra in hypermutated genomes of
rubella virus and in SARS-CoV-2 genomes accumulated during the COVID-19 pandemic.
PLoS One, 2020. 15(10): p. e0237689.
36. Danchin, A. and P. Marliere, Cytosine drives evolution of SARS-CoV-2. Environ Microbiol,
2020. 22(6): p. 1977 -1985.
37. Salter, J.D., R.P. Bennett, and H.C. Smith, The APOBEC Protein Family: United by
Structure, Divergent in Function. Trends Biochem Sci, 2016. 41(7): p. 578 -594.
38. Conticello, S.G., The AID/APOBEC family of nucleic acid mutators. Genome Biol, 2008.
9(6): p. 229.
39. Chen, S.H., et al., Apolipoprotein B-48 is the product of a messenger RNA with an organ-
specific in-frame stop codon. Science, 1987. 238(4825): p. 363 -6.
40. Driscoll, D.M., et al., An in vitro system for the editing of apolipoprotein B mRNA. Cell,
1989. 58(3): p. 519 -25.
41. Teng, B., C.F. Burant, and N.O. Davidson, Molecular cloning of an apolipoprotein B
messenger RNA editing protein. Science, 1993. 260(5115): p. 1816 -9.
42. Lerner, T., F.N. Papavasiliou, and R. Pecori, RNA Editors, Cofactors, and mRNA Targets:
An Overview of the C-to-U RNA Editing Machinery and Its Implication in Human Disease.
Genes (Basel), 2018. 10(1).
43. Prohaska, K.M., et al., The multifaceted roles of RNA binding in APOBEC cytidine
deaminase functions. Wiley Interdiscip Rev RNA, 2014. 5(4): p. 493 -508.
44. Sharma, S., et al., Transient overexpression of exogenous APOBEC3A causes C-to-U RNA
editing of thousands of genes. RNA Biol, 2017. 14(5): p. 603 -610.
86
45. Sharma, S., et al., APOBEC3A cytidine deaminase induces RNA editing in monocytes and
macrophages. Nat Commun, 2015. 6: p. 6881.
46. Sharma, S., et al., The double-domain cytidine deaminase APOBEC3G is a cellular site-
specific RNA editing enzyme. Sci Rep, 2016. 6: p. 39100.
47. Korber, B., et al., Tracking Changes in SARS-CoV-2 Spike: Evidence that D614G Increases
Infectivity of the COVID-19 Virus. Cell, 2020. 182(4): p. 812 -827 e19.
48. Garcia-Beltran, W.F., et al., Multiple SARS-CoV-2 variants escape neutralization by
vaccine-induced humoral immunity. 2021.
49. Wolfe, A.D., D.B. Arnold, and X.S. Chen, Comparison of RNA Editing Activity of
APOBEC1-A1CF and APOBEC1-RBM47 Complexes Reconstituted in HEK293T Cells. J
Mol Biol, 2019. 431(7): p. 1506 -1517.
50. Kinde, I., et al., Detection and quantification of rare mutations with massively parallel
sequencing. Proc Natl Acad Sci U S A, 2011. 108(23): p. 9530 -5.
51. Eboreime, J., et al., Estimating Exceptionally Rare Germline and Somatic Mutation
Frequencies via Next Generation Sequencing. PLoS One, 2016. 11(6): p. e0158340.
52. Rosenberg, B.R., et al., Transcriptome-wide sequencing reveals numerous APOBEC1
mRNA-editing targets in transcript 3' UTRs. Nat Struct Mol Biol, 2011. 18(2): p. 2 30-6.
53. Fossat, N., et al., C to U RNA editing mediated by APOBEC1 requires RNA-binding protein
RBM47. EMBO Rep, 2014. 15(8): p. 903 -10.
54. Soleymanjahi, S., V. Blanc, and N. Davidson, APOBEC1 mediated C-to-U RNA editing:
target sequence and trans-acting factor contribution to 177 RNA editing events in 119
murine transcripts in-vivo. RNA, 2021.
55. Sowden, M., J.K. Hamm, and H.C. Smith, Overexpression of APOBEC-1 results in
mooring sequence-dependent promiscuous RNA editing. J Biol Chem, 1996 . 271(6): p.
3011-7.
56. Maris, C., et al., NMR structure of the apoB mRNA stem-loop and its interaction with the
C to U editing APOBEC1 complementary factor. RNA, 2005. 11(2): p. 173 -86.
57. Sharma, S. and B.E. Baysal, Stem-loop structure preference for site-specific RNA editing
by APOBEC3A and APOBEC3G. PeerJ, 2017. 5: p. e4136.
58. Reuter, J.S. and D.H. Mathews, RNAstructure: software for RNA secondary structure
prediction and analysis. BMC Bioinformatics, 2010. 11: p. 129.
87
59. Xiong, Y., et al., Transcriptomic characteristics of bronchoalveolar lavage fluid and
peripheral blood mononuclear cells in COVID-19 patients. Emerg Microbes Infect, 2020.
9(1): p. 761 -770.
60. Hadfield, J., et al., Nextstrain: real-time tracking of pathogen evolution. Bioinformatics,
2018. 34(23): p. 4121 -4123.
61. Yamamoto, M., et al., SARS-CoV-2 Omicron spike H655Y mutation is responsible for
enhancement of the endosomal entry pathway and reduction of cell surface entry pathways.
BioRxiv, 2022.
62. Cagno, V., SARS-CoV-2 cellular tropism. Lancet Microbe, 2020. 1(1): p. e2 -e3.
63. Zheng, Y.Y., et al., COVID-19 and the cardiovascular system. Nat Rev Cardiol, 2020. 17(5):
p. 259-260.
64. Trypsteen, W., et al., On the whereabouts of SARS-CoV-2 in the human body: A systematic
review. PLoS Pathog, 2020. 16(10): p. e1009037.
65. Heng, T.S., M.W. Painter, and C. Immunological Genome Project, The Immunological
Genome Project: networks of gene expression in immune cells. Nat Immunol, 2008. 9(10):
p. 1091-4.
66. Blanco-Melo, D., et al., Imbalanced Host Response to SARS-CoV-2 Drives Development
of COVID-19. Cell, 2020. 181(5): p. 1036 -1045 e9.
67. Blanc, V., et al., Apobec1 complementation factor (A1CF) and RBM47 interact in tissue-
specific regulation of C to U RNA editing in mouse intestine and liver. RNA, 2019. 25(1):
p. 70-81.
68. Miao, Z., et al., Secondary structure of the SARS-CoV-2 5'-UTR. RNA Biol, 2021. 18(4):
p. 447-456.
69. Harvey, W.T., et al., SARS-CoV-2 variants, spike mutations and immune escape. Nat Rev
Microbiol, 2021. 19(7): p. 409 -424.
70. Garcia-Beltran, W.F., et al., Multiple SARS-CoV-2 variants escape neutralization by
vaccine-induced humoral immunity. Cell, 2021. 184(9): p. 2523.
71. Wolfe, A.D., et al., The structure of APOBEC1 and insights into its RNA and DNA substrate
selectivity. NAR Cancer, 2020. 2(4): p. zcaa027.
72. Schmidt, N., et al., The SARS-CoV-2 RNA-protein interactome in infected human cells. Nat
Microbiol, 2021. 6(3): p. 339 -353.
88
73. Plante, J.A., et al., Spike mutation D614G alters SARS-CoV-2 fitness. Nature, 2021.
592(7852): p. 116 -121.
74. Chen, X.S., Insights into the Structures and Multimeric Status of APOBEC Proteins
Involved in Viral Restriction and Other Cellular Functions. Viruses, 2021. 13(3).
75. Kockler, Z.W. and D.A. Gordenin, From RNA World to SARS-CoV-2: The Edited Story of
RNA Viral Evolution. Cells, 2021. 10(6).
76. Nomburg, J., M. Meyerson, and J.A. DeCaprio, Pervasive generation of non-canonical
subgenomic RNAs by SARS-CoV-2. Genome Med, 2020. 12(1): p. 108.
77. Guan, B.J., et al., Genetic evidence of a long-range RNA-RNA interaction between the
genomic 5' untranslated region and the nonstructural protein 1 coding region in murine
and bovine coronaviruses. J Virol, 2012. 86(8): p. 4631 -43.
78. Liu, Z., et al., Systematic comparison of 2A peptides for cloning multi-genes in a
polycistronic vector. Sci Rep, 2017. 7(1): p. 2193.
79. Chu, H., et al., Comparative tropism, replication kinetics, and cell damage profiling of
SARS-CoV-2 and SARS-CoV with implications for clinical manifestations, transmissibility,
and laboratory studies of COVID-19: an observational study. Lancet Microbe, 2020. 1(1):
p. e14-e23.
80. Rao, Y., et al., Targeting CTP Synthetase 1 to Restore Interferon Induction and Impede
Nucleotide Synthesis in SARS-CoV-2 Infection. bioRxiv, 2021.
81. Crooks, G.E., et al., WebLogo: a sequence logo generator. Genome Res, 2004. 14(6): p.
1188-90.
82. Mathews, D.H., et al., Incorporating chemical modification constraints into a dynamic
programming algorithm for prediction of RNA secondary structure. Proc Natl Acad Sci U
S A, 2004. 101(19): p. 7287 -92.
83. Baysal, B.E., et al., RNA Editing in Pathogenesis of Cancer. Cancer Res, 2017. 77(14): p.
3733-3739.
84. Christofi, T. and A. Zaravinos, RNA editing in the forefront of epitranscriptomics and
human health. J Transl Med, 2019. 17(1): p. 319.
85. Gagnidze, K., et al., A New Chapter in Genetic Medicine: RNA Editing and its Role in
Disease Pathogenesis. Trends Mol Med, 2018. 24(3): p. 294-303.
86. Nishikura, K., Functions and regulation of RNA editing by ADAR deaminases. Annu Rev
Biochem, 2010. 79: p. 321 -49.
89
87. Ramaswami, G., et al., Accurate identification of human Alu and non-Alu RNA editing sites.
Nat Methods, 2012. 9(6): p. 5 79-81.
88. Pecori, R. and N.F. Papavasiliou, It takes two (and some distance) to tango: how ADARs
join to edit RNA. Nat Struct Mol Biol, 2020. 27(4): p. 308 -310.
89. Niavarani, A., et al., APOBEC3A is implicated in a novel class of G-to-A mRNA editing in
WT1 transcripts. PLoS One, 2015. 10(3): p. e0120089.
90. Sharma, S., et al., Mitochondrial hypoxic stress induces widespread RNA editing by
APOBEC3G in natural killer cells. Genome Biol, 2019. 20(1): p. 37.
91. Bogerd, H.P., et al., APOBEC3A and APOBEC3B are potent inhibitors of LTR-
retrotransposon function in human cells. Nucleic Acids Res, 2006. 34(1): p. 89 -95.
92. Chen, H., et al., APOBEC3A is a potent inhibitor of adeno-associated virus and
retrotransposons. Curr Biol, 2006. 16(5): p. 480 -5.
93. Thielen, B.K., et al., Innate immune signaling induces high levels of TC-specific deaminase
activity in primary monocyte-derived cells through expression of APOBEC3A isoforms. J
Biol Chem, 2010. 285(36): p. 27753 -66.
94. Schutsky, E.K., et al., APOBEC3A efficiently deaminates methylated, but not TET-oxidized,
cytosine bases in DNA. Nucleic Acids Res, 2017. 45(13): p. 7655 -7665.
95. Chan, K., et al., An APOBEC3A hypermutation signature is distinguishable from the
signature of background mutagenesis by APOBEC3B in human cancers. Nat Genet, 2015.
47(9): p. 1067 -72.
96. Cortez, L.M., et al., APOBEC3A is a prominent cytidine deaminase in breast cancer. PLoS
Genet, 2019. 15(12): p. e1008545.
97. Buisson, R., et al., Passenger hotspot mutations in cancer driven by APOBEC3A and
mesoscale genomic features. Science, 2019. 364(6447).
98. Wang, Y., et al., Role of the single deaminase domain APOBEC3A in virus restriction,
retrotransposition, DNA damage and cancer. J Gen Vi rol, 2016. 97(1): p. 1 -17.
99. Swanton, C., et al., APOBEC Enzymes: Mutagenic Fuel for Cancer Evolution and
Heterogeneity. Cancer Discov, 2015. 5(7): p. 704 -12.
100. Gohler, S., et al., Impact of functional germline variants and a deletion polymorphism in
APOBEC3A and APOBEC3B on breast cancer risk and survival in a Swedish study
population. J Cancer Res Clin Oncol, 2016. 142(1): p. 273 -6.
90
101. Schmitt, K., et al., Differential virus restriction patterns of rhesus macaque and human
APOBEC3A: implications for lentivirus evolution. Virology, 2011. 419(1): p. 24 -42.
102. Kim, K., et al., The roles of APOBEC-mediated RNA editing in SARS-CoV-2 mutations,
replication and fitness. Sci Rep, 2022. 12(1): p. 14972.
103. Nakata, Y., et al., Cellular APOBEC3A deaminase drives mutations in the SARS-CoV-2
genome. Nucleic Acids Res, 2023. 51(2): p. 783 -795.
104. Acharya, N., P. Kumar, and U. Varshney, Complexes of the uracil-DNA glycosylase
inhibitor protein, Ugi, with Mycobacterium smegmatis and Mycobacterium tuberculosis
uracil-DNA glycosylases. Microbiology, 2003. 149(Pt 7): p. 1647 -58.
105. Prochnow, C., et al., The APOBEC-2 crystal structure and functional implications for the
deaminase AID. Nature, 2007. 445(7126): p. 447 -51.
106. Marino, D., et al., APOBEC4 Enhances the Replication of HIV-1. PLoS One, 2016. 11(6):
p. e0155422.
107. Muckenfuss, H., et al., APOBEC3 proteins inhibit human LINE-1 retrotransposition. J Biol
Chem, 2006. 281(31): p. 22161 -22172.
108. Lackey, L., et al., Subcellular localization of the APOBEC3 proteins during mitosis and
implications for genomic DNA deamination. Cell Cycle, 2013. 12(5): p. 762 -72.
109. Land, A.M., et al., Endogenous APOBEC3A DNA cytosine deaminase is cytoplasmic and
nongenotoxic. J Biol Chem, 2013. 288(24): p. 17253 -60.
110. Salamango, D.J., et al., APOBEC3B Nuclear Localization Requires Two Distinct N-
Terminal Domain Surfaces. J Mol Biol, 2018. 430(17): p. 2695 -2708.
111. Huang, X., et al., Programmable C-to-U RNA editing using the human APOBEC3A
deaminase. EMBO J, 2020. 39(22): p. e104741.
112. Fu, Y., et al., DNA cytosine and methylcytosine deamination by APOBEC3B: enhancing
methylcytosine deamination by engineering APOBEC3B. Biochem J, 2015. 471(1): p. 25 -
35.
113. Ito, F., et al., Family-Wide Comparative Analysis of Cytidine and Methylcytidine
Deamination by Eleven Human APOBEC Proteins. J Mol Biol, 2017. 429(12): p. 1787 -
1799.
114. Jalili, P., et al., Quantification of ongoing APOBEC3A activity in tumor cells by monitoring
RNA editing at hotspots. Nat Commun, 2020. 11(1): p. 2971.
91
115. Garg, A. and U. Heinemann, A novel form of RNA double helix based on G.U and C.A(+)
wobble base pairing. RNA, 2018. 24(2): p. 209 -218.
116. Love, R.P., H. Xu, and L. Chelico, Biochemical analysis of hypermutation by the
deoxycytidine deaminase APOBEC3A. J Biol Chem, 2012. 287(36): p. 30812 -22.
117. Nik-Zainal, S., et al., Association of a germline copy number polymorphism of APOBEC3A
and APOBEC3B with burden of putative APOBEC-dependent mutations in breast cancer.
Nat Genet, 2014. 46(5): p. 487 -91.
118. Abudayyeh, O.O., et al., A cytosine deaminase for programmable single-base RNA editing.
Science, 2019. 365(6451): p. 382 -386.
119. Lau, P.P., et al., Apolipoprotein B mRNA editing is an intranuclear event that occurs
posttranscriptionally coincident with splicing and polyadenylation. J Biol Chem, 1991.
266(30): p. 20550 -4.
120. Sowden, M., et al., Determinants involved in regulating the proportion of edited
apolipoprotein B RNAs. RNA, 1996. 2(3): p. 274 -88.
121. Sowden, M.P. and H.C. Smith, Commitment of apolipoprotein B RNA to the splicing
pathway regulates cytidine-to-uridine editing-site utilization. Biochem J, 2001. 359(Pt 3):
p. 697-705.
122. Asaoka, M., et al., APOBEC3-Mediated RNA Editing in Breast Cancer is Associated with
Heightened Immune Activity and Improved Survival. Int J Mol Sci, 2019. 20(22).
123. Alonso de la Vega, A., et al., Acute expression of human APOBEC3B in mice causes
lethality associated with RNA editing. 2022: p. 2022.06.01.494353.
124. Liu, Y., et al., Two different kinds of interaction modes of deaminase APOBEC3A with
single-stranded DNA in solution detected by nuclear magnetic resonance. Protein Sci,
2022. 31(2): p. 443 -453.
125. Schroeder, S.J., Challenges and approaches to predicting RNA with multiple functional
structures. RNA, 2018. 24(12): p. 1615 -1624.
126. Schneider, C.A., W.S. Rasband, and K.W. Eliceiri, NIH Image to ImageJ: 25 years of image
analysis. Nat Methods, 2012. 9(7): p. 671 -5.
Abstract (if available)
Abstract
The APOBEC3 family of human cytidine deaminases functions in a range of cellular activities, including the innate and acquired immune system, by inducing C-to-U in single-stranded DNA and/or RNA mutations. This dissertation focuses on investigating the roles of APOBEC RNA editors, specifically APOBEC1, APOBEC3A, and APOBEC3G, as catalytic enzymes involved in RNA editing activity. While their existence as RNA editors has been recently established, the precise mechanisms and specific targets of their editing activity remain poorly understood. My research aims to elucidate how these APOBEC RNA editors selectively target and edit human endogenous RNAs and/or viral RNAs within human cells. My dissertation is mainly divided into two parts, each addressing distinct aspects of APOBEC RNA editing: - Chapter I: Specific Editing of the RNA Genome of SARS-CoV-2 by Host APOBEC Enzymes - Chapter II: Unraveling the Enzyme-Substrate Properties of a Specific APOBEC3A- mediated RNA Editing. The findings from this dissertation will contribute to a better understanding of the functions and regulatory mechanisms of APOBEC RNA editors, paving the way for potential therapeutic applications and targeted interventions in RNA editing processes.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Structural and biochemical determinants of APOBEC1 substrate recognition and enzymatic function
PDF
The crystal structure of APOBEC-2 and implications for APOBEC enzymes
PDF
APOBEC RNA mutational signatures and the role of APOBEC3B in SARS-CoV-2 infection
PDF
Structural and biochemical analyses on substrate specificity and HIV-1 Vif mediated inhibition of human APOBEC3 cytidine deaminases
PDF
Biochemical studies of APOBEC protein family
PDF
Structural and biochemical studies of two DNA transaction enzymes
PDF
A structure based study of the HIV restriction factor APOBEC3G
PDF
The organization of small RNA pathways within C. elegans germ granules: mutator foci formation, regulation, and interaction
PDF
Regulation of Caenorhabditis elegans small RNA pathways: an examination of Argonaute protein RNA binding and post-translational modifications in C. elegans germline
PDF
Positive regulation of RNA polymerase III-mediated transcription of tRNA genes by the Mediator kinase submodule
PDF
Scanning and catalytic properties of AID with structural comparisons to APOBEC3A
PDF
SIMR-1 facilitates robust silencing of piRNA target loci in the C. elegans germline
PDF
Dissecting novel roles for MAFR-1 in reproduction and metabolic homeostasis
PDF
The role of microRNAs in cancer
PDF
Discovery of mature microRNA sequences within the protein- coding regions of global HIV-1 genomes: Predictions of novel mechanisms for viral infection and pathogenicity
PDF
The interaction of SARS-CoV-2 with CD1d/NKT antigen presentation pathway
PDF
Studies on iron-chloride redox flow battery for large scale energy storage
PDF
The role of envelope protein in SARS-CoV-2 evasion of CD1d antigen presentation pathway
PDF
Lymphatic cell environment promotes sustained KSHV lytic replication and viral maintenance
PDF
Pseudotyped viral vectors: HIV gene therapy applications and basic studies of SARS-COV-2
Asset Metadata
Creator
Kim, Kyu Min (author)
Core Title
Exploring roles of human APOBEC-mediated RNA editing activity
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Molecular Biology
Degree Conferral Date
2023-08
Publication Date
07/21/2023
Defense Date
08/08/2023
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
APOBEC,C-to-U RNA editing,deamination,OAI-PMH Harvest,SARS-CoV-2 editing
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Chen, Xiaojiang S. (
committee chair
), Calabrese, Peter (
committee member
), Michael, Matthew (
committee member
), Phillips, Carolyn (
committee member
)
Creator Email
celcel0320@gmail.com,kyumink@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC113282110
Unique identifier
UC113282110
Identifier
etd-KimKyuMin-12121.pdf (filename)
Legacy Identifier
etd-KimKyuMin-12121
Document Type
Dissertation
Format
theses (aat)
Rights
Kim, Kyu Min
Internet Media Type
application/pdf
Type
texts
Source
20230721-usctheses-batch-1071
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
APOBEC
C-to-U RNA editing
deamination
SARS-CoV-2 editing