Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Rational selection of CRISPR/Cas9 guide RNAs for homology directed genome editing and its utility in the development of gene therapies
(USC Thesis Other)
Rational selection of CRISPR/Cas9 guide RNAs for homology directed genome editing and its utility in the development of gene therapies
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Rational selection of CRISPR/Cas9 guide RNAs for homology directed ge- nome editing and its utility in the development of gene therapies by Kristina Jasmine Tatiossian A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY Medical Biology May 2020 Copyright 2020 Kristina Jasmine Tatiossian ii Table of Contents Figures......................................................................................................................................... iii Supplementary Figures ..............................................................................................................xxi Tables......................................................................................................................................xxxiv Rational selection of CRISPR/Cas9 guide RNAs for homology directed genome editing ............1 Abstract.................................................................................................................................. 2 Introduction ............................................................................................................................ 3 Results................................................................................................................................... 5 DSBs at Cas9 target sites exhibit preferences for NHEJ or MMEJ repair. ...................... 5 Indel signatures are conserved between hematopoietic cell lines and primary cells ....... 6 Indel signatures predict HDR............................................................................................ 7 Comparison of S. aureus and S. pyogenes Cas9 targeted to the same site. ................... 9 Predicting indel signatures and HDR in silico. ................................................................. 10 Discussion ............................................................................................................................ 12 Anti-HIV gene therapy: editing restriction factor tetherin in CD4+ T cells and HSPCs ...............17 Abstract ................................................................................................................................ 18 Introduction ........................................................................................................................... 19 Results.................................................................................................................................. 21 Screening tetherin variants for tethering activity and resistance to HIV Vpu ................... 21 Editing ΔGI/T45I tetherin in HSCs ................................................................................... 21 Anti-HIV capacity of ΔGI/T45I gene edited tetherin in cell lines and primary T cells. ...... 22 Mitigating indel outcomes in coding regions by gene editing at-a-distance. ................... 24 Engraftment of NSG mice with tetherin-modified HSCs and HIV challenge. .................. 25 Discussion ............................................................................................................................ 27 Methods and Materials .............................................................................................................. 30 References ................................................................................................................................ 37 iii Figures iv a b c d Figure 1. Indel signatures reflect dominance of MMEJ or NHEJ repair of DSBs. a) DNA DSB repair by MMEJ or NHEJ in mammalian cells can result in insertions or deletions of nucleotides (indels) at the lesion site. MMEJ resects DNA from the DSB ends leading to larger deletions. NU7441 is a highly selective inhibitor of DNA-PK, a component of NHEJ repair. b) Visualization of the eight most frequent indels, detected by ICE analysis, for gRNA #34 (Supplementary Table 1) in K562 cells, with and without 5μM NU7441. Individual indels were defined by their length (+ for insertion, - for deletion) and location relative to the predicted Cas9 cut site. Total indel fre- quency was also provided by the ICE analysis, which considers all observed indels in a sample. c) The absolute indel frequency for each indel observed in the panel of 51 gRNAs tested in K562 cells was converted to a relative indel frequency by normalization to the total indel frequency in its respective sample. Any change in relative indel frequency was calculated for each individual indel when 5μM NU7441 was added (n=3-5), and all changes were visualized by being plot- ted in groups with other indels of the same size. Indels that only appeared in the presence of NU7441 were excluded from the analysis. Boxplots show the median, quartiles, and interquartile range (IQR) for each indel size group. Two-sample Mann-Whitney U test was used to deter- mine whether the change in frequency upon NU7441 treatment significantly deviated from 0 (no change) for each indel size group (1.2 x 10 -38 <***p-value< 5.8x10 -4 , **p-value< 4.6x10 -3 , *p-val- ue<4.2x10 -2 ). Based on responses to NU7441, deletions ≥3bp were designated as MMEJ repair events (gold), while indels +3bp to -2bp were designated as NHEJ repair events (cyan). d) For each of the 51 gRNAs analyzed in K562 cells, relative indel frequencies were derived by normal- ization to the total indel frequency within each sample, and assigned as either NHEJ or MMEJ indels based on the size criteria described above. Total relative NHEJ (cyan) and MMEJ (gold) frequencies were then calculated and the gRNAs ranked based on the dominance of MMEJ in- dels (Supplementary Table 1). Error bars represent the standard error of the mean (SEM) (n=3-5). v a b c Figure 2. Comparison of indel signatures between hematopoietic cell lines and primary cells. a) Visualization of the eight most frequent outcomes, detected by ICE analysis, for gRNA #5 (Supplementary Table 1) in K562, CD4+T cells and HSPCs. Indels are defined by length and start/end site relative to the Cas9 cut-site. Total indels is the sum of all indels observed for the gRNA, including those not in the top eight. b) Relative indel frequencies were derived by normal- ization to the total indel frequency of each sample. Then, mean indel frequencies, either relative or absolute, were documented for 44 gRNA tested in K562 and CD4+T cells, 14 of which were also tested in HSPCs (K562 n=3-5, CD4+T n=2-3, HSPC n=1-8). Pearson coefficients were cal - culated comparing the indel outcomes, using either relative or absolute mean frequencies, for matched gRNA between cell types. Boxplots show the median, quartiles, and IQR, and median Pearson value for all gRNA in each comparison is indicated. c) Relative indel frequencies were derived by normalization to the total indel frequency of each sample. Total NHEJ and MMEJ frequency was calculated as the sum of the individual indel frequencies, which were defined as MMEJ or NHEJ based on size (indels sized ≥+1bp, -1bp, and -2bp defined as NHEJ while deletions ≥3bp defined as MMEJ). gRNAs were grouped to highlight those that had discrep - ancies in total indels >25% between K562 and primary cells. Numbers along the x-axis refer to gRNA (Supplementary Table 1). Error bars represent the SEM (K562 n=3-5, CD4+T n=2-3, HSPC n=1-8). d) MMEJ and NHEJ frequencies as shown in c) were normalized to the total in- del frequency to derive relative MMEJ and NHEJ for each gRNA. MMEJ and NHEJ outcomes were conserved between cell types for the panel of gRNA tested (ANOVA, ns=not significant). d vi Figure 3. Indel signatures predict HDR. a) DNA double-strand break (DSB) repair in mammalian cells can also occur by homology directed repair (HDR). Unlike NHEJ, HDR and MMEJ are both initiated by nucleotide resection at DSB ends, but eventually diverge in that HDR, but not MMEJ, uses a homology template as a guide to repair the DSB. b) MMEJ frequency but not total indels predicts efficiency of HDR editing. 44 gRNA were matched with ssODN homology templates and co-delivered to K562 or CD4+T cells with a subset of 13 gRNA + matched ssODN tested in HSPCs (Supplementary T able 1). The mean relative HDR frequency of each gRNA + matched ssODN was quantified and correlated to measurements in the matched gRNA only control, either the mean total indel frequency (left) or mean relative MMEJ frequency (right) (K562 n=3-4, CD4+T n=2-3, HSPCs n=2-7). The Pearson coefficient was derived for each comparison and significance calcu - lated. P-values<.05 indicate that there is a high probability that the correlates are associated. c) a c b d e vii Indels detected for each gRNA were categorized as MMEJ or NHEJ based on size as previously described, and cell type data pooled. Violin plots display the density distribution of the fold-change in MMEJ or NHEJ frequency for all gRNA calculated between +/- ssODN samples. Embedded boxplots show the median, quartiles and IQR. MMEJ is depleted in the presence of an ssODN more than NHEJ (two-sample Mann-Whitney U test, ***p-value<2.2x10 -16 ). d) Matched ssODNs were titrated for 3 gRNA in CD4+T cells and HSPCs using the Amaxa 4D nucleofector (Lonza). Relative MMEJ, NHEJ and HDR frequency was calculated as previously described. #’s refer to gRNA (Supplementary Table 1). Error bars, where applicable, represent the SEM (n=1-2). e) A subpanel of 27 gRNA +/- ssODN template were tested with and without 5μM NU7441 treatment in K562. MMEJ, NHEJ, and HDR frequency were derived as previously described. gRNAs were selected to retain similar total indel frequencies in presence and absence of NU7441. Numbers along the x-axis refer to gRNA (Supplementary Table 1). Error bars represent the SEM (n=2-3). viii a b c Figure 4. Comparison of Sa and SpCas9 indel signatures. a) The SaCas9 PAM (5’-NN- GRRT) overlaps with the SpCas9 PAM (5’-NGG) at five target sites (#7, 26, 29, 36, 42, Sup - plementary Table 1). The canonical Cas9 cut site is indicated by an arrow. b) Indel patterns, detected by ICE analysis, for gRNA #7 in K562 cells, induced by Sp or Sa Cas9. Indels are defined by length and start/end site relative to the Cas9 cut-site and are ordered by increas - ing size. c) All indels detected (replicate data pooled) from either Cas variant for the 5 gRNAs were used to generate a binary list of calls for each gRNA (where 1 indicated indel present, 0 absent). Jaccard/Tanimoto similarity coefficients were calculated between lists of all pairwise samples and plotted as a heatmap, where a value of 1 (purple) indicates high similarity and a value of 0 (yellow) indicates low similarity. Cas9 varieties targeting the same site are clustered. ix a b c d Figure 5. Predicting indel signatures and HDR editing potential in silico. a) Lindel, FORE- CasT, and inDelphi were used to predict indel signatures for the panel of 44 gRNAs that we had previously characterized in K562 cells, CD4+ T cells and a subset in HSPCs (Supplementa- ry Table 1). Only the 10 most frequent indels predicted by each tool were considered and fre- quencies normalized within this group. For indels observed experimentally, mean relative indel frequencies were calculated as previously described for each gRNA (n=3-5). For all predicted and measured values, indels were binned based on size (≥+1bp, +1, -1, -2, or ≥3bp deletion). Boxplots show the median, quartiles and IQR for indel bin frequencies. Comparisons were made between the observed indel frequencies, with cell type data pooled, against each prediction tool x (Kruskal-Wallis test, *p-value=.035, **p-value<8.5x10 -3 , ***p-value<6.9x10 -6 , ns=not significant). Only Lindel predicted insertions >1bp. b) Pearson coefficients were calculated for the predicted versus measured relative indel frequencies for each gRNA, and in each different cell type. Box- plots show the median, quartiles, and IQR. Lindel and FORECasT predictions correlated better to observed outcomes than inDelphi. (Kruskal-Wallis test performed with cell type data pooled, **p-value=7.9x10 -3 , ***p-value=3.3x10 -4 ). c) Predicted MMEJ frequency for all gRNA derived from each of the three programs is compared to observed MMEJ frequencies in K562 cells (dark grey), CD4+ T cells (medium grey) and HSPCs (light grey). The root-mean square error, a measure of deviation between predicted and measured values, is shown within each panel. Embedded pie charts show the percentage of gRNAs where the MMEJ or NHEJ dominance (>50% of indels) was correctly predicted by each tool. d) The mean frequencies of MMEJ, NHEJ, or HDR editing outcomes in the presence of a matched homology template were calculated for the indicated 44 gRNAs in K562 cells. Error bars represent the SEM (n=3-4). gRNAs are grouped based on the locus they are targeting, and ordered by HDR efficiency. Numbers at the base of columns indicate ranking of MMEJ indel frequencies, either observed in K562 cells in the absence of homology template (black numbers), or as predicted by Lindel (red numbers). The identity of the nucleo- tide in the -4 position, relative to the PAM, for each gRNA is shown at the top of each column. xi Figure 6. Model for CRISPR/Cas9 DSB repair and indel outcomes. Cas9 is directed to a target sequence by a gRNA (blue). The nucleotide in the -4 position (red) of the target sequence relative to the PAM sequence (orange) is shown. The HNH domain of Cas9 cleaves the gRNA binding strand between nucleotides -3/-4 (upper arrow), while the RuvC domain has more flexibility (lower arrows), resulting in blunt or overhanging DSB ends. Repair by MMEJ or NHEJ of these different DSBs can produce perfect repairs or the indicated indels. The gRNA can direct Cas9 to re-cleave a perfectly repaired sequence, while indels will typically produce a terminal outcome because the target sequence is disrupted. Our model predicts that blunt DSBs will be more likely to produce terminal events that are MMEJ repair outcomes, while 1bp overhang DSBs will favor terminal events that are NHEJ repair outcomes. xii Figure 7. Cellular, ex-vivo gene therapy for HIV/AIDS. Gene therapy has the potential to con- trol HIV replication and prevent lymphocyte depletion by providing a pool of HIV-resistant immune cells. Current approaches primarily target hematopoietic stem cells (HSCs), obtained by enriching either human bone marrow or umbilical cord blood. Genetic modification of HSCs and engraft - ment can potentially reconstitute the entire immune system with cells resistant to HIV. xiii HIV only WT Tetherin I34A L37A P40A L41A ΔGI(27,28) T45I ΔGI/T45I 0 20 40 60 80 100 % HIV-VLP release - Vpu + Vpu Tetherin variants ns ** ** ** ** ** *** WT I34A L37A P40A L41A ΔGI(27,28) T45I ΔGI/T45I 0.0 0.5 1.0 1.5 Tetherin expression Figure 8. ΔGI/T45I tetherin restricts HIV-VLP release in the presence of HIV Vpu. a) Tetherin (red) can physically tether budding viral particles at the surface of infected cells, preventing virion egress. The HIV-1 accessory protein Vpu (purple) counteracts tetherin activity; Vpu downregu- lates and/or degrades tetherin (bottom half). A tetherin variant that retains tethering activity but is resistant to Vpu would be desirable (top half). b) A representative blot of 293T cells transfected with plasmids encoding HIV-gag-pol (HIV-VLP), the indicated tetherin variant, and ‘filler’ con - trol plasmid, as well as human codon optimized Vpu, where indicated. Cells were lysed 48hrs post-transfection and lysates blotted and probed for tetherin, Vpu, HIV capsid (p24), and actin. Supernatents were harvested, concentrated by ultracentrifugation, and probed for p24. c) Rela- tive tetherin expression for each variant compared to WT tetherin in the absence of Vpu (n = 4). d) Relative p24 in the supernatant for each tetherin variant in the presence/absence of Vpu com- pared to HIV-gag-pol only control. All tetherin isoforms tested resitrected HIV-VLP release in the absence of Vpu, but only ΔGI/T45I tetherin potently restricted HIV-VLP release in the presence of HIV Vpu (***p-value = 3.5x10 -6 , two-sample Student’s t test). a b c d xiv ZFN (mRNA) Cas9 (mRNA) Cas9 (RNP) 0 20 40 60 80 100 HDR frequency indels ΔGI T45I ** ** *** *** *** *** ΔGI/T45I ZFN (mRNA) Cas9 (mRNA) Cas9 (RNP) 0 20 40 60 80 100 Indel frequency *** *** * untouched AAV only ZFN (mRNA) only ZFN (mRNA) + AAV Cas9 (mRNA) only Cas9 (mRNA) + AAV Cas9 (RNP) only Cas9 (RNP) + AAV 0 20 40 60 80 100 Frequency Viable Apoptotic Dead AAV6 ZFN mRNA Cas9 mRNA Cas9 RNP + + + + + + + + + + untouched AAV only ZFN (mRNA) only ZFN (mRNA) + AAV Cas9 (mRNA) only Cas9 (mRNA) + AAV Cas9 (RNP) only Cas9 (RNP) + AAV 0 20 40 60 80 100 % of total cells Viable Apoptotic Dead AAV6 ZFN mRNA Cas9 mRNA Cas9 RNP + + + + + + + + + + untouched ZFN (mRNA) Cas9 (mRNA) Cas9 (RNP) 0 20 40 60 80 100 % of total CFU CFU-GEMM CFU-GM BFU-E Figure 9. Gene editing ΔGI/T45I tetherin in CD34+ HSPCs. a) Strategy and tools to modify exon 1 of the tetherin locus to encode ΔGI, a deletion of 6bp (2 amino acids), and T45I, a single, C->T base pair subsitution. The approximate cut sites for the Cas9 and zinc finger nucleases are shown. The AAV6-based template has 500bp left homolgy & ~1.5kb right homology arm and encodes both designer modifications, as well as two additional mutations between ΔGI and T45I which disrupt the ZFN binding domain (orange, Supplemental Fig. 2). b) Total indel frequency in CD34+ HSCs in ZFN and Cas9 only samples (Cas9 delivered either as mRNA or RNP) quantified by targeted deep sequencing (n = 4) (see Supplemental Fig. 1 for ZFN and Cas9 target site selec- tion). c) Frequency of indels and various HDR outcomes in HSCs in samples treated with AAV6 homology template and the indicated targeted nuclease quantified by targeted deep sequencing. d) The effect of editing reagents on cellular viability 1 day post editing as determined by viaibli- ty staining and flow cytometery. e) Representative colony forming unit’s (CFUs) phenotypically called as CFU-GEMM (most primitive), CFU-GM, or BFU-E. f-g) Total CFUs per plate are plotted for control as well as AAV6-treated samples for indicated targeted nuclease. Immediately follow- untouched ZFN (mRNA) Cas9 (mRNA) Cas9 (RNP) 0 20 40 60 80 100 Total CFU/plate CFU-GEMM CFU-GM BFU-E ns * * a b c d e f g xv ing AAV transduction, HSCs were plated in methycellulose semi-solid media with recombinant cytokines to support hematopoeitic cell expansion and differrentiation. 2-3 weeks later, colonies were imaged by bright-field microscopy and phenotypically called. xvi ZFN Cas9 0 20 40 60 80 100 Indel frequency ZFN Cas9 0 20 40 60 80 ΔGI frequency WT ΔVpu A18H WT ΔVpu A18H 0.0 5.0 ×10 7 1.0 ×10 8 1.5 ×10 8 HIV RNA/mL ΔGI/T45I WT HIV CEM-SS *** *** *** *** *** (-) WT ΔVpu A18H (-) WT ΔVpu A18H 0.0 0.5 1.0 1.5 Tetherin MFI ns * ** ns ns ns ΔGI/T45I WT HIV CEM-SS Figure 10. Production of clonal ΔGI/T45I CD4+ CEM-SS cells and HIV challenge. a) Total in- del frequency for ZFN 1189 (delivered as mRNA) and Cas9 #4 (delivered as RNP) in CD4+ CEM- SS T cell lines. b) ΔGI HDR efficiency, quantified by ddPCR, induced by ZFN 1189 and Cas9 #4 when paired with the AAV6 homology template in CEM-SS cells. c) ΔGI/T45I edited CEM-SS clonal populations were derived by single cell sorting bulk-edited cells. The table indicates the total number and frequency of clonal colonies for editing outcomes for each nuclease variety. Clones with uninteligible Sanger trace files were categorized under ‘decomposed.’ d) A represen- tative Sanger trace file from a clonal ΔGI/T45I edited CEM-SS population is shown. e-f) Challenge of WT and ΔGI/T45I CEM-SS cells with WT, ΔVpu, and A18H Vpu HIV NL4.3 (n=2, for 2 different ΔGI/T45I clonal CEM-SS populations). e) Viral loads, measured by qRT-PCR of viral RNA in the supernatent, of WT (black) and ΔGI/T45I edited (grey) CEM-SS cell populations infected with WT, ΔVpu, and A18H Vpu HIV. f) Tetherin expression, displayed as the mean fluorescence intensity (MFI), of ΔGI/T45I and WT CEM-SS cell populations infected with HIV compared to control, unin- fected cells. Tetherin expression in ΔGI/T45I CEM-SS populations was not signficantly changed in the presence of WT HIV as compared to WT CEM-SS cells (**p-value=8.3x10 -3 , Student’s t test) a b c e f d xvii - + 0 20 40 60 80 100 Nu7441 Indel frequency ns - + 0 20 40 60 80 100 ΔGI frequency Nu7441 *** (-) RNP only RNP + ΔGI/T45I AAV 0 20 40 60 80 100 % Tetherin (of live cells) - HIV + HIV RNP AAV6 X X X ** ns ns untouched RNP edited GI/T45I edited 0 500 1000 1500 2000 Tetherin MFI - HIV + HIV RNP AAV6 X X X ** ns ns untouched RNP edited GI/T45I edited 0 20 40 60 80 100 % CD4+ (of live cells) - HIV + HIV RNP AAV6 X X X ns ns ns WT RNP edited GI/T45I edited 10 6 10 7 10 8 HIV RNA/mL RNP AAV6 X X X *** ** * Figure 11. Gene editing tetherin in primary CD4+ T cells and HIV challenge. a) Total indel frequency induced by Cas9 #4 RNP in the presence/absence of NHEJ inhibitor, NU7441, in pri- mary, pre-stimulated CD4+ T cells. b) ΔGI HDR efficiency, quantified by ddPCR, induced by Cas9 #4 RNP when paired with the AAV6 homology template in the presence/absence of NU7441 in primary CD4+ T cells (***p-value=9.2x10 -4 , n = 3). c-f) Challenge of ΔGI/T45I edited primary CD4+ T cells with WT HIV NL4.3 (n=3). c) Viral loads, measured by qRT-PCR of viral RNA in the supernatent, of RNP only, RNP+AAV6 ΔGI/T45I edited, or control primary CD4+ T cell pop- ulations 4 days post-HIV infection (*p-value=.035, **p-value=2.4x10 -3 , ***p-value=4.6x10 -5 , n=3, ANOVA). d) CD4+ cell surface expression, of control and edited cell populations in the presence/ absence of HIV infection. e-f) Tetherin expression, displayed as either the mean fluorescence intensity (MFI) or frequency cell surface expression, of edited CD4+ cell populations infected with HIV compared to control, uninfected cells. Tetherin expression in RNP and ΔGI/T45I edited pop- ulations was not signficantly changed in the presence of WT HIV as compared to WT CD4+ cells (**p-value=6.2x10 -3 , n=3, Student’s t test) a b c d d e xviii ZFN 1383 (mRNA) Cas9 #6 (RNP) 0 2 4 6 8 10 % of ΔGI/T45I reads ΔGI/T45I ΔGI/T45I + indel Figure 12. At-a-distance gene editing in CD34+ HSPCs. a) Strategy and tools to modify exon 1 of the tetherin locus to encode ΔGI/T45I using nucleases targeted to intron1, roughly ~225bp downstream of ΔGI. The approximate cut sites for the Cas9 and zinc finger nucleases are shown. The AAV6-based template (blue) has 500bp left homolgy & ~1.5kb right homology arm. b) Total indel frequency induced by intron-targeted ZFN 1383 and Cas9 #6 (delivered as mRNA or RNP) in HSCs quantified by targeted deep sequencing (n=2-4). c) Frequency of indels and various HDR outcomes in HSCs in samples treated with AAV6 homology template and the indicated targeted nuclease quantified by targeted deep sequencing. d) Allelic linkage of ΔGI/T45I edits with intronic indels in HSCs using either the ZFN or Cas9. The frequency of reads with and without co-ocurring edits and indels was determined by targeted deep sequencing (w/ long miseq) and normalized to the total reads with ΔGI/T45I edits. Indels were linked to ΔGI/T45I edits for both nucleases. ZFN 1383 (mRNA) Cas9 #6 (mRNA) Cas9 #6 (RNP) 0 20 40 60 80 100 Indel frequency ** *** ZFN 1383 (mRNA) Cas9 #6 (mRNA) Cas9 #6 (RNP) 0 10 20 30 40 50 Frequency indels ΔGI T45I ΔGI/T45I a b c d xix 0 4 8 12 0 20 40 60 80 100 Weeks post infection % CD45 (of live) untouched Exon Cas9 (RNP) 4 8 12 10 4 10 5 10 6 10 7 Weeks post infection HIV RNA/mL untouched Exon Cas9 (RNP) Figure 13. Engraftment of mice with tetherin-modified CD34+ HSPCs. Adult NSG mice were engrafted with fetal liver HSCs, either untreated, or treated with the tetherin AAV6 homology tem- plate plus the indicated targeted nuclease, using 3 different donor tissues for Exon Cas9 RNP edited samples and 1 donor tissue for Intron Cas9, Exon Cas9 (mRNA), and ZFN (mRNA). ΔGI editing levels in these input HSCs were 82%, 67%, and 75% for Exon Cas9 RNP , and 25%, 20%, and 16% for the Exon Cas9 (mRNA), Exon ZFN, and Intron Cas samples, respectively. Peripheral blood was analyzed at 16 weeks for the % of human CD45+, CD19+, CD3+ and CD4+ cells. a) Levels of human CD45+ for all conditions with independent cohorts pooled. b) Levels of human CD19+, CD3+, and CD4+ cells within the CD45+ gate for all conditions. c) Rates of gene editing untouched Cas9 (RNP) Cas9 (mRNA) ZFN (mRNA) 0.1 1 10 100 % CD45+ (of live) Exon 1 Intron 1 untouched Exon Cas9 (RNP) Intron Cas9 (RNP) Exon ZFN (mRNA) 0 20 40 60 80 100 % of CD45+ %CD19+ %CD3+ %CD3/CD4+ untouched Exon Cas9 (RNP) Intron Cas9 (RNP) Exon ZFN (mRNA) 0 10 20 30 ΔGI frequency a b c untouched Exon Cas9 (RNP) 0.0 0.2 0.4 0.6 Fold-change CD4 (pre/post HIV) ns d e f xx in human cells was measured in blood from individual mice by ddPCR. d) Frequency of human CD45+ cells for all conditions 4, 8, and 12 weeks post HIV-infection. e) Levels of HIV RNA/mL for mice engrafted with untouched or Exon Cas9 + AAV edited HSCs at 4, 8, and 12 weeks post infection. f) Fold-change of CD4+ expression before and after HIV infection for mice engrafted with untouched or Exon Cas9 + AAV edited HSCs at 12 weeks post infection. xxi Supplementary Figures xxii Supplementary Fig. 1, Related to Fig. 1. The effect of NU7441 on indel frequencies. a) Total indel frequency was determined by ICE analysis for the panel of 51 gRNAs in K562 cells, either untreated, or treated with 1μM or 5μM NU7441. Boxplots show the median, quartiles, and IQR. Treatment with either dose of NU7441 had no effect on the total indel frequency (Krus- kal-Wallis test). b) The absolute frequencies of all indels in the absence or presence of NU7441 were binned based on size (+1bp insertion and -1bp, -2bp or ≥3bp deletions) for the panel of 51 gRNAs. Boxplots show the median, quartiles, and IQR. c) Same data as in (b), but shown as fold-change in presence of 5μm NU7441. Boxplots show the median, quartiles, and IQR. d) All indels detected for the panel of 51 gRNAs were defined, based on size, as NHEJ (+1bp insertion, -1 or -2bp deletions, cyan) or MMEJ (≥3bp deletions, gold). Relative indel frequen- cies were derived by normalization to the total indel frequency within each sample, and the total NHEJ or MMEJ indels were combined for each gRNA. The gRNAs were ranked based on MMEJ prevalence under control conditions. The effects of adding 1μM or 5μM NU7441 on the total NHEJ or MMEJ indels is shown. Error bars represent the SEM (n=1-5) and overlayed dots in NU7441-treated samples show the MMEJ indel frequency for the control (no NU7441) samples. e) Same data as in (d), but plotted to show the distribution of total NHEJ or total MMEJ indels for each gRNA, under the indicated conditions. Boxplots show the median, quar- tiles, and IQR. Mann-Whitney U test was used to determine significance in (b), (c), and (e). a b d e c xxiii Supplementary Fig. 2, Related to Fig. 2. Cell-type comparisons of indel signatures. a) Pair- wise comparison of indel patterns for gRNAs, between K562 and CD4+T cells (gRNA=44, Supple - mentary Table 1) or HSPCs (gRNA=14, Supplementary Table 1). All indels observed in samples treated with each gRNA were coded based on length and position relative to Cas9 cut site and replicates were pooled (K562 n=3-5, CD4+T n=2-3, HSPC n=1-8). A binary list of calls was gener - ated for each gRNA in each cell type (1 indicating indel present, 0 absent) and Jaccard/Tanimoto similarity coefficients were calculated between all lists. The coefficients are plotted as heatmaps with matched gRNAs from each cell type clustered side-by-side. A value of 1 (purple) indicates high similarity and a value of 0 (yellow) indicates low similarity. b) Jaccard/Tanimoto similarity co- efficients calculated as in (a) were plotted for different gRNAs compared within K562 cells (left), or for matched gRNAs compared between the three different cell types (right). Boxplots show the median, quartiles and IQR. Matched gRNAs across different cell types are more similar than dif- ferent gRNAs within the same cell type (two-sample Mann-Whitney U test, *** p-value<2.2x10 -16 ). a b xxiv Supplementary Fig. 3, Related to Fig. 2. Improving total indel frequency does not alter indel signatures. a) Comparison of mean total indels frequency in K562 cells (n=3-5) versus primary cells (n=1-8). Numbers identify 12 gRNAs (Supplementary Table 1) producing <75% total indels in K562 cells. b) The 12 gRNAs producing <75% total indels in K562 cells in panel (a) were re-tested in the presence of a non-target ssODN. Boxplots show the mean, standard deviations, and min/ max values (n=2). Total indel frequencies were improved by non-target ssODN (Welch’s two-sam- ple paired t test, ***p-value=8.9x10 -7 ). c) For each of the 12 gRNAs, the relative frequencies of MMEJ (gold) or NHEJ (cyan) indels were calculated based on size, as described in Figure 1, in the presence and absence of the non-target ssODN. Relative NHEJ and MMEJ frequencies were not affected by the addition of a non-target ssODN, despite the increases in total indel frequency. a b c xxv 0 1 5 0 20 40 60 80 100 HDR frequency ICE RFLP µM NU7441 ns ns ns Supplementary Fig. 4, Related to Fig. 3. Indel signatures predict HDR. a) Single-stranded oligonucleotide (ssODN) homology templates, each containing 80bp of homology to the non-PAM DNA strand, are designed to insert a 6bp (XhoI) cassette between nucleotides -3/-4 (red) relative to the PAM motif (orange). b) Validation of ICE for the quantification of HDR frequency. gRNA #18 was delivered to K562 cells alongside a matched ssODN homology template that inserts a XhoI site at the Cas9 cut site (Supplementary Table 1). K562 cells were cultured in increasing amounts of NU7441 (0, 1, and 5μM) post-electroporation. HDR was quantified by either ICE or RFLP. No significant difference was detected between HDR quantification methods (Welch’s two-sample t test, n=2). c) HDR editing outcomes for 21 gRNAs that produced at least 80% total indel frequency in CD4+ T cells, where total indel frequency was quantified in the absence of a template. Error bars represent the SEM (n=2-3). The mean frequency of MMEJ, defined as the ≥3bp deletion indel subset, observed in the absence of a template, is shown overlaid as a gold diamond (n=2-3). d) Comparison of editing outcomes for indicated gRNAs in K562 cells versus CD4+T cells, or K562 cells versus HSPCs, in the presence and absence of matched ssODN ho- mology templates. Outcomes were assigned as MMEJ, NHEJ or HDR as previously described and relative frequencies calculated. Error bars represent the SEM (K562 cells n=3-5, CD4+T cells n=2-3, HSPC n=1-8). Numbers in c) and d) identify individual gRNAs (Supplementary Table 1). b c d a xxvi Supplementary Fig. 5, Related to Fig. 4. Indel signatures of S. aureus Cas9. a) Total indel frequencies were calculated for the panel of 5 SaCas9-gRNAs in Figure 4a, in K562 cells, un- treated or treated with 5μM NU7441. Boxplots show the median, quartiles, and IQR for all gR- NAs, with replicates pooled (n=4). Unlike SpCas9, treatment of SaCas9-gRNAs with NU7441 reduced the total indel frequency (Kruskal-Wallis test, ***p-value=2.8 x 10 -4 ), which may relate to the fact that we deliver SaCas9 as mRNA, not RNPs. b) All indels for the panel of 5 Sa- Cas9-gRNAs tested in K562 cells were categorized based on size, with a positive value reflect - ing an insertion, and a negative value a deletion. Relative indel frequencies were derived by normalization to the total indel frequency of each sample. The mean change in relative indel frequency in the presence of 5μM NU7441 was calculated (n=4) and plotted for all gRNAs. Box- plots show the median, quartiles, and IQR. Two-sample Mann-Whitney U test was used to de- termine whether the change in frequency upon NU7441 treatment significantly deviated from 0 (no change) for each indel size (***p-value< 4.8 x 10 -4 , **p-value< 7.8 x 10 -3 , *p-value<3.1x10 -2 ). Samples with <50% total indels were removed from analysis. c) The frequency of NHEJ, MMEJ, and HDR outcomes are shown for the panel of 5 SaCas9-gRNAs, with and without a ssODN homology template (+/-). gRNA #36 consistently had undetectable levels of indels (not detected, n.d.) in the presence of an ssODN for unknown reasons. Error bars represent the SEM (n=4). a b c xxvii Supplementary Figure 6, related to Figure 9. Selection of ZFN and CRISPR/Cas9 target sites. a) Approximate cut sites for the panel of Cas9s (blue) and ZFNs (black) tested are shown relative to the intended tetherin edit sites (red). b) Representative agarose gel for T7 Endonucle- ase assay showing parent band and cleavage products for ZFN 1189 and 1208 tested in K562. c) The total indel frequency of ZFN-treated K562 samples was determined by quantifing the relative band intensities between parent and cleavage products as shown in b. ZFN 1189 induced sign- ficiantly greater indels than 1208 (*p-value=.023, n = 5, Student’s t test). ZFN 1189 was selected for HDR studies in HSCs. d) The total indel freuqency of Cas9-gRNA treated K562 samples was determined by Sanger sequencing and ICE analysis. NHEJ and MMEJ indel subsets were de- fined based on indel size (deletions ≥3bp as MMEJ and all others as NHEJ). Based on the high total indel efficiency and favorable MMEJ:NHEJ profile, Cas9-gRNA #4 was selected for HDR studies in HSCs. e) Representative indel signature induced by gRNA #4 in HSCs displayed from most to least frequent. Cas9 #1 Cas9 #2 Cas9 #3 Cas9 #4 Cas9 #5 0 20 40 60 80 100 Indel frequency MMEJ NHEJ ZFN 1189 ZFN 1208 0 10 20 30 40 Indel frequency * a b c d e xxviii WT DNA + 0 mut + 1mut + 2mut + 3mut + 4mut + 6mut 0 10 20 30 40 Cleavage frequency ΔGI/T45I DNA Supplementary Figure 7, related to Figure 9. Optimization of ZFN activity in HSCs and improving compatibility with AAV6 homology template. a) Schematic of in vitro transcribed mRNA used to express ZFN 1189 in HSCs. Both pVAX and pVAX+UTR constructs use T7 pro- moter to express the ZFN open reading frame (ORF). pVAX+UTR mRNA additionally encodes a Xenopus 5’UTR and 2, beta globin 3’UTRs. b) Total indel frequency induced by ZFN 1189, delivered in either mRNA format, to HSCs. ZFN 1189 expressed from mRNA with UTRs induces signficantly greater total indels in HSCs (***p-value=1.3x10 -4 , n = 4, Student’s t test). c) In vitro cleavage of gene blocks (double stranded DNA) by ZFN 1189 protein. Gene blocks were de- signed to encode various mutations in the ZFN 1189 binding site in addition to the desired ΔGI/ T45I modifications. ZFN 1189 cleavage was prevented in vitro when gene blocks encoded >2 additional mutations in the ZFN target sequence. d) Allelic linkage of ΔGI/T45I edits with indels in HSCs using AAV homology templates with and without ZFN binding site mutations. The frequency of reads with and without co-ocurring edits and indels was determined by targeted deep sequenc- ing and normalized to the total reads with ΔGI/T45I edits. Indels are no longer linked to ΔGI/T45I edits when the AAV homology template encodes +2mutations in ZFN binding domain. e) Location of the +2 mutations (orange) relative to ΔGI/T45I and the ZFN binding site. ΔGI/T45I AAV ΔGI/T45I AAV + 2mut 0 20 40 60 80 100 % of ΔGI/T45I reads ΔGI/T45I ΔGI/T45I + indel pVAX pVAX+UTRs 0 10 20 30 40 50 Indel frequency *** a b c d e xxix Supplementary Figure 8, related to Figure 11. Concentrated AAV6 transduction improves HDR efficiency in CD4+ T cells. a) Timelines tested for the electroporation and transduction of primary, pre-stimulated CD4+ T cells. In timeline A, cells are transduced for 1hr in 10microL AAV containing serum followed by electroporation and immediate culture in FBS containing media. In timeline B, cells are first electroporated, then immediately transduced for 2hrs in 1mL AAV con - taining serum followed by the addition of FBS to culture. b) ΔGI HDR efficiency, quantified by ddP - CR, induced by Cas9 #4 RNP when paired with the AAV6 homology template at indicated MOIs in primary CD4+ T cells compared between timeline A and B. At all AAV MOI’s tested, timeline A outperformed timeline B (***p-value<3.6x10 -10 , n=2, Student’s t test). AAV only RNP only 5e2 1e3 1e4 1e5 0 20 40 60 80 ΔGI frequency Timline A Timeline B *** *** *** *** MOI a b xxx 1383 1410 1442 1460 0 10 20 30 40 50 Indel frequency Cas9 #6 0 20 40 60 80 100 Indel frequency MMEJ NHEJ Supplementary Figure 9, related to Figure 12. Intron-targeted nucleases preserve tetherin protein expression. a) Approximate cut sites for the intron-targeted Cas9s (blue) and panel of ZFNs (black) tested are shown relative to the intended tetherin edit sites (red) and tetherin locus structure. b) Representative agarose gel for T7 Endonuclease assay showing parent band and cleavage products for ZFN panel tested in K562. c) The total indel frequency of ZFN-treated K562 samples was determined by quantifing the relative band intensities between parent and cleavage products as shown in b. ZFN 1383 was selected for HDR studies in HSCs. d) The indel outcomes following ZFN 1383 cleavage in HSCs as determined by targeted deep sequencing. The exon-intron junction as well as the splice acceptor site is indicated. No indels with frequency >1% disrupt the splice site of exon-intron junction; all are contained within intron 1. e) The total indel freuqency of the Cas9-gRNA treated K562 samples was determined by Sanger sequencing and ICE analysis. NHEJ and MMEJ indel subsets were defined based on indel size (deletions ≥3bp as MMEJ and all others as NHEJ). f) The indel outcomes following Cas9 #6 RNP cleavage in HSCs as determined by targeted deep sequencing. The exon-intron junction as well as the splice acceptor site is indicated. No indels with frequency >1% disrupt the splice site of exon-in- 0 2 4 7 10 0 1×10 3 2×10 3 3×10 3 4×10 3 5×10 3 Days post editing Tetherin MFI untouched mock Exon Cas9 Intron ZFN Intron Cas9 Exon ZFN *** *** ns ns a b c d e f g xxxi tron junction; all are contained within intron. g) Tetherin protein expression, displayed as the mean fluorescence intensity (MFI), is shown for primary CD4+ T cells treated with the indicted targeted nuclease compared to control untouched and mock electroporated samples. Nucleases targing the exon (Cas9 #4 - Exon Cas9, ZFN 1189 - Exon ZFN) reduced tetherin cell surface expression signficantly more than intron targeted nucleases (Cas9 #6 - Intron Cas9, ZFN 1383 - Intron ZFN) (***p-value=9.3x10 -8 , n=2, Student’s t test). There was no difference in tetherin MFI between con- trol samples and intron targeted nucleases. xxxii 0 25 50 75 100 0 25 50 75 100 % ΔGI input % ΔGI output Supplementary Figure 10, related to Methods. Digital droplet PCR assay design and valida- tion for the quantification of ΔGI HDR frequency. a) ddPCR assay design. Black line indicates in-out PCR strategy, so as not to amplify the homology template, around the modification site of interest. A reference probe with ‘FAM’ flourophore and quencher is targeted to the upstream region of tetherin and is used as a basline to quantify the proportion of amplicons with tetherin. A WT probe lacking a flourophore is targeted to the WT allele at the position of GI. A ΔGI probe with ‘HEX’ flourophore and quencher is targeted to the ΔGI site. Upon PCR amplification, the probes bound to the DNA are digested, separating the quencher from the flourophore, providing a quantifiable signal of ΔGI frequency in the population. b) Representative ddPCR output. Green populations indicate droplets positive for the reference probe. Orange populations indicate drop- lets double positive for the reference and ΔGI probe. Black populations indicate droplets negative for the reference probe. c) Validation of ddPCR assay design. Gene blocks encoding ΔGI and/or WT tetherin were dosed with indicated ΔGI input frequency. The ddPCR output strongly correlated with the known ΔGI input frequency. a b c xxxiii long short long short long short 0 20 40 60 80 100 Miseq Indel frequency AAV6 only ZFN CRISPR/Cas9 ns ns ns long short ddPCR long short ddPCR long short ddPCR 0 20 40 60 Miseq ΔGI frequency AAV6 only ZFN CRISPR/Cas9 ns ns ns Supplemental Figure 11, related to Methods. Validation of non-traditional, >200bp “long” Miseq assay. a) Approximate cut sites for ZFN 1189 and Cas #4 are shown relative to the intend- ed tetherin edit sites (red). Arrows and lines indicate the approximate primer and PCR amplicon positions for either the traditional ‘short’ Miseq, or the ‘long’ miseq. The long Miseq strategy was developed for the quantification of linkage between indels induced by intron-targeted nucleases and ΔGI/T45I edits within the exon. b) Total indel frequency quantified by targeted deep sequenc - ing of AAV only, ZFN 1189 only, or Cas#4 RNP only treated HSCs as determined by the distinct Miseq amplification strategies (short vs long). c) ΔGI HDR efficiency quantified in AAV only, ZF - N1189+AAV, or Cas#4RNP+AAV treated HSCs as determined by the distinct Miseq amplification strategies, as well as by ddPCR. a b c xxxiv Supplementary Table 1. Panel of gRNA. # Gene Site gRNA sequence ssODN template sequence 1 TRAC E1 GCTGGTACACGGCAGGGTCA AACCCTGATCCTCTTGTCCCACAGATATCCAGAACCCTGActcgagCCCTGCCGTGTACCAGCTGAGAGACTCTAAATCCAGTGAC X 2 IL2RG E1 TTCAGCCCCACTCCCAGCAG ATTACCATTCACATCCCTCTTATTCCTGCAGCTGCCCCTGctcgagCTGGGAGTGGGGCTGAACACGACAATTCTGACGCCCAATG X X 3 HBB E1 CTTGCCCCACAGGGCAGTAA ACCATGGTGCATCTGACTCCTGAGGAGAAGTCTGCCGTTActcgagCTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGG X 4 IL2RG E1 AGGGATGTGAATGGTAATGA ACCCAGGGAATGAAGAGCAAGCGCCATGTTGAAGCCATCActcgagTTACCATTCACATCCCTCTTATTCCTGCAGCTGCCCCTGC X X 5 BST2 E1 TCTGCTGGGGATAGGAATTC AAGGGCACCCCCAGAATCACGATGATCAGGAGCACCAGAActcgagTTCCTATCCCCAGCAGAAGCTTACAGCGCTTATCCCCGTC X X 6 IL2RG E1 ATTCCTGCAGCTGCCCCTGC NA 7 HBB E1 GTAACGGCAGACTTCTCCTC GCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGAGgagctcGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACG X X 8 TRAC SA GTCAGGGTTCTGGATATCTG AATGAGATCATGTCCTAACCCTGATCCTCTTGTCCCACAGctcgagATATCCAGAACCCTGACCCTGCCGTGTACCAGCTGAGAGA X 9 HBB 5'UTR AGGGCTGGGCATAAAAGTCA TGTCAGAAGCAAATGTAAGCAATAGATGGCTCTGCCCTGActcgagCTTTTATGCCCAGCCCTGGCTCCTGCCCTCCCTGCTCCTG X 10 TRAC E1 CTCTCAGCTGGTACACGGCA GATCCTCTTGTCCCACAGATATCCAGAACCCTGACCCTGCctcgagCGTGTACCAGCTGAGAGACTCTAAATCCAGTGACAAGTCT X 11 TRAC E1 TCTCTCAGCTGGTACACGGC ATCCTCTTGTCCCACAGATATCCAGAACCCTGACCCTGCCctcgagGTGTACCAGCTGAGAGACTCTAAATCCAGTGACAAGTCTG X 12 HBB I1 TGGTATCAAGGTTACAAGAC NA 13 PPP1R12C I1 ATTCCCAGGGCCGGTTAATG TGGAGGGGACAGATAAAAGTACCCAGAACCAGAGCCACATctcgagTAACCGGCCCTGGGAATATAAGGTGGTCCCAGCTCGGGGA X 14 IL2RG 5'UTR GGCGCTTGCTCTTCATTCCC AGGGACCCAGGTTCCTGACACAGACAGACTACACCCAGGGctcgagAATGAAGAGCAAGCGCCATGTTGAAGCCATCATTACCATT X 15 HBB I1 ACCAATAGAAACTGGGCATG NA 16 HBB E1 GTCTGCCGTTACTGCCCTGT NA 17 TRAC SA TCAGGGTTCTGGATATCTGT AAATGAGATCATGTCCTAACCCTGATCCTCTTGTCCCACActcgagGATATCCAGAACCCTGACCCTGCCGTGTACCAGCTGAGAG X 18 BST2 I1 CCTTTGGATGGCCTAGTACT GATTTCTGGGGAGAGATTCTTGTCCCTCCCACCGCCTAGTctcgagACTAGGCCATCCAAAGGAGTTGAGGAGCTTACCACAGTGT X X 19 PPP1R12C I1 GGGGCCACTAGGGACAGGAT AAGGAGGAGGCCTAAGGATGGGGCTTTTCTGTCACCAATCctcgagCTGTCCCTAGTGGCCCCACTGTGGGGTGGAGGGGACAGAT X 20 PPP1R12C I1 ACAGTGGGGCCACTAGGGAC GGAGGCCTAAGGATGGGGCTTTTCTGTCACCAATCCTGTCctcgagCCTAGTGGCCCCACTGTGGGGTGGAGGGGACAGATAAAAG X 21 HBB 5'UTR CAGGGCTGGGCATAAAAGTC NA 22 IL2RG E1 GGGCAGCTGCAGGAATAAGA AGCGCCATGTTGAAGCCATCATTACCATTCACATCCCTCTctcgagTATTCCTGCAGCTGCCCCTGCTGGGAGTGGGGCTGAACAC X 23 TRIM5α I7 AAAATTAACAGGTCAGCTAA TTTCTCTCATATCACAAATAAAGTTACATTATATCCCTTActcgagGCTGACCTGTTAATTTTTCTACAGTTGATGTGACAGTGGC X X 24 TRAC SA CTGGATATCTGTGGGACAAG AAAGAGGGAAATGAGATCATGTCCTAACCCTGATCCTCTTctcgagGTCCCACAGATATCCAGAACCCTGACCCTGCCGTGTACCA X 25 HBB E1 AGTCTGCCGTTACTGCCCTG CCTCACCACCAACTTCATCCACGTTCACCTTGCCCCACAGctcgagGGCAGTAACGGCAGACTTCTCCTCAGGAGTCAGATGCACC X 26 TRIM5α I7 CTTGTTTAGAAGTTATAAGG TATGATTCTCCTTCCAAGACACACATAACTTACCCCTCCTctcgagTATAACTTCTAAACAAGGTTCCTCCCAGTTTTCTCTCAAG X 27 PPP1R12C I1 TATAAGGTGGTCCCAGCTCG GACAGCATGTTTGCTGCCTCCAGGGATCCTGTGTCCCCGActcgagGCTGGGACCACCTTATATTCCCAGGGCCGGTTAATGTGGC X 28 BST2 E1 GGCCTCGCTGTTGGCCTTGA ATCGTGATTCTGGGGGTGCCCTTGATTATCTTCACCATCActcgagAGGCCAACAGCGAGGCCTGCCGGGACGGCCTTCGGGCAGT X X 29 PPP1R12C I1 TGTCCCTAGTGGCCCCACTG CTGGTTCTGGGTACTTTTATCTGTCCCCTCCACCCCACAGctcgagTGGGGCCACTAGGGACAGGATTGGTGACAGAAAAGCCCCA X 30 PPP1R12C I1 GTCACCAATCCTGTCCCTAG TACTTTTATCTGTCCCCTCCACCCCACAGTGGGGCCACTActcgagGGGACAGGATTGGTGACAGAAAAGCCCCATCCTTAGGCCT X X 31 TRIM5α I7 AGACTTGAGAGAAAACTGGG AACTTACCCCTCCTTATAACTTCTAAACAAGGTTCCTCCCctcgagAGTTTTCTCTCAAGTCTTTATCAAGATTTCTCTCATATCA X 32 IL2RG E1 TGTTCAGCCCCACTCCCAGC TACCATTCACATCCCTCTTATTCCTGCAGCTGCCCCTGCTctcgagGGGAGTGGGGCTGAACACGACAATTCTGACGCCCAATGGG X 33 BST2 E1 TGTAAGCTTCTGCTGGGGAT CCCCAGAATCACGATGATCAGGAGCACCAGAATTCCTATCctcgagCCCAGCAGAAGCTTACAGCGCTTATCCCCGTCTTCCATGG X X 34 IL2RG E1 TGGTAATGATGGCTTCAACA AGACAGACTACACCCAGGGAATGAAGAGCAAGCGCCATGTctcgagTGAAGCCATCATTACCATTCACATCCCTCTTATTCCTGCA X X 35 IL2RG 5'UTR GCGCTTGCTCTTCATTCCCT GAGGGACCCAGGTTCCTGACACAGACAGACTACACCCAGGctcgagGAATGAAGAGCAAGCGCCATGTTGAAGCCATCATTACCAT X 36 IL2RG E1 CGACAATTCTGACGCCCAAT TCCCAGATTTCCCACCAGCTGTGGTGTCTTCATTCCCATTctcgagGGGCGTCAGAATTGTCGTGTTCAGCCCCACTCCCAGCAGG X CD4+T HSPC xxxv # Gene Site gRNA sequence ssODN template sequence 37 PPP1R12C I1 ACCCCACAGTGGGGCCACTA CCTAAGGATGGGGCTTTTCTGTCACCAATCCTGTCCCTAGctcgagTGGCCCCACTGTGGGGTGGAGGGGACAGATAAAAGTACCC X 38 PPP1R12C I1 TGGGGGTTAGACCCAATATC AAGCCCCATCCTTAGGCCTCCTCCTTCCTAGTCTCCTGATctcgagATTGGGTCTAACCCCCACCTCCTGTTAGGCAGATTCCTTA X 39 TRIM5α E8 CAGATAATATATGGGGCACG ATAATTGAAATTCACAAATGTCTGGTATCTTGTCCCTCGTctcgagGCCCCATATATTATCTGTGGTTTCGGAGAGCTCACTTGTC X 40 TRIM5α E8 AGATAATATATGGGGCACGA AATAATTGAAATTCACAAATGTCTGGTATCTTGTCCCTCGctcgagTGCCCCATATATTATCTGTGGTTTCGGAGAGCTCACTTGT X X 41 HBB I1 AAGGTTACAAGACAGGTTTA NA 42 IL2RG 5'UTR ACACAGACAGACTACACCCA AATGATGGCTTCAACATGGCGCTTGCTCTTCATTCCCTGGctcgagGTGTAGTCTGTCTGTGTCAGGAACCTGGGTCCCTCACCCA X 43 IL2RG 5'UTR GGGTGTAGTCTGTCTGTGTC GGTGGGGAGGGGTAGTGGGTGAGGGACCCAGGTTCCTGACctcgagACAGACAGACTACACCCAGGGAATGAAGAGCAAGCGCCAT X 44 TRIM5α E8 GAAACCACAGATAATATATG AAATTCACAAATGTCTGGTATCTTGTCCCTCGTGCCCCATctcgagATATTATCTGTGGTTTCGGAGAGCTCACTTGTCTCTTATC X 45 HBB I1 TAAGGAGACCAATAGAAACT NA 46 PPP1R12C I1 TAACCGGCCCTGGGAATATA CCTCCAGGGATCCTGTGTCCCCGAGCTGGGACCACCTTATctcgagATTCCCAGGGCCGGTTAATGTGGCTCTGGTTCTGGGTACT X 47 TRAC E1 AGAGTCTCTCAGCTGGTACA TCTTGTCCCACAGATATCCAGAACCCTGACCCTGCCGTGTctcgagACCAGCTGAGAGACTCTAAATCCAGTGACAAGTCTGTCTG X 48 PPP1R12C I1 GTCCCCTCCACCCCACAGTG GGGGCTTTTCTGTCACCAATCCTGTCCCTAGTGGCCCCACctcgagTGTGGGGTGGAGGGGACAGATAAAAGTACCCAGAACCAGA X X 49 IL2RG 5'UTR GACACAGACAGACTACACCC ATGATGGCTTCAACATGGCGCTTGCTCTTCATTCCCTGGGctcgagTGTAGTCTGTCTGTGTCAGGAACCTGGGTCCCTCACCCAC X X 50 IL2RG E1 AGGAATAAGAGGGATGTGAA ATGAAGAGCAAGCGCCATGTTGAAGCCATCATTACCATTCctcgagACATCCCTCTTATTCCTGCAGCTGCCCCTGCTGGGAGTGG X 51 HBB E1 TCTGCCGTTACTGCCCTGTG GGCCTCACCACCAACTTCATCCACGTTCACCTTGCCCCACctcgagAGGGCAGTAACGGCAGACTTCTCCTCAGGAGTCAGATGCA X X NA HBB E1 CGTTACTGCCCTGTGGGGCA NA NA HBB E1 TGAAGTTGGTGGTGAGGCCC NA NA HBB I1 TCCACATGCCCAGTTTCTAT NA NA HBB I1 TTAAGGAGACCAATAGAAAC NA NA IL2RG E1 AGCTGCCCCTGCTGGGAGTG NA NA TRAC E1 ACACGGCAGGGTCAGGGTTC NA NA TRAC E1 AGCTGGTACACGGCAGGGTC NA Supplementary Table 1 continued. gRNAs used in this study ranked from 1-51 based on relative MMEJ frequency in K562 cells (Fig. 1d). gRNAs that failed to induce at least 20% total indels are unranked. Sites in genes are identified as exon (E), intron (I), splice acceptor (SA), or 5’ untranslated region (5’UTR). ssODN homology templates were designed with 40bp right and left homology (80bp total) to insert a XhoI 6bp se- quence (ctcgag, indicated in lower case) between nucleotides -3/-4 relative to the PAM motif. The subset of gRNAs also tested in CD4+ T cells or HSPC are indicated. The CD4+ T cell subset of 44 gRNAs were also used to query the prediction programs (Fig. 5). CD4+T HSPC xxxvi Electroporation parameters Cell type # cells electro- porator code volume ( μL) voltage, time cuvette Cas9 gRNA Cas9 (pmol) gRNA (pmol) ssODN (pmol) non-target ssODN used K562 2x10 5 Amaxa FF120 20 strip mRNA sg 1.9 31 100 no K562 2x10 5 Amaxa FF120 20 strip Macro cr:tr 60 120 100 no CD4+T 2x10 5 Amaxa EO115 20 strip Macro cr:tr 50 100 100 yes HSPC 2x10 5 BTX 50 125V, 5ms 1mm Thermo sg 47 77.2 375 yes HSPC 2x10 5 Amaxa DZ100 20 strip Thermo sg 18.8 30.9 variable yes Supplementary Table 2. Cell type specific gene editing protocols . xxxvii Supplementary Table 3. Primers used for amplification and sequencing. Gene Forward PCR primer Reverse PCR primer Sequencing primer PPP1R12C TCAGTGAAACGCACCAGACG GACAACCCCAAAGTACCCCG ATCCTCTCTGGCTCCATCGTA TRAC AGGTTTCCTTGAGTGGCAGG GACTGCCAGAACAAGGCTCA CTGAAATCATGGCCTCTTGGC HBB AACCGAGGTAGAGTTTTCATCCA GAAAGAAAACATCAAGCGTCCCA ACGCAGGAAGAGATCCATCTACA IL2RG AGGATCTAGGTGGGCTGAGG GCCCTTCCCACTCCACTTTT TACCACCTTACAGCAGCACC TRIM5α AATTTCAGTGCTGACTCCTTTGTTTGTATT ACCCAGTAGCCGTATTTAGGTTGATAATTT TTATCATAAGCCACCCTGCGG BST2 TATTGGTCAGGACGTTTCCTATGCTAATAA AGAGTTTTAACACCCTGCCCCTACAC GGTGGCCCGTAGAAGATTCC xxxviii ddPCR oligo sequences (5’-3’) Forward primer GAGACCGGACCAACAGTG Reverse primer TTGCAGGAGATGGGTGACA WT probe CTTCTGCTGGGGATAGGAATTCTGG’Phosphate Ref probe HEX’CGGCCTAGGCACTCAGTAACAC’IowaBlack ΔGI probe FAM’CTTCTGCTGGGGATACTGGTG’IowaBlack Supplementary Table 4. Primers used for genetic analysis by ICE, ddPCR, and NGS. NGS oligo sequences (5’-3’) “short” For. GCGCTGTAAGCTTCTGCTGG “short” Rev. GTCAGCTCTTGTTGCAGGAGATG “long” For. CAGAGTGCCCATGGAAGACG “long” Rev. CCCACCCTGGGTCAGATTTC ICE/Cel1 oligo sequences (5’-3’) Forward primer TATTGGTCAGGACGTTTCCTATGCTAATAA Reverse primer AGAGTTTTAACACCCTGCCCCTACAC Sequencing primer (ICE only) GGTGGCCCGTAGAAGATTCC 1 Part I. Rational selection of CRISPR/Cas9 guide RNAs for homology directed genome editing Kristina J. Tatiossian, Robert D.E. Clark, Chun Huang, and Paula M. Cannon 2 Homology-directed repair (HDR) of a DNA break allows copying of genetic materi- al from an exogenous DNA template and is frequently exploited in CRISPR/Cas9 genome editing. However, HDR is in competition with other DNA repair pathways, including NHEJ and MMEJ, and the efficiency of HDR outcomes is not predictable. Consequently, to opti - mize HDR editing, panels of CRISPR/Cas9 guide RNAs (gRNA) and matched homology templates must be evaluated. We report here that CRISPR/Cas9 indel signatures can in- stead be used to identify guide RNA (gRNAs) that maximize HDR outcomes. Specifically, we show that the frequency of indels derived by MMEJ repair, characterized as deletions greater than or equal to 3bp, better predicts HDR frequency than consideration of total indel frequency. We further demonstrate that tools that predict gRNA indel signatures can be repurposed to identify gRNAs to promote HDR. Finally, by comparing indels generat- ed by S. aureus and S. pyogenes Cas9 targeted to the same site, we add to the growing body of data that the targeted DNA sequence is a major factor governing genome editing outcomes. Abstract 3 Targeted nucleases such as CRISPR/Cas, ZFNs and TALENs create se- quence-specific DNA double-strand breaks (DSB), whose repair by homology-directed repair (HDR) pathways can be exploited to introduce genetic modifications 1 . In eukary- otic cells, the sister chromatid normally provides the template for HDR repair of a DSB. However, this process can be usurped by the delivery of exogenous DNA templates that encode a desired edit within sequences that are homologous to and adjacent to the DSB site. In this way, HDR can be exploited to make small genetic edits close to the DSB 2,3 , or promote the site-specific insertion of larger gene cassettes 4,5 . Achieving efficient HDR-mediated genome editing in mammalian cells is often challenging. HDR is restricted mainly to the S/G2 phases of the cell cycle 6,7 , and is in competition with two other major DNA repair pathways: canonical non-homologous end joining (NHEJ) and microhomology-mediated end joining (MMEJ) 8–10 . Unlike HDR, NHEJ and MMEJ do not use a homology template and can result in heterogeneous outcomes, with insertions and deletions (indels) of nucleotides at the site of the DSB. Consequently, these template-independent repair pathways can be effectively exploited for gene disrup- tion 11–13 . However, the precise changes possible with HDR-mediated editing provide many more applications. As a result, several methods have been described to enhance rates of HDR editing while reducing the frequency of NHEJ/MMEJ outcomes, including strategies to overcome the cell cycle restrictions of HDR, or to perturb key components of the NHEJ, MMEJ or HDR pathways 14–21 . Selection of the most suitable genome editing tools for a specific HDR application follows a series of well-established considerations. Due to limitations imposed by the processivity of HDR in mammalian cells, the nuclease target site is typically chosen to be within 10-50bp of the desired editing site 22–26 . Depending on the nature of the nucle- ase being used, there may also be specific sequence considerations, for example the requirement for an adjacent PAM sequence when using CRISPR/Cas reagents 27 . More Introduction 4 generally, the likelihood of off-target DSB creation due to homology with other sequences in the genome is also considered, and less specific nucleases are rejected. Finally, for the homology template component, the most appropriate DNA format, such as oligonu- cleotides, plasmids or viral vectors, is selected based on consideration of the target cell and the desired edit. Even when these criteria are applied, there will often be more than one possible nuclease target site available for a given HDR application. For CRISPR/Cas9, the ease of synthesizing different guide RNAs (gRNAs) means that multiple different target sites can be readily evaluated. Most typically, the gRNAs are ranked based on their ‘cutting ef- ficiency’, which is a measure of the total indels produced by the nuclease in the absence of a homology template 22,28 . This common selection criterion uses the straightforward concept that the generation of more DSBs will lead to more opportunities for HDR. How- ever, observed indel frequencies do not accurately reflect the true extent of DSB gener - ation, since DSBs can also be perfectly repaired by NHEJ 29,30 , and recutting events can continue until a terminal ‘scar’ is created that disrupts the target sequence. additionally, even when a nuclease is selected that is highly effective at producing indels, there is no guarantee that it will also support efficient HDR 26,31 . Therefore, the empirical evaluation of combinations of nucleases and matched homology templates is currently necessary to identify the most effective regents to support HDR editing. Recently, several studies have sought to better understand the indel outcomes resulting from the repair of CRISPR/Cas9 DSBs. They have reported that individual gR- NAs produce characteristic and nonrandom indel outcomes that are conserved across different cell lines, reagent delivery methods, and even when the target sequence is pres- ent in different genomic locations 32–37 . Inhibition or knockdown of essential NHEJ/MMEJ components has revealed that these non-random indel patterns at a given target site can be the result of repair by both NHEJ and MMEJ 30,36–39 . These repair pathways are mutually exclusive and mechanistically distinct, and produce indels of various sizes; in general, 5 small indels result from NHEJ repair, while larger deletions are caused by MMEJ 8,9,19 . This predictability of CRISPR/Cas9 indel patterns has made possible strategies to cor- rect a limited set of disease-associated mutations based on the corrective outcome of a dominant indel 36,40–43 . In part to assist in such approaches, machine learning models have been developed as tools to predict Cas9 indel outcomes based solely on the target se- quence 32,34–36 . Of the three major DSB repair pathways, MMEJ and HDR share a number of fea- tures that are distinct from NHEJ. Both pathways are active at similar stages of the cell cy- cle 6,44 , are kinetically slower processes than NHEJ 9,30,45–47 , and require Mre11-dependent DSB end-resection44. As a result, we hypothesized that the frequency of MMEJ repair at a given CRISPR/Cas9 target site may better predict HDR efficiency than consideration of the total indel frequency. Here we report an analysis of 51 CRISPR/Cas9 target sites, which were characterized for both their indel patterns and the ability to promote HDR edit- ing in the presence of an oligonucleotide homology template. The results reveal that indel patterns, and specifically the MMEJ indel subset, can be used to select gRNAs for optimal HDR editing. We also show that indel prediction programs can be adapted to predict such MMEJ dominant/HDR efficient target sites. In this way we have identified guidelines that can be used to optimize HDR editing. DSBs at Cas9 target sites exhibit preferences for NHEJ or MMEJ repair. Treatment of cells with a specific Cas9-gRNA complex results in a characteristic and nonrandom set of indels 30,32–38 . Inhibiting or knocking-out components of NHEJ and/or MMEJ repair pathways revealed that both can contribute to the repair of DSBs and that, in general, small indels result from NHEJ repair, while larger deletions result from MMEJ repair (Fig. 1a) 36–39 . Using a similar approach, we characterized the indels formed by a panel of 51 S. pyogenes (Sp) Cas9 gRNAs targeting 6 therapeutically relevant genes, and which includ- Results 6 ed a subset from previously published studies (Supplementary Table 1). In brief, K562 cells were transfected with preassembled complexes of SpCas9 protein and gRNA, and the resulting indels analyzed 4-5 days later by Sanger sequencing and ICE analysis. ICE analysis gives results that are similar to those obtained by deep sequencing, albeit with a cut-off that a sequence must be present in at least 1% of the total population 48 . The pattern of indels observed for each gRNA was characterized based on indel length and position relative to the canonical Cas9 cut-site. We also included the frequencies of each indel for any given gRNA, to produce a comprehensive indel signature (Fig 1b). The indel signatures for all 51 gRNAs were next analyzed in the absence and pres- ence of NU7441, a specific inhibitor of the essential NHEJ component, DNA-PK (Fig. 1b, Supplementary Fig. 1). Indels whose frequency significantly decreased in the presence of NU7441 were defined as NHEJ indels, while indels that did not change, or increased in frequency, were defined as MMEJ indels. This analysis revealed that all insertions plus the smallest deletions of -1bp and -2bp were NHEJ products, while deletions greater than or equal to 3bp were MMEJ products, which is consistent with prior mechanistic studies (Fig. 1c) 9,10 . Using these criteria, we were able to assign every indel observed for all 51 gRNAs in K562 cells as resulting from either NHEJ or MMEJ repair. Next, by adding together the frequencies of all assigned NHEJ or MMEJ indels for a given gRNA, we were able to estimate the contribution from each pathway to DSB repair for that specific gRNA. This analysis revealed that individual gRNAs can produce DSBs with markedly different intrin- sic preferences for NHEJ or MMEJ repair (Fig. 1d). Indel signatures are conserved between hematopoietic cell lines and primary cells. Previous studies have demonstrated that the indel patterns produced by specific gRNAs are conserved across different cell lines 32,36–38 . In more limited analyses, such re- lationships were also suggested to be maintained when comparing between cell lines and primary cells 35,38 . We therefore next investigated whether the indel patterns observed in 7 the K562 cell line were also conserved in primary human hematopoietic cells of therapeu- tic relevance, by analyzing subsets of 44 and 14 gRNAs, respectively, in primary human CD4+T cells and HSPCs (Supplementary Table 1). An example of such a comparison in all three cell types is shown for gRNA #5 (Fig. 2a). Pairwise comparisons of the indel pat- terns for all gRNAs analyzed revealed that matched gRNAs produced significantly more similar indel patterns than different gRNAs, regardless of the cell type (Supplementary Fig. 2). We also took into account the frequencies of distinct indels and the compared indel signatures for matched gRNAs between different cell types. To normalize for differences between cell types that may result from different overall efficiencies of CRISPR/Cas9 ed - iting, we performed these analyses using relative indel frequencies. We have previously confirmed that changing the total indel frequency, for example by optimizing Cas9-gRNA delivery in a cell type, does not alter the relative frequency of the different indels (Supple- mentary Fig. 3). Indel signatures were strongly correlated across the cell types (Fig. 2b). Finally, the indels for each gRNAs tested in CD4+ T cells and HSPCs were ana- lyzed as described in Figure 1d, to visualize the total contribution of NHEJ or MMEJ re- pair at a given DSB. This confirmed that the preferences we observed for repair pathway choice were maintained across the different cell types (Fig. 2c, 2d). Together, these anal- yses further support that the indel signature produced by a gRNA is the result of intrinsic properties of the targeted DNA sequence, and that preferences for MMEJ or NHEJ are conserved between hematopoietic cell lines and primary cells. Indel signatures predict HDR. HDR is distinct from MMEJ and NHEJ because it uses a homology template to repair a DSB (Fig. 3a). However HDR also shares features with MMEJ, such as the re- quirement for resection of nucleotides at the DSB, and it is similarly cell-cycle dependent and kinetically slower than NHEJ 6,9,30,44–47 . We therefore asked whether the indel signature produced in the absence of a homology template, and specifically the indels that resulted 8 from MMEJ repair, could be used to predict the likelihood that a DSB would support HDR editing in the presence of a homology template. To test this, we synthesized matched single-stranded oligonucleotide (ssODN) ho- mology templates for the gRNA, containing 80bp of sequence with homology to the non- PAM DNA strands, and capable of inserting a 6bp (XhoI) cassette between nucleotides -3/-4 relative to the PAM motif (Supplementary Fig. 4, Supplementary Table 1). Such a design allows us to quantify HDR editing rates upon insertion of the cassette and also ensures that edited sequences will not be re-cut. Treated samples were measured for fre- quencies of HDR editing by Sanger sequencing and ICE analysis, with HDR editing rates validated by comparison to RFLP analyses (Supplementary Fig. 4). Rates of HDR editing for each gRNA plus template combination were then compared to both total indel and MMEJ only indel frequencies obtained in the absence of homology templates (Fig. 3b, Supplementary Fig. 4). This revealed that HDR editing rates did not positively correlate with total indels in any of the three cell types tested (K562 r = -0.188, CD4+T r = -0.011, HSPC r = -0.390), but strongly correlated with the MMEJ indels (K562 r = 0.774, CD4+T r = 0.709, HSPC r =0.815). Genome editing in the presence of a template typically produces a mixture of HDR edits and indels. We reasoned that HDR editing should deplete the pool of MMEJ indels more than the NHEJ indels. Calculations of the fold-change of each indel type in the pres- ence versus absence of an ssODN template confirmed this expectation (Fig. 3c, Supple - mentary Fig. 4). Additionally, titration experiments with ssODN templates at three different target sites in CD4+T cells and HSPCs revealed that increasing the dose of the ssODN template led to both an increase in HDR rates and a corresponding decrease in MMEJ indels (Fig. 3d). From this, we conclude that the MMEJ indel frequency can be used to predict the maximum HDR editing potential for a given gRNA. Moreover, if this expected rate of HDR is not observed, and MMEJ indels persist as an outcome, our model implies an inefficiency in the editing system that could be further optimized, such as increasing 9 the concentration of the homology template. Finally, a subset of 27 gRNAs were treated with NU7441, in the presence or ab- sence of an ssODN template. As previously noted (Fig. 1c), in the absence of the ssODN, a clear increase in the frequency of MMEJ indels and a decrease in the frequency of NHEJ indels was observed (Fig. 3e). In the presence of a matched template, this increase in MMEJ indels also translated to improved rates of HDR (Fig. 3e). Practically, this sug- gests that in cases where a gRNA is NHEJ-dominant, better rates of HDR editing can be achieved through the use of such DNA repair pathway modulators 19 and that modulators can be tested to maximize MMEJ prior to developing a homology template. In summary, we propose that the frequency of MMEJ indels produced by a gRNA is a simple and effective way to predict HDR editing outcomes and is a better strategy for selecting optimal gRNAs for HDR editing applications than an approach based on total indel frequencies. Comparison of S. aureus and S. pyogenes Cas9 targeted to the same site. Previous work has concluded that indel patterns are an intrinsic feature of the gRNA and targeted DNA sequence32–38. However, these studies have mostly been per- formed using SpCas9. We therefore next analyzed whether such observations also held true when DSBs were created at the same site using a different Cas9 protein. S. aureus (Sa) Cas9 has a different PAM sequence than SpCas9, but the same purported cut site relative to the PAM49. In our panel of 51 target sites we found 5 that had overlapping Sp and Sa Cas9 PAM sequences (Fig. 4a). Appropriate SaCas9 gRNAs were synthesized for these 5 sites and co-delivered with SaCas9 mRNA to K562 cells. The resulting indel patterns were analyzed as previously described and compared to those obtained with the equivalent SpCas9 complexes, as shown (Fig. 2b). The similarity of the indel patterns produced by every combination of Sp and Sa gRNAs was calcu- lated and is shown as a heatmap representing the Jaccard coefficients derived from all pairwise comparisons (Fig. 4c). This revealed that indel patterns from Sp and Sa Cas9 10 gRNAs were significantly more similar than indel patterns between different gRNAs. This provides further evidence, albeit with a limited set, that indel signatures are an intrinsic feature of the targeted DNA sequence. We also queried whether the resulting SaCas9 indels could be defined as MMEJ or NHEJ indels, and whether these definitions were informative for predicting HDR gene editing outcomes. Using NU7441 treatment and the same approach as with SpCas9, (Fig. 1) we found that different SaCas9-gRNAs produced distinct preferences for MMEJ or NHEJ indels (Supplementary Fig. 5). Finally, when each SaCas9 gRNA was paired with an ssODN homology template and analyzed for editing outcomes, we observed that MMEJ indels were depleted in the presence of the ssODN, and that the frequency of MMEJ indels roughly predicted the HDR efficiency (Supplementary Fig. 5). In this way the rules we established for SpCas9 gRNAs also appear to hold true for SaCas9 reagents. Predicting indel signatures and HDR in silico. Several algorithms have been developed to predict Cas9 indel signatures based only on the gRNA target sequence 32,34,36 . These studies profiled thousands of Cas9 target sites and determined, in corroboration with other work 33,37,38 , that the major determinants of the indel signature were the gRNA target sequence and regions of microhomology near the Cas9 cut site. We reasoned that these tools might be adapted to also predict the relative preference for NHEJ or MMEJ repair for a given gRNA, and thereby be used to identify MMEJ-dominant gRNAs for efficient HDR editing. Three tools developed from these studies were investigated: inDelphi 36 , FORE- CasT 32 , and Lindel 34 . Each tool was used to predict the indel signatures of the 44 gRNAs we had characterized in both K562 cells and CD4+T cells. Although each program pro- vided hundreds of predicted indels, in order to match the sensitivity of our ICE analyses (~1%), we considered only the top 10 predicted indels for each gRNA. These top 10 fre- quencies were combined and normalized, to produce predicted indel signatures for each gRNA by each of the tools. 11 In order to compare experimental observations against the 3 different predictors, we first binned all indels based on size (>+1bp, +1, -1, -2 and ≥3bp deletion) and plotted predicted versus measured relative indel frequencies for all gRNAs (Fig. 5a). Pearson correlations were determined between predicted and measured values for each tool, and in a cell-type specific manner (Fig. 5b). Lindel predicted gRNA indel signatures performed marginally better than FORECasT and significantly better than inDelphi. Next, because of our observation that MMEJ indel frequencies best predict the efficiency of HDR editing, we used each tool to predict relative MMEJ indel frequencies for each gRNA. As previously described, this was based on the definition of MMEJ repair outcomes being deletions ≥3bp. We compared predicted vs measured relative MMEJ frequencies for each gRNA in all 3 cell types examined (Fig. 5c). By defining a gRNA as MMEJ-dominant if >50% of its indels were ≥3bp deletions, we found that Lindel correctly predicted the repair preference of 34 of the 44 gRNAs (77%), with FORECasT predicting 29/44 (66%) and inDelphi only 23/44 (52%) (Fig. 5c). The predictions by inDelphi, and to a lesser extent, FORECasT was less accurate due to an overestimation of the frequency of MMEJ-derived deletions. Because of its success in predicting MMEJ-dominant gRNAs, we next queried whether Lindel could be used to identify the best gRNA for HDR editing. Our panel of gRNAs are targeted to 6 different loci and we asked whether Lindel accurately predicted the gRNA with the highest measured HDR editing activity for each locus. Figure 5d shows the observed NHEJ, MMEJ and HDR editing frequencies for each gRNA in K562 cells, grouped by locus and ordered based on the efficiency of HDR editing. Overlaid numbers indicate the rank of MMEJ indel dominance, as predicted by Lindel or as measured in K562 cells, where a rank of 1 indicates the gRNA that possessed the highest relative MMEJ frequency within that locus group. Using the measured MMEJ indel rankings, the best gRNAs for HDR editing were correctly predicted for 5 out of the 6 loci, with the lack of success at the TRAC locus likely resulting from the similar rates of MMEJ for several 12 of its gRNAs, such that ranks were based on only small differences. Using the Lindel-pre- dicted indel rankings, the best or second best gRNAs for HDR editing were also correctly predicted at 5 of the 6 loci, again failing at the TRAC locus. In comparison, for FORECasT and inDelphi, the best or second best predicted gRNAs were only predicted for 3 or 2 out of 6 loci, respectively (data not shown). Of note, it has also been reported that target sites with a guanine in the -4 position relative to the PAM sequence are more likely to produce deletions 33,36,37 . We therefore asked whether this simple metric could be used to predict MMEJ-dominant target sites and thereby identify gRNAs supporting higher rates of HDR editing. However, for the 5 loci where at least one gRNA had a G at the -4 position (Figure 5d, indicated on top of the columns), the metric only predicted the best or second best HDR gRNAs for 2 of the 5 loci. In sum, we found that Lindel outperformed both FORECasT and inDelphi in its ability to predict gRNA indel signatures and relative MMEJ repair frequencies. Lindel’s su- perior ability also made it capable of predicting the relative potential of gRNAs to support HDR editing. Therefore, Lindel can assist in identifying gRNAs with predicted MMEJ-dom- inant repair profiles that better support HDR editing. Discussion Homology-directed repair of a DSB is the desired repair pathway in many ap- plications of genome editing because it allows for the introduction of precise modifica - tions into DNA. However, HDR is in competition with the non-template repair pathways NHEJ and MMEJ, which reduces the efficiency of HDR editing outcomes. Significant effort has been made to enhance HDR in mammalian cells, for example by synchronizing cell cultures14,15, perturbing DNA repair pathways19, or modifying homology template features 21,25,26,50–55 . However, few rules exist to govern the selection of the DSB target site, where CRISPR gRNAs that produce high total indel frequencies (high “cutting efficiency”) 13 are generally assumed to be the best choice to also maximize HDR 22,28 . In this study, we demonstrate instead that characteristics of the gRNA and the DSB it produces impact HDR efficiency. Specifically, we found that the pattern of indels produced at a gRNA target site, and in particular the frequency of indels resulting from MMEJ repair, can be used to predict the potential for HDR editing outcomes. Our study analyzed a panel of SpCas9 gRNAs and characterized the pattern of in- dels they produced in both a cell line and primary CD4+ T cells and HSPCs. Chemical in- hibitor studies allowed us to assign individual indels as MMEJ or NHEJ repair outcomes. This analysis identified deletions greater than or equal to 3bp as resulting from MMEJ repair. We found that specific gRNA target sites gave characteristic patterns, reflecting different inherent preferences for either MMEJ or NHEJ repair. We extended previous observations 32,36–38 , made mostly in cell lines, to show that the characteristic indel pattern produced by a specific SpCas9 gRNA is conserved between immortalized and primary hematopoietic cells. Finally, we found that gRNAs with MMEJ-dominant indel signatures resulted in more efficient HDR editing outcomes, which we attribute to the similar initial steps of MMEJ and HDR. For the smaller set of SaCas9 gRNAs we tested, these patterns also held true. Moreover, for a small panel of coincident gRNAs, we were able to target both Sp and SaCas9 proteins to the same genomic site. In so doing we demonstrated that the indel patterns produced were conserved across the two different Cas9 variants. Why are the indel patterns at some target sites MMEJ dominant events, while others are NHEJ dominant? We hypothesize that it is a consequence of the nature of the DSB so created, and in particular whether this is a blunt or staggered DSB. Although SpCas9 was initially thought to produce only blunt cuts56, it is now known to also induce staggered cuts with 1, 2, or even 3bp 5’ overhangs, likely due to flexible cleavage by the RuvC domain of the non-gRNA binding DNA strand 57–61 . We hypothesize that different gRNAs results in different types of DSBs, and that differences in the repair of blunt or 14 staggered DSBs by different pathways can lead to different outcomes, which ultimately determine indel patterns and preference for HDR. DNA repair of a DSB by NHEJ can result in scarless or perfect repair, which pre- serves the original target sequence and so allows additional rounds of cutting. Alterna- tively, NHEJ can create indels which destroy the gRNA target sequence and therefore be- come a terminal event. The evidence supports a model in which NHEJ-mediated repair of a Cas9 induced overhang of 1bp could involve a fill-in ligation event, thus creating a 1bp duplication 32–35,37,38,59,60 . Alternatively, overhang/flap cleavage also occurs in NHEJ to trim back overhangs62 and this could produce a -1bp deletion. In this way, terminal NHEJ repair outcomes for a staggered DSB include the 1bp insertions or deletions that are commonly observed with Sp or SaCas9. For other nucleases that generate longer stag- gered cuts, the observed indels also support a model in which the overhanging bases are used by NHEJ as a template for fill-in ligation, thus forming duplicated nucleotides. For example, for ZFNs, a common indel is a duplication of the staggered 4-5bp overhangs generated by FokI cleavage 63 . In contrast, for SpCas9 induced blunt cuts, NHEJ repair can result in either perfect repair, or could lead to the insertion of random nucleotides. However, the insertion of random nucleotides was only observed at frequencies mostly <1% 32–37 . Therefore, NHEJ repair of a blunt cut is likely scarless most of the time and will not lead to a terminal event/ indel. Consequently, only MMEJ outcomes will lead to terminal events and thus, MMEJ will accumulate over many iterative rounds of cutting and perfect repair thus leading to an observed MMEJ dominance at endpoint. In short, we propose that MMEJ and NHEJ dom- inance is a consequence of the specific DSB induced, which is itself likely a consequence of the targeted DNA sequence. In light of our observations that MMEJ can predict HDR, we determined whether indel-predicting programs, built by analyzing the indel outcomes at thousands of CRIS- PR/Cas9 target sites, could be adapted to predict HDR. Lindel and FORECasT predicted 15 the relative indel frequencies, and by extension the relative MMEJ frequencies, in K562 for our panel of gRNA better than inDelphi. This translated to a superior ability to predict HDR. In particular, Lindel was the best program and detected the top gRNA for HDR in 5 out of 6 loci tested with either the first or second ranked gRNA. Lindel, and to a lesser extent FORECasT are successful at predicting indel sig- natures likely because these models predict the relative ratio of insertions to deletions using the sequence directly proximal to the cut site, which we believe is pivotal in deter- mining whether blunt or staggered breaks dominate. Therefore, Lindel and FORECasT are based on a mechanism similar to the one described above. In contrast, the inDelphi model uses the strength of the microhomology features nearby the cut site to determine the ratio of insertions to deletions which implies sites with strong microhomology will un- dergo MMEJ more favorably. However, we believe that if the DSB end structure favors mostly staggered cuts and therefore, NHEJ terminal outcomes, there will be no opportu- nity for end resection to expose the ‘strong’ microhomology sites. In our dataset, inDelphi significantly overestimated the frequency of deletions ≥3bp; despite sites having strong microhomology in regions, we did not observe them to be used based on the repair prod- ucts. Thus, the inDelphi model may fail to recapitulate the biological nature and as such, leads to an overestimation of MMEJ-dominant gRNA. Additionally, there are other key differences between the prediction programs which are well summarized in Chen et al. (2019), the publication for Lindel. Furthermore, in line with our work, Chen et al. (2019) compared the prediction capabilities of Lindel to FORECasT and inDelphi, but unlike our work, they show little dif- ference between the tools in their capacity to predict indels. This is likely due to the fact that, in order to compare the tools performance side-by-side, the authors elected to limit the Lindel and FORECasT predictive capacity, which normally factors >200 potential indel classes, to match that of inDelphi, which roughly includes only ~60 indel classes. Thus, our study is the first to not only compare these tools in their full capacity to one another, 16 as well as to an independent dataset, but also proposes a new function - predicting HDR. In conclusion, we propose that the MMEJ tendency of a gRNA target site is an important feature to consider when developing and optimizing an HDR strategy for max- imum efficiency. Knowledge of a gRNA’s indel signature can assist in the strategic im - provement to HDR during optimizations and whether to consider additional interventions. Furthermore, Lindel, which is pre-integrated into CRISPOR, can be used to predict the relative propensity of a gRNA target site for HDR (http://crispor.tefor.net/). CRISPOR has additional modules that predict gRNA stability, specificity, and activity making it an ideal one-stop shop for the development of gRNA panels in the pursuit of HDR 64,65 . 17 Part II. Anti-HIV gene therapy: editing restriction factor tetherin in CD4+ T cells and HSPCs Kristina J. Tatiossian, Cathy X. Wang, Andrea L. Gadon, and Paula M. Cannon 18 Abstract Cellular restriction factors potently inhibit cross-species viral transmission but fail to protect against viruses that have evolved with the host. For primate lentiviruses, the antagonism of restriction factors is often by the action of viral accessory proteins that tar- get the restriction factor for degradation. For example, the host factor tetherin is normally countered by HIV-1 Vpu, but in the absence of the viral protein, it physically restricts bud- ding HIV-1 virions at the cell surface, activates NFkB, and possibly improves ADCC-me- diated viral clearing. We are developing a genome editing strategy to create a Vpu-resistant form of human tetherin as a potential anti-HIV gene therapy. We screened a panel of mutations in human tetherin, informed by Vpu-resistant primate orthologs, to identify mutants with the desired phenotype of (a) a normal ability to restrict HIV-1 budding, and (b) Vpu-resis- tance. The most effective mutant, ΔGI/T45I, was previously described by McNatt et al. (2009) and contains a bipartite signal comprising a 6bp deletion of amino acids 27 and 28 (ΔGI), and a single base pair substitution altering amino acid 45 (T45I). Genome editing typically occurs through the combined delivery to a cell of a tar- geted nuclease that introduces a site-specific DNA break, and a matched DNA homology template sequence containing the desired mutations that is used as a repair template. We developed and optimized the following genome editing reagents: tetherin specific ZFN and CRISPR/Cas9 reagents paired with AAV6-based homology templates. Using these tools, we can achieve highly efficient ΔGI/T45I gene editing at tetherin in both CD34+ hematopoietic stem and progenitor cells and CD4+ T cells. Subsequent engraftment of these primary cells in immune deficient mice, followed by HIV-1 challenge, will be used to test whether ΔGI/T45I tetherin can form the basis of an anti-HIV gene therapy in vivo. 19 Introduction The availability of HAART has transformed HIV infection into a chronic condition, but the associated costs, side-effects, and practical challenges of adhering to life-long drug regimens mean that alternative drug-free strategies to control HIV are being consid- ered. These include approaches based on genetically modifying cells to be HIV-resistant, either by directly engineering the CD4 T cells that HIV infects, or by targeting the precur- sor hematopoietic stem and progenitor cells (HSC) that give rise to these cells in vivo (Figure 7). Several recent technical advances are improving both the safety and efficiency with which HSC can be engineered. In particular, there is intense interest in the use of en- gineered nucleases such as zinc finger nucleases (ZFNs), TALENs and CRISPR/Cas9, which provide the capability of precisely engineering specific genes 1 . The nucleases act by creating double-stranded breaks (DSB) at a targeted DNA sequence, whose subse- quent repair is then exploited to achieve one of three outcomes - gene knockout, gene mutation, or the site-specific addition of new genetic material at the locus 1–5 . Our lab has previously used this property of ZFNs to disrupt the CCR5 gene in human HSC at levels that proved sufficient to fully suppress HIV-1 replication in a humanized mouse model 66 . We propose to expand the use of nuclease engineering to develop alternate and complementary approaches. For these applications, the DNA break will be repaired using homology directed repair (HDR) pathways, where information is copied from a homolo- gous template that is also introduced into the cell 2 . We have recently bypassed a signif- icant bottleneck that had limited the use of this technology in HSC and can now achieve HDR-mediated gene editing at levels that exceed 50% in human HSC with Cas9 simply by selecting the gRNA target site with proficient MMEJ repair signatures (Part I of this thesis). This unprecedented capability is allowing us to propose expanding the use of targeted nucleases beyond CCR5 gene knockout, and to develop new approaches to HIV gene therapy based on in situ editing of endogenous human restriction factors. 20 Restriction factors are part of the cell-autonomous innate immune system, whereby cells detect the presence of pathogens and respond by deploying both local and systemic defense measures 67 . During viral infections, these antiviral effectors tend to be induced by interferon and contribute to the so-called ‘antiviral state’ in neighboring cells. The pro- totype human restriction factors uncovered by studies of HIV-1 are TRIM5, APOBEC3G and tetherin/BST-2, and more recently SAMHD1 and MX2 67 . These genes are found to be under intense evolutionary pressure, exhibiting high rates of non-synonymous mutations, species adaptations and target virus specificity 68 . Today’s successful human pathogens such as HIV-1 have, by definition, evolved ways to circumvent the current slate of human factors; however, HIV-1 remains sensitive to the orthologs present in many non-human primates 67,68 . This discrepancy has allowed the identification of point mutations that can be introduced into the human proteins to create forms that now inhibit HIV-1 69,70 . In this study, we aimed to exploit the anti-HIV capabilities of tetherin, and to evaluate gene ther- apy strategies based on engineering the endogenous locus. Tetherin has a variety of anti-viral functions, including its name-sake ability to pre- vent the release of virions at the surface of an infected cell by tethering. It also regulates IFN production from pDCs, plays roles in facilitating ADCC, and has recently been shown to trigger NFkB-dependent pro-inflammatory responses as a consequence of sensing such tethered virions 71–73 . Interestingly, our lab has previously shown that the non-pan- demic group O HIV-1 is deficient in tetherin antagonism, which may account for its lower success as a human pathogen 74 . In the case of group M HIV-1, tetherin is degraded by the Vpu protein (Figure 2a). The interactions between these two proteins have been mapped and substitutions at these residues confer Vpu-resistance, suggesting our ap- proach 75–77 . Specifically, it has been demonstrated that tetherin with a substitution at the 45th amino acid residue (T45I) and a deletion of two amino acids ΔGI (27-28), confers Vpu resistance in 293T ectopic expression assays 69 (Figure 2a). We hypothesize that genetically engineering tetherin in situ to encode the ΔGI/T45I designer modifications can 21 allow us to exploit the anti-HIV capabilities and form the basis of a new gene therapy. Results Screening tetherin variants for tethering activity and resistance to HIV Vpu Initially, we screened a panel of seven tetherin variants, inspired by Vpu-resistant primate orthologs, which were modified at residues within tetherin’s transmembrane do - main that interact with Vpu. In agreement with the work by McNatt et al. 2009, we show that a dual tetherin modification - ΔGI/T45I - outperformed other variants tested (Figure 8b-d). Specifically, ΔGI/T45I significantly restricted HIV virus like particle (VLP) release in the presence of Vpu compared to control, WT tetherin without a significant change in the relative tetherin expression (Figure 8b-d) in 293T cells. Previous crystallographic studies have shown that ΔGI/T45I modification kinks the angle of the tetherin transmembrane domain in a way that disallows Vpu’s degradation/downregulation activity but does not completely abolish the interaction between the partners 76 . Based on these studies, we selected ΔGI/T45I tetherin for further development. Editing ΔGI/T45I tetherin in HSCs Next, we designed and validated a gene editing strategy for modifying the tetherin locus in situ to encode ΔGI/T45I in HSCs. We tested a panel of ZFN and CRISPR/Cas9 gRNA target sites (Supplementary Fig. 6). ZFN 1189 was selected based on total indel efficiency whereas CRISPR/Cas9 gRNA #4 was selected due to its MMEJ dominant indel signature (as described in Part I of this thesis, Supplementary Fig. 6). Editing with ZFNs was optimized by testing mRNA expression backbones in HSCs as well as improving the complementation of the AAV6 donor and the ZFN. Specifically, mutations, informed by in vitro ZFN cleavage assays, were added in the AAV6 homology sequence that inhibited ZFN cutting (Figure 9a, Supplementary Fig. 7c-e). AAV6 templates with +2 additional mu- tations in the ZFN binding site prevented the linkage of ΔGI/T45I edits with indels, likely formed a result of repeated cutting after successful HDR (Supplementary Fig. 7d). Editing 22 with CRISPR/Cas9 was optimized by testing Cas9 delivered as mRNA vs RNP to HSCs. The efficiency with which Cas9 RNP edited was clearly greater than that of Cas9 mRNA or ZFN (mRNA) (Figure 9b-c). Nevertheless, the efficiency of dual ΔGI/T45I editing was reduced for all three nucleases compared to ΔGI or T45I alone. T45I editing, relative to ΔGI, was more challenging for Cas9 gRNA #4 as it targeted T45I 50bp away, while ZFN 1189 was more centrally located (Figure 9a). Next, the health and differentiation capacity of HSC populations treated with gene editing tools was assessed. For all three nuclease strategies, viability measurements were taken 1 day post editing using flow cytometry and the Guava ViaCount reagent. Vi - ability of HSCs was, unsurprisingly, highest for untouched and mock electroporated cells with a slight-moderate decrease in samples treated with targeted nuclease’s +/- AAV6 template (Figure 9d). Additionally, to assess differentiation capacity of edited HSCs, we used the colony forming unit (CFU) assay. By plating newly edited and control HSCs in a semi-solid methylcellulose media, enriched with cytokines to support HSC proliferation and differentiation, we catalogued the differentiation capacity of stem cells within the my- eloid compartment (Figure 9e). Though the total number of CFUs decreased for ZFN and Cas9 (delivered as mRNA) conditions, the relative ratio of the distinguishable cell popula- tions remained unchanged (Figure 9f-g). CFUs derived from Cas9 RNP edited HSCs did not deviate in total number or relative frequency when compared to control HSCs (Figure 9f-g). Anti-HIV capacity of ΔGI/T45I gene edited tetherin in cell lines and primary T cells. We next tested whether viral restriction by ΔGI/T45I tetherin extended to situations where it was being endogenously controlled and expressed in T cell lines and primary cells. Accordingly, we gene edited CD4+ CEM-SS T cell lines using our AAV6 homology template construct with ZFN mRNA or Cas9 RNP (Figure 10a-b) developed as previously described in HSCs. ΔGI/T45I CEM-SS clones were derived by FACS single cell sorting and cell populations resulting from the outgrowth were genetically analyzed for gene ed- 23 iting by Sanger sequencing (Figure 10c). A representative clonal ΔGI/T45I population is shown in Figure 10d. ΔGI/T45I or control, WT CEM-SS clones were infected with HIV by spinoculation, cultured for 4 days to allow for viral replication and production of progeny virions, and then the tetherin cell surface expression and HIV RNA in the supernatant was quantified by flow cytometry and qRT-PCR, respectively. Both Vpu-deficient HIV and HIV with A18H Vpu, a modification that impairs Vpu’s tetherin interaction but maintains other known Vpu functions, were used as controls. Infection of ΔGI/T45I CEM-SS cells with WT HIV resulted in significantly less supernatant HIV than when control, WT CEM-SS cells were used (Figure 10e). Additionally, the magnitude of protection offered by ΔGI/T45I CEM-SS in the presence of WT HIV was akin to WT cells being infected with HIV lacking a Vpu altogether. Thus ΔGI/T45I tetherin restricts viral release of WT HIV, in the presence of Vpu when expressed and controlled from the endogenous locus in T cell lines. The obvious next step was to confirm this phenomenon in primary CD4+ T cells, the major challenge being that clones cannot be made. Thus, we were reliant on achiev- ing highly efficient HDR. Therefore, we took advantage of the NU7441, a chemical inhibi - tor to DNA-PK which is required for NHEJ repair 78 . Upon inhibition of NHEJ, we observed an increase in HDR frequency (Figure 11a-b). To minimize any effect NU7441 may have on HIV infection, we washed NU7441 away 24hrs post editing and then waited 10 days before viral spinoculations (CD4+ T cells were restimulated for 2 days just prior to HIV challenge). As previously described, HIV RNA was quantified in the supernatant by qRT- PCR and the expression of relevant cell surface markers, such as tetherin and CD4, were assayed by flow cytometry 4 days post infection. As expected, tetherin expression, as shown by either the MFI or frequency of expression, was reduced most for the RNP only condition, and partially rescued when an AAV6 homology template was added (Figure 11d-e). We observed that HIV RNA was significantly reduced in conditions treated with Cas9 RNP + AAV6, but, surprisingly, also in conditions treated with the Cas9 RNP only (Figure 11c). The response in Cas9 RNP only samples likely derived from Cas9 gRNA 24 #4’s indel signature, which is composed mainly of deletions -15bps in size (Supplemen- tary Fig. 6e). Cas9 gRNA #4’s indels may form tetherin variants that can also restrict HIV release in the presence of Vpu, though this remains unconfirmed. Mitigating indel outcomes in coding regions by gene editing at-a-distance. Undesirable indel outcomes are nearly always a bi-product of any HDR editing strategy because there is rarely 100% conversion of the DSB to HDR. Often, repair by er- ror prone pathways, which result in the formation of indels, and which compete for repair of the DSB (such as NHEJ or MMEJ), are unavoidable consequences. Therefore, strat- egies that eliminate or skew the formation of indels in favor of precise, HDR outcomes are currently being developed 14–21 . We hypothesized that nuclease’s targeted to intronic, non-coding regions could allow for the modification of genes and while mitigating unin - tended gene knockout outcomes. Accordingly, we tested a panel of ZFN and CRISPR/Cas9 gRNA targeted to Intron 1 of tetherin, roughly positioned 225-300bp away from the ΔGI edit site (Supplementary Fig. 9a). Though no particular ZFN had enhanced activity, ZFN 1383 was selected from the panel for further evaluation because it was the closest to the edit site, but still situated within the intron (Supplementary Fig. 9). HSCs were treated with ZFN 1383 mRNA and the resulting indel profile was queried by targeted deep sequencing (Supplementary Fig. 9). ZFN 1383 most frequent indel outcomes did not disrupt the splice acceptor site or the ORF of exon 1. The same procedure was conducted for Cas9 gRNA #6 targeted to tetherin’s intron 1, and again, the top indel outcomes were fully contained within intronic sequences (Supplementary Fig. 9). Furthermore, we compared the effect of exon- vs intron-targeted nucleases on tetherin protein expression in primary CD4+ T cells. We ob- served that treatment with either an exon targeted ZFN (1189) or Cas9 (#4) significantly decreased tetherin expression compared to mock electroporated cells on days 2 and 4 post-editing (Supplementary Fig. 9). On the other hand, intron-targeted ZFN (1383) and Cas9 (#6) had no effect on tetherin protein expression. 25 Nevertheless, the main challenge with an at-a-distance editing strategy is efficien - cy. Conversion tracts in mammalian cells are limited and roughly translate to a decrease in ~50% efficiency 100bp away from the cut site; ΔGI is roughly 225bp away from both ZFN 1383 and Cas9 gRNA #6 (Figure 12a). We treated HSCs with intron-targeted ZFN 1383 and Cas9 gRNA #6 with and without the AAV6 template and quantified the editing outcomes by targeted deep sequencing. We observed that total indel frequency was high- est for Cas9 delivered as RNP, and this translated to ~20% ΔGI editing from 225bp away (Figure 12b-c). The co-conversion of ΔGI and T45I, however, was more challenging. De- spite efficient indel generation, and a relatively strong MMEJ:NHEJ ratio, Cas9 gRNA #6, only induced ~10% ΔGI/T45I editing, only marginally better than the ZFN or Cas9 mRNA. Additionally, both the ZFN and Cas9 outcomes had GI/T45I edits linked with indels likely due to the fact that the AAV6 donor lacked mutations in the binding sites of these nucle- ases which allowed for recutting after HDR editing (Figure 12d). Nevertheless, due to the fact that these indels are positioned within the intron, the result is likely not problematic. We also tested whether at-a-distance editing could be improved. First, we tested whether using mismatched AAV6 homology templates, which have silent mutations ev- ery 3bp between the cut site and the edit site, improve HDR, specifically of GI deletion. Mismatches between the AAV6 homology template and genomic DNA have been previ- ously shown to improve the conversion of deletion edits 79 . However, adding mismatches seemed to abolish editing activity in K562 rather than improve it (data not shown). Ad- ditionally, we tested whether adding additional factors related to the DNA end-resection requirements of HDR, specifically a factor called BLM, could improve the conversion of edits at-a-distance, but this too showed no improved response in K562 (data now shown). Ultimately, intron targeted editing does prove an interesting avenue to explore further and may be best suited to purposes where eliminating exon disruption outweighs absolute efficiency. 26 Engraftment of NSG mice with tetherin-modified HSCs and HIV challenge. The NSG immunodeficient mouse model was used to query the ability of ex vivo edited stem cells to survive and differentiate in vivo. We engrafted HSCs treated with our tetherin AAV6 homology template and the Exon Cas9 (#4), alongside control, WT HSCs into mice. In addition, a few mice were also engrafted with tetherin-modified HSCs treat - ed with Exon ZFN (1189) or the Intron Cas9 (#6). In preliminary assays, humanization of mice using gene edited HSCs was very hit and miss, with only about half the cohort humanizing, as detected 16 weeks post injection (data not shown). We tested whether increasing the number of gene edited HSCs used for engraftment could potentially im- prove the efficiency with which mice are humanized. Cohorts where two million gene ed - ited HSCs were used had successful humanization in nearly all mice (Figure 13b). Mice were humanized with an average efficiency of 25% when Cas9 RNPs or ZFN 1189 was used. The efficiency for mice engrafted with Cas9 mRNA was low, with an average of 2%, albeit determined with a smaller number of mice. Though control mice were engrafted with only one million HSCs, humanization efficiency remained highest in this group, with an average of 37%. Evidently, the gene editing procedures affect the ability of HSCs to successfully humanize mice, perhaps partly due to the toxic effects of these tools which can be observed even in vitro (Figure 9d). Nevertheless, humanization can be improved by increasing the seed number of cells engrafted into mice. Once humanization was established, we looked at various compartments of hu- man hematopoietic cell progenitors to query whether the gene editing procedures impact- ed the HSCs ability to differentiate in vivo. Specifically, we measured the frequency of CD19+ B cells, CD3+ T cells and CD4+ T cells of human CD45+ cells in mice engrafted with tetherin-modified HSCs and control, WT HSCs at 16 weeks post engraftment. We observed no significant skewing of subpopulations in samples treated with Exon Cas9, Exon ZFN, or Intron Cas9 as compared to control mice. Next, we quantified the efficiency of HDR editing in the blood of mice engrafted 27 with tetherin-modified HSCs. Of the three editing conditions, only mice engrafted with HSCs treated with Cas9 RNP + AAV6 homology template had consistently detectable ΔGI editing frequencies, with an average of 15% and range of 4-27%. Finally, we challenged Exon Cas9 + AAV6 humanized mice with WT HIV-1 NL4.3. We observed a reduction in humanization (CD45+ cells) over time, mostly as a reduction in the CD4+ T cell subset (Figure 13e-f). However, in the parameters used in this experi- ment, we observed no difference in the ratio of CD4+ T cell loss after HIV infection, or in the viral loads, between mice engrafted with tetherin-modified or WT HSCs (Figure 13f-g). Discussion Currently there are a handful of genetic engineering-based clinical trials in the USA and China for the treatment of HIV/AIDS (clinicaltrials.gov). All rely on the ablation of CCR5 using ZFN or CRISPR/Cas9 and only one targets CD34+ HSCs (others target CD4+ T cells). CCR5 knockout, however, only protects against R5 tropic HIV, offering little to no protection against HIV strains that instead use the CXCR4 cell surface receptor for entry. Furthermore, all studies rely on error-prone DNA repair pathways which result in a mosaic of indel outcomes. More controlled outcomes, such as those that can be achieved by HDR, are desirable. Nevertheless, efficient HDR in HSCs remains a major hurdle. In this study, we use a novel approach (Part I of this thesis) to rationally select a CRISPR/Cas9 gRNA target site with a dominant MMEJ indel signature (Supplemental Fig. 6) Due to the correlation of MMEJ to HDR frequency, we were able to achieve 60- 80% ΔGI HDR at tetherin using an AAV6 based homology template. Nevertheless, the co-conversion of ΔGI and T45I was less efficient, likely due to the 50bp gap between the two desired edits (Figure 9c). In addition, though HDR was highly efficient, inadvertent indel outcomes were still present at roughly 5-20% frequency in HSCs, depending on the targeted nuclease variety used (Figure 9c). Inadvertent indels when targeting tetherin’s exon had a noticeable effect on tetherin protein expression as seen in CD4+ T cells (Fig- ure 11e). Specifically, tetherin expression was lowered after treatment with Cas9 RNP, 28 and was only partially rescued by the addition of a homology template. For these reasons, we sought to develop a novel strategy whereby the inadvertent gene knockout created by indels could be completely eliminated. We proposed that by targeting the nuclease to nearby introns, one could gene edit at-a-distance while con- centrating indels within non-coding regions. We developed ZFN and CRISPR/Cas9 that target tetherin’s intron 1 and were roughly situated ~250bp from the ΔGI edit site. Genetic and protein analyses revealed that the indels of the intron-targeted ZFN and Cas9 were mostly contained within the intron (i.e. did not disrupt the open reading frame or regulato- ry elements required for mRNA splicing) and had no result on tetherin protein expression in CD4+ T cells as compared to exon-targeted strategies (Supplementary Fig. 9). Never- theless, efficiencies of at-a-distance editing were roughly 10-20% for ΔGI and roughly half that for the dual ΔGI and T45I conversion (Figure 12). As a result, at-a-distance editing strategies should carefully consider the distance between the cut site and edit site and may be best suited to strategies where eliminating protein knockout is essential, more so than absolute efficiency. Treatment of HSCs with Cas9 RNP and AAV6 templates did not skew the differ- entiation program of HSCs in vitro (Figure 9) or in vivo (Figure 13) for either the exon-tar- geting or intron-targeting strategy. ΔGI edited cells were detectable in blood 16 weeks post engraftment at frequencies between 5-27% for samples treated with the Exon Cas9. Editing in vivo was not detected for the Intron-targeted Cas9 or for ZFN treated samples. Additionally, in this study, we functionally confirmed that ΔGI/T45I tetherin, when expressed and controlled from the endogenous locus can restrict budding virus. For this, we gene edited CD4+ CEM-SS cells and derived clones with perfect HDR editing. After infecting these clonal populations with HIV, we observed a 1-log drop in HIV RNA detect- ed in the culture of cells (Figure 10). Importantly, the level of inhibition granted by ΔGI/ T45I tetherin matched levels of HIV obtained when a variant lacking Vpu altogether was used (Figure 10). Additionally, we repeated this assay with gene edited CD4+ T cells, with 29 the one difference that clonal populations were not derived and instead, the bulk edited populations were challenged with HIV. In CD4+ T cells we observed roughly a ½ log drop in viral loads when cells were edited with ΔGI/T45I or, unexpectedly, with Cas9 RNP only. The anti-HIV response in samples treated only with the Cas9 RNP could be a result of two things: 1) gene editing techniques harm cells such that HIV infection is more challeng- ing OR 2) the indel signature created at this target site, which includes dominant -15bp deletions in the residues important for Vpu interaction, also forms Vpu-resistant tetherin variant(s) (see Supplementary Fig. 6 for indel signature). Despite observing anti-HIV responses in vitro in both CEM-SS clonal cell lines as well as CD4+ T cells, gene edited tetherin failed to show a response in mice engrafted with tetherin modified HSCs and challenged with HIV. Most likely the dual ΔGI and T45I editing frequencies were below the threshold for a functional response. Major challenges must be overcome to enhance the engraftment and/or survival of gene edited cells in vivo. Despite 80% ΔGI editing in vitro, this only translated to 5-27% ΔGI in vivo (blood). Furthermore, twice as many edited HSCs needed to be engrafted to achieve humaniza- tion levels that corresponded to levels in control mice engrafted with untouched HSCs. Additionally, developments to improve the conversion tract length for HDR in mammalian cells are necessary to improve the co-conversion of ΔGI and T45I as well as improve the efficiency of at-a-distance editing strategies. 30 Methods and Materials Isolation of primary human cells Peripheral blood mononuclear cells (PBMCs) were isolated from whole blood from healthy donors (Gulf Coast Regional Blood Center, Houston, TX) by Ficoll centrifugation using Leucosep tubes (GE Healthcare, Chicago, IL) per the manufacturer’s instructions. CD4+ T cells were isolated from PBMCs by magnetic positive selection using anti-CD4 mi- crobeads and a magnetic column (Miltenyi Biotec, Bergisch Gladbach, Germany). Human CD34+ cells were purified from fetal liver (Advanced Bioscience Resources, Alameda, CA), one day before experimentation, as previously described 66,80 . Briefly, connective tis - sue was removed, both physically and enzymatically, to yield a single-cell suspension from which CD34+ cells were isolated using the EasySep™ human cord blood CD34 positive selection kit II (StemCell, Vancouver, BC, Canada). Cell culture The K562 cell line (ATCC, Manassas, VA, USA) was cultured in RPMI-1640 (VWR; Radnor, PA), 10% fetal bovine serum (FBS; Gemini Bioproducts, West Sacramento, CA), and 1% Penicillin/Streptomycin (P/S; VWR) and maintained at a density of 1 x 106 cells/ mL. Primary CD4+T cells were cultured in X-VIVOTM 15 serum-free hematopoietic cell medium (Lonza, Basel, Switzerland), supplemented with 10% FBS (Gemini Bioprod- ucts), 2mM L-glutamine (SigmaAldrich; St. Louis, MO), and 1% P/S/Amphotericin (P/S/A; Lonza). Immediately following isolation, CD4+T cells were stimulated for 72hrs with an- ti-human CD3/CD28 magnetic dynabeads (Thermo Fisher; Waltham, MA) at a beads to cells concentration of 1:1, in the presence of IL-2 at 30 IU/mL (Peprotech, Rocky Hill, NJ, U.S.A.). Following electroporation, CD4+T cells were cultured in the presence of IL-2 and IL-7 (at 20ng/mL; Peprotech) and maintained at a density of 1e6 cells/mL. CD34+ HSPCs were plated at 1 x 106 cells/mL in SFEM-II (StemCell), supple- 31 mented with 1% P/S/A, and 100 ng/mL each of stem cell factor (SCF; R&D Systems, Minneap- olis), Thrombopoietin (TPO; R&D Systems), and fms-like tyrosine kinase 3 ligand (Flt3; Miltenyi Biotec). Following electroporation, HSPCs were cultured in the presence of 10% FBS and main- tained at a density of 1 x 10 6 cells/mL. Cells were cultured at 37ºC and 5% CO2. Cas9 RNPand gRNAs complexing Recombinant Streptococcus pyogenes (Sp) Cas9 protein was obtained as either TrueCut Cas9 (ThermoFisher) or Macro Cas9 (QB3 MacroLab, UC Berkeley, Berkeley, California), and stored at -20°C. Lyophilized guide RNAs with 2’-O-methyl 3’ phosphorothioate modifications were purchased from Synthego (Silicon Valley, CA), in either a single-guide (sgRNA) format or a crispr:tracr-gRNA (cr:tr gRNA) format. sgRNAs were resuspended to a final concentration of 5 ng/µL (77 pmol/µL) with nuclease-free 1x TE buffer (Synthego), incubated at room temperature for 15 min., aliquoted, and stored at -80°C. crRNA and trRNA were each resuspended in nu- clease-free duplex buffer (IDT, Coralville, IA), incubated at room temperature for 15 min, mixed together to yield a final concentration of 2.98 ng/µL (80 pmol/µL), then placed in a thermal cycler to complex the cr:trRNA (95°C for 2 mins; decrease temperature 2°C/sec to 85°C; decrease temperature 0.1°C/sec to 25°C). Complexed cr:tr RNA was aliquoted and stored at -80°C. mRNA for Sp or Staphylococcus aureus (Sa) Cas9 was synthesized using the T7 mScript™ Standard mRNA Production System (Cellscript, Madison, WI) per the manufacturer’s instructions and stored at -80°C. Cas9 and ZFN mRNA production Cas9 and ZFN coding sequences were cloned into a modified version of the pVAX-2UX vector with 3’UTRs and poly A tail derived from beta globin. ZFN constructs had an additional 5’UTR derived from Xenopus. Plasmids were linearized XbaI digestion to generate templates for mRNA synthesis. mRNA was prepared using the T7 mScript Standard mRNA production sys- tem (CELLSCRIPT, Madison, WI). Transcribed mRNA was purified by column purification and aliquoted and stored at -80ºC. 32 Cas9-gRNA reagents and electroporation Cas9-gRNA reagents were delivered to all cell types by electroporation. Immediately prior to each experiment, SpCas9 protein and gRNAs were thawed on ice and complexed by gently mixing with incubation at 37°C for 15 min. When Cas9 was delivered as mRNA, it was simply mixed with sgRNA prior to electroporation. The details of the specific Cas9 protein and gRNAs used, and their doses, are in Supplementary Table 2. For CD4+T cells, just prior to electroporation, CD3/CD28 beads were removed by placing cells on a DynaMag2 cell separation magnet (ThermoFisher) for 2 min. For all cell types, 2 x 10 5 cells were centrifuged and resuspended in electroporation buffer and mixed with Cas9-gRNA and, where indicated, with single stranded oligonucleotide homology templates (ssODNs) (Sup- plementary Table 2). Mixtures were transferred to the electroporation cuvettes and zapped. The cell-type specific details of the dosages, resuspension volumes, electroporation protocols can be found in Supplementary Table 2. All HSPC samples were edited with the BTX electropora- tor (BTX, Holliston, MA) unless otherwise stated. Immediately following electroporation, fresh, pre-warmed media was added to the cuvettes and the cells were allowed to rest for 10-15 min. Cells were transferred to 96-well plates and cultured for 4-5 days followed by extraction of the genomic lysate as described below. ssODN homology template ssODNs (IDT) were designed to introduce a 6bp (XhoI) sequence at the canonical Cas9 cut site for 44 gRNAs (Supplementary Table 2). The ssODNs were identical to the non-PAM DNA, and contained 40bp left and right homology arms (80bp total homology). A control non-tar- get ssODN was designed containing 80bp of the bacterial Kanamycin gene and was confirmed by BLAST to have no significant homology to the human genome. ssODNs were modified at the 5’ and 3’ terminal ends with two phosphorothioate modifications 50 . AAV6 homology template AAV vectors were produced in-house as described below, or alternatively high-titer preps 33 were purchased from Vigene (Rockville, MD). T etherin homologous donor templates were cloned into custimized plasmid derived from pAAV-MCS (Agilent Technologies, Santa Clara, CA), con- taining AAV2 inverted terminal repeats, to enable packaging as AAV vectors using triple-trans- fection method 81 . Briefly, HEK293 cells were plated, grown for 2-3 days to a density of 80%, then transfected using the calcium phosphate method with an AAV helper plasmid expressing AAV2 Rep and serotype specific Cap genes, an adenovirus helper plasmid, and an AAV vector genome plasmid containing inverted terminal repeats. After 3 days, cells were lysed, and cell debris removed by centrifugation. AAV vectors were precipitated from lysates using polyethylene glycol and purified by ultracentrifugation on a cesium chloride gradient. Vectors were formulated by dialysis and filter sterilized. NU7441 treatment For NU7441 (StemCell) treatments, Cas9-gRNA formation and nucleofection of K562 was performed as described above and cells were plated in 180μL of culture medium in a 96-well plate. Immediately after plating, 50μL of cell suspension was transferred to an empty well. 1µL of media containing NU7441 at 50x the final concentration was delivered to the cells. NU7441 was delivered at two doses: 1μM or 5μM. At 24hr post-NU7441 treatment, 150μL of fresh, pre- warmed media was added to cells. At 4-5 days post editing, genomic lysate was generated as described below. HIV-1 NL4.3 virus production HIV NL4.3 was produced by transfecting 293T production cells with plasmids encoding HIV proviral DNA. Three days post-transfection, cellular supernatants were harvested, filtered, aliquoted and stored at -80ºC. HIV NL4.3 infectious units per mL (IU/mL) were determined by serially diluting HIV preps and infecting Ghost cells that express GFP in response to HIV rep- lication. GFP expression was quantified by flow cytometry 2 days post infection and viral titers calculated. 34 In vitro HIV-1 infections For in vitro HIV infections, 2x10 5 cells (either CD4+ CEM-SS or primary CD4+ T cells) were resuspended in 100microL media and HIV at MOI of 1, then spinoculated at 2500rpm for 2 hours. Following spinoculation, cells were transferred to a 96 well culture plate and expanded for up to 5 days with samples of cells and supernatant taken on days 2, 3, and 4. HIV RNA in the supernatant was quantified by qRT-PCR with probes directed to the HIV LTRs. Genome editing quantification by Sanger sequencing Cells were transferred to a thermal cycler compatible plate and pelleted by centrifugation and media was gently aspirated. 25 µL of Cell Lysis Buffer/Protein Degrader mix (GeneArt Ge- nomic cleavage detection kit; Thermo Fisher) was added to each well and the plate was incubat- ed at 68ºC for 15min and 95ºC for 10min then stored at -20ºC. Amplicons were generated using indicated PCR primers (Supplementary Table 3), designed to amplify uniquely a 500 to 1000 bp region of genomic DNA surrounding the target site, using the Primer3 webtool (http://prim- er3.ut.ee, v. 0.4.0). 1-2 µL of genomic lysate were used in a 25 µL PCR reaction with AmpliTaq Gold Master Mix (Thermo Fisher) and 0.4 nM final concentration of forward and reverse primers (each). The thermal cycler program for all PCRs is as follows: 95 °C 10 min; 95 °C 30 sec; 60 °C 30 sec; 72 °C 2 min; repeat step 2 to 4 for 38 circles; 72 °C 7 min. 10 µL unpurified PCR product was transferred to a new 96-well plate, sealed and shipped to Genewiz (South Plainfield, NJ) for Sanger sequencing. Trace sequencing files were uploaded to ICE (Inference of CRISPR Edits, ice.synthego.com, Synthego) to quantify DNA repair outcomes48. Genome editing quantification by ddPCR Cells were lysed and genomic DNA column purified. Purified genomic DNA was mixed with ddPCR primers and probes (Supplementary Figure) along with ddPCR polymerase and buf- fer in a 96 well ddPCR compatible plate (Biorad, Hercules, CA). Droplets were generated using the QX200TM Droplet Generator (Biorad), then placed in the thermocycler. Droplets were read and quantified using the QX200TM Droplet Reader (Biorad). Gene editing was quantified as the number of droplets double positive for the reference and ΔGI probe divided by the total positive 35 droplets. ddPCR assay validation in Supplementary Figure 10. Genome editing quantification by next generation sequencing Cells were lysed and genomic DNA column purified. Purified genomic DNA was sent to Sangamo Biosciences (Brisbane, CA) for processing. In brief, samples were amplified using a two step amplification procedure. First an in-out PCR was conducted to ensure amplification oc - curred only from the endogenous locus, not any lingering homolog template within the sample. Second, either a “short” (<200bp) or “long” (~350bp) amplicon was generated around the edit/ nuclease cleavage sites and subjected to next generation sequencing (Supplementary Figure 11). Genome editing analysis and visualization for Sanger trace files All ICE datasets were run through a custom quality control filter to remove samples with poor suitability for ICE analysis (R2 < 0.75), or with low quality sequencing reads (Mead Discord Before > 0.25). Total number of samples excluded was 56 out of 1,335, and were not skewed to any one gRNA. Indel prediction software Lindel 34 , FORECasT 32 , and inDelphi 36 software were downloaded via GitHub and run lo- cally. Batch-mode was used for Lindel and FORECasT in order to predict the indel outcomes for all 44 gRNAs (Supplementary Table 1). A custom Python-based module was written for inDelphi to allow batch analysis. Mouse Engraftment, sampling, and HIV infection 6-12 week old NOD-SCID-Gamma mice (NSG) were irradiated, and roughly 6-8 hrs later, implanted with control or gene edited HSCs retro-orbitally. Control mice received 1x106 un- touched HSCs while experimental mice received 2x10 6 gene edited HSCs. HSCs were allowed to differentiate and proliferate in mice for 16 weeks at which point peripheral blood was harvest- ed retro-orbitally and subjected to cellular and genetic analysis. Humanization frequency was determined by flow cytometry by staining for human factors CD45, 36 CD19, CD3, and CD4. Maintenance of gene edited cells in vivo was quantified after genomic DNA extraction by ddPCR. Mice were then infected with 1x10 5 IU of HIV by direct injection into the peritoneal cavity. HIV viral loads and human cells were monitored by sampling of peripheral blood at 4, 8 and 12 weeks post infection followed by qRT-PCR or flow cytometry, respectively. Statistical Analyses No statistical methods were used to predetermine the sample size. All calculations were performed using R in RStudio (version 1.1) or GraphPad Prism. All bar graphs show the mean +/- S.E.M. All box plots show the median (segment within rectangle), first to third quartile (rect - angle), and outliers (points outside the whiskers), unless otherwise noted. The Shapiro-Wilk test was used to determine whether each dataset was normally distributed. If normal, Welch’s t test or Student’s t test was performed to calculate the statistical significance of the results. If non-nor - mally distributed, the Mann-Whitney U test was used to compare two groups or the Kruskal-Wal- lis test with Bonferroni’s correction was used to compare multiple groups. The p-values shown in figures as asterisks are defined as follows: *** p-value < .001, ** p-value < .01, * p-value < .05. 37 References 1. Carroll, D. Genome engineering with targetable nucleases. Annual review of biochemistry 83, 409–39 (2014). 2. Rouet, P., Smih, F. & Jasin, M. Expression of a site-specific endonuclease stimulates homologous recombination in mammalian cells. Proc National Acad Sci 91, 6064–6068 (1994). 3. Chen, F. et al. High-frequency genome editing using ssDNA oligonucleotides with zinc-finger nucle - ases. Nature methods 8, 753–5 (2011). 4. Lombardo, A. et al. Site-specific integration and tailoring of cassette design for sustainable gene transfer. Nat Methods 8, 861–9 (2011). 5. Maeder, M. L. & Gersbach, C. A. Genome-editing Technologies for Gene and Cell Therapy. Mol Ther J Am Soc Gene Ther 24, 430–46 (2016). 6. Ira, G. et al. DNA end resection, homologous recombination and DNA damage checkpoint activation require CDK1. Nature 431, 1011 (2004). 7. Saleh-Gohari, N. & Helleday, T. Conservative homologous recombination preferentially repairs DNA double-strand breaks in the S phase of the cell cycle in human cells. Nucleic Acids Res 32, 3683– 3688 (2004). 8. Ranjha, L., Howard, S. M. & Cejka, P. Main steps in DNA double-strand break repair: an introduction to homologous recombination and related processes. Chromosoma 127, 187–214 (2018). 9. Chang, H. H. Y. H., Pannunzio, N. R., Adachi, N. & Lieber, M. R. Non-homologous DNA end joining and alternative pathways to double-strand break repair. Nature reviews. Molecular cell biology (2017) doi:10.1038/nrm.2017.48. 10. Ceccaldi, R., Rondinelli, B. & D’Andrea, A. D. Repair Pathway Choices and Consequences at the Double-Strand Break. Trends in Cell Biology 26, 52–64 (2016). 11. Perez, E. E. et al. Establishment of HIV-1 resistance in CD4+ T cells by genome editing using zinc-fin - ger nucleases. Nature biotechnology 26, 808 (2008). 38 12. Santiago, Y . et al. Targeted gene knockout in mammalian cells by using engineered zinc-finger nucle - ases. P Natl Acad Sci Usa 105, 5809–14 (2008). 13. Holt, N. et al. Human hematopoietic stem/progenitor cells modified by zinc-finger nucleases targeted to CCR5 control HIV-1 in vivo. Nature biotechnology 28, 839–47 (2010). 14. Lin, S., Staahl, B. T., Alla, R. K. & Doudna, J. A. Enhanced homology-directed human genome engi- neering by controlled timing of CRISPR/Cas9 delivery. Elife 3, e04766 (2014). 15. Lomova, A. et al. Improving Gene Editing Outcomes in Human Hematopoietic Stem and Progenitor Cells by Temporal Control of DNA Repair. Stem Cells Dayt Ohio 37, 284–294 (2018). 16. Charpentier, M. et al. CtIP fusion to Cas9 enhances transgene integration by homology-dependent repair. Nature communications 9, 1133 (2018). 17. Tran, N.-T. et al. Enhancement of Precise Gene Editing by the Association of Cas9 With Homologous Recombination Factors. Frontiers Genetics 10, 365 (2019). 18. Gutschner, T., Haemmerle, M., Genovese, G., Draetta, G. F. & Chin, L. Post-translational Regulation of Cas9 during G1 Enhances Homology-Directed Repair. Cell Reports 14, 1555–66 (2016). 19. Yeh, C. D., Richardson, C. D. & Corn, J. E. Advances in genome editing through control of DNA repair pathways. Nature cell biology 21, 1468–1478 (2019). 20. Shao, S. et al. Enhancing CRISPR/Cas9-mediated homology-directed repair in mammalian cells by expressing Saccharomyces cerevisiae Rad52. Int J Biochem Cell Biology 92, 43–52 (2017). 21. Liu, M. et al. Methodologies for Improving HDR Efficiency. Frontiers Genetics 9, 691 (2019). 22. Bak, R. O., Dever, D. P. & Porteus, M. H. CRISPR/Cas9 genome editing in human hematopoietic stem cells. Nature protocols 13, 358–376 (2018). 23. Elliott, B., Richardson, C., Winderbaum, J., Nickoloff, J. A. & Jasin, M. Gene conversion tracts from double-strand break repair in mammalian cells. Molecular and cellular biology 18, 93–101 (1998). 24. Paquet, D. et al. Efficient introduction of specific homozygous and heterozygous mutations using CRISPR/Cas9. Nature 533, 125 (2016). 39 25. Liang, X., Potter, J., Kumar, S. & of …, R. N. Enhanced CRISPR/Cas9-mediated precise genome editing by improved design and delivery of gRNA, Cas9 nuclease, and donor DNA. (2017). 26. DeWitt, M. A. et al. Selection-free genome editing of the sickle mutation in human adult hematopoietic stem/progenitor cells. Science translational medicine 8, 360ra134 (2016). 27. Mojica, F. J. M., Díez-Villaseñor, C., García-Martínez, J. & Almendros, C. Short motif sequences de- termine the targets of the prokaryotic CRISPR defence system. Microbiology+ 155, 733–740 (2009). 28. Wang, B. Tips for designing CRISPR-Cas9 mediated HDR experiments. at <https://www.idtdna.com/ pages/education/decoded/article/crispr-cas9-mediated-hdr-tips-for-successful-experimental-design> 29. Chang, H. H. et al. Different DNA End Configurations Dictate Which NHEJ Components Are Most Important for Joining Efficiency. J Biol Chem 291, 24377–24389 (2016). 30. Brinkman, E. K. et al. Kinetics and Fidelity of the Repair of Cas9-Induced Double-Strand DNA Breaks. Molecular cell 70, 801-813.e6 (2018). 31. Pavel-Dinu, M. et al. Gene correction for SCID-X1 in long-term hematopoietic stem cells. Nature communications 10, 1634 (2019). 32. Allen, F. et al. Predicting the mutations generated by repair of Cas9-induced double-strand breaks. Nature biotechnology 37, 64–72 (2018). 33. Chakrabarti, A. M. et al. Target-Specific Precision of CRISPR-Mediated Genome Editing. Molecular cell 73, 699-713.e6 (2018). 34. Chen, W. et al. Massively parallel profiling and predictive modeling of the outcomes of CRISPR/ Cas9-mediated double-strand break repair. Nucleic Acids Research (2019) doi:10.1093/nar/gkz487. 35. Leenay, R. T. et al. Large dataset enables prediction of repair after CRISPR-Cas9 editing in primary T cells. Nature biotechnology 37, 1034–1037 (2019). 36. Shen, M. W. et al. Predictable and precise template-free CRISPR editing of pathogenic variants. Na- ture 563, 646–651 (2018). 37. Taheri-Ghahfarokhi, A. et al. Decoding non-random mutational signatures at Cas9 targeted sites. Nucleic acids research 46, 8417–8434 (2018). 40 38. Overbeek, M. van et al. DNA Repair Profiling Reveals Nonrandom Outcomes at Cas9-Mediated Breaks. Molecular cell 63, 633–646 (2016). 39. Silva, J. da et al. Genome-scale CRISPR screens are efficient in non-homologous end-joining defi - cient cells. Scientific Reports 9, 15751 (2019). 40. Guo, T. et al. Harnessing accurate non-homologous end joining for efficient precise deletion in CRIS - PR/Cas9-mediated genome editing. Genome biology 19, 170 (2018). 41. Iyer, S. et al. Precise therapeutic gene correction by a simple nuclease-induced double-stranded break. Nature 568, 561–565 (2019). 42. Grajcarek, J. et al. Genome-wide microhomologies enable precise template-free editing of biological- ly relevant deletion mutations. Nat Commun 10, 4856 (2019). 43. Ata, H. et al. Robust Activation of Microhomology-mediated End Joining for Precision Gene Editing Applications. Plos Genet 14, e1007652 (2018). 44. Truong, L. N. et al. Microhomology-mediated End Joining and Homologous Recombination share the initial end resection step to repair DNA double-strand breaks in mammalian cells. Proceedings of the National Academy of Sciences 110, 7720–7725 (2013). 45. Han, L. & Yu, K. Altered kinetics of nonhomologous end joining and class switch recombination in ligase IV-deficient B cells. J Exp Medicine 205, 2745–53 (2008). 46. McVey, M. & Lee, S. MMEJ repair of double-strand breaks (director’s cut): deleted sequences and alternative endings. Trends in Genetics 24, 529–538 (2008). 47. Kochan, J. A. et al. Meta-analysis of DNA double-strand break response kinetics. Nucleic Acids Res 45, 12625–12637 (2017). 48. Hsiau, T. et al. Inference of crispr edits from sanger trace data. BioRxiv 251082 (2019) doi:10.1101/251082. 49. Ran, A. F. et al. In vivo genome editing using Staphylococcus aureus Cas9. Nature 520, 186–91 (2015). 41 50. Renaud, J.-B. B. et al. Improved Genome Editing Efficiency and Flexibility Using Modified Oligonucle - otides with TALEN and CRISPR-Cas9 Nucleases. Cell reports 14, 2263–2272 (2016). 51. Paix, A. et al. Precision genome editing using synthesis-dependent repair of Cas9-induced DNA breaks. Proceedings of the National Academy of Sciences of the United States of America 114, E10745–E10754 (2017). 52. Richardson, C. D., Ray, G. J., DeWitt, M. A., Curie, G. L. & Corn, J. E. Enhancing homology-directed genome editing by catalytically active and inactive CRISPR-Cas9 using asymmetric donor DNA. Na- ture biotechnology 34, 339–44 (2016). 53. Song, F. & Stieger, K. Optimizing the DNA Donor Template for Homology-Directed Repair of Dou- ble-Strand Breaks. Mol Ther Nucleic Acids 7, 53–60 (2017). 54. Savic, N. et al. Covalent linkage of the DNA repair template to the CRISPR-Cas9 nuclease enhances homology-directed repair. Elife 7, e33761 (2018). 55. Romero, Z. et al. Editing the Sickle Cell Disease Mutation in Human Hematopoietic Stem Cells: Com- parison of Endonucleases and Homologous Donor Templates. Molecular therapy : the journal of the American Society of Gene Therapy (2019) doi:10.1016/j.ymthe.2019.05.014. 56. Gasiunas, G., Barrangou, R., Horvath, P. & Siksnys, V. Cas9–crRNA ribonucleoprotein complex me- diates specific DNA cleavage for adaptive immunity in bacteria. Proceedings of the National Academy of Sciences 109, E2579–E2586 (2012). 57. Zuo, Z. & Liu, J. Cas9-catalyzed DNA Cleavage Generates Staggered Ends: Evidence from Molecu- lar Dynamics Simulations. Scientific reports 5, 37584 (2016). 58. Gisler, S. et al. Multiplexed Cas9 targeting reveals genomic location effects and gRNA-based stag- gered breaks influencing mutation efficiency. Nature Communications 10, 1598 (2019). 59. Shou, J., Li, J., Liu, Y. & Wu, Q. Precise and Predictable CRISPR Chromosomal Rearrangements Reveal Principles of Cas9-Mediated Nucleotide Insertion. Molecular cell 71, 498-509.e4 (2018). 60. Lemos, B. R. et al. CRISPR/Cas9 cleavages in budding yeast reveal templated insertions and strand-specific insertion/deletion profiles. Proceedings of the National Academy of Sciences of the United States of America 115, E2040–E2047 (2018). 42 61. Shi, X. et al. Cas9 has no exonuclease activity resulting in staggered cleavage with overhangs and predictable di- and tri-nucleotide CRISPR insertions without template donor. Cell discovery 5, 53 (2019). 62. Wu, X., Wilson, T. E. & Lieber, M. R. A role for FEN-1 in nonhomologous DNA end joining: the order of strand annealing and nucleolytic processing events. Proceedings of the National Academy of Sci- ences of the United States of America 96, 1303–8 (1999). 63. Wang, J. et al. Targeted gene addition to a predetermined site in the human genome using a ZFN- based nicking enzyme. Genome research 22, 1316–26 (2012). 64. Haeussler, M. et al. Evaluation of off-target and on-target scoring algorithms and integration into the guide RNA selection tool CRISPOR. Genome biology 17, 148 (2016). 65. Liu, G., Zhang, Y. & Zhang, T. Computational approaches for effective CRISPR guide RNA design and evaluation. Computational and structural biotechnology journal 18, 35–44 (2020). 66. Wang, J. et al. Homology-driven genome editing in hematopoietic stem and progenitor cells using ZFN mRNA and AAV6 donors. Nature biotechnology 33, 1256–1263 (2015). 67. Towers, G. J. & Noursadeghi, M. Interactions between HIV-1 and the cell-autonomous innate immune system. Cell host & microbe 16, 10–8 (2014). 68. McLaren, P. J. et al. Identification of potential HIV restriction factors by combining evolutionary ge - nomic signatures with functional analyses. Retrovirology 12, 41 (2015). 69. McNatt, M. W. et al. Species-specific activity of HIV-1 Vpu and positive selection of tetherin trans - membrane domain variants. PLoS pathogens 5, e1000300 (2009). 70. Gupta, R. K. et al. Mutation of a single residue renders human tetherin resistant to HIV-1 Vpu-medi- ated depletion. PLoS pathogens 5, e1000443 (2009). 71. Sauter, D. Counteraction of the multifunctional restriction factor tetherin. Frontiers in microbiology 5, 163 (2014). 72. Evans, D. T., Serra-Moreno, R., Singh, R. K. & Guatelli, J. C. BST-2/tetherin: a new component of the innate immune response to enveloped viruses. Trends in microbiology 18, 388–96 (2010). 43 73. Kuhl, B. D. D., Cheng, V., Wainberg, M. A. & Liang, C. Tetherin and its viral antagonists. Journal of neuroimmune pharmacology : the official journal of the Society on NeuroImmune Pharmacology 6, 188–201 (2011). 74. Yang, S. J., Lopez, L. A., Exline, C. M., Haworth, K. G. & Cannon, P. M. Lack of adaptation to human tetherin in HIV-1 group O and P. Retrovirology 8, 78 (2011). 75. Rong, L. et al. The transmembrane domain of BST-2 determines its sensitivity to down-modulation by human immunodeficiency virus type 1 Vpu. Journal of virology 83, 7536–46 (2009). 76. Kobayashi, T. et al. Identification of Amino Acids in the Human Tetherin Transmembrane Domain Re - sponsible for HIV-1 Vpu Interaction and Susceptibility. Journal of Virology 85, 932–945 (2011). 77. Skasko, M. et al. HIV-1 Vpu protein antagonizes innate restriction factor BST-2 via lipid-embedded helix-helix interactions. The Journal of biological chemistry 287, 58–67 (2012). 78. Leahy, J. J. J. et al. Identification of a highly potent and selective DNA-dependent protein kinase (DNA-PK) inhibitor (NU7441) by screening of chromenone libraries. Bioorganic & medicinal chemis- try letters 14, 6083–7 (2004). 79. Deyle, D. R., Li, L. B., Ren, G. & Russell, D. W. The effects of polymorphisms on human gene target- ing. Nucleic acids research 42, 3119–24 (2014). 80. Rogers, G. L., Chen, H.-Y. Y., Morales, H. & Cannon, P. M. Homologous Recombination-Based Ge- nome Editing by Clade F AAVs Is Inefficient in the Absence of a Targeted DNA Break. Molecular ther - apy : the journal of the American Society of Gene Therapy 27, 1726–1736 (2019). 81. Xiao, X., Li, J. & Samulski, R. J. Production of high-titer recombinant adeno-associated virus vectors in the absence of helper adenovirus. Journal of virology 72, 2224–32 (1998).
Abstract (if available)
Abstract
Part I. Homology-directed repair (HDR) of a DNA break allows copying of genetic material from an exogenous DNA template and is frequently exploited in CRISPR/Cas9 genome editing. However, HDR is in competition with other DNA repair pathways, including NHEJ and MMEJ, and the efficiency of HDR outcomes is not predictable. Consequently, to optimize HDR editing, panels of CRISPR/Cas9 guide RNAs (gRNA) and matched homology templates must be evaluated. We report here that CRISPR/Cas9 indel signatures can instead be used to identify guide RNA (gRNAs) that maximize HDR outcomes. Specifically, we show that the frequency of indels derived by MMEJ repair, characterized as deletions greater than or equal to 3bp, better predicts HDR frequency than consideration of total indel frequency. We further demonstrate that tools that predict gRNA indel signatures can be repurposed to identify gRNAs to promote HDR. Finally, by comparing indels generated by S. aureus and S. pyogenes Cas9 targeted to the same site, we add to the growing body of data that the targeted DNA sequence is a major factor governing genome editing outcomes. ❧ Part II. Cellular restriction factors potently inhibit cross-species viral transmission but fail to protect against viruses that have evolved with the host. For primate lentiviruses, the antagonism of restriction factors is often by the action of viral accessory proteins that target the restriction factor for degradation. For example, the host factor tetherin is normally countered by HIV-1 Vpu, but in the absence of the viral protein, it physically restricts budding HIV-1 virions at the cell surface, activates NFkB, and possibly improves ADCC-mediated viral clearing. ❧ We are developing a genome editing strategy to create a Vpu-resistant form of human tetherin as a potential anti-HIV gene therapy. We screened a panel of mutations in human tetherin, informed by Vpu-resistant primate orthologs, to identify mutants with the desired phenotype of (a) a normal ability to restrict HIV-1 budding, and (b) Vpu-resistance. The most effective mutant, ΔGI/T45I, was previously described by McNatt et al. (2009) and contains a bipartite signal comprising a 6bp deletion of amino acids 27 and 28 (ΔGI), and a single base pair substitution altering amino acid 45 (T45I). ❧ Genome editing typically occurs through the combined delivery to a cell of a targeted nuclease that introduces a site-specific DNA break, and a matched DNA homology template sequence containing the desired mutations that is used as a repair template. We developed and optimized the following genome editing reagents: tetherin specific ZFN and CRISPR/Cas9 reagents paired with AAV6-based homology templates. Using these tools, we can achieve highly efficient ΔGI/T45I gene editing at tetherin in both CD34+ hematopoietic stem and progenitor cells and CD4+ T cells. Subsequent engraftment of these primary cells in immune deficient mice, followed by HIV-1 challenge, will be used to test whether ΔGI/T45I tetherin can form the basis of an anti-HIV gene therapy in vivo.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Characterizing and manipulating homology-directed gene editing in human cells
PDF
Molecular characterization of the HIV-1 Vpu protein and its role in antagonizing the cellular restriction factor BST-2/tetherin both in vitro and in vivo
PDF
Pseudotyped viral vectors: HIV gene therapy applications and basic studies of SARS-COV-2
PDF
Improving adeno-associated viral vector for hematopoietic stem cells gene therapy
PDF
Design and characterization of multiplex anti-HIV single domain antibodies for genome editing of the immunoglobulin locus
PDF
Mechanisms of nucleases in non-homologous DNA end joining
PDF
Role of the bone marrow niche components in B cell malignancies
PDF
Discovery of mature microRNA sequences within the protein- coding regions of global HIV-1 genomes: Predictions of novel mechanisms for viral infection and pathogenicity
PDF
Mechanistic basis for chromosomal translocations at the E2A gene
Asset Metadata
Creator
Tatiossian, Kristina Jasmine
(author)
Core Title
Rational selection of CRISPR/Cas9 guide RNAs for homology directed genome editing and its utility in the development of gene therapies
School
Keck School of Medicine
Degree
Doctor of Philosophy
Degree Program
Medical Biology
Publication Date
04/20/2020
Defense Date
03/16/2020
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
CD4 T cells,CRISPR/Cas9,DNA repair pathways,gene editing,gene therapy,HDR,hematopoietic stem and progenitor cells,HSCs,humanized mice,MMEJ,NHEJ,OAI-PMH Harvest,restriction factors,tetherin,ZFN
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Feng, Pinghui (
committee chair
), Cannon, Paula (
committee member
), Lu, Rong (
committee member
)
Creator Email
kristatios@gmail.com,tatiossi@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-282044
Unique identifier
UC11674747
Identifier
etd-Tatiossian-8265.pdf (filename),usctheses-c89-282044 (legacy record id)
Legacy Identifier
etd-Tatiossian-8265.pdf
Dmrecord
282044
Document Type
Dissertation
Rights
Tatiossian, Kristina Jasmine
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
CD4 T cells
CRISPR/Cas9
DNA repair pathways
gene editing
gene therapy
HDR
hematopoietic stem and progenitor cells
HSCs
humanized mice
MMEJ
NHEJ
restriction factors
tetherin
ZFN