Ja2c10656 Si 001
Ja2c10656 Si 001
Madden2,
David Tannahill1, Shankar Balasubramanian1,2,3*
1Cancer Research UK Cambridge Institute, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, UK.
2Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, CB2 1EW, UK
3School of Clinical Medicine, University of Cambridge, Cambridge, CB2 0SP, UK
*Correspondence: sb10031@cam.ac.uk (S.B.)
1
Materials and Methods
Phage display antibody selection
A naive synthetic library NaLi-H1 (Nanobody Library-Humanized 1) was screened against G4 structure MycG4 for the in vitro selection
of the G4-specific nanobody by Hybrigenics Services SAS (Hybrigenics Services SAS, 3-5 Impasse Reille, 75014 Paris, France,
www.hybrigenics-services.com). The biotinylated oligonucleotides have been folded in 10 mM Tris pH 7.4, 100 mM KCl, and 0.1%
Tween-20, annealed by heating to 95 °C for 10 minutes followed by slow cooling down to 23 °C (sequences reported in the table below).
The screening was performed in three rounds of phage display against MycG4 (50 nM in the first round, 10 nM in second and third
rounds). Before each round of selection, nanobodies were counterselected by incubating phages displaying the nanobodies with 50 nM
dsDNA and ssDNA bound to streptavidin magnetic beads to select the library against non-specific binders. During the second and third
rounds the nanobodies were presented to 10 nM MycG4 in presence of 5 nM and 10 nM of a mix of competitors (yeast tRNA, salmon
sperm DNA, random primer single-stranded DNA). The nanobody is cloned in pHEN2 phagemid carrying also the sequences for a
6xHIS and a triple MYC tag. We changed the tag of the novel G4-binding nanobody to 3xFLAG to have a probe comparable to scFV
BG4, also FLAG-tagged. The plasmid was digested with NotI and BglII (NEB) and 3xFLAG tag was inserted by Gibson Assembly
according to the manufacturer protocol (Gibson Assembly Master Mix, E2611S, New England Biolabs). The dead mutants mSG4 was
generated by mutation of a single amino acid to alanine through side-directed mutagenesis (Q5 site-directed mutagenesis kit, E0554S,
New England Biolabs).
CD spectroscopy
CD spectroscopy was performed using Chirascan CD spectropolarimeter (Applied Photophysics). Scans of the DNA oligonucleotides
reported in Table 1 were performed in triplicates at a concentration of 10 Tris HCl, 100 mM KCl, pH 7.4), in the range of
wavelengths from 220 to 340 nm in a 1 mm cuvette, with readings taken every 1 nm with 1 second per point. In CD melting experiments,
DNA oligonucleotides were analyzed at the concentration of 10 mM lithium cacodylate, pH 7.4), in three
replicates. The temperature was increased from 20 to 92 C in smooth ramp mode at ramp-rate of 1 C per min with 1 second per point.
The nanobody SG4 WT and mutants mSG4s (R105A, R107A, R56A) were scanned in triplicates at 10 concentration (PBS pH 7.4)
at 30C in a 1 mm cuvette between 200-280 nm, with readings taken every 0.5 nm with 1 second per point. Data were smoothed using
Chirascan software, and plotted with GraphPad Prism 9.
2
SG4 eukaryotic expression
For cellular expression, a SG4 nanobody fusion with GFP carrying a SV40 nuclear localization sequence (NLS) and FLAG-tagged (see
map) was synthesized as a gBlock Gene Fragment (Integrated DNA Technologies) and inserted in to the PiggyBac transposon in the
pCLIIPI-BP eukaryotic expression vector (a gift from M. Narita, CRUK CI, UK) (3) using NEBuilder HiFi DNA assembly (NEB
E5520S). pCLIIPI-BP carries a doxycycline-inducible promoter and a puromycin cassette. Plasmid DNA was prepared using a Plasmid
Plus Midi Kit (Qiagen 12943). Plasmid DNA and PiggyBac transposase (mPB) were transfected into HEK293T cells using TransIT-
293 (Mirus Bio MIR 27
puromycin for 14 days. Cells were maintained in DMEM supplemented with 10% tetracycline-free FBS (biosera FB-1001T) and
expression of the SG4-GFP fusion was achieved by addition of 1 72 hours. Live cells were then imaged for GFP
on a Leica SP5 confocal microscope at 40x magnification. Binding of the SG4-GFP fusion protein to the MYC or KRAS promoter was
assayed by CUT&Tag, with three technical replicates (different cells from the same flask) for each of two biological replicates (different
passages). Bulk CUT&Tag was performed as previously described (14), with the exception that anti-FLAG (Cell Signaling 2368;
1:100) was used as a primary antibody. Peaks were visualized on the IGV genome browser.
SG4 nanobody - SV40 nuclear localization sequence - Linker - GFP - FLAG epitope tag
Structure prediction
The predicted structures of the nanobodies were generated using Colabfold, Alphafold1 using MMseqs2 (7), using Amber to relax the
primary output structures, the 5 relaxed structures produced undergo 1 s long molecular dynamics simulations.
Molecular dynamics
Molecular dynamics for both the nanobodies as well as the complexes were performed using Gromacs2021 (8) the amber14sb bcs1
DNA forcefield. Solute was placed in cubic box with periodic boundary conditions at least 1 nm away from the boundary, TIP3P Water
was added along with and NaCl to simulate concentration of 50 nM. Initial minimisation was carried to at least 1000 kJ mol nm or
50 000 steps followed by heating and NVT equilibration for 1000 ps using V-rescale modified Berendsen thermostat, at 310K. All
3
simulations use 2 fs time step and Parrinello Rahman pressure coupling and PME electrostatics at 1.0 nm cut-off. PCA and TiCA plots
were produced with Pyemma 2.5.7 (9)
Docking
Docking was performed using High Ambiguity Driven protein-protein DOCKing (HADDOCK 2.4) (10) on the WeNMR-EOSC
Ecosystem (11), between the 10 first NMR structures of MycG4 and 10 structures for each nanobody, suggesting the whole length of
the DNA as active residues and the CDRs for the nanobodies. All other parameters are set to default for nucleic acid-protein interaction.
Resulted proposed docking structures from HADDOCK, undertake a further 500 ns MD with the same parameters as before (KCl
instead of NaCl). Gmx_MMPBSA(Molecular mechanics Poisson-Botlzmann surface area) (12) was used to analyze the resulted
trajectories and the contribution of each residue during the above simulations (MM/GBSA calculations).
4
MWM
kDa
250
150
100
75
50
37
25
20
15
SG4 mSG4
Figure S1. SDS-PAGE of purified nanobodies SG4, mutant mSG4 R105A (lane 2 and 3), and molecular weight marker (lane 1), stained
with Instablue Coomassie staining. The nanobodies ran at around 18 kDa as predicted. Lefthand side molecular weights of the protein
ladder in lane 1 (KDa)
5
S2B
S2B
100
MycG4
Folded fraction (%)
Kit1G4
VegfG4
50 hTeloG4
TbaG4
0
20 30 40 50 60 70 80 90 100
Temperature (°C)
Figure S2. (A) Structures of the oligonucleotides employed in ELISAs analyzed in the presence of 100 mM KCl, at a concentration of
-HCl buffer using CD spectroscopy (220-340 nm). The spectra of MycG4, Kit1G4, and VegfG4 are typical of a parallel
G-quadruplex with a positive peak at ~ 260 nm and a negative peak at ~ 240 nm. The spectrum of TbaG4 indicate the formation of an
antiparallel G-quadruplexes with positive peaks at ~ 250 nm and ~ 290 nm and a negative peak at ~260 nm. The spectrum of hTeloG4
is typical of a hybrid parallel/antiparallel G-quadruplex with a positive peak at ~ 290 nm and a negative peak at ~ 240 nm. The spectra
of ssDNA and 8-aza-7-deazaguanine used as negative control show a positive peak at ~280 nm and a negative peak at ~ 250 nm, not
corresponding to any G-quadruplex topology. Units are measured in molar ellipticity. (B) Normalized CD melting temperature curves
of oligonucleotides folded into G4s at the concentration of in 10 mM KCl 10 mM lithium cacodylate. MycG4, Kit1G4, and
VegfG4 (parallel G4s) were analyzed at 264 nm, while hTeloG4 and TbaG4 at 295 nm, according to the maximum peak observed in
the CD spectrum analysis.
6
Kd (nM)
1.2
B11 R101A 4.0 ± 0.5
Absorbance (450 nm)
B11 R105A ND
0.8 B11 R107A ND
0.4
0.0
0.1 1 10 100 1000
log[nM]
Figure S3. ELISA binding curves of four different SG4 mutants to MycG4: R101A (red), R105A (pink), R107A (blue), and H109A
(green). Dissociation constants (Kd) are indicated in nanomolar, in some cases they could not be determined (ND). Error bars
represent the Standard Error of the Mean (s.e.m.) calculated from two replicates.
Figure S4. (A) Overlap of all relaxed predicted structures of SG4 (blue) and mSG4-R105A (green). (B) Alphafold2 similarity score
and sequence coverage (PAE) of SG4 nanobody produced during the Alphafold2 prediction. (C) Overlap of SG4 nanobody structures
every 10 ns of molecular dynamics for 300 ns, CDR3 in red sticks demonstrating the flexibility of the region. (D) Represented complexes
of the two clusters resulted from docking experiments and used in further molecular dynamics. (E) R105A (left) and H109A (right),
binding energy delta and per component decomposition of energy contribution at residues 105 and 109.
7
A B
mSG4 R56A
1.2
molar ellipticity (deg*cm2*dmol-1)
8×105
mSG4 R56A ND
4×105
2×105 0.4
0
210 220 230 240 250 260 270 280
-2×105 0.0
0.1 1 10 100 1000
-4×105 log[nM]
Wavelength (nm)
Figure S5. (A) mSG4 R56A secondary structures determined by CD spectroscopy (200-280 nm) showing a characteristic -sheet with
a negative peak at 218 nm and a positive peak at 200 mm. Units are measured in molar ellipticity. (B) ELISA binding curve of SG4
mutant R56A to MycG4. The dissociation constant (Kd) could not be determined (ND). Error bars represent the Standard Error of the
Mean (s.e.m.) calculated from two replicates.
Figure S6. Bar plot of the enrichment of the amplification through qPCR and bar plot of the percentage of the DNA input recovered
(% Input) by either SG4 ChIP or mSG4 R105A in the same genomic regions, established by qPCR, of four different G4-positive controls
(RBBP4, SIRT4, RPA3, MAZ) over a G4-negative region (TMCC1), indicated by individual bars. The group of left bars is relative to
SG4 ChIP-seq, while the right one mSG4 R105A ChIP in K562.
8
SG4 mSG4 R105A
[bp] Ladder Input Rep1 Rep2 Rep3 Rep4 Rep1 Rep2 Rep3
Figure S7. Trace of the fragment sizes in bp of DNA fragments pulled down by either SG4 or mSG4 R105A ChIP (Tapestation).
OQs
(7 4 9 ,2 9 9 )
SG4 ChIP peaks
(1 0 ,1 9 7 )
K5 6 2 5 ,5 3 1
(5 4 .2 4 %) SG4 ChIP peaks 9 ,9 70
(1 0 ,19 7 ) (97 .7 8%)
K56 2
C D
OQs
(74 9,2 99 ) ATAC- seq
(4 1 ,0 6 2 )
10 ,6 21
(54 .6 0%) SG4 ChIP peaks 10 ,7 36
(1 9 ,45 3 ) (98 .7 8%)
U2 OS
Figure S8. Venn diagrams of the overlap of SG4 regions identified through ChIP-seq (SG4 ChIP-seq peaks) with sequences previously
identified as capable of folding into a G4 structure in vitro (so-called observed G4 sequences and referred to as OQs) (4) in K562 (A)
and in U2OS (C). Venn diagrams of the overlap of SG4 ChIP-seq peaks with accessible chromatin sites identified through Assay for
Transposase Accessible chromatin with high-throughput sequencing (ATAC-seq peaks) in K562 (B) and in U2OS (D) (1). Total
numbers of the total regions identified by SG4, OQs, and ATAC-seq peaks, and number and percentage of SG4 peaks overlapping with
OQs or ATAC-seq peaks are reported for both cell lines.
9
A K562 U2OS
20 30
20 30
15 15
20 20
Fold enrichment
Proportion (%)
10 10
10 10
5 5
0 0 0 0
First Exon
All Exons
Intergenic
TSS±1kb
Introns
5' UTR
3' UTR
5' UTR
3' UTR
GeneBody
Introns
TSS±1kb
All Exons
Intergenic
First Exon
GeneBody
Figure S9. (A) Fold enrichment over random (black bars) and proportion (grey) of SG4 ChIP-seq consensus regions across different
genomic features in K562 and U2OS cell lines. (B) Genome browser screenshots of one of the three technical replicates per biological
replicate of SG4 ChIP-seq (red) and input (grey) tracks at the promoters of MYC and KRAS genes in K562 and U2OS. OQs and
consensus regions across two of the three SG4 ChIP-seq replicates are shown. (C) Total number of regions showing G4 structural
motifs, percentage in brackets of G4 structural motifs detected by SG4 in K562 and U2OS. Loop sizes 1-3 (dark blue), 4-5 (red), 6-7
(green) are G4s that have one or more loops of this length in their motif; a G4 with at least a loop longer than 7 bases is referred as a
long loop (purple); a G4 is called a simple bulge (light blue) when it contains a bulge of 1-7 base; 2-tetrads/complex bulge (orange) are
G4s composed of only two tetrads or with many 1-
described by using the categories previously mentioned.
10
Figure S10. Fold enrichment (obtained as actual observed occurrences over average of 10 randomizations) of G4 structural motifs as
defined in (4) described in Figure 3 detected by SG4 ChIP-seq in K562 and U2OS. Enrichment values above 1 (red dotted line) indicate
enrichment of the motif over random expectation.
11
1. Shen J, Varshney D, Simeone A, Zhang X, Adhikari S, Tannahill D, et al. Promoter G-quadruplex folding precedes transcription and is
controlled by chromatin. Genome Biol. 2021;22(143):1 14.
2. Hänsel-Hertsch R, Spiegel J, Marsico G, Tannahill D, Balasubramanian S. Genome-wide mapping of endogenous G-quadruplex DNA
structures by chromatin immunoprecipitation and high-throughput sequencing. Nat Protoc. 2018;13(3):551 64.
3. Kirschner K, Samarajiwa SA, Cairns JM, Menon S, Pérez-Mancera PA, Tomimatsu K, et al. Phenotype Specific Analyses Reveal Distinct
Regulatory Mechanism for Chronically Activated p53. PLoS Genet. 2015;11(3):1 28.
4. Chambers VS, Marsico G, Boutell JM, Di Antonio M, Smith GP, Balasubramanian S. High-throughput sequencing of DNA G-quadruplex
structures in the human genome. Nat Biotechnol. 2015;33(8):877 81.
5. Hui WWI, Simeone A, Zyner KG, Tannahill D, Balasubramanian S. Single-cell mapping of DNA G-quadruplex structures in human cancer
cells. Sci Rep. 2021;11(23641):1 7.
6. Meers MP, Tenenbaum D, Henikoff S. Peak calling by Sparse Enrichment Analysis for CUT&RUN chromatin profiling. Epigenetics and
Chromatin. 2019;12(42):1 11.
7. Mirdita M, Schütze K, Moriwaki Y, Heo L, Ovchinnikov S, Steinegger M. ColabFold: making protein folding accessible to all. Nat Methods.
2022;19:679 82.
8. Páll S, Zhmurov A, Bauer P, Abraham M, Lundborg M, Gray A, et al. Heterogeneous parallelization and acceleration of molecular dynamics
simulations in GROMACS. J Chem Phys. 2020;153(134110):1 15.
9. Scherer MK, Trendelkamp-Schroer B, Paul F, Pérez-Hernández G, Hoffmann M, Plattner N, et al. PyEMMA 2: A Software Package for
Estimation, Validation, and Analysis of Markov Models. J Chem Theory Comput. 2015;11(11):5525 42.
10. Van Zundert GCP, Rodrigues JPGLM, Trellet M, Schmitz C, Kastritis PL, Karaca E, et al. The HADDOCK2.2 Web Server: User-Friendly
Integrative Modeling of Biomolecular Complexes. J Mol Biol. 2016;428(4):720 5.
11. Honorato R V., Koukos PI, Jiménez-García B, Tsaregorodtsev A, Verlato M, Giachetti A, et al. Structural Biology in the Clouds: The WeNMR-
EOSC Ecosystem. Front Mol Biosci. 2021;8(729513):1 7.
12. Valdés-Tresanco MS, Valdés-Tresanco ME, Valiente PA, Moreno E. Gmx_MMPBSA: A New Tool to Perform End-State Free Energy
Calculations with GROMACS. J Chem Theory Comput. 2021;17(10):6281 91.
12