Computing With DNANov28th2009
Computing With DNANov28th2009
Questions:
◦ Can any algorithm be simulated by means of DNA
computing?
◦ Is it possible to design a programmable molecular
computer?
Motivation
Unique features
◦ DNA used as data and code structures!
◦ Many ways of creating DNA computers
Usefulness
◦ Massive parallelism
◦ Smaller “hardware” size
◦ High energy efficiency
◦ Smaller information storage
◦ Can solve problems standard computers can’t
Need of DNA computer?
Moore’s Law states that silicon
microprocessors double in
complexity roughly every two
years.
One day this will no longer hold
true when miniaturisation limits
are reached. Intel scientists say
it will happen in about the year
2018.
Require a successor to silicon.
The Beginning
Francis Crick & James D. Watson.
co-discoverers of the structure of
the DNA molecule in 1953
Molecular Biology of the
Cell
Cellular structures and processes result from a complex
interaction network of biological molecules
Carbohydrates
Lipids
Proteins
DNA - Data and Code
DNA doesn't just make proteins , it has
instructions on how the system should behave
Human and chimpanzee DNA is 98.5 percent identical
DNA of humans and mice is only around 60 percent
similar
Dense Information Storage
10 μm 0.84 μm
2 nm 11 nm
DNA Molecules
Important features
◦ 3 basic parts for each
◦ Numbering carbons
◦ 5’: unattached phosphate group, 3’: unattached
hydroxyl group
◦
◦
A T
G C
DNA Bases
Important features
◦ Complementarity
◦ Purines vs. pyrimidines
◦ Hydrogen bonds
◦ Phosphodiester bonds
◦ Antiparallelism
◦ Natural direction
◦
DNA Molecules
Simplest representation
◦ 5’ – CGTGTTCGAAGCCC – 3’
◦ 3’ – GCACAAGCTTCGGG – 5’
Important features
◦ Representation
◦ Complementarity
◦ Directionality
Sticky ends
◦ 5’ – CGTGTTCGA – 3’
◦ 3’ – GCACA – 5’
Manipulating DNA
1) Denaturation (melting)
2) Annealing (renaturation)
Manipulating DNA
3) Polymerase extension
◦
5’ – TCGATT – 3’ (primer)
3’ – AGCTAACTT – 5’ (template)
◦
5’ – TCGATTG – 3’
3’ – AGCTAACTT – 5’
◦
5’ – TCGATTGA – 3’
3’ – AGCTAACTT – 5’
◦
5’ – TCGATTGAA – 3’
3’ – AGCTAACTT – 5’
◦
Manipulating DNA
4) Nuclease degradation
◦
5’ – TCGATTGAA – 3’ 5’ – TGAATTCCG – 3’
3’ – AGCTAACTT – 5’ 3’ – ACTTAAGGC – 5’
5’ – TCGATTGA – 3’ 5’ – TG– 3’ 5’ –
3’ – GCTAACTT – 5’ AATTCCG – 3’
3’ – ACTTAA – 5’
5’ – TCGATTG – 3’ 3’ – GGC – 5’
3’ – CTAACTT – 5’
5’ – TGCCCGGGA – 3’
5’ – TCGATT – 3’ 3’ – ACGGGCCCT – 5’
3’ – TAACTT – 5’
5’ – TGCCC – 3’ 5’
– GGGA – 3’
3’ – ACGGG – 5’ 3’
– CCCT – 5’
Manipulating DNA
5) Ligation
OH
P
5’ – TC
GATTGAA – 3’
3’ – AGCTAA
CTT – 5’
P
OH
OH P
5’ – TCGATTGAA – 3’
3’ – AGCTAACTT – 5’
OH
P
Manipulating DNA
6) Amplification
1. Denaturatation
2. Add primers
3. Annealling
Manipulating DNA
6) Amplification (cont’d)
4. Polymerase extension
Manipulating DNA
7) Gel electrophoresis
Manipulating DNA
8) Modify nucleotides – insert, delete, substitute
9) Filtering – magnetic bead separation
10) Synthesis of a single strand
11) Sequencing
DNA manipulations:
Ifwe want to use DNA as an
information bulk, we must be able to
manipulate it .
However we are talking of handling
molecules…
ENZYMES = Natural CATALYSERS.
So instead of using physical processes,
we would have to use natural ones,
more effective:
◦ for lengthening: polymerases…
◦ for cutting: nucleases (exo/endo-
nucleases)…
◦ for linking: ligases…
Serialization: 1985: Kary Mullis PCR
Thank this reaction we get millions of identical
Video
DNA Machine
Introduction to DNA
Computing
What is DNA computing ?
◦ Around 1950 first idea (precursor
Feynman)
Molecularlevel (just greater than
10-9 meter)
Massive parallelism.
◦ In a liter of water, with only 5
grams of DNA we get around 1021
bases !
◦ Each DNA strand represents a
processor !
DNA Computing Begins...
1970s
◦ Much speculation
◦
1994
◦ Leonard Max Adleman
◦ “Molecular Computation Solutions to
Combinatorial Problems”
◦ Used DNA computing to solve an NP-complete
problem: Hamiltonian Path Problem
2 3
0
5 4
1
Hamiltonian Path Problem
Instance of the HPP solved by Adleman
◦
2 3
0
5 4
1
Adleman’s HPP Solution
Adleman translated this solution step-by-step
into molecular biology
Encoded each vertex as a single stranded
nucleotide of length 20 – randomized codes
Each possible edge synthesized
Connect edges by enzymatic ligation
TGAATCCGACGTCCAGTGA
v
ATGAACTATGGCACGCTATC
v2
1
GCAGGTCACT
TACTTGATAC
e12
Adleman’s HPP Solution
Adleman’s HPP Solution
DNA Classical
Operations 106 - 1012 1014 - 1020
(per second)
Energy used 2*1019 109
(operations per joule) Theoretical:
Storage size of one bit 10 12 19
34*10 1
(per cubic nanometer)
SAT Problem
Satisfiability problem for prepositional
formulae
Logical variables E = {e1, e2, …, en}
Clauses Cj = {e1j, e2j, …, enj} joined by AND,
OR, NOT
Problem:
◦ Given C1 ^ C2 ^ … ^ Cm assign a Boolean value to
each variable such that the entire statement is
TRUE
◦ NP-complete!
Lipton’s SAT Solution
Possibly represent this as a graph search
problem
Two phases:
◦ 1) Generate all paths in the graph
◦ 2) Search (filter) for truth assignment set that
satisfies formula
◦ Basically, same principles as Adleman
v0 v1 v2
… vn-1 vn
Gene regulation gives the cell control over structure and function
Metaprogramming
Gene Expression
Transcription
Translation
North America -
Blue
Latin American and
Caribbean - Yellow
RFC1918 IP
Addresses - Cyan
Unknown - White
DNA Computer
Given enough strands of DNA and
certain biological operations
DNA can model 1-tape
nondeterministic Turing machine
DNA compare to formulas
DNA can work like a state
machine
DNA Logic Gates
DNA can work like a state
machine
Catalytic DNA or DNAzyme
DNAzymes are used to build logic
gates
DNAzymes are limited to 1-, 2-,
and 3-input gates
DNA Multiplication
DNA Multiplication
0 0 0 1 0 1 0 0 1 0
Memory complex:
Semi-double
Soup of stickers:
=
A G C A T G A T
Zoom
DNA Associative Memory
About a stickers machine?
Simple operations: merge, select,
detect, clean.
Tubes are considered (cylinders
with two entries)
However for a mere computation
(DES):
◦ Great number of tubes is needed
(1000).
◦ Huge amount of DNA needed as
well.
Practically no such machine has
Technological Developments
Molecular Computing
Allowing proteins to fold producing
computation
Protein Folding
Mainly guided by:
Hydrophobic interactions
Intramolecular hydrogen bonds
Van der Waals forces
Protein Folding
Folding is a free energy minimization process
that depends on the interactions among
amino acids
Protein change as fast as femtoseconds (10-15
sec)
Folding Proteins
All proteins begin to fold into three–
dimensional structures after synthesis
These structures gives proteins its
functionally (lock and key receptors)
Folding is a free energy minimization process
Protein Folding Problem
Considered to be an NP-complete problem
Massively parallel computers to derive
solutions by brute force have failed
Molecular pathway too complex
Genetic Algorithms do better but cannot
guarantee polynomial time, fitness
relies on structure, and since the
structure is not known you have the
“termination problem” in GA
Protein Folding Problem
Protein with mere 25 amino acids requires 94,502
years to solve
No way of knowing if a GA terminated with an optimal
solution
Protein based computing
Different architectures and computing
dimensions
Non-von Neumann, non-serial and non-silicon
Context dependent
Input processed as dynamical physical
structures, patterns, or analog symbols
Multidimensional conditions
Temperature, pH, ionic concentrations, voltage,
dipole moment, electroacoustical vibration,
phosphorylation or hydrolysis state,
conformational state of bound neighbor
proteins, etc.
Proteins integrate all this
Protein Folding Matrix
Computer
Use Rose scale matrix
Let the protein folding solve large matrix
problems
Folding Proteins
Applications
Possible to generate a vast combinatorial of
different protein shapes just by changing
the DNA base sequence
Encrypting data (lock and key)
Decrypting data
Encryption breaking
Pattern Recognition
D N A C O M P U T E R V s S ILIC O N C O M P U T E R