On the Application of DNA Based Computation


Joel C. Adams
jadams@julian.uwo.ca

Candidate for HBA 2000, BESc (CE) 2000
Richard Ivey School of Business / Faculty of Engineering Science 
University of Western Ontario
London, Ontario
Canada


| Abstract | Introduction |
| Massively Parallel Processors | NP-Complete Problems |
| Storage and Associative Memory | DNA to DNA Applications |
| Implications to Biology, Chemistry, and Medicine | Implications to Computer Science |
| Conclusions | References |

Abstract

An overview and categorization of existing research in DNA based computation, the possible advantages that different models have over conventional computational methods, and potential applications that might emerge from, or serve to motivate, the creation of a working Biomolecular Computer. The emerging paradigms of DNA based (or biomolecular) computation has yet to find a practical use, despite many possible advantages over existing computational methods. Possible applications are discussed for different models of DNA computation as well as their relative viability. Assumes that the reader has a basic familiarity with DNA based computing and Adleman's original work, but does not necessarily have a strong background in either computer science or biochemistry.

Introduction

The practical possibility of using molecules of DNA as a medium for computation was first demonstrated by Adleman in 1994 [1]. Adleman's primary intention was to prove the feasibility of biomolecular computation but his work also gave an indication that the emergence of this new computational paradigm could provide an advantage over conventional electronic computing techniques. Specifically, DNA was shown to have massively parallel processing capabilities that might allow a DNA based computer to solve hard computational problems in a reasonable amount of time. However, Adleman's brute force search algorithm is not, and was never meant to be, a practical means of solving such problems; the volume of material required was found to increase exponentially as the complexity of the problem was increased.

After Adleman successfully solved a directed Hamiltonian path problem using the tools of biomolecular engineering, others followed, applying similar algorithms to other hard computation problems, as well as devising more efficient computational schemes. A number of theoretical models for creating a universal DNA computer have been developed, and mathematical proofs have shown that some models of DNA computing are at least equivalent to a classical Turing machine. Other work has concentrated on improving error rates and identifying specific types of bio-operations and molecular engineering techniques that could be harnessed for computational and storage-related purposes. A limited amount of work has been directed at real-life applications and the practical feasibility of DNA computers.

While the practical benefits of DNA based computational schemes are still questionable and the vast majority of work to date has been theoretical, there have been many allusions to potential uses of this emerging computational paradigm. The usefulness of developing techniques of DNA computing and ultimately developing working DNA computers can be described as falling into one of the following three general categories:

  1. Applications making use of "classic" DNA computing schemes where the use of massive parallelism holds an advantage over traditional computing schemes, including potential polynomial time solutions to hard computational problems;
  2. Applications making use of the "natural" capabilities of DNA, including those that make use of informational storage abilities and those that interact with existing and emerging biotechnology;
  3. Contributions to fundamental research within both computer science and the physical sciences, especially concerning exploring the limitations of computability and to understanding and manipulating biomolecular chemistry.
Each of these categorizations can be further subdivided into different areas and specific models. As well, there is a great deal of overlap between the different categories. By viewing the field of DNA computation as a number of interrelated and exciting new paradigms, the overall qualities and potential for this area of research can be seen.

Massively Parallel Processing

The primary advantage offered by most proposed models of DNA based computation is the ability to handle millions of operations in parallel. The massively parallel processing capabilities of DNA computers may give them the potential to find tractable solutions to otherwise intractable problems, as well as potentially speeding up large, but otherwise solvable, polynomial time problems requiring relatively few operations. The use of DNA to perform massive searching and related algorithms will be referred to as "classic" DNA computation for the purposes of this discussion.

Proposed "classical" models of DNA computers derive their potential advantage over conventional computers from their ability to:

These models also have some of the following drawbacks: With these qualities in mind, the comparison between conventional computing and "classic" DNA computation comes down to one of depth versus breadth. A working DNA based computer might hold an advantage over conventional computers when applied to decomposable problems, those problems that are able to be divided into separate, non-sequential tasks, because they can hold so much data in memory and conduct so many operations at once. However, due to the length of time required to conduct the biochemical operations, non-decomposable problems, those requiring many sequential operations, are likely to remain much more efficient on a conventional computer. [2]

Within the paradigm of "classic" DNA computation there exists many different models, each with different advantages and degrees of applicability to classes of problems. An example of one model differing from Adleman's original search algorithm is the surface-based DNA algorithm used to solve the minimal set cover problem in [8]. This technique proposes to decrease the potential for error from lost DNA strands by affixing them to a surface of glass or silicon. The efficiency of their algorithm is dependent on the method of encoding quantity by strand length so the authors have speculated that such a model might be inappropriate for application to problems where the validity of a solution is not proportional to its size. A surface-based approach to DNA computation was also considered in [13], which suggests the products of bit operations could be identified using optical readers scanning for the relative surface position of hybrid double strands consisting of a previously unknown bitstring and a value being held by that bit. The ability to read output in this manner may drastically increase the feasibility of implementing a DNA computer outside the conventional laboratory setting. It might also encourage the development of DNA/Electronic computer hybrids, with different problem types being handled by different components, and the electronic portion conveying the output to the user. Another model [10] makes use of 3-Dimensional DNA structures to reduce errors and necessary steps. This method of using the structral properties of DNA may also lead to more efficient storage mechanisms.

Classical DNA computing techniques have already been theoretically applied to a real life problem: breaking the Data Encryption Standard (DES). Although this problem has already been solved using conventional techniques in a much shorter time than proposed by the DNA methods, the DNA models are much more flexible, potent, and cost effective.

DES is a method of encrypting 64-bit messages with a 56-bit key, used extensively in the United States. Electronic keys are normally a string of data used to code and/or decode sensitive messages. By finding the appropriate key to a set of encrypted messages, one can either read encoded messages or pose as the sender of such messages. Using a special purpose electronic computer and differential cryptanalysis, it has been shown that the key to DES can be found in several days. However, to do so would require 2^43 examples of corresponding encrypted and unencrypted messages (known as plain-text/cipher-text pairs) and would slow down by a factor of 256 if the strength of the encrypting key was increased to 64-bits. In [6] it is proposed that DES could be broken using a DNA based computer and a search algorithm similar to Adleman's original technique. This procedure would be expected to take 4 months, but would only need a single plain-text/cipher-text pair or an example of cipher text with several plain text candidates to be successful. The feasibility of applying DNA computation to this problem was also addressed in [3] using a more refined algorithm (the sticker model approach) which enabled the researchers to suggest that they could solve the problem using less than a gram of DNA, an amount that could presumably be handled by a desk top sized machine. Both models would likely be more cost and energy effective than the expensive electronic processors required by conventional means, but are entirely theoretical. The first model ignores error rates incurred though laboratory techniques and the inherent properties of the DNA being used. The second model requires an error rate approaching .0001, with higher rates substantially affecting the volume of DNA required. Despite these assumptions, these models show that existing methods of DNA computation could be used to solve a real life problem in a way that is both practical and superior to methods used by conventional computers. In [3] it is also demonstrated that such benefits can be obtained despite error rates that would be unacceptable in an electronic computer and that may be unavoidable in a molecular one.

Solving NP-Complete and Hard Computational Problems

After Adleman's and Lipton's initial results, much of the work on DNA computing has continued to focus on solving NP-complete and other hard computational problems. Problems in NP-complete are such that there is no polynomial time solution known to exist using conventional computer algorithms. That is, as the complexity of these problems increase, the time required to solve them increases at an exponential rate. These problems are also said to be intractable, but, if within the domain of NP-complete a tractable solution can be found to one of these problems then it can also be used to solve for all other problems in the set. Adleman's original and subsequent works demonstrated a tractable solution to the directed Hamiltonian path problem, similar to the more familiar traveling salesperson problem. It has since been shown that a direct polynomial time solution can be obtained for any NP-complete problem using similar techniques of DNA computation [14]. Unfortunately, many of the initial techniques following Adleman's original brute force searching algorithm are able to solve problems in polynomial time but at the expense of exponential increases in volume. Other papers have established relationships between volume and accuracy which may be able to be used as the basis of determining how the costs and benefits of DNA computing compare to conventional computers for solving different classes of problems [9].

The ability to obtain tractable solutions to NP-complete and other hard computational problems has many implications to real life problems, particularly in business planning and management science. Many of the cost optimization problems faced by managers are in NP-complete are currently solved using heuristic methods and other approximations. These problems include scheduling, routing, and optimal use of raw materials and correspond to problems already solved, either theoretically or experimentally, using DNA computation.

Current and near-term advances in laboratory technology preclude the use of DNA computation as a method of solving problems in real time. This restricts the possible application of classical DNA computing to areas where the calculation of optimal solutions could be performed over a number of days, weeks, or months. This would preclude many otherwise obvious uses in expert systems, but would be able to be applied to long term production planning where initial cost commitments are very high such as chip design and manufacturing. DNA computing could also be applied extensively to optimizing airline and bus routes for planning purposes.

To address these practical instances of hard computational problems, DNA computers could be implemented in a variety of different capacities. In terms of hardware, the most effective solution might come from concentrating efforts on problem specific processors rather than universal DNA computers. Such problem specific processors could focus on problems whose solution will have the greatest cost benefit while also allowing for the time necessary to conduct the bio-operations.

Different models might also be used that more closely mirror the current iterative approximation techniques used to currently solve these problems. An efficient model of DNA computing has been developed to make use of dynamic programming, solving successively harder instances of a problem until the required problem has been solved [5]. In addition to using massive parallelism this method makes use of the large size of memory available to DNA computers. In this way, a solution to an instance of the knapsack problem was proposed, an NP-complete problem with implications to optimal use of materials and packaging. In [9] a general technique of constructing approximations to NP-optimization problems using DNA computing was proposed. Such methods of approximation may be relevant to instances where an exact solution is not absolutely necessary but the problem is sufficiently large enough to preclude conventional electronic techniques or other, more exact models of DNA computing.

If the use of DNA is found to be a clearly superior method of solving large instances of hard computational methods but bio-molecular techniques continue to dictate solutions times much longer than electronic based approximation methods, DNA computers could be used for off-line benchmarking of existing heuristic algorithms, which would continue to used on a more frequent basis. In such a scheme, a DNA computer could be used to initially optimize the production scheduling for a proposed product. This step may take several weeks or months to complete before production is ready to begin. The results of this initial optimization could then be compared to various approximation methods. The method found to most closely approximate the optimal solution could then be used on a more active basis to account for changes in supply and demand parameters so that the production will continue to run at near optimal levels.

Storage and Associative Memory

DNA might also be used to mirror, and even improve upon, the associative capabilities of the human brain. In [4] Baum proposed a method for making a large content addressable memory using DNA. A truly content addressable memory occurs when a data entry can be directly retrieved from storage by entering an input that most closely resembles it over other entries in memory. This input may be very incomplete, with a number of wildcards, and in an associative memory might even contain bits that do not actually occur within the closest match. This contrasts with a conventional computer memory, where the specific address of a word must be known to retrieve it. Rather, the use of this technique would replicate what is thought by many to be a key factor in human intelligence.

Baum's models for a DNA associative memory are quite simple, and build from the techniques used in other areas of DNA computation. Storing a word could be done by assigning a specific DNA subsequence to each component value pair and building a fixed length word from these subsequences. To then retrieve the word closest to the input, one would introduce marked complementary subsequences into the storage medium and chose the molecule that has the most matches to this input. This technique could be further refined to more closely approximate the brain by appending words to only store attributes that an object has, rather than wasting space using '0's to represent attributes that an object does not have.

Baum has further speculated that a memory could be constructed where only portions of the data are content-addressable and associative, with other information on an object compactly stored in addresses relative to the associative portion of the entry. To save on operating costs and reduce error frequency, this portion of the memory could be kept in double-stranded form.

Considering the brain's limit of about 10^15 synapses and Feynman's low-end estimate that the brain can distinguish about 10^6 concepts, such a DNA based associative memory could have certain advantages over the brain. Without accounting for redundant molecules, Baum estimates that a large bath tub of DNA, about 50g in 1000L, could hold over 10^20 words. In attempting to come up with practical uses for this memory scheme one will have to weigh the massive size and ability to retrieve in an associative manner against the slow retrieval times necessitated by current biomolecular engineering techniques. And, although Baum's simplistic approach has accounted for some error rates, his initial paper remains quite speculative.

DNA2DNA Applications

Another area of DNA computation exists where conventional computers clearly have no current capacity to compete. This is the concept of DNA2DNA computations as suggested in [12] and identified as a potential killer app. DNA2DNA computations involve the use of DNA computers to perform operations on unknown pieces of DNA without having to sequence them first. This is achieved by re-coding and amplifying unknown strands into a redundant form so that they can be operated on according to techniques similar to those used in the sticker model of DNA computation. Many of the errors inherent in other models of DNA computing can hopefully be ignored in DNA2DNA computing because there will be such a high number of original strands available for operations.

The potential applications of re-coding natural DNA into a computable form are many and include:

In the case of DNA mutation detection, the strand being operated on would already be partially known and therefore fewer steps would need to be taken to re-code the DNA into a redundant form applicable for computational form.

There are other models of DNA computation that suggest that DNA might be used to detect and evaluate other chemical and bio-chemical substances. In [16] it is suggested that nucleic acid structures, in addition to nucleic acid sequences, could play an important role in molecular computation. Various shapes of folded nucleic acids can be used to detect the presence of drugs, proteins, or other molecules. It is also suggested that selected or engineered ribozymes could be used as operators to effect re-write rules and to detect the presence of such non-nucleic acid molecules. Using these structures and operators to sense levels of substances, it would then be possible to compute an output readable using proposed biosensors that detect fluorescence or polarization. These biosensors could potentially allow communication between molecular sensory computers and conventional electronic computers.

Implications to Biology, Chemistry, and Medicine

While the development of DNA computational methods may have many directly applicable applications, the biggest contribution of research in this area may be much more fundamental and will likely fuel many indirect benefits. In many papers, including [2] and [16] it is stressed that high levels of collaboration between academic disciplines will be essential to affect progress in DNA computing. Such collaboration may very well lead to the development of a DNA computer with practical advantages over a conventional computer but has an even greater likelihood of contributing to an increased understanding of DNA and other biological mechanisms. The need for additional precision could effect progress in biomolecular techniques by placing demands on bio-chemists and their tools that might not otherwise be considered.

A particular area within the natural and applied sciences that may benefit from advances in DNA computation is combinatorial chemistry. Combinatorial chemistry involves the construction of enzymes, sequences of RNA, and other molecules, particularly for use in biomolecular engineering or medicine. Adleman [2] describes this process as being similar to "classic" models of DNA computation, as combinatorial chemistry involves generating large sets of random RNA sequences and searches for molecules with the desired properties. Advances in either area could easily benefit the other field or even pave a way to combining the two fields, producing both products and related computational results in parallel.

Several papers also extend the use of biomolecular computing into applications in the emerging science of nanotechnology, specifically nano-fabrication, making use of both the small scale computational abilities of DNA and the manufacturing abilities of RNA [15, 17]. Since both fields are still very embryonic, the practical or even experimental implementation of this use is still highly speculative but promising.

Applying the techniques of DNA2DNA computing could result in improved laboratory interfaces capable of performing computations prior to input into conventional computers and may lead to improved methods of sequencing DNA by motivating the use of magnetic beads, optical scanners, and other emerging techniques that may allow DNA to be read directly into an electronic interface.

DNA's Role in Computer Science

The study of DNA as both a natural storage medium, and as a tool of computation, could shed new light on many areas of computer science. Despite the high error rates encountered in DNA computing, in nature DNA has little understood but resilient mechanisms for maintaining data integrity. By studying genetic code for these properties, new principles of error correction may be discovered, having use within conventional electronic computation in addition to the new biomolecular paradigms [7].

With the advent of DNA computational paradigms, especially those directed towards improved methods of solving NP-hard problems, will come an increased understanding about the limitations of computation. DNA based computers may postpone certain expected thermodynamic obstacles to computation as well as exploring the limitations of Turing machines and questioning theories of computation based on electronic and mechanical models.

Developments in parallel processing algorithms to take advantage of the massive parallelism of "classic" DNA based computation may serve as a foundation for future developments in other parallel processing architectures using more conventional electronic computers.

DNA based computing may also be able to contribute to other unconventional areas of computation. Through the use of its massive parallelism and potentially non-deterministic mechanisms, DNA based models of computation might be useful for simulating or modeling other emerging computational paradigms, such as quantum computing, which may not be feasible until much later. As well, the trials and tribulations experienced by DNA computing researchers may very well be related to the future obstacles expected to face other revolutionary ideas within computer science and information systems.

Conclusions and Future Outlook

With so many different methods and models emerging from the current research, DNA computing can be more accurately described as a collection of new computing paradigms rather than a single focus. Each of these different paradigms within biomolecular computing can be associated with different potential applications that may prove to place them at an advantage over conventional methods. Many of these models share certain features that lend them to categorization by these potential advantages. However, there exists enough similarities and congruencies that hybrid models will be possible, and that advances made in both "classic" and "natural" areas of DNA computing will be mutually beneficial to both areas of research. Advancements in DNA computing may also serve to enhance understanding of both the natural and computer sciences. For these reasons, and due to the many areas dependent on each of computer science, mathematics, natural science, and engineering, continued interdisciplinary collaboration is very important to any future progress in all areas of this new field.

A "killer app" is yet to be found for DNA computation, but might exist outside the bulk of current research, in the domain of DNA2DNA applications and other more natural models and applications of manipulated DNA. This direction is particularly interesting because it is an area in which DNA based solutions are not only an improvement over existing techniques, but may prove to be the only feasible way of directly solving such problems that involve the direct interaction with biological matter.

On the "classical" front, problem specific computers may prove to be the first practical use of DNA computation for several reasons. First, a problem specific computer will be easier to design and implement, with less need for functional complexity and flexibility. Secondly, DNA computing may prove to be entirely inefficient for a wide range of problems, and directing efforts on universal models may be diverting energy away from its true calling. Thirdly, the types of hard computational problems that DNA based computers may be able to effectively solve are of sufficient economic importance that a dedicated processor would be financially reasonable. As well, these problems will be likely to require extensive time the would preclude the need for a more versatile and interactive system that may be able to be implemented with a universal computing machine.

Even if the current difficulties found in translating theoretical DNA computing models into real life are never sufficiently overcome, there is still potential for other areas of development. Future applications might make use of the error rates and instability of DNA based computation methods as a means of simulating and predicting the emergent behavior of complex systems. This could pertain to weather forecasting, economics, and lead to more a scientific analysis of social science and the humanities. Such a system might rely on inducing increased error rates and mutation through exposure to radiation and deliberately inefficient encoding schemes. Similarly, methods of DNA computing might serve as the most obvious medium for use of evolutionary programming for applications in design or expert systems. DNA computing might also serve as a medium to implement a true fuzzy logic system.

With so many possible advantages over conventional techniques, DNA computing has great potential for practical use. Future work in this field should begin to incorporate cost-benefit analysis so that comparisons can be more appropriately made with existing techniques and so that increased funding can be obtained for this research that has the potential to benefit many circles of science and industry.

Acknowledgments

Prof. Peter Bell, at The Richard Ivey School of Business and past president of the International Federation of Operation Research Scientists (IFORS), for his insights into the applicability of NP-Hard solutions to business problems and how improved methods of computation might be used to benchmark existing heuristic based solutions. Also to Prof. Lila Kari, at UWO's Department of Computer Science, for first introducing me to DNA computation at her symposium "From Micro-soft to Bio-soft" and furthering my knowledge through her course CS 881b Unconventional Computing I: DNA (Biomolecular) Computing.

References

  1. L. Adleman. Molecular Computation of Solutions to Combinatorial Problems. Science. v.266. Nov. 1994. 1021-1024.
  2. L. Adleman. On Constructing a Molecular Computer. 1st DIMACS workshop on DNA based computers. Princeton. 1995. In DIMACS series. vol.27 (1996). 1-21.
  3. L. Adleman. P. Rothemund. S. Roweis. E. Winfree. On Applying Molecular Computation To The Data Encryption Standard. 2nd DIMACS workshop on DNA based computers. Princeton. 1996. 28-48.
  4. E. Baum. A DNA Associative Memory Potentially Larger than the Brain. 1st DIMACS workshop on DNA based computers. Princeton. 1995. In DIMACS series. vol.27 (1996). 23-28.
  5. E. Baum. D. Boneh. Running dynamic programming algorithms on a DNA computer. 2nd DIMACS workshop on DNA based computers. Princeton. 1996. 141-147.
  6. D. Boneh. C. Dunworth. R. Lipton. Breaking DES Using a Molecular Computer. 1st DIMACS workshop on DNA based computers. Princeton. 1995. In DIMACS series. vol.27 (1996). 37-65.
  7. M. Deputat. G. Hajduczok. E. Schmitt. On Error-Correcting Structures Derived form DNA. 3rd DIMACS workshop on DNA based computers. June 1997. 223-229.
  8. T. Eng. B. Serridge. A Surface-Based DNA Algorithm for Minimal Set Cover. 3rd DIMACS workshop on DNA based computers. June 1997. 74-82.
  9. B. Fu. R. Beigel. On Molecular Approximation Algorithms for NP Optimization Problems. 3rd DIMACS workshop on DNA based computers. June 1997. 93-101.
  10. N. Jonoska. S. Karl. M. Saito. Creating 3-Dimensional Graph Structures with DNA. 3rd DIMACS workshop on DNA based computers. June 1997. 192-203.
  11. L. Kari. From Micro-soft to Bio-soft. 1997.
  12. L. Landweber. R. Lipton. DNA2DNA Computations: A Potential "Killer App"? 3rd DIMACS workshop on DNA based computers. June 1997. 59-68.
  13. T. Leete. J. Klein. H. Rubin. Bit Operations Using a DNA Template. 3rd DIMACS workshop on DNA based computers. June 1997. 159-165.
  14. R. Lipton. Speeding Up Computations via Molecular Biology. 1st DIMACS workshop on DNA based computers. Princeton. 1995. In DIMACS series. vol.27 (1996). 67-74.
  15. J. Reif. Local Parallel Biomolecular Computation. 3rd DIMACS workshop on DNA based computers. June 1997. 243-258/
  16. M. Robertson. A. Ellington. New directions in nucleic acid computing: Selected ribozymes that can implement re-write values. 3rd DIMACS workshop on DNA based computers. June 1997. 69-73.
  17. E. Winfree. On the Computational Power of DNA Annealing and Ligation. 1st DIMACS workshop on DNA based computers. Princeton. 1995. In DIMACS series. vol.27 (1996). 199-215.
Draft 1.0 1998 by Joel Adams. Prepared for presentation to Prof. Lila Kari in CS 881b, Department of Computer Science, University of Western Ontario, Canada