【DNA数据存储或成未来新趋势】

【DNA数据存储或成未来新趋势】如何将整个互联网的信息存入一个鞋盒大小的空间?研究者表示,DNA——生物体内的存储遗传信息的有机物——有望成为未来数据存储的新载体。据称,拥有四种碱基的DNA有着超高的信息密度,仅1克DNA就可存储2.15PB的数据(1PB=1024TB)。目前,研究人员已成功合成了存储1000兆可读信息的DNA分子。未来,如何使DNA信息输入和输出的流程自动化,将是科研人员面临的挑战之一。

DNA Data Storage is About to Go Viral

John Cumbers

In 1862, Gregor Mendel bred pea plants to study inheritance. Fast forward 100 years to 1962, James Watson, Frances Crick, and Maurice Wilkins were awarded a Nobel Prize for discovering the structure of DNA. Today, advances in this field are spilling over into the most unlikely places.

As we enter the century of biotechnology, our ability to read, write, and edit DNA is disrupting everything from human health to manufacturing. The next disruption to take place could be in the world of data storage.

Tech giants including Facebook and Amazon and their millions of users generate petabytes of data on the Internet every second. Microsoft has been quietly working in the background to store this information in As, Ts, Cs, and Gs, and instead of 0s and 1s.

“Think of compressing all the information on the accessible Internet into a shoebox,” says Karin Strauss, a principal researcher at Microsoft. “With DNA data storage, that’s possible.”

Strauss is working with Luis Ceze, a professor of computer science and engineering at the University of Washington, to wield DNA for data storage and computing. Using synthetic DNA molecules, the team has successfully stored over one gigabyte of readable information, including various forms of media such as the top 100 books from Project Gutenberg, a high-definition OK Go music video, and the #MemoriesInDNA project.

The information density of DNA is remarkable — just one gram can store 215 petabytes, or 215 million gigabytes, of data. For context, the average hard drive in a laptop can house just one millionth of that amount.

“We encode all data at a molecular level, making it as small as possible, and store it in a medium that will last for quite a while and not become obsolete, like floppy disks, because of its eternal relevance for life,” says Strauss.

The Rise of DNA Data Storage

Improved techniques for reading and writing DNA, including an increase in the length of strands of DNA usable for these purposes, have facilitated the rapid increase in the amount of possible data storage in DNA. 

In addition to pioneering high-density data storage, Ceze and Strauss also conducted a similarity search between images using DNA, and recently created the first fully-automated, writing-to-reading DNA storage system. 

“We’re trying to make computers better with a systematic approach that finds great alternatives and solutions in nature,” says Strauss. “The computational approaches facilitated by working with DNA make it an even more attractive option for data storage,” adds Ceze. “We have the freedom to choose how to map bits to DNA sequences, creating redundancy and high tolerance to error when reading and writing DNA.”

How does this technology work? It’s surprisingly simple. Data is first translated from a code of 0s and 1s to As, Ts, Cs, and Gs. This genetic code is then synthesized into an actual molecule (with the help of Twist Bioscience for the Microsoft Research-UW team), and the “encoding” process is complete.

Retrieving data is a bit more complex. Two steps — “processing” and “decoding” — must occur. Simulating random-access memory (RAM), a polymerase chain reaction (PCR, a common laboratory protocol for copying DNA) hones in on a targeted section of the sequence, which is then replicated, sequenced, decoded, and adjusted for errors to retrieve the original data. This targeted approach is efficient because it involves only the desired sequence, rather than the entire dataset.

The rise of DNA data storage, previously the stuff of science fiction, is being made possible by advances in biotechnology, particularly improvements in high-throughput DNA sequencing and synthesis. Also, because these bio-programmers control what materials enter their experiments, and their sequences do not need to be meticulously engineered to function within a living organism, there are fewer overhead costs compared to typical life science experiments. The journey has not been without roadblocks, however. Despite dramatic improvement, working with DNA can be slow and expensive. Further streamlining is still needed.

“Automation was, and is, one of our biggest challenges,” says Strauss. “It was great to have our first proof of concept converting information from bits, to DNA, and back to bits to prove that it was possible and also show what are our other challenges in automation, but some of the biotechnology aspects are quite new to some of us, so we’ve also been learning a lot there. The other significant challenges are continuing to increase throughput and decrease the cost for DNA sequencing and synthesis. There’s quite a bit of engineering left to get [us] to where we need to be.”

The interdisciplinary Microsoft and UW team sees value in its diverse background. “It’s extremely exciting that this is at the intersection of biotech and computing,” says Ceze. “These areas have been feeding off each other.”

“If the technology continues to advance the way we see it right now,” he says, “I think it’s conceivable that we will see DNA storage as a form of archival for the general public within the decade.”

Thank you to Aishani Aatresh for additional research and reporting in this post. Aishani is a senior at Saint Francis High School in Mountain View, CA, a TEDx speaker and event director, co-founder of LancerHacks, and researcher at Distributed Bio developing computational immunoengineering methods to generate superior antibodies. 

Disclaimer: I am the founder of SynBioBeta, the innovation network for the synthetic biology industry. Some of the companies that I write about are sponsors of the SynBioBeta conference (click here for a full list of sponsors).

 

原文链接:https://www.forbes.com/sites/johncumbers/2019/08/03/dna-data-storage-is-about-to-go-viral/#6342667a7721

 


Comments are closed.



无觅相关文章插件