[ad_1]
Join Transform 2021 for the most important topics in AI and business data. Learn more.
Researchers affiliated with Nvidia and Harvard today detailed AtacWorks, a machine learning toolkit designed to reduce the cost and time required for rare, single-cell experiments. In a study published in the journal Nature communications, the co-authors showed that AtacWorks can perform whole genome analyzes in just half an hour compared to the hours that traditional methods take.
Most cells in the body carry a complete copy of a person’s DNA, with billions of base pairs crammed into the nucleus. But an individual cell only removes the subsection of the genetic components it needs to function, with cell types like liver, blood, or skin cells using different genes. The regions of DNA that determine the function of a cell are easily accessible, more or less, while the others are protected around proteins.
AtacWorks, which is available from Nvidia’s NGC hub of GPU-optimized software, works with ATAC-seq, a method for finding open areas in the genome in cells pioneered by Harvard Professor Jason Buenrostro, one of the co-authors of the article. ATAC-seq measures the intensity of a signal at each location in the genome. The peaks in the signal correspond to regions with DNA, so the fewer cells available, the louder the data appears, making it difficult to identify accessible areas of DNA.
ATAC-seq typically requires tens of thousands of cells to get a clean signal. The AtacWorks app produces the same quality of results with just dozens of cells, according to the coauthors.
AtacWorks was trained on labeled pairs of matching ATAC-seq datasets, one high quality and one noisy. Given an undersampled copy of the data, the model learned to predict an accurate high-quality version and identify peaks in the signal. Using AtacWorks, researchers found they could spot accessible chromatin, a complex of DNA and proteins whose main function is to package long molecules into more compact structures, in a noisy sequence of 1 million reads. almost as good as traditional methods with a clean data set. of 50 million reads.
AtacWorks could allow scientists to conduct research with a smaller number of cells, reducing the cost of sample collection and sequencing. The scan could also become faster and cheaper. Running on Nvidia Tensor Core GPUs, AtacWorks took less than 30 minutes for genome inference, a process that would take 15 hours on a system with 32 processor cores.
In the Nature communications paper, Harvard researchers applied AtacWorks to a dataset of stem cells that produce red and white blood cells – rare subtypes that could not be studied with traditional methods. With a sample set of just 50 cells, the team was able to use AtacWorks to identify distinct regions of DNA associated with cells that develop into white blood cells and separate sequences that correlate with red blood cells.
“With very rare cell types, it is not possible to study the differences in their DNA using existing methods,” said Nvidia researcher Avantika Lal, first author of the article. “AtacWorks can not only help reduce the cost of collecting data on chromatin accessibility, but also open up new possibilities in drug discovery and diagnostics.”
VentureBeat
VentureBeat’s mission is to be a digital city place for technical decision-makers to gain knowledge about transformative technology and conduct transactions. Our site provides essential information on data technologies and strategies to guide you in running your organizations. We invite you to become a member of our community, to access:
- up-to-date information on the topics that interest you
- our newsletters
- Closed thought leader content and discounted access to our popular events, such as Transform 2021: Learn more
- networking features, and more
Become a member
[ad_2]
Source link