Algorithms to Recover Original Nucleotide Sequence from Chaos Game Representation — 109p — Corey Zhao, Bichar Dip Shrestha Gurung
The Chaos Game Representation (CGR) is a tool to visualize genetic information in a two-dimensional, fractal image that represents intrinsic patterns inside the nucleotide sequences, allowing alternate methods of analysis for comparison or classification. These applications do not require an exact CGR image, hence the Frequency Chaos Game Representation (FCGR) is used instead, which is an approximation of the exact graph in which pixels record the frequency of points that lie in them. Our research explores the feasibility of reversing an FCGR image back into its original genetic sequence, which is difficult due to the dilution of the information of multiple points into a singular frequency count in a pixel. We developed an algorithm using the available information from the FCGR image to first create a dictionary of kmer counts, then assembling compatible subsequences to create candidates using a recursive backtrack function. Through leveraging the computing power of computers, we are able to pinpoint all possible sequences that can create some FCGR image, aiming for a high proportion of exact matches. We tested this algorithm on genetic sequences with varying lengths and recorded the success rate, such as a 87.4% success rate over 1000 sequences of length 168. Exploring CGR can provide insight into the nature of genetic sequences and their patterns that are difficult to read from a long, one-dimensional nucleotide sequence. Having a reversible transformation can allow for data validation, compression, and guaranteeing a lossless representation of data.
Lexington High
Etienne Gnimpieba Ph.D.