Use of Data Integration and Machine Learning to Identify Cancer Biomarkers, Aiding in Diagnosis and Treatment — 75a — Grace Goeden, Bichar Shrestha Gurung
Analyzing data by hand has been proven to be inefficient and can often be inaccurate. By implementing machine learning algorithms and graphical components the data can be represented visually. Colorectal cancer (CRC) and other intestinal and rectal cancers patient data was obtained from the TCGA database, specifically the tumor microenvironment (TME) and associated microbes. The data was integrated using machine learning and other coding techniques in order to extract relevant data. The integration of the data concluded that no one microbe is the cause of CRCs, but instead multiple. After further analysis the top 20 microbial biomarkers per tumor stage were also identified. Between 40-45% of each tumor stage’s microbe biomarkers were not previously identified in research. The data found will pave the way for further analysis with machine learning that could prove to predict cancer stage from microbe biomarkers and predict the most effective treatment per patient. This same process can also be applied to other cancers with alterations. In conclusion, data integration tools are an efficient and accurate way to analyze data to identify trends, patterns, and other relevant data. Future applications of data integration and machine learning show promising results for genomic data.
University of South Dakota
Etienne Gnimpieba