BICOB-2024: Papers with Abstracts

Papers
Abstract. Residue-residue contact prediction in a protein is one of the most used and informative middle steps to ultimately predict the complete 3D structure of a protein. While most previous studies use methods relying on statistical analysis of sequential properties to infer these contacts, some recent methods based on natural language processing models have gained success in accomplishing the task. However, most of these methods and models are built for globular proteins and not intended for specific types of proteins such as Transmembrane Proteins, which actually comprise about 30% of the proteome in most organisms and play important roles in cellular processes. In this study, we propose a Transmembrane Protein Helices Contacts predictor (TMHC-MSA) that utilizes features extracted by a protein language model called MSA Transformer and incorporates neighborhood information to enhance the quality of the produced contact map. Our proposed model shows that it can successfully outperform the state-of- the-art method by an average of 7% in terms of L precision and even surpass the MSA Transformer by an average of 2.5% on the same metric. Furthermore, we demonstrate that the more accurate contact map produced by our model can be used to generate a more accurate 3D structure.
Abstract. The study of infectious diseases in humans has become increasingly important in public health. This paper extends the SEIR model to include unreported COVID-19 cases (U) and environmental white noise. Dynamic analysis is conducted based on the variation of the environment. The ergodicity and stationary distribution criteria are discussed. Using a Lyapunov function, we write down some sufficient conditions for disease extinction. With different intensities of stochastic noises, we calculate the threshold of extinction for the stochastic epidemic system. In order to control the spread of disease, the stochastic noise plays an important role. A numerical simulation and a fit to real data have shown that the model and theoretical results are valid.
Abstract. Single-cell RNA sequencing (scRNA-seq) provides expression profiles of individual cells but fails to preserve crucial spatial information. On the other hand, Spatial Transcrip- tomics technologies are able to analyze specific regions within tissue sections, but lack of the capability to examine in single-cell resolution. To overcome these issues, we present Single-cell and Spatial transcriptomics Alignment (SSA), a novel technique that employs an optimal transport algorithm to assign individual cells from a scRNA-seq atlas to their spa- tial locations in actual tissue based on their expression profiles. SSA has demonstrated su- perior performance compared to existing methods SpaOTsc, Tangram, Seurat and DistMap using 10 semi-simulated datasets generated from a high-resolution spatial transcriptomics human breast cancer dataset with 100,064 cells. This advancement provides a refined tool for researchers to delve deeper in understanding of the relationship between cellular spatial organization and gene expression.
Abstract. This paper presents a disease clustering approach by utilizing the biological process annotations from the Gene Ontology as the only data source for clustering diseases. As a result, a disease within a cluster should be more similar to all other diseases in the same cluster than to any disease in other clusters. Essentially, the clustering task is an unsupervised machine learning technique that attempts to discover and learn some hidden patterns from the disease information to place similar diseases together in the same cluster. We used two independent validations to examine our results. We examined the path length between disease pairs in the same cluster versus pairs in two separate clusters by utilizing semantic relationships from the Disease Ontology. We also utilized recently published results on disease similarity from a comprehensive study. Our experimental results are highly encouraging and highly agree with both validation methods. Specifically, most diseases placed in one cluster by our method are more similar to one another than to any disease in the other cluster, according to the validation results.
Abstract. This study addresses the pressing need for effective methods in detecting Attention- Deficit/Hyperactivity Disorder (ADHD), a neurodevelopmental condition significantly impacting individuals' attention, impulse control, and activity regulation. Leveraging advancements in machine learning and wearable technology, the research explores the potential of Heart Rate Variability (HRV) data as a novel source for ADHD detection. Six machine learning algorithms, including Logistic Regression, Random Forest, XGBoost, LightGBM, Neural Network, and Support Vector Machine, were rigorously investigated using an HRV dataset, marking a pioneering effort in utilizing HRV data for ADHD identification. The results demonstrate promising performance, with Logistic Regression exhibiting the highest F1 score (0.71), and Support Vector Machine achieving the highest Matthews Correlation Coefficient (0.44). This study showcases the capacity of machine learning utilizing HRV data for identifying ADHD, contributing to the evolving landscape of machine learning applications in mental health diagnostics.
Abstract. Deep learning has achieved great success for detecting COVID-19 from CT scan images. However, there is lack of generalization ability for the existing models. For example, one model with a higher prediction accuracy developed on one dataset cannot be used to pre- dict on another dataset. Thus, developing a robust deep learning model that has a great generalization ability is a significant need. In this paper, we first apply three deep learning models, namely convolutional neural network (CNN), capsule neural network (CapsNet) and vision transformer (ViT) and test their generalization abilities. Then, we develop and hypertune the models based on transfer learning to generalize the model performance on new datasets. However, the transfer learning technique always has the catastrophic forgetting issue which lead to lower prediction accuracy on its original training dataset. Lastly, we will apply continual learning based on modified elastic weight consolidation (EWC) regularization technique to address the catastrophic forgetting issue and improve the models’ prediction accuracy on both new and original training datasets. Our results on cross-data validation show that our proposed models not only achieve better prediction accuracy of up to 97.85% compared with the existing state-of-the-art models, but also the proposed models with EWC show great generalization ability and retain the higher prediction accuracy on both new dataset and the training dataset. Extensive experiments show that our proposed COVID-CNN model with EWC outperforms ViT and CapsNet with an impressive 82.26% knowledge retention rate on the original training dataset. Our developed code can be found from https://github.com/astonish24/-QinggeLab BICOB24.
Abstract. Diabetes Retinopathy, a leading cause of vision impairment, necessitates early and pre- cise detection. To address this, we developed a Convolutional Neural Network (CNN) model and tuned three popular pre-trained models, namely VGG16, Xception, and Mo- bileNetV2, to suit the specific characteristics of our dataset. To better understand the functioning of these deep learning algorithms, Explainable AI (XAI) techniques, such as CAM and Grad CAM++, were employed to highlight the crucial features influencing the model’s classifications. This study extends to the realm of imaging analysis, emphasiz- ing the critical importance of carefully selecting and customizing models to ensure precise and dependable diagnosis of complex conditions such as DR. Notably, the VGG16 model exhibited strong performance in identifying cases categorized as ’Moderate’ and ’No DR’, achieving accuracies of 0.90 and 0.98, respectively. Similarly, both Xception and Mo- bileNetV2 demonstrated promising results in the DR categories. Remarkably, our custom CNN model, tailored for our dataset, achieved an accuracy of 0.986 in identifying cases without DR (’No DR’). These results underscore the effectiveness of the trained deep learn- ing models in accurately diagnosing DR.
Abstract. This article proposes a visual analytic framework for monitoring and evaluating preven- tive health programs. The Centers for Disease Control and Prevention (CDC) developed an evaluation framework that focuses on a set of guidelines for public health professionals to evaluate public health programs. This article underlines a growing need for a visual analytic framework to support public health professionals with tasks related to managing programmatic activities and helping them monitor and evaluate ongoing efforts to plan for future programs. Visual analytic frameworks are conceptualized to address domain-specific tasks that equip domain experts with analytical reasoning to make better decisions. We present the Tobacco Reporting and Progress System (TRAPS), a visual analytic system used for managing and evaluating tobacco cessation programs in Mississippi. We assessed the TRAPS data portal based on user logs and the program evaluator’s observations uti- lizing the system for evaluating tobacco control programs. The TRAPS data portal could also be used to help monitor and report other preventive public health programs with similar needs.
Abstract. This paper investigates the sustainability of an ecosystem that involves the consumption and reproduction of wildlife on a day-by-day basis in addition to the growth of plants. Different from the traditional approaches such as the reinforcement learning algorithms or the predator-prey dynamical system analysis, we applied simulation techniques and developed computer programs that manage the evolution of the system. The results provide visualization for the system. Limitations and further improvements of the study are also discussed.
Abstract. Stochastic approaches to the reaction-diffusion master equation (RDME) are commonly employed in systems biology to model the intrinsic randomness of diffusing molecular species. For accurate modeling and numerical simulation of the reaction-diffusion process, parameter estimation from experimental or synthetic data is a topic of interest. Parameter estimation is a challenging task in stochastic RDME since the reaction rate parameters are always coupled with the diffusion rate parameters, and the state of the system itself is random. We present a fitting scheme based on a maximum likelihood estimation (MLE) to approximate both the reaction and diffusion rate parameters. The quality of the method is evaluated by applying it to two case-studies from systems biology, such as the birth- death process and the annihilation system. The results obtained from our experiments demonstrate a reasonable approximation of the estimated parameters compared to the true parameter values.
Abstract. Relationship inference from sparse data is an important task with applications ranging from product recommendation to drug discovery. A recently proposed linear model for sparse matrix completion has demonstrated surprising advantage in speed and accuracy over more sophisticated recommender systems algorithms. Here we extend the linear model to develop a shallow autoencoder for the dual neighborhood-regularized matrix completion problem. We demonstrate the speed and accuracy advantage of our approach over the existing state-of-the-art in predicting drug-target interactions and drug-disease associations.