CS373 - Final Presentation Schedule - Mike Heroux Home Page

CSCI 373: Spring 2025 Final Presentation Schedule

Location: Room 251, Main Building, CSB and Virtual via Zoom

Family and friends are welcome to attend in person or via Zoom.

For virtual participation, register via Zoom to receive the connection link via email. The same Zoom link works for both sessions.

Register in advance: Zoom Registration Link

Tuesday, May 13, 2025

Session 1: 10:00 AM - 12:30 PM

Time	Speaker	Title	Abstract
10:00 AM	Anna Byron	From Distance to Discovery: K-Nearest Neighbors and the Science of Music Recommendation	Music-streaming platforms rely on strong recommendation systems to keep users engaged, which means accuracy and fairness are both critical. This paper explores how the K-Nearest Neighbors (KNN) algorithm performs in music recommendation tasks using audio and metadata features. I walk through a content-based filtering system that uses KNN to find similar songs based on values like tempo, energy, acousticness, and genre. Each song is turned into a vector, and recommendations are made using Euclidean distance. I also look at how recent hybrid systems combine KNN with matrix factorization and other techniques to handle cold-start problems and data sparsity. Some of these models have been competitive in large-scale challenges, and they show how flexible KNN can be when paired with other tools. At the same time, I highlight how biases can sneak in through things like skewed training data or poorly chosen distance metrics. The findings suggest that KNN remains a practical and interpretable tool for music recommendation, particularly when optimized and paired with fairness-aware techniques.
10:15 AM	Dylan Cummings	Data-Driven Scouting: Applying Machine Learning to Soccer Player Acquisition	Soccer player acquisition, traditionally reliant on subjective scouting, is increasingly leveraging Machine Learning (ML) for objective, data-driven insights. This paper outlines an ML pipeline for player evaluation, starting with data collection from sources like Opta (event data) and Second Spectrum (tracking data), followed by essential preprocessing including feature weighting. Key supervised learning algorithms, notably Random Forest and XGBoost classifiers, are discussed along with their hyperparameters and comparative strengths. Applications explored include talent identification, injury risk prediction, and transfer fee estimation. While ML offers significant advantages, limitations such as data accessibility, inherent unpredictability in sports, and potential biases are acknowledged. The integration of ML promises to enhance decision-making, optimize spending, and provide a competitive edge in the dynamic soccer transfer market.
10:30 AM	Emily Hed	Explainable AI in Machine Learning Models Using SHAP	As artificial intelligence (AI) becomes increasingly embedded in decision-critical systems, the need for transparency and accountability in model predictions grows urgent. This paper addresses the problem of explainability in machine learning (ML) by examining SHAP (SHapley Additive exPlanations), a widely used framework for interpreting model outputs. SHAP, grounded in cooperative game theory, attributes a model’s predictions to input features using Shapley values, offering both global and local explanations. This work explores the mathematical underpinnings of SHAP, including its fairness axioms and computational optimizations via TreeSHAP, and evaluates its effectiveness across diverse data types and visual tools. However, recent research reveals limitations in SHAP’s ability to reflect logical feature relevance, despite its mathematical rigor, suggesting caution in its interpretive reliability. These findings imply that while SHAP is a powerful tool for model interpretation, it must be complemented with other methods to ensure trustworthy AI explanations.
10:45 AM	Sophia Maldonado	Optimizing Photo Management with PHash and Hamming Distance	This presentation explores the fundamentals of perceptual hashing, including its reliance on the Discrete Cosine Transform (DCT) to extract low-frequency image components. It also examines the role of Hamming distance in measuring image similarity and discusses potential challenges, such as false positives and negatives. By analyzing how these techniques process image data, this paper highlights their effectiveness in detecting nearduplicate images and optimizing storage management.
11:00 AM	Andrew Nerud	Evaluating Hybrid Quantum Neural Networks: Efficiency, Viability, and the Path Forward	Hybrid Quantum Neural Networks (HQNNs) aim to combine quantum circuits with classical deep learning to address key challenges in model efficiency, parameter reduction, and generalization in high-dimensional tasks. As quantum machine learning gains momentum, understanding HQNN viability is critical for resource-constrained and real-world AI applications. However, HQNN adoption faces serious challenges, including quantum noise, simulation bottlenecks, and hybrid optimization complexity. To evaluate HQNNs’ practical potential, this study investigates their performance through direct experimentation on handwritten digit classification and comparative analysis across medical imaging, chemistry, and NLP domains. Results show HQNNs can reduce parameter counts by up to 50%, lower floating-point operations by 30%, and maintain or surpass classical model accuracy — though they incur significantly longer training times. These findings suggest HQNNs offer meaningful computational and generalization advantages for small-data and embedded applications, even as they remain unsuitable for large-scale deployment under current hardware constraints. Overall, this research positions HQNNs as promising candidates for near-term hybrid quantum AI, while emphasizing the need for continued advances in quantum hardware and hybrid optimization methods to realize their full potential.
11:15 AM	Break
11:30 AM	Jason Smith	AI Computer Vision for Wildlife Observation	Monitoring wildlife populations is critical for conservation efforts, yet traditional observation methods are labor-intensive and limited in scale. Advances in computer vision and artificial intelligence offer new opportunities to automate wildlife tracking, improving both efficiency and accuracy. This work addresses the challenge of accurately detecting and tracking wildlife in complex natural environments by applying deep learning and Convolutional Neural Networks (CNNs), focusing on the YOLO object detection framework. Two YOLOv5 models, YOLOv5n and YOLOv5x, were trained on a 7,121-image animal dataset with data augmentation techniques. Results show that YOLOv5n offers approximately 5.2X faster inference speed (6.3 ms vs. 32.9 ms per image) at the cost of lower mean Average Precision (mAP), achieving 25.675%, compared to YOLOv5x’s 43.139% mAP — an improvement of about 17.5 percentage points. These results suggest that model choice should be guided by application constraints such as real-time needs versus accuracy demands, and indicate promising directions for lightweight, high-performance models in field-based wildlife conservation technologies.
11:45 AM	Matthew Utsch	Symmetric Crypt-Schemes	Information is the most important commodity in the modern world, and many technologies have emerged to protect data while its being transported across potentially insecure channels. Data is protected by scrambling it in an invertible way, the inversion is facilitated by a key similar to the way that a combination lock facilitates the inversion of the lock via three unknown numbers within a range. However we don’t want keys that are too large, so we will create ways to use a key smaller than the data to properly secure it. Symmetric crypt-schemes have fallen out of favor due to the overhead required in managing a different key for each open communication channel, and asymmetric “public” schemes have taken center stage. The theory of public schemes is much more complex and based in the primitives of symmetric schemes. Symmetric schemes were the first avaliable standard cryptschemes, so studying their behavior is essential in understanding modern schemes.
12:00 AM	Jacob Odenthal	Modern Fully Homomorphic Encryption: Practical Applications and Performance	With global data volumes projected to reach 394 zettabytes by 2028, privacy-preserving computation has become a critical challenge. Fully Homomorphic Encryption (FHE) offers a powerful solution by enabling data processing in encrypted form, but its high computational cost limits widespread adoption. This paper explores the practicality of modern FHE in cloud computing, focusing on the BFV and CKKS schemes—each based on the Ring Learning With Errors (RLWE) problem. BFV enables exact computation at the cost of performance, while CKKS provides efficient approximate arithmetic ideal for machine learning tasks. Libraries such as Microsoft SEAL, Google HEIR, and OpenFHE are working to develop viable implementations. A somewhat-homomorphic prototype is developed to showcase the important features of HE. While FHE remains resource-intensive, ongoing advances in algorithms and tooling are closing the gap between secure encryption and practical deployment in privacy-sensitive domains.
12:15 PM	Ellie Wohnoutka	Smart Recipes, Smarter Metrics: AI for Personalized Meals	In this presentation, I explore how AI can personalize recipe recommendations through collaborative filtering. I compare two key similarity metrics—Pearson correlation and cosine similarity—and examine their influence on the accuracy and efficiency of user-based and item-based recommendation systems. Using structured dietary data and user interaction patterns, I demonstrate that Pearson correlation captures nuanced preferences in sparse datasets, while cosine similarity performs better in dense environments. I also address challenges such as data sparsity, rating bias, and ethical concerns. My findings aim to guide the design of fair and effective AI-powered nutrition tools.

Session 2: 1:30 PM - 4:00 PM

Time	Speaker	Title	Abstract
1:30 PM	Dylan Bartness	Using AI to Mix Music	Convolutional neural networks (CNNs) offer a promising solution to the traditionally manual task of audio equalization by learning from frequency-domain representations of audio data. By analyzing spectrograms, CNNs can identify and replicate the nuanced decisions made by skilled audio engineers, applying gain adjustments across frequency bands in a context-aware manner. The network architecture—comprising convolutional, pooling, and fully connected layers—enables automated equalization that is both consistent and adaptable to different genres or instruments. Evaluation through objective metrics and perceptual listening tests confirms the model’s ability to approximate human judgment. Although computational demands and genre specificity pose limitations, this approach marks a significant step toward intelligent, AI-driven audio processing.
1:45 PM	Nathan Courchane	Developing AI Agents for Risk: An Analysis of Monte Carlo Tree Search and Reinforcement Learning	Developing artificial intelligence for the game of Risk is particularly challenging due to its reliance on strategic thinking, a vast number of potential game states, inherent randomness, and the presence of multiple players. This study explores two significant AI methodologies, Monte Carlo Tree Search (MCTS) and Reinforcement Learning (RL), to assess their effectiveness in creating AI agents capable of playing Risk. MCTS is effective for planning and exploring various options but can demand significant computational resources, whereas RL is adaptable but needs substantial amounts of training data. The research analyzes the advantages and disadvantages of each method to better understand how they can be applied to build AI players for Risk and similar strategy games. Both algorithms can manage Risk’s complexity, but each may require considerable computing power in certain situations. A comparative understanding of MCTS and RL enables informed decisions about which algorithm, or combination thereof, is most suitable for crafting AI for Risk and other strategic games.
2:00 PM	Matt DeRosa	Leveraging BERT for Enhanced Spam Detection	The growing challenge of sophisticated spam necessitates more effective detection methods. BERT’s contextual language understanding offers a promising avenue for improved accuracy over traditional techniques. This paper explores BERT’s effectiveness in spam detection, aiming to understand its advantages over conventional machine learning and simpler deep learning models by leveraging its bidirectional context processing. This work reviews spam detection techniques, focusing on BERT’s application. It details data preprocessing, model configuration (architecture, embeddings, attention, pre-training), and the fine-tuning process for spam classification. This analysis suggests BERT’s deep contextual understanding and automated feature learning can lead to significantly enhanced spam detection accuracy by capturing subtle linguistic patterns missed by less sophisticated methods. BERT represents a significant advancement in spam detection, offering a path towards more accurate and adaptive filters. Its robust architecture and contextual awareness hold substantial potential for combating increasingly complex spam.
2:15 PM	Colin Glynn	Optimizing Network Performance with Q-Learning in Software Defined Networks	This talk explores the application of Artificial Intelligence (AI), specifically Machine Learning (ML) techniques such as Q-learning (QL), to enhance the capabilities of Software Defined Networks (SDNs). SDNs, with their decoupled control and data planes, offer a flexible and programmable network architecture. By integrating AI/ML, particularly Reinforcement Learning (RL) algorithms like QL, SDN can automate network management, optimize performance, and improve Quality of Service (QoS). This research examines how QL can be leveraged to make intelligent routing decisions based on QoS requirements, addressing the limitations of traditional routing algorithms like Dijkstra’s. We also review existing literature on Al applications in SDNs, including intelligent routing and security enhancements, as well as specific implementations of QL and Deep Q-learning (DQL) for dynamic routing and QoS optimization.
2:30 PM	Benjamin Hennen	Deep Learning Shakes Up Earthquake Early Warning Systems	Effective Earthquake Early Warning (EEW) systems are vital for public safety, demanding rapid and reliable seismic detection. Traditional STA/LTA algorithms offer speed but suffer from high false alarm rates; Convolutional Neural Networks (CNNs) are investigated as a more robust alternative. This paper comparatively analyzes STA/LTA and CNN methods for EEW, evaluating their accuracy, speed, computational cost, and robustness using seismic data. STA/LTA provides computational efficiency but is sensitive to noise. CNNs demonstrate significantly higher accuracy (>94% reported) and robustness by learning complex features, albeit at greater computational expense. CNNs represent a more accurate and robust approach for EEW, particularly in noisy environments. Integrating deep learning, like CNNs, is crucial for advancing future EEW system reliability.
2:45 PM	Break
3:00 PM	Logen Landowski	Beyond the Box Score: Using Graph Theory to Decode Passing Networks in Basketball	Traditional basketball metrics like assists and field goal percentages offer limited insight into the true structure and flow of team offenses. This paper presents a graph-theoretic framework for analyzing passing networks, where players are modeled as nodes and passes as directed, weighted edges. By applying key concepts such as degree centrality, clustering coefficients, shortest path length, graph partitioning, and minimum cut analysis, we uncover patterns in ball movement, offensive cohesion, and player roles that conventional statistics fail to capture. Drawing from NBA play-by-play data, we construct and analyze real-world passing networks to identify facilitators, offensive bottlenecks, and lineup dependencies. Our findings suggest that graph-based analysis not only enhances our understanding of team chemistry and tempo but also lays the groundwork for real-time strategy optimization and cross-sport applications.
3:15 PM	James Strabala	Sequence Isn’t Enough: The Future of Structure-Aware Protein Language Models	Designing new proteins could transform medicine, materials science, and biotechnology, but predicting functional sequences remains a challenge due to the complexity of protein folding. This work explores the capabilities and limitations of generative protein language models (PLMs), which apply transformer-based architectures to protein sequences, analogous to natural language. I developed a simple PLM trained with a small dataset that failed to generate realistic protein sequences. Compared to ProtGPT2 and ESM3, two state-of-the-art PLMs; the latter of which incorporates spatial information and structural tagging. When the models were prompted with partial sequences, ESM3’s products had more working motifs. These findings suggest that next-generation PLMs will rely increasingly on non-sequential data and spatial relationships to improve biological plausibility. As proprietary PLMs emerge and real-world synthesis becomes more common, the impact of generative PLMs is likely to grow.
3:30 PM	Tristin Thilmany	LSTM Networks for Economic Time Series: Forecasting Unemployment.	Accurate unemployment forecasting is vital for economic stability but challenged by complex, non-linear data dynamics. This project aimed to enhance forecasting accuracy by developing a Long Short-Term Memory (LSTM) network model. Using historical US economic data from FRED, we preprocessed inputs by handling missing values and scaling features. An LSTM model, built with Python (TensorFlow/Keras), was trained on data up to 2018 and optimized using Bayesian methods. The model predicted the next three time steps based on the previous twelve. Testing on 2019-2025 data showed the model effectively learned patterns (decreasing MSE) and reasonably tracked actual unemployment rates, including the 2020 surge. Results confirm LSTMs are a valuable tool for unemployment forecasting, demonstrating their ability to capture complex dependencies, though performance relies on careful data preparation and hyperparameter tuning.
3:45 PM	Thomas VanDenEinde	Overcoming Transpilation Failures with A*	Converting code between different high-level languages often requires numerous individual transpilers, one for each target language or version. This inefficiency limits flexibility in software development and migration. This paper introduces a delta framework designed to automatically create unique transpilations for any combination of source and target languages, thus overcoming the limitations of traditional stand-alone transpilers. The proposed framework utilizes small, reusable transpilation units called “deltas” that transform code representations (Concrete Syntax Trees) while preserving semantics. It employs the A search algorithm to efficiently find an optimal sequence of deltas to translate code from a source language to a specified target language grammar. The framework was evaluated using custom-designed languages (S, S++, S#). The delta framework successfully transpiled code between the test languages in under a second. The number of deltas needed varies based on the complexity and differences between the languages involved. The approach allows developers flexibility in defining translation specifics through manual delta creation. This delta-based approach offers a viable solution for automatically generating transpilers, providing enhanced developer control over the translation process. While effective, the framework’s capability is dependent on the set of available, manually created deltas, which may pose limitations for highly complex translation tasks.