Applied Machine Learning In Data-Driven Systems: Case Studies In Prediction, Clustering, Recommendation, And Simulation
DOI:
https://doi.org/10.64252/ndbycb68Keywords:
Machine Learning, Decision Trees, Entity Resolution, Clustering, Monte Carlo, Recommender Systems, PySpark, Big Data, Interpretability, ScalabilityAbstract
With the increasing availability of massive datasets and computational resources, machine learning (ML)Has emerged as a key driver of innovation in numerous fields. This paper presents an in-depth exploration of five widely-used machine learning techniques: Decision Trees, Entity Resolution, K-Means Clustering,Monte Carlo Simulation, and ALS-based Collaborative Filtering. Each method is contextualized through a case study, illustrating its practical implementation using scalable platforms like Apache Spark and Python. We explore the theoretical foundations, implementation challenges, and domain-specific applications ofeach model. In addition to demonstrating the performance and interpretability of these algorithms, this paper highlights the trade-offs and synergies across models when applied to diverse real-world datasets. The intent is to guide practitioners and researchers in choosing the most suitable ML approach based on problem characteristics, interpretability requirements, and scalability constraints.




