Multi-Modal Deep Learning For Parkinson’s Disease Detection Using Voice And Gait Biomarkers
DOI:
https://doi.org/10.64252/rq3faf47Keywords:
Parkinson’s Disease, CNN, LSTM, Cross-Attention, Multi-modal Learning, MFCC, Gait AnalysisAbstract
Parkinson’s Disease (PD) is a progressive neurodegenerative disorder that affects motor and vocal functions, making early and accurate diagnosis crucial for effective treatment. This paper presents a novel multi-modal deep learning framework that integrates both voice and gait data to improve PD detection accuracy. Voice recordings are processed using Mel-Frequency Cepstral Coefficients (MFCCs) to extract relevant acoustic features, which are then fed into a Convolutional Neural Network (CNN) for high-level representation learning. Simultaneously, gait time-series data— captured from wearable sensors or pressure mats—are analyzed using a Long Short-Term Memory (LSTM) network to model temporal dependencies. A cross-attention fusion module is proposed to align and integrate these heterogeneous feature spaces by learning the inter-modality relationships between voice and gait signals. The resulting fused representation is passed through a Multi-Layer Perceptron (MLP) for final binary classification of PD presence. Experimental evaluation on publicly available Parkinson’s datasets demonstrates that the proposed model significantly outperforms traditional unimodal and early/late fusion baselines, achieving high accuracy, robustness, and generalization. The framework also offers a practical pathway for developing remote, non-invasive, and cost-effective PD screening tools