TY  - JOUR
AU  - Mahapatra, Satyajit 
AU  - Mishra, Jibitesh 
AU  - Patra, Kumar Janardan 
AU  - Dash, Sanjit Kumar 
AU  - Deferisha, Aliazar Deneke 
PY  - 2026
TI  - MedFusion: A Unified Multimodal Framework for Visual Question Answering and Explainable Medical Recommendation
JF  - Journal of Computer Science
VL  - 22
IS  - 5
DO  - 10.3844/jcssp.2026.1539.1551
UR  - https://thescipub.com/abstract/jcssp.2026.1539.1551
AB  - In clinical decision-making, the ability to ask visual questions about medical images and receive accurate, personalized, and interpretable recommendations can significantly enhance practitioner support systems. This paper presents MedFusion, a unified multimodal framework that integrates Visual Question Answering (VQA), personalized medical recommendation, and explainability within a single architecture. The proposed model employs co-attention&ndash;based visual&ndash;textual fusion augmented with retrieval-enhanced reasoning to improve answer grounding, while personalized recommendations are generated using a shared multimodal representation supported by GAN-guided feature augmentation. To enhance transparency, the framework provides attention-based heatmaps and natural-language rationales for both answers and recommendations. Extensive experiments on VQA-RAD, EHRXQA, and Med-RecX demonstrate that MedFusion outperforms state-of-the-art medical VQA and recommendation baselines, achieving a 7.4% improvement in VQA accuracy, reducing RMSE to 0.91, and improving human-rated interpretability to 4.5/5. Ablation studies confirm the effectiveness of retrieval augmentation, GAN-guided enhancement, and joint multi-task learning. These results indicate that MedFusion offers a robust and explainable decision-support solution, advancing the deployment of trustworthy, user-adaptive AI systems in real-world healthcare environments.