Vol. 3 No. 03 (2026): Voice Assistant for Blind People: An AI-Driven Mobile System Using YOLOv8 and MiDaS for Real-Time Object Detection and Depth Estimation

					View Vol. 3 No. 03 (2026): Voice Assistant for Blind People: An AI-Driven Mobile System Using YOLOv8 and MiDaS for Real-Time Object Detection and Depth Estimation

Visual impairment presents profound barriers to independent navigation, creating a persistent demand for intelligent assistive technologies. This paper presents a mobile-based Voice Assistant system that empowers visually impaired individuals by transforming their physical surroundings into real-time auditory descriptions. The proposed architecture couples a Flutter cross-platform mobile frontend with a Python-based backend server, employing YOLOv8 for high-speed object detection and MiDaS for monocular depth estimation. A captured image is transmitted via RESTful API to a FastAPI server, where objects are identified and their approximate distances are derived by correlating bounding-box centroids with the corresponding depth map regions. The resulting structured data is converted to natural speech through on-device Text-to-Speech (TTS), delivering descriptive alerts such as "person detected at 1.5 metres." MongoDB Atlas handles user authentication and data persistence, while the client-server design keeps the mobile application lightweight. Experimental trials across indoor and outdoor environments demonstrate a mean object-detection precision of 87.4% and an average response latency of 1.3 seconds. This integrated, hardware-agnostic solution provides a cost-effective, portable, and scalable approach to assistive technology, significantly improving situational awareness and independence for blind users.

Published: 2026-03-28

Articles