Voice Assistant for Blind People: An AI-Driven Mobile System Using YOLOv8 and MiDaS for Real-Time Object Detection and Depth Estimation
DOI:
https://doi.org/10.1234/4mcatt96Keywords:
Voice Assistant, visual impairment, Depth estimation, Assistive technology, Object detection, YOLOv8, MiDasAbstract
Visual impairment presents profound barriers to independent navigation, creating a persistent demand for intelligent assistive technologies. This paper presents a mobile-based Voice Assistant system that empowers visually impaired individuals by transforming their physical surroundings into real-time auditory descriptions. The proposed architecture couples a Flutter cross-platform mobile frontend with a Python-based backend server, employing YOLOv8 for high-speed object detection and MiDaS for monocular depth estimation. A captured image is transmitted via RESTful API to a FastAPI server, where objects are identified and their approximate distances are derived by correlating bounding-box centroids with the corresponding depth map regions. The resulting structured data is converted to natural speech through on-device Text-to-Speech (TTS), delivering descriptive alerts such as "person detected at 1.5 metres." MongoDB Atlas handles user authentication and data persistence, while the client-server design keeps the mobile application lightweight. Experimental trials across indoor and outdoor environments demonstrate a mean object-detection precision of 87.4% and an average response latency of 1.3 seconds. This integrated, hardware-agnostic solution provides a cost-effective, portable, and scalable approach to assistive technology, significantly improving situational awareness and independence for blind users.
Downloads
