VocalEyes – CSE Senior Design

Team Name

Team Visionaries

Timeline

Spring 2021 – Summer 2021

Students

Jason Bernard Lim
Jason Michael Richardson
Aseem Thapa
Saugat Pandey
Sudarshan Tiwari
Alejandro Mendez

Sponsor

Mr. Phani Kaduri

Abstract

VocalEyes will be a device/wearable that will include a camera and a speaker module.
The major goal of the VocalEyes is to help the visually impaired.
The VocalEyes will recognize an object that the user directs it at (via the camera module)
and will output audio information (via the speaker module) about what the object is.
(eg: in case of text it will read out the text).

Background

Our solution is to build a device/wearable that is similar in base functionality to some of the other blind assist devices/wearables in existence in terms of text detection. However, unlike the other devices, ours will look to improve upon them by adding an adjustable audio feedback speed, as well as a text translator to any language of their choosing. Braille detection will also be sought after as a long term goal.

Our sponsor, Mr. Phani Kaduri, is allowing us to develop a prototype for this device so he can use it for visually-impaired/blind children in a school in India. We have had zero previous interactions/relationships as a team, however, Mr. Kaduri has had previous interactions with the University of Texas at Arlington involving development of a text detection app.

Project Requirements

Wearable/Portable: The device that will be utilized is Raspberry Pi 4 Model B with an additional camera, it will be a device that attaches to virtually any glasses and will have a headphone jack to output audio. Our device willbe able to read text from a book, phone screen, or any other surfaces with text. The devices portability allows the user to conveniently utilize it in almost any situation.
Read and Voice out Text: The device must be able to read text whether on a book, screen, or any other surface with text. The device must then voice out the text to the user.
Translation for Selected Language: The user can select an available language option and the device will output the audio in the selected language.
Fast output for audio: The algorithm responsible for classifying the text will run fast enough that the speaker output matches approximately the pace of normal human speech.
Safety Requirements: The device will comply with various safety requirements (located in the SRS document) in order to not give injury to any of the users.
Pause and Change Pace of the Audio: The user interface will allow the user to adjust the speed at which the text is being read, and a pause option will also be available.
Pre-Trained Speech Classifier Algorithm/All Software Pre-Loaded in Memory: The algorithm used for text classification will be trained before the product is handled by the user. Open source data will be used to train the algorithm. As well as all software will be ready for execution when the user receives the product. This includes: text classifier, handling of image data, translation, and audio output of text.
Instruction Manual: An instruction manual on how to operate the device and application will be provided to the user. Instructions on how to get the device started, change settings, and how to operate the device safely will be included. The manual will also include instructions on how to change certain parts in case of hardware failures.
Adjustment of the settings: The settings should be able to be adjusted easily enough for the visually impaired user.
Minimum Battery Life: The device should have a minimum battery life of 8 hours to provide a good usage time for the user experience.

System Overview

The system design as of now contains a camera module, raspberry pi, computer vision software, and speakers/headphones. The working principle can be described as a flow of information from one system to another. First the user points the camera mounted on the device towards the target text/object and turns it ON. The camera is to be turned ON only at the time of use. Without doing this the camera would constantly take pictures of the surrounding and the user would be bombarded with a flurry of gibberish output from the computer vision algorithm. When the camera is turned ON it takes a picture and transmits this information to the Raspberry pi. Once the data is in the Raspberry Pi, the Pi will call a script that will feed the image into the computer vision algorithm. Then, the algorithm computes the output and after receiving the output the Pi releases it in audio form via a speaker/headphones connected to the raspberry Pi. All of these devices will be held within a compact and portable housing. Right now the prototype Mr. Phani has shown us does a similar thing but instead it uses a phone. The software utilizes the phone’s camera and speaker to take input and produce output to the user. Our approach is similar in the sense that it does the same thing but instead it uses a camera mounted on glasses with headphones. The new design should be much more user friendly for the visually impaired, compact, and relatively cost-efficient.

Results

The current status of the project is that all software is currently working (i.e. picture-to-voice processing, translation, running off boot) and the only problem is due to peripherals (i.e. the camera not being as good as we wanted, CAD design not optimal).

Demo Video

Future Work

The future work that will need to be done on the device is that the Raspberry Pi model would need to be changed due to the autofocus camera being not compatible with current model as well as the overall wearable and device will need to be optimized for additional processing power and accessibility.

Project Files

Project Charter (link)

System Requirements Specification (link)

Architectural Design Specification (link)

Detailed Design Specification (link)

Poster (link)

GitHub (link)

Code (link)

Device Case CAD file (link)

Glasses Frame CAD file (link)

Circuit Schematic (link)

User Manual (link)

References

https://opencv.org/
https://github.com/tesseract-ocr/
https://pypi.org/project/gTTS/
https://pypi.org/project/googletrans/