Team Name
The Visionaries
Timeline
Fall 2019 – Spring 2020
Students
- Connor Morris
- Micah Hall
- Roshan Shrestha
- Sergio Guerrero
- Vivek Patel
Abstract
Bowdie is a smartglass application designed for the Vuzix Blade smartglasses. The primary design goal with Bowdie was to create an application that provides a better user experience for common smartphone applications. Through the unique hardware components of the Vuzix Blade including the microphone, lens camera, and AR display, Bowdie is able to provide new forms of interaction for users. To eliminate the need for touch, Bowdie supports the use of voice commands to fully navigate and interact with the app. Using the camera, Bowdie provides computer vision capabilities including object recognition, text recognition, and coin counting. These features allow the user to learn more about their surroundings by simply taking pictures. Lastly, Bowdie uses the AR display of the glasses to provide navigational functionality. Using Bowdie’s map features, users can verbally provide Bowdie with a destination location. With this destination, Bowdie can then actively guide users through the entire route to their destination.
Background
Smartphones have existed for several decades. People utilize them extensively, and tech companies are working hard to reach smartphone customers. Despite this, people are slowly becoming bored of the same handheld devices. They want something different that will further enrich their experiences. They want to increase their efficiency and productivity. They want to get the most by providing the least amount of effort. Pulling out a smartphone from your pocket takes about two seconds. Scrolling down to see the notification takes about a second. With all the notifications cluttered, it may take even more time. Across a user’s lifetime, they will waste a lot of time performing these interactions. The smartphones we have today primarily rely on user interactions through touch, not voice. Since smartphones primarily rely on the input information they receive from users’ hand interactions, smartphones are completely blind to users’ fields of view. As a result, it is very difficult for smartphones to interpret users’ intentions and desires when they search for information. This creates a disconnect between users and smartphones. We believe by using a smartglass, users will be able to receive more accurate information through the addition of visual data, taken from users’ fields of view, and through users’ voice interactions, in place of touch interactions. By combining these additions with a display that projects directly to users’ eye, users will be able to receive information more quickly while on the go.
Project Requirements
- The application should be able to access the outward-facing camera on the glasses to take pictures and retrieve the data.
- The application should be able to process speech data from the built-in microphone to perform speech recognition.
- The glasses and application must be able to connect with the user’s smartphone over Bluetooth.
- The application should be able to display navigation information on the built-in AR screen on the glasses.
- The application should be able to interact with the Google Maps API to retrieve map information.
- The application should be able to respond to “Bowdie” as a speech recognition wake word.
- The application should be able to perform text recognition on images taken from the outward-facing camera on the glasses.
- The application should be able to detect and recognize objects that the user is looking at using the outward facing camera on the glasses.
- The application should be able to open and operate the navigation functionality using voice commands.
- The application should be able to visually count coins returning the number of each type of coin identified with a total monetary amount.
System Overview
Bowdie’s architecture involves two main components: the smartglass application and a server. The smartglass application contains five subsystems that operate directly on the glasses: the Vuzix Blade system, the computer vision system, the AR display system, the map and routing system, and the gesture sensing system. The Vuzix Blade system includes the core functionalities provided by the Vuzix Blade glasses including the built-in microphone, the built-in speech processing capabilities, the Vuzix speech recognition library, and internet search capabilities. The computer vision system includes the built-in camera on the glasses, an API request sender that communicates with the server’s API, an object recognition handler that uses object recognition results, a text recognition handler that uses text recognition results, and a coin counting handler that uses coin counting results. The AR display system includes the physical display in the lens of the glasses and the logic for displaying information from other parts of the application. The map and routing system includes a speech handler that performs custom speech recognition to obtain a destination from the user, the Google Maps Directions API that provides routing information, parsing logic that extracts results from the Google Maps API, and a routing processing algorithm that navigates the user through their route. Lastly, the gesture sensing system includes a hand-held sensor that gathers positional data and a gesture sensing algorithm that detects the gestures the user makes with the sensor. The server includes an API that communicates with the application to perform object recognition, text recognition, coin counting, and speech recognition. For object recognition and text recognition, the server uses an image sent from the application with Bing’s Visual Search API to extract vision information to be displayed in the app. For coin counting, the server uses the image with YOLO, having been trained on images of coins. Finally for speech recognition, the server uses audio recorded by the map and routing system’s speech handler with Google’s Cloud Speech-to Text API.
Results
Future Work
Future work on Bowdie includes finishing the process of integrating our working gesture sensing code with the application to trigger actions in the app. Additional work also includes improving the accuracy and speed of the coin counting functionality. Although it works fairly well currently, it is not always 100% accurate, and it takes a relatively long time to load. Lastly, our offline map tracking algorithm can be further improved to be more responsive and to better handle roads that are not completely straight.
Project Files
Project Charter (link)
System Requirements Specification (link)
Architectural Design Specification (link)
Detailed Design Specification (link)
Poster (link)
References
N/A