Our goal is two way translation between American Sign Language (ASL) and English that a user can take on the go. The system has three main components: a pair of motion capture gloves, a mobile app, and machine translation tools that run on a remote backend.
The Deaf user wears the motion capture gloves, which connect wirelessly to the app on their phone. The app sends the signing data to the backend, which runs the machine translation algorithms. Then it sends the English translation back to the phone, which plays it out loud using a text to speech tool.
Going the other way, the mobile app uses the phone’s microphone to capture the hearing user’s speech and relays it to the backend. It is transcribed by a speech to text tool and put through another machine translation tool. This tool takes English text as input and returns a sequence of glosses. An ASL gloss is an English word used as a representation of a particular sign . Gloss is a convenient way to encode the identity of a particular sign. In the case of our mobile app, the sequence of glosses returned by the back end is used to stitch together an animated avatar that signs the interpretation of what the hearing user said.
Google’s new Neural Machine Translation system improved on the accuracy of their production machine translation tool by 60%, but still only achieved 95% of the accuracy of human translators . For formal occasions that demand high accuracy such as interpreting a speech, a professional interpreter would still likely be the better option. However, for informal situations that are more forgiving of the occasional faults of machine translation, business meetings, shopping, and casual social situations, our project is ideal.
With current technology, a Deaf employee at a small company where no one else knows ASL would have difficulty engaging in staff meetings and other face to face conversation. We’ve spoken to business owners in our community who said they would be reluctant to hire Deaf individuals for this reason. With our finished product, a Deaf employee can use their phone to interpret what their colleagues say and to speak out the interpretation of what they sign back. Now all the employees can communicate with each other through the medium that each is most comfortable with. This has the potential to lower the barrier that employers may see in hiring Deaf individuals.
II. Technology Overview
The motion capture gloves have three types of sensors. Custom made flexible bend sensors measure finger flex. Conductive pads placed on the fingertips register finger contacts. Paired 3-axis accelerometers and 3-axis gyroscopes mounted at the base of the wrists track motion.
The flex sensors are read by a custom designed PCB with two TI FDC2114 capacitive reader chips. The touch sensors are read by a commercially available capacitive reader board. All devices use an I2C bus to communicate their data back to the controller, which is an Adafruit Feather nRF52.
The mobile app was made using the Ionic framework to develop the app like a website using HTML, CSS and AngularJS then automatically generate code for IOS and Android apps. There are two main workflows in the app.
In the first, a signal from the motion capture gloves initiates the ASL recording phase. Once the Deaf user ends the phase, the data is sent to the backend. When the interpretation is received, the transcript is displayed on screen. The user has the ability to push a play button and play it out loud for the hearing user.
In the second workflow, the Deaf user hits a record button to initiate audio recording. When the recording is complete, the data is sent to the backend. When the interpretation is received, the animation is stitched together and played back to the Deaf user.
The backend is built on the MEAN stack. It manages the connection between the users’ mobile phones and the machine learning applications that run on the Azure Function server. At the current stage of development, Function allows us to use limited server time for testing without needing to pay for excess resources.
The machine learning tools that run on the backend are currently under development. The goal is to use an existing database of gesture data with a gloss transcription to quickly build a tool capable of parsing gestures to gloss. Then we will build our own data set with our motion capture gloves for final training. Then we can move on to the tools that go between ASL gloss and English.
 M. E. Bonham, “English to ASL Gloss Machine Translation,” Masters Thesis, Dept. of Ling. and Eng. Lang., Brigham Young Univ., Provo, UT, 2015.
 Y. Wu, et al., “Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation,” Tech. Rep., arXiv:1609.08144
Translation. We are part of a larger project that seeks to work with the Deaf community to promote opportunities for Deaf and Hard of Hearing people to engage with the STEM fields. Our goal is to create a user-friendly, mobile product that will allow Deaf and hearing people to communication seamlessly using the languages with which they are most comfortable. We hope that this product, and the efforts of other groups we work with, can push back against conditions that have resulted in 47% of the Deaf and Hard of Hearing community not actively participating in the U.S. labor force . We believe that our product will help bridge communication gaps that exist in workplaces, schools, and other collaborative environments.