Koel Labs

PUBLIC

United States, University of Washington

Members

Aruna Srivastava

ADMIN

United States

Team Gallery

Report Inappropriate Content

Project Overview

Our Website: https://koellabs.com | Demo Video: https://youtu.be/s7yPjSUjU9s At Koel Labs, we innovate engaging and dialect-diverse pronunciation tools for the 50% of foreign speakers who struggle with their accents. As language learning enthusiasts and immigrants, we have seen firsthand how language affects opportunities in education and work. Yet many non-native speakers do the flashcards and grind the grammar exercises only to find their speech sounds broken, unnatural, and slow. This has been our experience, as well as the experience of hundreds of speakers we interviewed and surveyed. In reality, committing new mouth movements to muscle memory and memorizing inconsistent rules about nuances in the stress of various syllables is tedious, demoralizing, disorienting to do alone, and expensive to do with a human tutor. The current language learning landscape makes pronunciation learning far more complicated than it should be. Large language learning classrooms in schools, with 30-40+ people to 1 teacher, provide little personal feedback on speaking/conversation practice. Many teachers have expressed the need for a digital pronunciation feedback solution to complement classroom exercises. However, existing language learning apps do not provide pronunciation feedback at all (or only at a superficial level, such as whether it can recognize each word), thus leaving the learner with the difficult task of self-assessing. The language learning apps that attempt pronunciation feedback are not engaging and inaccessible to people who do not know linguistic phonemes. No app holistically covers stress/pitch accent, intonation, tone, and cadence. To address this need for more personalized and actionable pronunciation feedback, we train custom audio models and research accessible human interfaces with collaborators at top research institutions such as UBC, CMU, and UW. As we work on this cutting-edge research, we make it a priority to do so ethically. Ethical considerations when teaching language at scale include not pushing one “standard” dialect (e.g., the White-American pronunciation), thus perpetuating particular expectations of how words should be pronounced. We fundamentally shape our product around addressing this type of concern. Instead of defining hard ground-truth correct pronunciations for our machine learning models, we enable users to choose the movie clips and TV shows with their favorite actors whose accents they want to learn how to emulate. Using our state-of-the-art audio models, we transcribe speaker and dialect-agnostic features of both the actor reference and user speech to synthesize actionable and personalized feedback. This paradigm shift from traditional binary feedback systems to choose-your-own-accent is more scalable because it is naturally generalized to multilingual capabilities and is also naturally appreciative of the beautiful landscape of existing dialects. Using movies, TV shows, and audiobooks as mediums for learning about language has the bonus of being more fun and engaging! We take great care in training our models on diverse datasets so that all dialects, ethnicities, genders, and ages are equally represented. So that's our vision: engaging, actionable, real-time pronunciation feedback that is inclusive of all accents and dialect backgrounds via entertainment. To tackle this, we've had the incredible opportunity to spend the 12 weeks refining our MVP for non-native English speakers with Mozilla Builders (https://blog.mozilla.org/en/mozilla/14-ai-projects-to-watch-mozillas-first-builders-accelerator-cohort-kicks-off/). This was a super inspiring program that enabled us to work alongside 13 accomplished founders from all across the Americas, Europe, and Asia who also valued solving problems for social good. With the help of Mozilla's business mentors, we conducted extensive user validation interviews and surveys, allowing us to narrow down the essential features for our MVP. Mozilla funded our machine learning training runs, enabling us to train state-of-the-art phoneme transcription models for edge device inference locally in browsers. We are currently working with our collaborators to add multilingual support to our application. We hope the Imagine Cup and Microsoft AI Founders Hub resources can help us accomplish this and take our MVP to the next level as a commercial success. We are entering this competition for business mentorship and guidance for effective cloud computing with Azure for custom machine learning models.

About Team

Our founding team is entirely made up of students who are immigrants or children of immigrants, and our technical skills complement one another exceptionally well. Our CEO, Alexander Metzger, has startup, speech research, MLOps, backend, and infrastructure expertise. Our CTO, Aruna Srivastava, has education, multilingual NLP research, networking, and ML for linguistics expertise. Our CPO, Ruslan Mukhamedvaleev, has UI/UX design, branding, Human Computer Interaction research, and front-end expertise. Check out our LinkedIns for details: https://www.linkedin.com/in/alexander-le-metzger/, https://www.linkedin.com/in/arunasr, https://www.linkedin.com/in/ruslan-muk We have extensive research experience in Audio and Multilingual Speech (https://tsvetshop.github.io/), Embedded ML (https://ubicomplab.cs.washington.edu/), and HCI (https://ictd.cs.washington.edu/) with multiple peer-reviewed co-first authorships. We also have 5 years of startup experience: one of the most significant projects we worked on was Farmer.CHAT which was presented at the United Nations General Assembly and serves millions of smallholder farmers across Ethiopia, Rwanda, India, and more. We have various different native language backgrounds, from Nordic countries such as Denmark and Asian countries such as Japan to continent-spanning countries such as Russia. Our experiences as immigrants and children of immigrants give us a unique perspective on making dialect-inclusive pronunciation solutions. Shipping with ❤️ from Seattle, Washington.

Technologies we are looking to use in our projects

App Services (Mobile & Web

Azure

Cognitive Services or other AI

Machine Learning

Virtual Machines

Python

Microsoft AI Skills Fest