Language Accessible Conversation Technology: A Work In Progress

Post author Allison King is Cortico’s Engineering Lead

Cortico’s mission of amplifying under-heard voices provides a sense of urgency to the issue of language accessibility. We have a way to go to reach our goal of providing a seamless experience for speakers of diverse languages. Ultimately, we are building a tool that supports listening and understanding the perspectives voiced by under-heard communities, including historically marginalized language groups.

There are many engineering and design challenges to tackle along the way to this goal, and in this post we will describe how this work has unfolded to date. It began soon after we launched the English-only pilot of Local Voices Network in 2019 – with a goal of visualizing conversation data and making it searchable. Within a year, our first Spanish-language conversation was convened and it became clear that we would need to make changes to our data pipeline in order to include more languages.

Step One: Data pipeline changes to support Spanish language conversations

Our speech pipeline automatically sends every conversation over to Google’s Speech API for initial automated transcription. It was pretty simple to add a language code to send over to Google— before we always sent “en-US” for English, so we added the ability to also send “es-US” for Spanish. We store language codes as an array of strings, to allow for the case of bilingual speakers switching between languages.

After automatic transcription, our pipeline sends the same audio file through a human transcription process. Eventually, these conversations end up on our site with Spanish transcripts. However, the site as a whole was still stuck in English.

Transcript reads: Gracias. Should I get started, Mareva, with the facilitator process? Yes? Okay. Okay, vamos a empezar. Nuevamente, bienvenidos a todos por unirse a la sesión en español de esta plática de la Asociación Comunitaria, Mediática e Investigativa de Chicago. Queremos hablar sobre cómo es que los medios de comunicación se pueden comunicar de manera más efectiva con sus comunidades acerca del COVID-19, pero no solo de eso, sino de la salud en general y del acceso al cuidado de salud.

Todos aquí tenemos perspectivas diferentes, cosas que aportar, necesidades y formas de comunicarnos, en particular en cuanto a la información de la salud, y sabemos también que en nuestras comunidades a veces no se comparte suficiente información de la salud, a veces son cosas que no se hablan en nuestras conversaciones cotidianas. Por eso, los invitamos a ustedes, como residentes de las comunidades de Chicago, para que vengan a informarnos a nosotros cómo es que podemos mejorar estas líneas de comunicación, para también ayudar a que no haya tanta inequidad, inequidad en la forma que nos comunicamos sobre la salud y el acceso que tenemos a la salud. Queremos que todos se sientan como que pueden ser escuchados y que vamos a tomar sus opiniones en consideración.

Hay tres cosas de las que vamos a hablar en esta conversación. Queremos que usted nos ayude a entender cómo es que ustedes ven las diferentes estaciones de los medios de comunicación, ya sea por televisión, por periódico o por Internet. Queremos que nos ayude a entender cómo es que ustedes obtienen la información. También queremos crear más — https://app.lvn.org/conversation/1061

Though the transcript text is in Spanish, the tab labels and other copy are still in English. So even though our data pipeline allowed for Spanish, our site as a whole was still prioritizing English.

Current Focus: Internationalizing the product

This summer, we began the process of internationalizing our site. An overwhelming majority of internet users world-wide use a non-English language online. Even though we currently only operate in the United States, the communities we work with are often diverse in terms of language. If we truly want to surface underheard voices, we need to support as many languages as possible.

Multi-language support is still in the works, but our next steps will:

Utilize the i18next library to allow for easy switching between languages on the site
Store a user’s language preference in our database and default to their browser’s language preference
- This will be able to be set during the registration process, on our home page, and in our User Settings page
Pull out all of the (English) copy in our application into a Google sheet which can then be shared with a translation service
- This copy is organized by page/feature, i.e. Highlights listing page, individual Conversation pages
Download the Google sheet with all our copy in all the languages we support into a file that i18next can use. There are other services such as Locize and CrowdIn which we may use in the future to manage this step
Allow our automated emails (account creation, password reset) to also use the user’s preferred language

Our goal is for the site to be able to switch between English and Spanish by the fall.

Plans for the Future

Though this effort will make much of the site more usable, it will not translate the transcripts themselves. This is an ongoing research effort for us. Our transcript display is powerful in that we can highlight words as they are spoken and also create highlights from the full transcript . We are able to do this by having each word in a transcript assigned a time at which it was spoken in the conversation. However, this becomes more difficult since translations capture meaning at the phrase or sentence level, not word for word, and so the timings will need to be adjusted.

Ideally, we will one day be able to toggle the transcript between different languages so that users can read conversations in their native language while hearing the conversations in the participants’ voices.

Another area for growth is topic analysis, which is currently trained on English transcripts.

Cortico topic analysis map displaying keywords nonprofit, college, volunteering, grocery, pandemic, food, diversity, mental health, doctor, schools, suicide, housing, volunteer, vegetables, church, pastor, community garden, rent control, jobs, pulmonary, community gardents, volunteers, treatment, therapist, Jewish, hunger, WhatsApp, Facebook, parking, subway

Ideally, if we were to look at this English conversation in Spanish, we would see the keywords translated into Spanish as well. Furthermore, if a conversation takes place in Spanish, we would be able to send it to a model trained on Spanish transcripts to better surface keywords.

Moving In the Right Direction

We’re excited to ship a version of the product that will enhance the ability to surface insights and themes from conversations hosted in Spanish this fall. And, we look forward to the day when our platform will more seamlessly amplify the voices of historically under-heard voices from diverse language communities.

# Insights

# News, Insights, and Announcements