Japanese Chatbots
Leah Goldberg | September 2022 - May 2023

Skills


Machine learning, teamwork, problem solving, creativity, Python, feedback implementation, chatbots

Project Overview


In this project, I used an open source software called RASA to develop machine learning powered chatbots for Japanese language learners. RASA is a framework that allows developers to create customizable AI powered chatbots. The Japanese chatbots that I created simulate basic conversations with the users such as a self introduction, a visit to the doctor’s office, or talking with a friend. The bot prompted users to use relevant vocabulary and grammar patterns for the targeted learning material. These Japanese materials are loosely based off of the GENKI textbooks, a series designed for beginner Japanese learners.

The Chatbots


How they work

The chatbots are configured with YAML files. YAML files are similar to JSON files in how they store data in key-value pairs. Essentially, these key values pairs would store sample data for the chatbots. For example, in these chatbots, a key would be “greetings” and the values would be something like “こんにちは/konnichiwa” (hello), “おはよう/ohayou” (good morning), and “こんばんは/konnbanwa” (good evening). The chatbots would use this sample data, data inputted from users, and machine learning models to learn how to follow the flow of conversations. One of the models used was a Japanese language model called Spacy.

Spacy Model Screenshot | spaCy

The RASA software also allowed for custom bot commands to be written in Python. Some of the custom commands I wrote included recording a log of the user’s conversation and sending that log to a specified email address.


Problem Solving & Innovation

We ran into a lot of bugs with the RASA software which made the development of chatbots challenging. Additionally, the nature of the Japanese language itself made developing beginner level chatbots difficult to make. Japanese uses three writing systems: (1) hiragana/ひらがな, (2) katakana/カタカナ, and (3) kanji/漢字. Both hiragana and katakana are phonetic. Kanji are pictographic characters derived from Chinese. There are thousands of kanji, and most beginners do not know a lot. The Japanese language model we were using was trained on Japanese newspapers which use a lot of kanji. The problem with this was that we did not want the bots to use a lot of kanji because they were aimed towards beginners. However, Japanese has a lot of homophones, and kanji is used to distinguish between them. For example, both 洗濯 and 選択 are pronouced "せんたく/sentaku," but the former means "laundry" and the latter means "choice." Thus, this means when we wrote words in hiragana, the bot would sometimes misinterpreted which word we intended.

In general, when approaching a problem, I would discuss with my supervisor what the issues were and we would go back and forth proposing ideas to fix the bugs. Most of these issues were fixed by retraining the bot models or by writing a custom Python command for the bot.

Screenshot of running the RASA bot in the shell

Adaptability & Learning

I learned the RASA software from scratch, which also gave me more insight into machine learning. When we ran into bugs, I was able to efficiently devise a plan to fix them. I also effectivly adapted the bot to any feedback given by my supervisor. The Japanese professor from Dickinson would also give me constant feedback on how to improve the natural flow of the bots, which I was successfully able to implement. Additionally, I mentored a younger student who took over this position when I graduated.


Chatbot Source Code GitHub Links
# Topic/Link Conversation Simulated
1 Self Introduction Asks user basic questions regarding their name, age, hometown, etc.
2 Keigo (honorific language) Simulates having a conversation with a stranger on a plane using keigo.
3 Doctor's visit Asks user basic questions about their symptoms and diagnoses a condition.

Overall Thoughts


I believe that I was able to meet my objective to create chatbots for Japanese language learners. The chatbots I built effectively use certain grammar patterns and vocabulary words that follow the Genki textbook curriculum to help users practice their Japanese. Through the development of these bot, I also have a better understanding of how the Rasa framework works to create AI powered chatbots and how to effectively communicate with my supervisor about the issues I run into.

This internship gave me some insight into what it would be like to do work that involves both my Japanese and computer science majors. Because I enjoyed this working with both my majors, I would aim to use these skills in my future career as well. This may or may not include the development of chatbots, but I think I would enjoy the job either way.