Chat bots and the Turing test

When I recently tested out the voice activation features on my phone, I was extremely impressed with how well it understood not only the actual words I was saying, but also the context. The last time I used voice control features was years ago when the technology was still in its infancy. There was only a specific set of commands the voice recognition software understood and most of them were hard-coded. Given the impressive advances we have made utilizing machine learning for voice recognition and natural language processing to the point where I can tell my phone: “Hey Google, can you give me a list of the best BBQ restaurants near me?” and it will actually understand and do it, it is interesting that we still struggle with a language based technology that has been around for ages: chatbots.

While reading up on the subject, I stumbled across this article, which might be old news for some of you. Apparently the chatbot called Eugene Goostman which simulates a 13 year old Ukrainian boy “passed” the Turing test back in 2014. The test is considered passed if the chatbot is mistaken for a human in more than 30% of test cases. Eugene fooled the jury 33% of the time and therefore barely passed. However, there was major backlash from experts in the field that disputed the results.

They were mainly concerned with the artificial constraints that were put on the experiment through Eugene itself. If it simulates a 13 year old Ukrainian boy, you can only expect it to be as articulate and knowledgeable as the real counterpart. Most abstract or open ended questions that people think of in order to identify chatbots such as: “What is your opinion capitalism” or “What is love” are not going to work because neither Eugene nor actual 13 year olds will give you a good answer here. As a result, the actual challenge of building a chatbot that passes the Turing test without those constraints is still unsolved.

So what’s new since 2014? The state of the art seems to be the chatbot Mitsuku. It won the Loebner price , one of the most recognized awards for Turing tests for the last three years (2015-2018) as well as in 2013. On its website, you can try out the bot and judge for yourself how good it is. Although it claims to know that it is a bot, it still tries to mimic a human.

After some inquiries, I found out that Mitsuku seems to run exclusively on kebabs, a very human-like quality (but doesn’t seem to care much about which Oxford kebab van it acquires its kebabs from).

It also has some harsh opinions on scoring functions.

However, while it can answer normal questions relatively well, I found that the main issue it has comes from missing context. Questions or statements that are only coherent if you understand what was said before seem to be especially challenging for Mitsuku. The resulting answers don’t make sense or are more often easy cop-outs such as: “Oh that’s interesting” or “Why do you think that?”.

In the end, it is still impressive how far conversational machine learning techniques have come and after talking to Mitsuku for 30 minutes about kebabs and scoring functions, I should probably go back to work.

Author