Meet Moshi, a new AI chatbot with GPT-4o-like features

Moshi is a new AI chatbot that can understand the tone of your voice, can be interrupted and responds faster than ChatGPT’s upcoming Advanced Voice Mode functionality.

Kyutai, a French AI company has developed a new AI-powered chatbot called “Moshi” that offers features similar to ChatGPT’s now-delayed ‘Advanced Voice Mode’ GPT-4o. Moshi can understand your tone of voice and interpret it. It can also be used offline.The AI chatbot, named after the Japanese way of answering a phone call, has a response time of just 200 milliseconds, making it faster than GPT-4o’s Advanced Voice Mode, which typically takes anywhere between 232 to 320 milliseconds.

Based on a 7B parameter large language model (LLM) called Helium, the chatbot is currently available for all and can speak in various accents and 70 different emotional and speaking styles. Moshi can also handle two audio streams simultaneously, meaning it can listen and talk at the same time.

Kyutai says that it aimed to teach Moshi various nuances and tones of human conversations. To enhance the voice quality, the company even collaborated with a professional voice artist.

Kyutai says its goal is to make the chatbot an open source project, that is, make the model’s code and framework available to all, so that users can safely use the chatbot without having to worry about privacy. While Moshi is faster than GPT-4o, the company says it is a research prototype and is a way for them to showcase the bot’s response time and ability to replicate not only sentences but tones and voices as well.

However, unlike GPT-40, Moshi is pretty small and was developed from scratch in six months by a team of just eight researchers. It was reportedly trained on 1,00,000 synthetic dialogues using Text-to-Speech technology.