Microsoft currently uses OpenAI’s artificial intelligence (AI) model to power its CoPilot system because it invested in the company a few years ago. But Microsoft is now starting to move more independently with its own AI model development. This morning, three new AI models, MAI-Transcribe-1, MAI-Voice-1 and MAI-Image-2, were launched.
MAI-Transcribe-1 is a speech-to-text transcription model with the best accuracy in 25 languages. MAI-Voice-1 is a text-to-speech model that can produce 60 seconds of audio in 1 second. Finally, MAI-Image-2 is a new, faster image generation model that will be used in Bing and PowerPoint.
Microsoft says Transcribe-1 and Voice-1 can be used for tasks such as fast subtitle generation, AI conversational systems, AI agents, accessibility systems for the disabled, education, training and customer feedback.
Meanwhile, MAI-Image-2 is suitable for use by graphic and media designers to create ideas, produce design concepts, generate UX concepts and be used by brands for internal use.
All three models are now accessible through the MAI playground and Microsoft Foundry.

