What's wrong with current language models? Increasingly large parameters require high memory, which in turn leads to a global shortage of memory chips and storage. As a result, devices are now becoming more expensive.
Google has just announced TurboQuant, a new compression algorithm that reduces LLM memory usage by up to 6 times and promises 8 times faster speed in calculations without compromising output quality.
When tested on the Gemma and Mistral models, there was no loss of quality. TurboQuant will allow model training to be done more cheaply and efficiently. It opens the door to high-quality AI models running on mobile devices that have traditionally had memory constraints.
Tags
APPS & GAMES

