The world is being quietly rearranged by people who write very long documents.


The title they went with SOMA: Efficient Multi-turn LLM Serving via Small Language Model Noisy translates that to

AI chatbots can now remember more of your conversation for less money


A new research paper describes a way to make large language models cheaper and faster for ongoing conversations. This means companies building AI assistants can now keep the conversation going longer without huge computing costs.
Running AI chatbots that remember your whole conversation is expensive. This paper offers a way to cut those costs by swapping in a smaller, specialized AI model once the conversation gets going. This could make advanced conversational AI much more affordable for businesses and users.
Watch for this technique to appear in major open-source LLM frameworks or as a feature in commercial AI API offerings.

If you insist
Read the original →