At the end of April 2026, AI service developer DeepSeek introduced its latest public language model, DeepSeek v4, offering a context of up to 1 million tokens, allowing it to remain consistent in generation, and suitable for the development of large code or documents.
Most recently, as an update to the public language model (LLM), DeepSeek has introduced a new AI technology called DSpark, which is a speculative decoding framework that is leveraged by DeepSeek's data center to accelerate the inference process and produce answers for users at a faster rate, while reducing the electricity consumption required for questions and requests that would normally require a large number of tokens.
With the DSpark framework, DeepSeek can reduce the number of tokens required for requests submitted by users, especially when requests and outputs require a large number of tokens. DeepSeek says that with the help of DSpark, DeepSeek v4 can go through the inference process and produce output 60-85% faster than before.
A brief explanation of how DeepSeek does this is by using a lightweight draft model to suggest responses and then validating them in batches using larger LLMs to get faster but still accurate output.
A more in-depth explanation of how DSpark works can be read in this article, and it answers how DeepSeek was recently able to lower the price of their DeepSeek v4 Pro subscription when other AI services like ChatGPT by OpenAI and Claude by Anthropic increased their subscription prices.

