Alibaba AI Qwen latest model beats Google Gemini and ChatGPT in coding
Alibaba has launched its latest AI model, Qwen3.7-Max, claiming it outperformed rivals from OpenAI and Google in coding benchmarks. The model is designed for autonomous coding tasks and can reportedly work independently for up to 35 hours.

Chinese tech giants are making a serious push in the AI race to challenge their US rivals, and one of the biggest names driving that push is Alibaba. The company recently launched its latest AI model, Qwen3.7-Max, which it claims has outperformed several rivals from OpenAI and Google on coding benchmarks, putting it among the world’s top AI coding models.
According to Alibaba, the model has already reached fourth place globally on the Code Arena rankings with a score of 1,541. Code Arena is a benchmark that measures how well AI models can independently build and handle coding tasks. The ranking places Qwen3.7-Max ahead of some versions of ChatGPT and Gemini. The only models ranked above it were from Anthropic and its Claude family of coding-focused AI systems.
What is Qwen3.7-Max?
Unlike regular chatbots that mostly answer questions, Qwen3.7-Max has been built specifically for “agent-based” tasks. In simple terms, this means the AI is designed to independently handle long and complicated workflows instead of just responding to one prompt at a time.
Alibaba says the model can work as a coding agent capable of building front-end prototypes, handling large multi-file software projects, automating office tasks with external tools, and operating autonomously for extended periods without human input. The company claims the AI can continuously run for up to 35 hours and use software tools more than 1,000 times in a single session.
To show off those capabilities, Alibaba researchers reportedly tasked the AI with optimising code for one of the company’s own AI chips. According to the company, Qwen3.7-Max worked for around 35 hours straight, running 432 kernel tests and making more than 1,100 tool calls while repeatedly compiling, measuring and rewriting code on its own. The Chinese tech giant says the AI managed to achieve a 10x performance improvement over the original implementation, even though the model had reportedly never seen that chip architecture during training.
The benchmarks from Alibaba’s Qwen model come at a time when US companies such as OpenAI, Google and Anthropic have largely dominated advanced coding AI. But Chinese firms are now aggressively trying to compete in the space, especially around autonomous coding agents that can function more like software engineers rather than simple assistants. Alibaba says Qwen3.7-Max supports OpenAI- and Anthropic-compatible interfaces and can work with tools such as Claude Code, OpenClaw and Qwen Code.
For Alibaba itself, the launch of Qwen3.7-Max also marks a strategic shift. Earlier Qwen models were released as open source, but the latest Max version is proprietary and available only through Alibaba Cloud’s Model Studio API.
Beyond coding, Alibaba says the model can also monitor AI training systems, detect suspicious behaviour during software engineering tests, and even guide robots through physical spaces using paired navigation systems. The company also claims Qwen3.7-Max performed strongly across multiple reasoning and coding benchmarks, coming close to Anthropic’s Claude Opus 4.6 Max in several tests.
Do note that many of the benchmark results were self-reported by Alibaba and the company says a more detailed technical report will be released later.
Chinese tech giants are making a serious push in the AI race to challenge their US rivals, and one of the biggest names driving that push is Alibaba. The company recently launched its latest AI model, Qwen3.7-Max, which it claims has outperformed several rivals from OpenAI and Google on coding benchmarks, putting it among the world’s top AI coding models.
According to Alibaba, the model has already reached fourth place globally on the Code Arena rankings with a score of 1,541. Code Arena is a benchmark that measures how well AI models can independently build and handle coding tasks. The ranking places Qwen3.7-Max ahead of some versions of ChatGPT and Gemini. The only models ranked above it were from Anthropic and its Claude family of coding-focused AI systems.
What is Qwen3.7-Max?
Unlike regular chatbots that mostly answer questions, Qwen3.7-Max has been built specifically for “agent-based” tasks. In simple terms, this means the AI is designed to independently handle long and complicated workflows instead of just responding to one prompt at a time.
Alibaba says the model can work as a coding agent capable of building front-end prototypes, handling large multi-file software projects, automating office tasks with external tools, and operating autonomously for extended periods without human input. The company claims the AI can continuously run for up to 35 hours and use software tools more than 1,000 times in a single session.
To show off those capabilities, Alibaba researchers reportedly tasked the AI with optimising code for one of the company’s own AI chips. According to the company, Qwen3.7-Max worked for around 35 hours straight, running 432 kernel tests and making more than 1,100 tool calls while repeatedly compiling, measuring and rewriting code on its own. The Chinese tech giant says the AI managed to achieve a 10x performance improvement over the original implementation, even though the model had reportedly never seen that chip architecture during training.
The benchmarks from Alibaba’s Qwen model come at a time when US companies such as OpenAI, Google and Anthropic have largely dominated advanced coding AI. But Chinese firms are now aggressively trying to compete in the space, especially around autonomous coding agents that can function more like software engineers rather than simple assistants. Alibaba says Qwen3.7-Max supports OpenAI- and Anthropic-compatible interfaces and can work with tools such as Claude Code, OpenClaw and Qwen Code.
For Alibaba itself, the launch of Qwen3.7-Max also marks a strategic shift. Earlier Qwen models were released as open source, but the latest Max version is proprietary and available only through Alibaba Cloud’s Model Studio API.
Beyond coding, Alibaba says the model can also monitor AI training systems, detect suspicious behaviour during software engineering tests, and even guide robots through physical spaces using paired navigation systems. The company also claims Qwen3.7-Max performed strongly across multiple reasoning and coding benchmarks, coming close to Anthropic’s Claude Opus 4.6 Max in several tests.
Do note that many of the benchmark results were self-reported by Alibaba and the company says a more detailed technical report will be released later.