Alibaba Qwen QwQ-32B: Scaled reinforcement learning showcase
The Qwen team at Alibaba has unveiled QwQ-32B, a 32 billion parameter AI model that demonstrates performance rivalling the much larger DeepSeek-R1. This breakthrough highlights the potential of scaling Reinforcement Learning (RL) on robust foundation models. The Qwen team have successfully integrated agent capabilities into the reasoning model, enabling it to think critically, utilise tools, … Read more