Upstage 13B LLM Beats MS ToRA and ChatGPT in Mathematical Reasoning

January 8, 2024 종원 황

Upstage's math-tailored LLM, developed with Qanda and KT, surpasses MS ToRA 13B and ChatGPT, redefining global AI benchmarks in superior math-solving capabilities.

Seoul, Jan. 8, 2024 – Upstage introduces the outstanding performance of its 13B-parameter Large Language Model (LLM) in math problem-solving domain, surpassing both Microsoft (MS) ToRA 13B and ChatGPT and achieving State-of-the-Art (SoTA) status across key benchmarks.

For this task, Upstage collaborated with Qanda, an AI-powered learning platform, and KT, a major Korean carrier that supplied GPUs for model training, to build a language model specifically designed for mathematical reasoning and problem-solving. By leveraging Qanda's high-quality math dataset, Upstage seamlessly integrated natural language reasoning with program-based mathematical processing.

The result: Upstage-Qanda 13B model has achieved SoTA performance over MS ToRA 13B on both GSM8K and MATH benchmark datasets. Notably, the model has surpassed ChatGPT's average performance across various benchmark tests and even outperformed GPT-4 on MATH with an impressive accuracy of 48.8 percent, showcasing its competitiveness with industry-leading models.

* Based on MS ToRA paper (link)
** includes datasets other than GSM8K and MATH (as of Dec. 22)

Achieving SoTA performance on both MATH and GSM8K datasets is exceptionally rare, but Upstage has set itself apart by adopting unique data-centric methodologies to curate an optimal dataset cohort for training and fine-tuning the model. The resulting format integrates the advantages of natural language rationales in mathematical reasoning (Chain-of-Thought) with code-based algorithmic techniques in precise calculation (Program-of-Thought), significantly enhancing the model’s capability to solve complex mathematical problems.

“This achievement marks a significant milestone for Upstage, reaffirming our unrivaled promise in making the world’s best domain-specific language model,” said Sung Kim, CEO of Upstage. “Looking ahead, our flagship LLM 'Solar' will be the focal point of our expansion into versatile applications within the global AI landscape.”

Further details, including the research methodology and analysis, will be available soon through an in-depth research paper.

Upstage is developing a ‘mathematics domain-specialized private LLM’ with Masspresso, operator of AI-based learning platform Qanda.