AI Companion

How we’re preparing for the next era of AI

Zoom’s CTO, Xuedong Huang, discusses how small language models (SLMs) drive our vision for AI agents to work together in a federated approach to improve your day-to-day tasks. 

Updated on February 19, 2025

Published on February 19, 2025

Image Placeholder
Xuedong Huang
Xuedong Huang
Chief Technology Officer

Xuedong Huang is the  Chief Technology Officer (CTO). Prior to Zoom, he was at Microsoft where he served as Azure AI CTO and Technical Fellow. His career is illustrious in the AI space: he began Microsoft’s speech technology group in 1993, led Microsoft’s AI teams to achieve several of the industry’s first human parity milestones in speech recognition, machine translation, natural language understanding, and computer vision, is an IEEE and ACM Fellow and an elected member of the National Academy of Engineering and the American Academy of Arts and Sciences.

Xuedong received his Ph.D. in EE from the University of Edinburgh in 1989 (sponsored by the British ORS and Edinburgh University Scholarship), his MS in CS from Tsinghua University in 1984, and BS in CS from Hunan University in 1982.

At Zoom, we remain focused on innovation, which drives our continuous exploration of AI-first transformation by Zoom AI Companion. In the past year, I’ve shared how our federated approach delivers high-quality results and how our focus on speech recognition quality creates a better foundation for our other AI features. As artificial intelligence continues to improve, we are accelerating agentic AI adoption.

What is agentic AI?

So far, artificial intelligence has relied on large language models (LLMs) to respond to user prompts and deliver generated responses. However, there is much more opportunity available when we consider how SLMs can enable customized AI agents. We’re building AI Companion to support agentic AI to manage a series of multi-step actions on your behalf.

When we consider AI as agents instead of standalone skills and responses, it means that they go beyond inputting prompts to deliver simple results and instead should be an extension of ourselves and our objectives. To do this, our AI agents have the following characteristics:

  • Reasoning and planning: Analyze situations and devise strategies. It can autonomously pursue goals with foresight and intelligence.
  • Memory and reflection: Learn from the past and adapt its strategies, fostering autonomous growth akin to human development.
  • Action execution: Uses the right tools to transform intent into real-world effects.
  • Multi-agent collaboration: Delegates and manages skills or multi-agents to achieve the goals.

To help bring these AI agents to reality, we are thrilled to announce a significant milestone in this journey: our newly developed Small Language Model (SLM) has achieved state-of-the-art performance within the 2 billion parameter category on the public benchmark leaderboard. Through customization via Zoom’s forthcoming AI Studio, we are designing Zoom’s SLMs to approach the quality of the industry’s leading LLM in specialized workloads. This will pave the way for AI Companion to perform complex agentic AI tasks with multiple AI agents to work together in unmatched cost-effectiveness. 

In Zoom’s federated AI approach, rather than depending on a single, comprehensive large model, we advocate for orchestrating multiple customized models. Zoom’s SLMs are designed to enhance this approach by optimizing for specific tasks. By distributing workloads across customized SLMs with corresponding agents—while also leveraging leading LLMs — we aim to achieve several important benefits:

  • Task-specific excellence: Each agent can be precisely optimized using appropriate domain data and fine-tuning approaches to meet specific performance criteria.
  • Speed and scalability: More compact models facilitate easier customization, maintenance, and scaling, enabling faster inferences and updates.
  • Cost-effectiveness: Customized smaller models require fewer computational resources and reduced development costs.

Let’s discuss what this breakthrough means and how exactly it stacks up against leading models.

How our new SLMs stack up against today’s LLMs

To create Zoom’s SLM, we used 6 trillion tokens of multilingual data and 256 Nvidia H100 GPUs. From start to finish, the whole training cycle took about 30 days. The following tables describe how Zoom’s SLM capability stacks against other models for several public benchmarks based on our internal testing:

  • MMLU: Evaluates language models with multiple-choice questions spanning 57 distinct subjects—from mathematics and history to law and ethics—testing a broad range of factual and conceptual understanding.
  • MMLU-Pro: An extension of MMLU, this benchmark focuses on high-quality STEM problems and specialized reasoning challenges, pushing models to demonstrate deeper technical proficiency.
  • GPQA: A challenging dataset comprising 448 multiple-choice questions crafted by domain experts in biology, physics, and chemistry, designed to rigorously assess domain-specific expertise.
  • BBH: Focuses on particularly demanding cognitive and problem-solving tasks, evaluating advanced reasoning and comprehension capabilities in language models.

Following the community’s common practice, we evaluated the accuracy of those benchmarks using the Lighteval tool, which provided 5 shots of examples on MMLU and MMLU-Pro, 2 shots of examples on GPQA, and 3 shots of examples on BBH.

Table 1. Zoom SLM in comparison to other SLMs in the 2B category (higher scores are better).

Generally, SLMs remain less competitive in these measures of quality than leading LLMs, such as OpenAI's GPT-4o-mini, without customization for a specific domain or a task as shown in Table 2.

Table 2. Zoom SLM, without customization, is less competitive to LLMs beyond the 2B category, such as OpenAI GPT 4o-mini.

However, the most interesting result is these SLMs can offer exceptional capabilities when customized for a specialized task. Through customization with Zoom's AI Studio, we expect to effectively narrow the quality gap against more costly LLMs. Customized SLMs can act as specialized agents to perform key tasks in orchestration with LLMs, prioritizing the enhancement of accuracy, speed, and cost-effectiveness for each AI agent. 

Customized SLMs can excel in tasks such as machine translation. By adapting the SLM with 11.5 billion tokens (including synthetic data) designed for machine translation, we have significantly improved the widely adopted COMET-22 quality metrics across 14 language pairs, encompassing major languages such as Chinese, English, French, Japanese, Portuguese, and Spanish as shown in Table 3. 

Our SLMs can also be customized to support AI Companion’s agentic AI benchmark for slot decoding, which measures how well the model interprets user commands on action execution. With 2 billion synthetic tokens for the agentic AI domain data, the customized SLM also outperforms GPT-4o-mini as shown in Table 3. 

This combination of efficiency and adaptability is designed to enable Zoom to bring our much-improved machine translation to our worldwide customers as well as to  support Zoom AI Studio to customize for specific agentic AI workloads. 

Table 3. Customized Zoom SLM vs OpenAI GPT-4o-mini in specialized workloads, higher scores are better.

 

Setting Zoom up for the agentic AI era

These customized SLMs will be the backbone of our AI agents, running more efficiently and with comparable results to the more expensive LLMs that people currently use. Using our federated AI, these AI agents and skills will help drive unmatched efficiency, cost, and accuracy. 

We take pride in our progress—and this is just the beginning. Our vision is to equip every organization with AI agents that deliver cost-effective, high-performing solutions. With the additional capabilities of AI agents and SLMs, AI Companion is here to help you create a workplace where you can get more done and do your best work.

Our customers love us

Okta
Nasdaq
Rakuten
Logitech
Western Union
Autodesk
Dropbox
Okta
Nasdaq
Rakuten
Logitech
Western Union
Autodesk
Dropbox

Zoom - One Platform to Connect