5 min read

Balto’s New Sentiment Analysis Model: Moving Beyond Positive and Negative Labels

Infographic explaining Balto’s new sentiment model in three steps: seeding a large LLM with high-quality sentiment labels, distilling it into an ~8B model to scale auto-labeling of utterances, and deploying a compact production model that processes ~2,500 requests per second, delivers sentiment scores every ~800 ms, and remains LLM-agnostic.

Note: Our customers are increasingly interested in the details behind AI models, so this is a more technical article than usual.

From Extremes to Nuance

When we first launched Balto’s sentiment analysis over a year ago, we deliberately kept it simple. Calls were labeled as either positive or negative, surfacing conversations at the extremes: the most satisfied and the most frustrated.

This worked for several reasons: extremes are revealing, they have an outsized impact, they’re easy for AI to detect, and customers told us they didn’t want the “noise” of “average” conversations, which makes it harder to see the conversations that really matter.

But customer feedback and model improvements made one thing clear: single labels don’t capture the reality of most conversations. True coaching value lies in the nuance, the subtle shifts in tone and emotion that shape everyday interactions.

Why One Label Isn’t Enough

Imagine a customer calling in furious and threatening to cancel their service. The agent resolves the issue, apologizes, and offers a discount. By the end, the customer’s tone shifts to something far more positive. The customer apologizes for being rude in the beginning of the conversation.

Now, labeling this conversation with “positive sentiment” might seem obvious, but is it?  Any one label oversimplifies. It could be positive, because the result is positive. Or maybe it’s negative, because the negative start to the conversation might be the most revealing and helpful to learn from. Or maybe it is actually neutral, since the positive and negative roughly balance each other out?

And that’s a simple conversation. Longer, more complex calls make the problem worse. Ultimately, what matters isn’t just the outcome; it’s the entire journey.

That’s why we advanced our sentiment models: to track sentiment as it evolves within a call and not just one catch-all label.

How Our New Sentiment Model Works

The short version: Balto now measures sentiment every ~800 milliseconds, assigns a calibrated 1–9 score, and generates a time-series graph showing how sentiment rises and falls throughout the conversation.

The technical version: Our three-stage pipeline balances accuracy, scalability, and efficiency.

Infographic explaining Balto’s new sentiment model in three steps: seeding a large LLM with high-quality sentiment labels, distilling it into an ~8B model to scale auto-labeling of utterances, and deploying a compact production model that processes ~2,500 requests per second, delivers sentiment scores every ~800 ms, and remains LLM-agnostic.

Step 1. Seed a Large LLM

  • Carefully designed instructions make for a large model that generates thousands of high-quality sentiment labels.
  • A foundation of nuanced, “reasoned” examples that general-purpose models don’t deliver out of the box.

Step 2. Distill into an ~8B model

  • That seed is distilled into an ~8B-parameter model.
  • It auto-labels tens of millions of utterances, scaling sentiment judgment to the size of real-world call data.

Step 3. Deploy a compact production model

  • A smaller runtime model is trained on the distilled data.
  • It sustains ~2,500 requests per second, delivering new sentiment scores every 800ms.
  • It’s LLM-agnostic, so we can swap in newer backbones as they improve without starting over.

This layered design gives us the reasoning strength of LLMs, the scalability of a mid-size annotator, and the efficiency of a production-ready runtime. Basically, we balance speed and accuracy.

Our 1–9 Sentiment Scale

Behind the scenes, we label conversations from 1-9:

  • 5 = Neutral — most conversations hover around a 5.
  • 1 = Extreme negativity — think profanity, hostility, escalation, yelling.
  • 9 = Really, really happy — think genuine enthusiasm or gratitude.

Visually, it looks like this:

Image of Balto’s 1–9 sentiment scale. The middle value, 5, represents neutral conversations. The lower end, 1, indicates extreme negativity such as hostility or yelling. The upper end, 9, indicates strong positivity such as genuine enthusiasm or gratitude.

Now, you can:

  • Monitor Sentiment Graphs over time, show curves across each conversation. Instead of scanning random samples, you’ll see emotional trajectories to which you can coach and QA against.
  • Click-to-Jump, so you can quickly jump to the lowest and highest sentiment point. We break these moments out as snippets for your review.
  • Filters, Overlays, and Fluctuations highlight specific events within a conversation, based on your use case, such as
    • Conversations with negative starts but with positive turnarounds later.
    • Conversations with outsized sentiment swings.
    • Conversations that end on a positive note.
    • Conversations that end on a negative note.
    • Agents with especially positive or negative sentiment.
    • Frequency of successful turnarounds.
    • Number of 9s per agent per month.

The list goes on and on. You can architect this sentiment to track and answer all kinds of questions.

If you have any specific use cases you would like to implement, please contact your customer success manager.

What’s Next

We’re continuing to evolve the model. Coming soon:

  • Updated score calibration for even more precise distributions.
  • Better categorical overlays to more accurately flag turnarounds or positive closes.
  • Real-time alerts that notify supervisors of sentiment dips mid-call.
  • Agent analytics to better track individual agent performance over time.

Thank you for being a Balto customer. As always, please reach out if we can help with anything.

Chris Kontes Headshot

Chris Kontes

Chris Kontes is the Co-Founder of Balto. Over the past nine years, he’s helped grow the company by leading teams across enterprise sales, marketing, recruiting, operations, and partnerships. From Balto’s start as the first agent assist technology to its evolution into a full contact center AI platform, Chris has been part of every stage of the journey—and has seen firsthand how much the company and the industry have changed along the way.

Liked What You Read? See Balto in Action.

Balto helps leading contact centers turn insights into outcomes—in real time. Book a live demo to discover how our AI powers better conversations, coaching, and conversions.