Over the next several weeks, Balto is giving readers a peek behind the curtain of the industry-leading AI technology that powers our real-time guidance platform.

Balto’s real-time guidance technology seamlessly guides agents through conversations in collections, sales, and customer service — all use cases involving sensitive data such as names, addresses, emails, and credit card information.

As you probably know, all of this data is personally identifiable information, or PII, and if it gets in the wrong hands, it can spell financial disaster for both the consumer and the business. That’s why it’s required by law to protect that data.

It’s Balto’s job to make sure that we are always compliant (e.g., PCI DSS), that sensitive PII data doesn’t end up in the wrong hands, and our customers (and their customers) are confident in our data security process.

Redaction is a Balance

In order to make sure PII is never revealed to the wrong party, Balto conducts a redaction process when listening to conversations and transcribing text. On the surface, it would seem simple: just redact any words that sound like PII, right? Well, it’s actually more complicated than that.

Consider the number two (2) — often used in credit card numbers, phone numbers, etc. Two is also a homophone with “to” and “too”.

So if a customer said, “I would like to buy that too,” you can see that a strict redaction model would go scorched earth and redact any variation of the word, resulting in a transcription of:

“I would like [x] buy that [x].”

That’s not a very helpful — or smart — transcription!

But redacting this sensitive information is a delicate balance. Redacting too much will hamper readability of the corresponding transcripts, and redacting too little could leave sensitive personal information exposed. In the past, Balto has erred on the side of extreme caution and redacted words if there was even a remote possibility they could be sensitive information.

However, we’re very excited to announce that we’ve trained a highly-sensitive neural network that redacts critical information with a high degree of accuracy without overcorrecting and masking additional, non-critical information.

Redaction Gets An Upgrade

Balto’s next-gen selective redaction approach was developed by leveraging the more than 130 million calls Balto has helped guide, as well as Balto’s internal data labeling platform.

Finding data to train an AI model to redact sensitive information presents a sort of “chicken-and-egg” dilemma: the model needs to see realistic examples to be able identify them in the future, however the whole purpose of PCI/PII redaction is to avoid capturing this information in the first place.

Balto solved this dilemma by first applying a layer of blanket redaction to real transcripts, using comprehensive word lists to mask terms commonly associated with numbers, addresses and names. These heavily redacted transcripts were then fed into our AI Hub labeling platform, where human labelers were asked to analyze the context and categorize these masked terms according to more than a dozen pre-determined labels associated with sensitive and non-sensitive information.

Once we had identified every word, we were able to replace the masked tokens with synthetic examples of the respective categories, thus producing realistic training data for our deep learning model.

After repeating this process for tens of thousands of calls, our neural network has greatly surpassed our traditional redaction methods in both reducing false positives and capturing more sensitive information. Because the data was sampled from real calls, this approach is consistently accurate, even when there are transcription errors in the text.

Using the Power of AI for Continual Improvement and Protection

Balto’s next-gen redaction model handles the tough job of both protecting PII for consumers and also providing accurate transcriptions for agents and AI learning/improvement. What’s more, it gives supervisors even more clarity and insight into the full context of what was said on calls — and provides businesses further assurance that sensitive data is being masked.

