Introducing AUML: Create Thousands Of Alexa Utterances With Just A Few Lines Of Python

July 27, 2020

David Moore

How do you make sure you've covered every permutation and combination of words and phrases when creating Alexa utterances? Enter AUML, the Alexa Utterance Markup Language. (Jack Mitchell/WBUR)

The Amazon GUI for creating Alexa intents and utterances is incredibly useful. It allows non-coders to create intents, utterances and slots, while coders can focus on creating the APIs that respond to these intents.

At the same time, that GUI can be very limiting.

Suppose you have an intent for hearing the weather. Some possible utterances include "play the weather" and "tell me the weather." But then you start brainstorming all the synonyms for "play," and then all the synonyms for "weather." Then you realize that there are optional words in these utterances; "play weather" is a little curt, but you still want it to map to the hearWeather intent. Seemingly abrupt phrases like that could also be the result of Alexa simply not hearing words, or utterances from non-native English speakers (there are no articles — a, an, the — in most Slavic languages, for example).

So you start creating all the various utterance permutations and combinations. But then, you realize, there are even more synonyms. How do you add those in a systematic fashion? How do you make sure you've covered every permutation and combination of words and phrases?

Enter AUML, the Alexa Utterance Markup Language. Well, it's a little pretentious to call it a language. It's much more like a simplified regular expression syntax that, with the help of a tiny Python parser, allows you to turn a few simple regexy sentences into hundreds, if not thousands, of utterances. (Any developer who's looked at the Amazon intent GUI has undoubtedly thought: "Why doesn't it accept regex!?")

AUML also allows you to keep, store and maintain all your intents and utterances outside of the Alexa GUI. Finally, it's a workaround for a particularly nettlesome aspect of utterance creation: the fact that an utterance can accept only a single slot.

So, let's take a look! The syntax (AUML) can include the following:

Simple, literal words
Optional words, if followed by a ?
A list of OR'ed words, offset by () and separated by a |
Alexa literal slots, offset of course by {}
Variables, beginning with $ (see below)

For example:

play weather

will be parsed into just one utterance: "play weather"

and

play {weather}

will be parsed into a simple, single utterance — but with a slot: "play {weather}."

But it quickly gets much more interesting. First we define some variables:

variables = { 'play': "(play|play me|I want to listen to|I want to hear|I'd like to hear)", 'newscast': '(news|newscast|news brief|news briefing)' }

And so now, something as simple as

$play $newscast

actually becomes (with variable interpolation)

and we will have 20 unique utterances. And if we had an optional word, like so

$play the? $newscast

we will get 40 unique utterances.

Finally, we can see how we can combine AUML variables and Alexa slots to do an end-run around the "one slot per utterance" rule.

$play the latest episode of {show}

Like the idea? We've open-sourced the code. Clone the repo, try it out and tell us how we could improve it.