n my last article, I introduced the hardware, the differences between the Alexa Voice Service (speech recognition and language understanding) and the Alexa Skills Kit (new and interesting abilities you can create for the Alexa ecosystem). I also covered four current skills categories. Today let’s cover some key terms you’ll want to understand before beginning your first skill. Then I’ll share a list of resources that helped me in building Alexa Skills.
Amazon Web Services — Amazon’s version of Microsoft Azure. Home of Alexa services and…
AWS Lambda — Amazon’s version of Azure Functions. This service will run code you write in Node.js, Java, Python, or C# (.Net Core) without needing to provision Virtual Machines. Your code runs as a response to an event. In this case, the Alexa service will make a request to your Lambda program. The response is the output of the Lambda code (also known as a Lambda function). You can set parameters for how much the service will scale up or out for your function.
As I mentioned in the previous article, Lambda didn’t support .Net core until recently. I’m also not a fan of their use of the term lambda function since that phrase also refers to shorthand functions used in LINQ expressions. That being said, the ability to create a small program to respond to events is exactly the right tool for the job when it comes to implementing an Alexa skill. Most skills require little horsepower, and anything more than a Lambda or Azure function would be overkill.
Interaction — a “conversation” between the user and Alexa. This can be as simple as a request from the user and a reply from Alexa, or a back and forth exchange lasting several minutes.
Interaction model — the words users can use with a specific skill. Interaction models are defined in JSON and are composed of Intents and utterances.
Intent — is the main idea behind the user’s request. In my first demo skill, I wanted a verbal calculator. This calculator supported the four primary functions: add, subtract, multiply and divide. With each request I made, I had one of these four intents. I wanted the skill to perform one of these intents. When defining these intents I use self-explanatory intent names, like AdditionIntent, SubtractionIntent, etc.
Utterance — A collection of words a user will speak in order to trigger a specific intent. In my calculator example, some of the utterances I had for my AdditionIntent were: “Add <x> and <y>”, “Please add <x> and <y>”, “What is <x> plus <y>?”.
Slot — In our utterances, we often need variables to provide interesting results. Imagine if I had to define every utterance for addition. That would take forever. Slots, allow us to define variables within the utterance. You have to define a data type for the slot, and those don’t always line up with what you’d expect.
In C# you would expect string, int, DateTime. In utterances, Amazon has defined a number of slot types, and you can even define your own custom slot types. The out of the box slot types look like AMAZON.DATE, AMAZON.TIME, AMAZON.NUMBER. These can be tricky at first. I would suggest looking at existing slot types before trying to define your own.
Voice User Interface (VUI) — Using your voice to interact with a computer, rather than a monitor, keyboard, and mouse. What started as a dream in Star Trek: The Next Generation is closer than ever before. We’ve still got a way to go in improving the accuracy of transcribing voice to text, and we also have plenty of work on building more natural interaction models, but we’re closer than we’ve ever been before!
Using c# in Lambda to write your custom skill code? Start here. There are also some git hub repositories that helped me:
With terminology and these references in hand, we’ll start talking through some of the infrastructure that you’ll interact with, and build in order for your skill to come to life. In the meantime, if you have any questions, send them in. I’m here to help!