Learning Kinect Voice Commands
Now that the "early adopter" phase of Xbox One is starting to bleed into the secondary phase of sales, lots of new users are picking up the Xbox One and I am seeing a few people out there frustrated with the Kinect voice commands, so thought it would be useful to put together a post on how Kinect does and does not work to help people learn to use it.
A common response to the notion that there should be any learning involved in the use of voice commands is to say that "If a press A on a controller, it works every time, so a voice command should work every time" but let me put it like this:
Every time you press down, or down-forward, or forward, or A on a controller it works perfectly, agreed? Does that mean that no one has ever failed to pull off a Hadoken in a Street Fighter game? No. Despite the controller working perfectly, this is still something the user has to learn.
Luckily, it is inordinately more simple to learn how to use Kinect voice commands than it is to master a Hadoken, and its well worth learning how to do so. Just as mastering special moves in Street Fighter will make you a better player, mastering voice commands unlocks one of the best features of the Xbox One – its functionality as a voice controller universal remote. Before long you will be using voice commands as second nature, and the constant irritating switching between remote controls can finally be put aside.
So what do we need to know about Kinect voice commands?
The first and most important lesson to learn is that (besides for the Bing search aspect) Kinect does not function like Siri. This is a great thing for someone like me with an unclear, wooly voice, as services like Siri just do not work for me. I can try and be as clear as I can and they won’t have a clue what I’m saying. Sure enough, if I try and use Kinect’s "Bing Search" which works in the same way as those services, I have an equally low success rate.
It is important to understand that Kinect is not like Siri, because if you approach it as such you might think that clearly pronunciating each word with a pronounced gap between each word it is the best way to go. That is the one thing that won’t work with Kinect. Putting Bing Search aside, for its standard voice commands Kinect is not looking to actually understand every word that you are saying. Rather it is looking for set rythms and certain key sounds which it recognises as its various pre-set voice commands.
And those commands are pre-set, and are not flexible. I regularly see people say Kinect "failed" and then go on to list a number of commands they tried, none of which are actual voice commands. Every system command on Xbox One is a set sentence, with a set rhythm. The least important part is the actual words, and we’ll get to why later.
First let’s talk about the rythms. Let’s start with a simple command – "Xbox turn off".
The rhythm it is looking for is the normal conversational way you would say that sentence – "DaDa Da Da"
Now if you approaching Kinect as though it was Siri, you might focus on the clear pronounciation of every word, and as a result you might unnaturally slow the sentence down: "Xbox. Turn. Off."
Now you’re saying "DaDa. Da. Da" rather than "DaDa Da Da". It will recognise you are trying to use a voice command as "Xbox" was there, and the "Listening" icon will appear, but it won’t recognise the voice command since you had an unnatural pause between "Turn" and "Off".
Then you might then go one step further and try and be as clear and slow as you can possibly be: "X. Box. Turn. Off". Now you’re saying "Da. Da. Da. Da" so it’s not only not going to recognise "turn off", it won’t recognise "Xbox" either. To your frustration, this time it won’t even indicate it has heard you.
You might scream. "I could not have pronounced "Xbox" more clearly than that!" and you might start to get irrational and think that making the same mistake but louder might work –
"X. BOX. TURN. OFF!" - Nothing
"X! BOX! TURN! OFF!!!!!" - Nothing
But it doesn’t matter how clearly you said the syllables "X", "Box", "Turn" and "Off", and it certainly doesn’t matter how loudly you said them or how angry your voice was. You said Xbox as two separate words so Kinect won’t understand.
So the rhythm is the first and most important thing to understand, and the rule of thumb is that you should just say the commands naturally and conversationally. That is all you really need to know for the most part. *
While it’s not really necessary to know this, it might help you to speak more naturally, and worry less about your pronunciation if you understand that Kinect does not even really need to hear the exact words you are saying at all. It is only listening out the key clearest and least mistakable sounds in each of the words you use.
You can have some fun with this.
Once you have "Xbox turn off" down pat try saying the following in exactly the same way –
"Sexfox Burn Off"
– Did it work? It does for me, every time.
So we can see that for the "Xbox turn off" command, Kinect does not need to hear or understand "Xbox turn off" at all
What Kinect actually needs to hear is the "DaDa Da Da" rhythm along with the right key sounds applied to it: "EcksOx Urn Off".
You can actually have a bit of fun with working out what the key sounds are for every voice command and end up speaking a strange kind of gibberish to your Kinect with perfect success. I probably need to get out more.
This hugely forgiving voice recognition requirement is what allows someone like me, with a very unclear voice, to use Kinect successfully. It is also what allows someone with the crispest, most clear and well-pronounced voice to utterly fail at it when they speak to Kinect as though it were an elderly foreign relative –
You: "X… BOX… TURN… OFF!" Kinect: "I’m sorry, what did you say?"
Me: "Sexfox burn off" Kinect: "Oh you want me to turn off, no problem sir"
So take some time with Kinect, and work out what it wants to hear. Learn the precise voice commands and say them to Kinect naturally and calmly, and if it doesn’t "hear you", trying saying the sentence with a slightly different intonation rather than saying it slower, louder, or angrier. Before long you will be using voice commands like they are second nature, and find yourself stupidly telling the TVs of your friends and family to do things without thinking about it.
* The only exception to the rule of saying things in a natural way is the "Xbox on" command.
Because only a sliver of the processor is "awake" in sleep mode, and it is only listening for "Xbox", it helps to leave a pause between "Xbox" and "On" to let the processor wake up.
So think of the "Xbox on" command as the "Xbox. On" command and you’ll find it works fine. The rest of the time - just say things normally.