8 days of Windows Phone 8 | Days 6: Speech API

 8 days of windows phone 8

Welcome again to the blog series called 8 days of Windows Phone 8 celebrating the release of Windows Phone 8 and the Windows Phone 8 SDK. Today, 5 days after the release it’s time for the 6th post in this series on the Speech APIs. Speech on Windows Phone 8 consists of 3 key aspects: Text to Speech, Speech to Text and Voice commands. In this post we’ll demo all 3.

  • Day 1: SDK Overview
  • Day 2: Live tiles and Lock screen
  • Day 3: Emulator & Simulation dashboard  
  • Day 4: New screen resolutions
  • Day 5: .net 4.5 & C# 5.0
  • Day 6: Speech
  • Day 7: Proximity capabilities
  • Day 8: Wallet and In-App purchases


    Text to speech

    The first of the 3 speech APIs we’ll discuss is the Text to Speech API. In Windows Phone 7 you’ll had to use the Bing speech API if you wanted to translate a written sentence into sound. In Windows Phone 8 this API is part of the Windows Phone API although under the hood it will probably still use Bing servers.

    Let’s create a new project and the first thing we need to do is add the Speech Recognition capability in WMAppManifest.xml file in the Capabilities tab in Visual studio 2012.


    After that we’ll add a button to our Mainpage and add the following 2 lines of code to the click event of the button

       1: private void SayHelloButton_Click(object sender, RoutedEventArgs e)

       2: {

       3:     SpeechSynthesizer ss = new SpeechSynthesizer(); 

       4:     ss.SpeakTextAsync("8 days of windows phone 8, day 6");

       5: }

    Run the app and indeed you’ll only need 2 lines of code to get text to speech working. Great! Note that the SpeakTextAsync method of the SpeechSynthesizer is async. First API checked on to the next one you would think.. well no there is more! You can also change the language and the voice of the clip. Let’s add some code to alter the voice and language to a German Male voice:

       1: private void SayGermanButton_Click_(object sender, RoutedEventArgs e)

       2: {

       3:     SpeechSynthesizer ss = new SpeechSynthesizer();

       4:     VoiceInformation vi = InstalledVoices.All.Where(v => v.Language == "de-DE" && v.Gender == VoiceGender.Male).FirstOrDefault();

       5:     ss.SetVoice(vi);

       6:     ss.SpeakTextAsync("8 Tage von Windows Phone 8, Tag 6");

       7: }

    It’s only 2 lines extra to set the language and gender (yes could be 1 i know ;) ) and the rest stays the same. we press the button again (in the sample project i’ve added an extra button to show both scenarios). Now run the project and we’ll have a german talking voice. Now it’s time to go to the next topic, speech to text.

    Speech to text

    the text to speech we discussed in the previous paragraph is pretty cool but converting speech to text is even more amazing and about as easy to implement. Before we start we’ll have to add the microphone capability so we can capture the sounds from the microphone. Just check the box in the capabilities tab of the WMAppManifest.xml file.

  • capabilities microphone

    Now that the capability is added we can start adding code to add the speech recognition. We’ll add a button to ask how you are doing to the mainpage.

       1: <Button x:Name="AskStatusButton" Content="how are you doing?" Click="AskStatusButton_Click"></Button>

       2: <TextBlock x:Name="StatusText"></TextBlock>

       3: <TextBlock x:Name="ConfidenceText"></TextBlock>

    in the code behind we’ll add code to the click event of this button and the results of the speech recognition we’ll be adding to the StatusText textblock. We’ll also add an extra textblock to display the confidence level of the speech recognition.

    In the click event we start with adding an SpeechRecognizerUI object. On this SpeechRecognizerUI object we’ll be settings some properties, The ListenText property is the text that is shown as the title in the speech box as shown on the image below. the next property is the ExampleText. here you can add a sample answer the user could answer. After that we set the ReadoutEnabled property to true so the Phone will speak your text back to you. The last property we’ll add is the ShowConfirmation that shows the spoken line of text as a text on the screen.

       1: private async void AskStatusButton_Click(object sender, RoutedEventArgs e)

       2: {

       3:     SpeechRecognizerUI sr = new SpeechRecognizerUI();

       4:     sr.Settings.ListenText = "How are you doing?";

       5:     sr.Settings.ExampleText = "I'm doing fine"; 

       6:     sr.Settings.ReadoutEnabled = true; 

       7:     sr.Settings.ShowConfirmation = true; 

       8:     SpeechRecognitionUIResult result = await sr.RecognizeWithUIAsync();

       9:     if (result.ResultStatus == SpeechRecognitionUIStatus.Succeeded) 

      10:     {

      11:         StatusText.Text = result.RecognitionResult.Text;

      12:         ConfidenceText.Text = result.RecognitionResult.TextConfidence.ToString();

      13:     }


      15: }

    When we’ve set all these properties we can call the RecognizeWithUIAsync method to trigger the voice recognition. Note that on the first time this is accessed by the user the user has to accept that the voice clips will be send to the Microsoft Servers for processing. when they accept the speech recognition will start.

    speech1 speech2 speech3 speech4

    When the speech recognition finishes you can check the result in code. The result that comes back from the RecognizeWithUIAsync has a text property and also a confidence level to check if the result is usable or not. In the sample project we add these results to the 2 textboxes we’ve added to the mainpage.

    In some use cases you just want the user to choose between some options instead of just free text. this is really easy to implement with the SpeechRecognizerUI class. We’ll be adding another button and 2 more textboxes on the Main page to ask the user which day the current day of the week is.

       1: <Button x:Name="AskDayButton" Content="which day is it?" Click="AskDayButton_Click"></Button>

       2: <TextBlock x:Name="DayText"></TextBlock>

       3: <TextBlock x:Name="DayConfidenceText"></TextBlock>

    After that we’ll add code to the click event of the new button with almost the same code as before with some extra additions. 

       1: private async void AskDayButton_Click(object sender, RoutedEventArgs e)

       2: {

       3:     SpeechRecognizerUI sr = new SpeechRecognizerUI();

       4:     sr.Settings.ListenText = "Which day is it today?";

       5:     sr.Settings.ExampleText = "Friday";

       6:     sr.Settings.ReadoutEnabled = true;

       7:     sr.Settings.ShowConfirmation = true;


       9:     sr.Recognizer.Grammars.AddGrammarFromList("answer", new string[] { "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday" });


      11:     SpeechRecognitionUIResult result = await sr.RecognizeWithUIAsync();

      12:     if (result.ResultStatus == SpeechRecognitionUIStatus.Succeeded)

      13:     {

      14:         DayText.Text = result.RecognitionResult.Text;

      15:         DayConfidenceText.Text = result.RecognitionResult.TextConfidence.ToString();

      16:     }

      17: }

    The only extra line we’ll add is a list of available answers and the speech recognizer will only match the spoken answer to one of these answers. In our example i’ve added the names of the days of the week.

    speech5 speech6 speech7

    When you run the app the app will only accept existing days of the week.

    Voice commands

    The last subject of today is voice commands. With voice commands you can start your app with a specific task or execute a task when you are already in your app. Voice commands always consist of 3 parts. The App name so the operating system knows which app to send the command to. the second part is the the command name and the third part is a Phrase which is a sort of parameter you can add to your command.

    To enable voice commands you’ll have to add an xml file which contains the voice commands. To add this VCD file right click your project in Visual Studio and select Add new item and select the Voice Command Definition from the list.

    vcd file

    By default you’ll get an already filled in VCD file with some example commands and phrases. We’ll be changing this to add a command to start the app by asking what day it is today, tomorrow or yesterday.

    The first thing you’ll have to add to a VCD file is the CommandPrefix. this is the word the operating system will use to find your app.. after that we’ll add a Command. this command needs a ListenFor element where you tell the operating system which text to listen for. you can also add a phrase to this command like i did with the {day}. this phrase with it’s options should also be added to the VCD file.

       1: <?xml version="1.0" encoding="utf-8"?>


       3: <VoiceCommands xmlns="http://schemas.microsoft.com/voicecommands/1.0">

       4:   <CommandSet xml:lang="en-US">

       5:     <CommandPrefix>8 days</CommandPrefix>

       6:     <Example>what day is it today?</Example>

       7:     <Command Name="DayToday">

       8:       <Example>What day is it?</Example>

       9:       <ListenFor>What day is it {day}</ListenFor>

      10:       <Feedback>Checking the day</Feedback>

      11:       <Navigate Target="/MainPage.xaml"/>

      12:     </Command>


      14:     <PhraseList Label="day">

      15:       <Item> today </Item>

      16:       <Item> tomorrow </Item>

      17:       <Item> yesterday </Item>

      18:     </PhraseList>


      20:   </CommandSet>

      21: </VoiceCommands>


    When we’ve added the VCD file we can register our VCD to the operating system. You’ll only have to do this once when your app runs for the first time but for now i’ll just add it to the constructor of our mainpage.

       1: public MainPage()

       2: {

       3:     InitializeComponent();


       5:     InitializeVoiceCommands();

       6: }


       8: private async System.Threading.Tasks.Task InitializeVoiceCommands()

       9: {

      10:     await VoiceCommandService.InstallCommandSetsFromFileAsync(new Uri("ms-appx:///VoiceCommandDefinition1.xml"));

      11: }

    That’s all we need to do. to start our app using our voice. You can trigger the voice recognition by holding the windows button for a few seconds. You’ll now see the small “Listening” popup window. if you say “What can i say” or press the question mark you’ll go to a new page with the explanation on what commands are available. The first page will describe the commands that are build in into the system. if you slide to the right you’ll get an overview of all apps that support voice commands. Our app is listed here. When you click the app you’ll get a list of available commands for this app.

    speech8  speech9speech10 speech11

    When we hold the windows button now and say “8 days, What day is it tomorrow?” our app will start and will open the Mainpage.xaml


    The voice command is sent to the MainPage.xaml on the querystring. overriding the OnNaigatedTo event we can check which voice command was used and also which phrase was used.

       1: protected override void OnNavigatedTo(NavigationEventArgs e)

       2: {

       3:     base.OnNavigatedTo(e);


       5:     if (e.NavigationMode == System.Windows.Navigation.NavigationMode.New)

       6:     {

       7:         if (NavigationContext.QueryString.ContainsKey("voiceCommandName"))

       8:         {

       9:             string command = NavigationContext.QueryString["voiceCommandName"];


      11:         }

      12:     }

      13: }

    The voice command and phrase are both added as seperate url parameters so if we run the app again and trigger it by asking what day it is tomorrow we can see the following results when we attach the debugger:

    onnavigated to

    As you can see there is a parameter “day” which you can use in a switch statement for example to call the appropriate method.

    you can download the sample project here: http://sdrv.ms/QSJ2yx

    That’s all for today on the Speech API’s. Hopefully you’ll be back tomorrow to read my post on the Proximity APIs.

    Geert van der Cruijsen

    Share on Facebook
    Kick It on DotNetKicks.com
    Shout it
    Post on Twitter