View source for Google Voice Recognition

Convert an Audio-File into text file via voice recognition. 

{{Note|The shell script mentioned below can be used on any Linux-Operating System with some software requirements, because the speech recognition is not performed on the local machine.}}

Because the performance of your Freerunner is too poor for voice recognition, the Google Voice API can be used to convert an [http://wiki.openmoko.org/wiki/Applications#Audio recorded Audio file] into a text file. Be aware that the audio file will be transmitted to Google and the recognition is not performed on FR. This implies, that you need to have Internet access on your freerunner FR to submit the audio file. 

{{Note|You must be aware of the fact, that the follow script is running on your freerunner but it is not a standalone voice recognition software and so you might not want to use this tool for private audio files.}}

==Google Voice API==
For using the Google Voice API and the script you need to have the following package installed on your freerunner:
* SoX [http://sox.sourceforge.net http://sox.sourceforge.net]for converting WAV into FLAC files
* WGET [http://www.gnu.org/s/wget/ http://www.gnu.org/s/wget/] for submitting the FLAC file to the Google Voice API
* SED [http://sed.sourceforge.net/ http://sed.sourceforge.net/] for extracting the recognized text in the returned string of the Google Voice API
Install the packages from the repositories of the freerunner [[Distributions]].

==Script Usage==
* The script <tt>googlevoice.sh</tt> uses a the audio file <tt>message.wav</tt> in the directory of the script. All files are stored in the same directory, so you need write permissions for that directory.
* SoX converts <tt>message.wav</tt>  into <tt>message.flac</tt> 
* wget submits the file <tt>message.flac</tt> to the Google Voice API and writes the return message to <tt>message.ret</tt>. The language variable in the script is set to German by  <tt>lang=de-de</tt>. If you want to submit a recorded file in US-English use <tt>lang=en-us</tt> instead.
* SED extracts the recognized text <tt>message.ret</tt> by regular expressions and writes the text into <tt>message.txt</tt>.
* Temporary files <tt>message.flac</tt> and <tt>message.ret</tt> will be deleted after the process.

{{Note|The return code of German audio files needs capitalization of nouns, because all words are return in small caps. A <tt>ispell</tt> or <tt>aspell</tt> correction of the <tt>message.txt</tt> might improve the recognized text.}}

==Basic Script Code==
The script code <tt>googlevoice.sh</tt> can be tested on any Linux machine with SoX, SED, WGET installed. Modifiy the script according to your demands and storage of your audio files

  #!/bin/sh
  echo "1 SoX Sound Exchange - Convert WAV to FLAC with 16000" 
  sox message.wav message.flac rate 16k
  echo "2 Submit to Google Voice Recognition"
  wget -q -U "Mozilla/5.0" --post-file message.flac --header="Content-Type: audio/x-flac; rate=16000" -O - "http://www.google.com/speech-api/v1/recognize?lang=de-de&client=chromium" > message.ret
  echo "3 SED Extract recognized text" 
  cat message.ret | sed 's/.*utterance":"//' | sed 's/","confidence.*//' > message.txt
  echo "4 Remove Temporary Files"
  rm message.flac
  rm message.ret
  echo "5 Show Text "
  cat message.txt

The parameter <tt>lang=de-de</tt> is indicating, that the Google Voice API is expecting a German language audio file. Replace <tt>lang=de-de</tt> by <tt>lang=en-us</tt> to submit an audio file in US-English.

== Script with Language Setting and Command Line Parameter ==
The script <tt>googlevoicepar.sh</tt> with a command line parameter can be used if you want to use multiple input files for batch file recognition. You will call this script with the basename e.g. <tt>message0, message1,...</tt> by 
* <tt>googlevoicepar.sh message0</tt> converts <tt>message0.wav</tt> into <tt>message0.txt</tt>
* <tt>googlevoicepar.sh message1</tt> converts <tt>message1.wav</tt> into <tt>message1.txt</tt>
* <tt>googlevoicepar.sh message2</tt> converts <tt>message2.wav</tt> into <tt>message2.txt</tt>
* ....

  #!/bin/sh
  LANGUAGE="de-de"
  echo "1 SoX Sound Exchange - Convert $1.wav to $1.flacC with 16000" 
  sox $1.wav $1.flac rate 16k
  echo "2 Submit to Google Voice Recognition $1.flac"
  wget -q -U "Mozilla/5.0" --post-file $1.flac --header="Content-Type: audio/x-flac; rate=16000" -O - "http://www.google.com/speech-api/v1/recognize?lang=${LANGUAGE}&client=chromium" > $1.ret
  echo "3 SED Extract recognized text" 
  cat $1.ret | sed 's/.*utterance":"//' | sed 's/","confidence.*//' > $1.txt
  echo "4 Remove Temporary Files"
  rm $1.flac
  rm $1.ret
  echo "5 Show Text "
  cat $1.txt

==Future Development==
Google API will always need a internet access. For the development of [http://en.wikipedia.org/wiki/Speech_recognition_in_Linux OpenSource standalone Software on Linux] it might be good to have an OpenSource-Webinterface to collect Audio Samples for improving the user independent Speech Recognition Profiles HMM for Speech Recognition of large vocabulary and different languages. 
* Users will get the speech recognition result on the freerunner or any other linux box,
* The can correct the speech recognition result and submit the correction back to the server
* By the Audio-File-Submit and Text-File-Return of a server based speech recognition the Open Source Speech Recognition on Linux can be improved.



==Links==
* The WGET code was extended from the article on [http://www.commandlinefu.com/commands/view/8043/google-voice-recognition-api http://www.commandlinefu.com].
* With your [[Android]] phone 2.2 the Google Voice API can be used by installation with Android Market.
* with the [http://www.google.com/chrome Chrome Browser] on a Linux Box you can test the Google Voice API [http://slides.html5rocks.com/#speech-input http://slides.html5rocks.com/#speech-input] via a Web-Interface.
* For standalone speech recognition on Linux Systems see [http://en.wikipedia.org/wiki/Speech_recognition_in_Linux Wikipedia-Article].