Home Automation At Your Lips Powered By IBM Watson Speech to Text

This is a proof of concept and demo for home automation via voice control using IBM Watson Speech to Text service. It also uses PubNub, which is a real time data streaming network to pass messages.

Integration of voice control in smart devices is buzzing, and adoption continues to grow. Voice control provides a more natural way of interacting with connected apps and devices ranging from news feeds, traffic information to acting as personal assistants in the home. These intelligent devices respond to commands spoken in our own voice and act immediately.

This post was originally published in the IBM Cloud Blog

In this tutorial, we will show you how to develop a home automation app to control devices in the home using voice commands, with the help of IBM Watson Speech to Text service. We will use a model smart home based on an SVG image of a home’s floor plan and few light bulbs which can be simulated to switch on / off with the voice commands.


Overview of Watson Speech to Text service

Watson Speech to Text service provides machine intelligence and knowledge of grammar and language structure, making it easy to enhance your apps by adding voice recognition capabilities. The service can be used to convert streaming audio to text in real-time or convert speech to text as a single request. It can also recognize different speakers and label the transcript accordingly. It has a few extra features such as profanity filtering, formatting and word confidence.

You can use a microphone and get the converted text back. You can even convert a recorded audio file to text. Moreover, it is also possible to customize the service to improve accuracy for a specific language or content using your own set of keywords. The ability to customize and train it with your own unique language model gives it the power to transcribe accurately unique accents, specific words, or uncommon dialect.

You can try a demo of this service.

Voice activated home automation system

Who wouldn’t want the luxury of controlling their home appliances from their couch? The idea of commanding all the devices around you isn’t new. Typical home automation systems have intelligent connected devices which can be controlled from a mobile app. There are a multitude of home automation products available in the market, each one with its own unique set of features.

With a mobile app however, you loose all the intuitiveness and spontaneity. You need to unlock the phone, then navigate and open the app, then navigate again to the appropriate screen to control a device.

A voice assisted system for controlling the appliances is much more intuitive. It is like a personal assistant listening to commands in your own voice and controlling the devices in your house.

Let’s build the talking home automation system

The system uses two major components – Watson Speech to Text API and PubNub Data Streaming Network.


Watson Speech to Text service is accessed in this project via HTTP REST APIs to convert speech commands to text. A locally running lightweight server generates authentication tokens using the service credentials for accessing the service. A client Web page served by this server listens to the microphone on your PC and sends the speech to the Watson Speech to Text service. The service returns converted text which is then parsed to extract control commands to send to the home appliances or devices. All of this is orchestrated via the PubNub Data Stream Network.

Here are the commands supported by this voice activated home automation app:

  1. Turn on Living Room light. / Turn off Living Room light.
  2. Turn on Kitchen light. / Turn off Kitchen light.
  3. Turn on Bedroom light. / Turn off Bedroom light.
  4. Turn on Portico light. / Turn off Portico light.
  5. Turn on Children room light. / Turn off Children room light.

The app supports both “turn on,” “turn off” commands and “switch on,” “switch off” commands as well.

Project source code, dependencies, and steps to build

Complete source code and instructions to build the app are available on Github. Refer to the README file to setup the services required to build and run this app.

The software components and cloud services used to build this app are listed below:

  • Node.js – A Javascript runtime environment to run a lightweight server to generate authentication tokens for accessing Watson Speech to Text service.
  • Watson Speech to Text service – Available on the IBM Cloud platform and accessible via HTTP REST APIs; this is used to convert speech to text.
  • PubNub – A real time data streaming network based on publish-subscribe mechanism. In this project, the text commands are sent to devices subscribed to this network on a particular channel. PubNub makes it easier to control the devices remotely.

This project utilizes the following major SDKs and libraries:

To build and run this app, you will need to create accounts on IBM Cloud and PubNub. Both the services offer a free tier account.

Sit back, relax and watch the devices obey your commands

You can experience the virtual home automation system controlled by voice commands after creating required services and building the app. Here is a quick video demo.

The video shows a local Web page to listen to the speech commands and another Web page to simulate the smart home with simulated devices. The page listening to speech commands shows the text returned by Watson Speech to Text service before sending the control commands to smart home. It also shows device status feedback received from the smart home. This Web page is run on a local server to authenticate to the Watson Speech to Text service.
Here is a brief description of how this voice controlled home automation system works.

  • A lightweight local server generates the authentication tokens needed to access the Speech to Text service using service credentials obtained from IBM Cloud.
  • A local Web page served by this server provides the interface to listen to the microphone for speech and send the speech to IBM Watson for conversion to text.
  • The server accesses the Speech to Text service via REST APIs and gets the converted text back to display on the Web page for information to the user.
  • A local script included in the Web page parses the converted text to extract supported home automation commands and devices. This script then publishes a JSON message containing the command and the device to a channel on PubNub network. The JSON payload contains the command (on / off) and the device to be controlled.
  • PubNub network further publishes the message to devices subscribed to this channel.
    Another Web page offers the interface for the simulated home. This Web page contains an SVG image and simulated bulb icons. This Web page does not need to be part of the local server and can be run on a separate computer, as well.
  • A script included in this Web page subscribes to the PubNub network on the same channel as the one used by script in the local Web page for Speech to Text conversion.
  • Once it receives a message with JSON payload containing home automation commands, it manipulates the corresponding devices simulated as bulb icons in the SVG image to switch on/off.
  • The script sends a feedback message to PubNub which is displayed on the local Web page as a status message.

The smart home is simulated as a Web page that uses an SVG image with bulb icons. This Web page listens to PubNub channels to receive the control commands and simulate switching of the devices as per the commands.

By default, Watson Speech to Text can recognize certain speech accents but can also be trained to understand your native speech accent. To save time and effort, recorded audio clips have been used for voice commands in this demo.

Extending capability of the voice controlled smart home

There are a number of ways you can extend the capabilities of this system.

Talking to a virtual assistant gives a better sense of control for the user. Hence, in addition to listening to commands, you can also add ability for the system to talk back and make it a talking home automation system. To do this, you can combine this project with the IBM Watson Text to Speech service to add audible feedback for the user. Refer to this blog post about building a Text to Speech based app using IBM Watson and PubNub.

PubNub plays a unique role in orchestrating the messages across the application components. Apart from data streaming, PubNub provides historical data for the messages published through their network. This is an extremely useful feature and can be used to develop data analytics applications that can provide insights into the usage pattern and other aspects of the voice controlled home automation system.


Watson Speech to Text service from IBM provides a powerful API to add speech recognition capabilities to your application. One area where it can impact the most can be developing applications for elderly or physically disabled people who can use voice for various tasks which otherwise may be difficult for them. A number of other applications can benefit from using this service. For example, transcribing a voice call in call centers or a teacher can have the notes ready after finishing a class and so on.

So get inspired and start building awesome applications capable of speech recognition. The documentation here will help you get started with Watson Speech to Text service. Please do share your feedback with us !