Speaking Up for Conversation Design :: UXmatters

Implementing Sound Design

Have you ever listened to an old radio show? Before TV, radio relied on using sound effects and music to tell a compelling story. The 1938 broadcast of “The War of the Worlds” is infamous for the public hysteria it created among listeners. We can use what we’ve learned from such radio shows to take our voice experiences to the next level.

Both Google and Amazon have extended SSML to enable voice designers to play sounds from their native library. For example, Google provides tags for mixing dialogue, sound effects, and music. To produce stellar results, you can change sounds’ volume, fade sounds in or out, and control the duration of sounds.

As an example, let’ s take a look at a horror game I’m working on as a side project. You’ll notice that I use ominous music and sound effects to build tension.

To group different sound bites, I simply apply the tag. Then I use the begin and end attributes to offset sound bites from one another. In Figure 3, you can see an abbreviated code sample that illustrates how to do this.

Figure 3—Using the tag
Using the class= tag” width=”474″ height=”212″/>

While not all voice experiences require the immersion of a game, all can benefit from earcons, which are short sound effects that convey information. Earcons are especially useful when a voice response isn’t necessary. For example, smart-home applications use earcons to provide feedback when the user turns a light off. Because the earcon sounds at around the same time that the light goes off, the combined feedback of the earcon and the room going dark is a much more elegant way of providing feedback than the spoken response “Okay, I’m turning off the light.” As a best practice, look for clever ways to use earcons in your voice applications.

Diversifying Phrasings

Out of the box, both Alexa and Google Assistant provide simple ways to vary voice responses. For each case where a voice response is necessary, you can provide a few different phrases. The system automatically selects which one to use. Even though writing numerous voice responses is better than providing just one, your app still ends up feeling stale to repeat users. With custom logic, you can create a much wider variety of answers for a voice assistant to use.

Depending on your business needs, there are many different ways to build your voice-interface logic. I like to create many sets of responses, then combine them together. For example, one type of response could be transactional—phrases that the user needs to hear. The second type of response could give flavor by adding extraneous commentary that makes the response seem more conversational and natural. Additional logic could leave these phrases out once in a while. For example, in my game, I randomize the zombie sound effects to add even more variety. Figure 4 shows some of the possibilities these capabilities can deliver.

Figure 4—Response combinations
Response combinations

Depending on your application’s functionality, you’ll need to craft responses in different ways. For example, let’s say you’re creating a weather app. You could create separate flavor response sets that correspond to different ranges of temperature or weather conditions. By adding a little extra logic, you can exponentially increase the number of possible responses. Then, every time users interact with your app, it will feel fresh.


The voice user-interface industry is on the cusp of a boom. We should be focusing on sound and conversation design to make voice experiences more natural and engaging. Brands that offer polished voice experiences will stand out in an endless sea of voice apps. Will yours be one of them? 

Source link