A React renderer for SSML

Voice user interfaces (UI) have seen substantial growth in adaption and popularity over the past years. According to QuoraCreative, voice-based searches across the internet will hit 50% this year. 30% of those searches will be queried on devices without screens. There are plenty of ways to implement voice-enabled or standalone voice applications. All major voice assistants offer third-party developers to build apps for their platforms (Alexa, Google Assistant, Siri) and there are plenty of natural language understanding (nlu) platforms that let you develop custom voice-enabled UIs (Cerence, Dialogflow, Houndify, Speechly).

{
intent: "SEARCH_SHOP",
user: {
id: 1,
token: "ABCDEFG",
},
contexts: [
{
name: "SEARCH_QUERY",
slot: "Best natural water"
}
]
}
const payload = {
reply: '',
contexts: [],
endConversation: false
};
const handleRequest = async (req, res) => {
// based on the request example from above
const {intent, user: {id, token}, contexts} = req.body;
const {error, user} = await fetchUser(id, token);

if(error) {
payload.reply = 'Something went wrong. Please relink your account.';
payload.endCoversation = true;
return res.json(payload);
}
if (intent === 'START_CONVERSATION') {
payload.contexts.push({name: 'SEARCH_QUERY'});
payload.reply = `Hi ${user.name}, what are you looking for?`;
return res.json(payload);
}
if (intent === 'SEARCH_SHOP') {
const context = contexts.find(c => c.name === "SEARCH_QUERY");
if(!context || !context.slot) {
payload.contexts.push({name: 'SEARCH_QUERY'});
payload.reply = 'What are you looking for?';
return res.json(payload);
}
const {slot: search} = context;
const {error, items} = await fetchItems(search, token);
if(error) {
payload.reply = 'Something went wrong. Try again soon.';
payload.endCoversation = true;
return res.json(payload);
}
if(!items.length) {
payload.reply = `I couldn't find any results for ${search}.`;
payload.endCoversation = true;
return res.json(payload);
}
let reply = 'I found the following items.';
items.forEach((item, i) => {
if(i === items.length - 1) {
reply += `, and ${item.name} for ${item.price}.`;
} else {
reply += `, ${item.name} for ${item.price}`;
}
}
reply += ' Just say the name of an item to add it to your shopping list.';
payload.reply = reply;
payload.contexts.push({name: 'ITEM_NAME'});
return res.json(payload);
}
if( intent === 'CHECKOUT' ) {
// TODO write checkout code
}
}

The pain

The code itself isn’t that complex and code-splitting through intent-specific handlers would already clean up the example from above. In my experience, it starts to get nasty when the following things get added to the intent-handlers:

  • User specific edge cases (e.g. if user has an item in Checkout)
  • Complicated responses based on fetched data (e.g. `item.hexColorCode` )
  • Utilising SSML features inside the responses
  • In case there are more than five items in the search results, provide the first five results and tell the user to say ‘more options’ if they want to hear further results.
const reply = `<speak>${ firstSession ? getHelloIntroReply() : getShortHelloReply() } <audio src='https://s3-bucket/niceSound.mp3'/>
${ i18n.t(keys.offerHelp) } ${ showHint ? newFeatureHint : '' } ${ i18n.t(keys.promptSearch) }</speak>`;

Why I love React

Websites and web apps have come a long way. The newest frontend frameworks and libraries build on top of years and years of best practices and lessons learned. React provides component-based composition and declarative syntax to define user interfaces. In short, React provides all the tools to break down user interfaces into components.

react-reconciler

If I say React, I think of react and react-dom but React itself is actually a library that provides functionalities (components, hooks, context, error-boundaries, lazy loading, concurrent mode, …) for any host environment. ReactDOM provides a React renderer for the web’s DOM, while ReactNative provides a renderer for mobile applications. Both ReactDOM and ReactNative use a shared package called react-reconciler to configure how React should treat the host environment. If you are interested in learning more about this, check out the conference talk by Sophie Alpert. In conclusion, we can utilise React for any host environment (heck even Word!, more examples).

SSML

ReactDOM handles the translation of React lifecycle methods to the DOM implementation. SSML, like HTML, is just another markup language and XML-based. Unfortunately, (as far as I know) there doesn’t exist a DOM implementation for SSML. So let’s just create a proof-of-concept!

class Document {
body = null;
resolve = undefined;
/*
* since we server-side render our response,
* we need a side-effect to tell our server
* that we finished rendering
*/
isReady = new Promise((resolve) => {
this.resolve = resolve;
});
setReady() {
this.resolve();
}

toString() {
return this.body.toString();
}
// we can anything we want here, the req, resp, payload ...
}
class Node {
type = '';
textContent = '';
attributes = {};
children = [];
constructor(type) {
this.type = type;
}
setAttribute(key, value) {
this.attributes[key] = value;
}
appendChild(child) {
this.children.push(child);
}
removeChild(child) {
this.children = this.children.filter(item => item !== child)
}
// this will output a basic SSML string!
toString() {
const tag = Object.keys(this.attributes).length
? `<${this.type} ${Object.entries(this.attributes).map(([key, value]) => `${key}="${value}"`).join('\n')}>`
: `<${this.type}>`;
return `${tag} ${this.children.map(child => child.toString()).join('\n')} ${this.textContent}</${this.type}>`
}
}
class TextNode {
type = ''
text = ''
constructor(text) {
this.type = 'text';
this.text = text;
}
toString() {
return this.text;
}
}
export { Document, Node, TextNode };
import ReactReconciler from 'react-reconciler';
import { Node, TextNode } from './ssml-dom';
const rootHostContext = {};const childHostContext = {};const reconciler = ReactReconciler({
/* host config for SSML-based on our SSML-DOM implementation */
now: Date.now,
supportsMutation: true,
getRootHostContext: () => {
return rootHostContext;
},
prepareForCommit: () => {},
resetAfterCommit: () => {},
getChildHostContext: () => {
return childHostContext;
},
shouldSetTextContent: (type, props) => {
return typeof props.children === 'string' || typeof props.children === 'number';
},
createInstance: (type, props, rootContainerInstance, hostContext, internalInstanceHandle) => {
const domElement = new Node(type);
Object.keys(props).forEach(propName => {
const propValue = props[propName];
if (propName === 'children') {
if (typeof propValue === 'string' || typeof propValue === 'number') {
domElement.textContent = propValue;
}
} else {
const propValue = props[propName];
domElement.setAttribute(propName, propValue);
}
});
return domElement;
},
createTextInstance: text => {
return new TextNode(text);
},
appendChildToContainer: (container, child) => {
container.appendChild(child);
},
appendInitialChild: (parent, child) => {
parent.appendChild(child);
},
appendChild(parent, child) {
parent.appendChild(child);
},
finalizeInitialChildren: (domElement, type, props) => {},
prepareUpdate(domElement, oldProps, newProps) {
return true;
},
commitUpdate(domElement, updatePayload, type, oldProps, newProps) {
Object.keys(newProps).forEach(propName => {
const propValue = newProps[propName];
if (propName === 'children') {
if (typeof propValue === 'string' || typeof propValue === 'number') {
domElement.textContent = propValue;
}
} else {
const propValue = newProps[propName];
domElement.setAttribute(propName, propValue);
}
});
},
commitTextUpdate(textInstance, oldText, newText) {
textInstance.text = newText;
},
removeChildFromContainer(container, child) {
container.removeChild(child);
},
removeChild(parentInstance, child) {
parentInstance.removeChild(child);
},
});
const ReactSSML = {
/**
* @param {*} element what to render
* @param {*} container where to render
*/
render(element, container, callback) {
const reactiveContainer = reconciler.createContainer(container, false, false);
reconciler.updateContainer(element, reactiveContainer, null, null);
}
}
export default ReactSSML;
import React from 'react';
import ReactSMML from './ReactSSML';
import { Document, Node } from './ssml-dom';
// our React App component
import App from './App';
const payload = {
reply: '',
contexts: [],
endConversation: false
};
const handleRequest = async (req, res) => {
// similar to window or document,
// let's have access to our environment inside React world
global.ssmlDocument = new Document();
const root = new Node('speak');
ssmlDocument.body = root;
ReactSMML.render(<App />, root);

// we only send one payload response
let hasBeenSent = false;
// after 5 seconds we return to the user no matter what
const timeout = setTimeout(() => {
// this calls our ssml-dom toString implementation
// and parses our rendered tree as a string
payload.reply = root.toString()
hasBeenSent = true;
resp.json(payload);
}, 5000);

// React re-renders as many times as needed and
// we can decide within our App when the final response is ready
await ssmlDocument.isReady;
if(hasBeenSent) {
console.log('we timed out, response sent already');
} else {
clearTimeout(timeout);
payload.reply = root.toString()
hasBeenSent = true;
resp.json(payload);
}
}
import React from 'react';const Paragraph = ({children}) => (<p>{children}</p>);const App = () => (
<Paragraph>
Hello World!
</Paragraph>
);
export default App;
<speak><p>Hello World!</p></speak>

Hey there, thanks for checking by! 👋 I am a fullstack developer and tech enthusiast and I cannot get enough of React, JavaScript, web, voice UIs, and more!