Here are some of the various reasons why Speech Markdown was created:
Instead of longer SSML:
<speak>
A pause <break time="1s"/> then continue.
</speak>
Use more concise Speech Markdown:
A pause [1s] then continue.
Use a plain text formatter and convert Speech Markdown:
This is (an important)[emphasis] announcement
Are you ++listening?++
to plain text:
This is an important announcement
Are you listening?
Use platform-specific formatters to convert Speech Markdown:
Are you ++listening?++
to SSML for a given platform:
<speak>
Are you <emphasis level="strong">listening?</emphasis>
</speak>
<speak>
Are you <emphasis level="strong">listening?</emphasis>
</speak>
In addition to formatters for:
Formatters can be created for other platforms:
A platform’s implementation of W3C standard SSML may not be complete and could differ from another platform.
One such example is the SSML <phoneme>
tag that Amazon Alexa supports and Google Assistant does not.
When you use the ipa
modifier in Speech Markdown:
You say, (pecan)[ipa:"pɪˈkɑːn"].
I say, (pecan)[ipa:"ˈpi.kæn"].
The formatter will render in Amazon Alexa SSML:
<speak>
You say, <phoneme alphabet="ipa" ph="pɪˈkɑːn">pecan</phoneme>.
I say, <phoneme alphabet="ipa" ph="ˈpi.kæn">pecan</phoneme>.
</speak>
But the formatter will ignore it in Google Assistant SSML:
<speak>
You say, pecan.
I say, pecan.
</speak>
Some Speech Markdown:
(I am not a real human.)[whisper]
will be converted to a specific flavor of SSML using platform-specific formatters:
<speak>
<amazon:effect name="whispered">I am not a real human.</amazon:effect>.
</speak>
<speak>
I am not a real human.
</speak>
Formatters can be configured to choose a similar substitution for platform-specific elements.
Some Speech Markdown:
(I am not a real human.)[whisper]
will be converted to a specific flavor of SSML using platform-specific formatters:
<speak>
<amazon:effect name="whispered">I am not a real human.</amazon:effect>.
</speak>
and pick a close substitution if a given tag is not supported:
<speak>
<prosody volume="x-soft" rate="slow">I am not a real human.</prosody>
</speak>
When using multiple SSML tags on a single text block, the modified text can get lost in the nesting:
<speak>
My favorite chemical element is <prosody volume="x-loud" rate="slow" pitch="low"><sub alias="aluminum">Al</sub></prosody>.
</speak>
Speech Markdown keeps the modified text together with the modifiers after:
My favorite chemical element is (Al)[sub:"aluminum";volume:"x-loud";rate:"slow";pitch:"low"]