If all voice platforms natively and consistently supported Speech Markdown, there would be no need for formatters. As we stand today, Speech Synthesis Markup Language (SSML) is all that is supported. Because each of the platforms have different levels of support for SSML (and in some cases introduced their own extensions) there are multiple flavors of SSML. For example, the SSML used by Amazon Alexa is mostly the same but different from that of Google Assistant. (For a quick comparison, see SSML Guru)

The solution is to parse Speech Markdown into a intermediate format and then use formatters to convent it into plain text (remove any markdown formatting) or a platform-specific flavor of SSML.

graph LR; smd(Speech Markdown) --> parser(Parser) parser --> ast("Intermediate Format (AST)") ast --> format-pt(Formatter) format-pt --> pt(Plain Text) ast --> format-aa(Formatter) format-aa --> aa(SSML - Amazon Alexa) ast --> format-ga(Formatter) format-ga --> ga(SSML - Google Assistant) ast --> format-o(Formatter) format-o --> o("SSML - Cortana, Mycroft, other ...") classDef code fill:#E2F0D9,stroke:#70AD47,stroke-width:1px; class parser,format-pt,format-aa,format-ga,format-o code; classDef doc fill:#a1d2fd,stroke:#33a1ff,stroke-width:2px; class pt,aa,ga,o doc; style smd fill:#33a1ff,stroke:#1C90F3,stroke-width:3px; style ast fill:#D0CECE,stroke:#595959,stroke-width:1px;

For developers, that intermediate format is an Abstract Syntax Tree (AST).