However, sometimes I need to adapt existing HTML5 content to XHTML5, its more strict XML compatible serialisation. For example when I generate eBooks, or want to use XML based processing tools.
Is there any tool that automates this translation? I have looked around but couldn’t find anything yet, even if BeautifulSoup seems to be able to do a part of it.
The ideal tool needs to re-serialise the document with all the xml rules (lowercase tags, properly cased etc.). It should also wrap inline scripts and styles in CDATA tags… And some other requirements of the spec…
Bonus points if it can convert to polyglot XHTML ( http://dev.w3.org/html5/html-polyglot/html-polyglot.html ), which requires also accommodating the oddities of HTML parsing (commenting out CDATA sections, make script tags not self-closing, etc.).
And finally I’d like it to be able to normalise whitespace.
I’d love for it to be in JS, but Python or Java would work for me as well.
Thanks for your suggestions!