In a recent email, I got these questions about CD records (the proprietary rich-text format used by Notes, CD stands for Composite Document):
Q: Do CD records map to HTML tags completely?
A: No. Like many of Notes' fundamental technologies, CD records were designed before their now-ubiquitous standardized counterpart (in this case, HTML). CD records cover much of the same ground as HTML (both represent rich text after all), and there are some clear mappings between them (how many different ways are there to say, "start a new paragraph"), but since CD records predate HTML, and HTML was not designed to map cleanly to CD (why should it have been?), there are many mismatches between the two.
The mismatches can be reduced if modern CSS is used to augment the HTML, but there's still a lot of work to be done to convert between the two. Luckily, there's already technology out there that can help. First, the Domino Web server translates CD to HTML as part of serving Notes content to the web. Second, Lotus's DXL provides an XML representation of CD content which may be easier to deal with. Third, there are third-party tools (notably the Midas Rich Text LSX) which can do some or all of the job for you.
Q: How do I determine the character set of a note? There seems to be no information stored about the character set.
A: All text (well, almost all text: native MIME data is the main exception) in Notes is stored in a proprietary character set called LMBCS (Lotus Multi-Byte Character Set). Again, Lotus needed a way to represent a wide range of texts in a unified way, and developed a solution well before the standards community came to the rescue with Unicode. When reading text from Notes (for example, with the C API), you must convert the text from LMBCS to whatever character set you want to use (using OSTranslate).