Authoring tools on MS Windows, in particular MS FrontPage ("WYSIWYG" HTML editor), generate invalid Numerical Character References for characters commonly found in positions 128...159 (0x80...0x9f) in Windows fonts. Although these are valid codepoints for windows-1252 (and other windows-xxxx) charsets, valid NCRs always refer to the document character set in the SGML sense, not to the character encoding scheme (or charset). For HTML, the SGML document character set is fixed, it is always a subset of Unicode (or ISO 10646). In Unicode and its iso-8859-1 subset, values 128...159 are C1 control characters, they must not appear in HTML. Valid NCRs for the intended characters use Unicode values greater than 256.
Lynx tries to interpret some of the invalid codes, by assuming that they are windows-1252 codepoints.
You may want to press '\' to view the source of this test. Code invalid NCRvalid NCR, description normal in ALT 0x80 € #EURO SIGN 0x81 #NOT USED 0x82 ‚ #SINGLE LOW-9 QUOTATION MARK 0x83 ƒ #LATIN SMALL LETTER F WITH HOOK 0x84 „ #DOUBLE LOW-9 QUOTATION MARK 0x85 … #HORIZONTAL ELLIPSIS 0x86 † #DAGGER 0x87 ‡ #DOUBLE DAGGER 0x88 ˆ #MODIFIER LETTER CIRCUMFLEX ACCENT 0x89 ‰ #PER MILLE SIGN 0x8a Š #LATIN CAPITAL LETTER S WITH CARON 0x8b ‹ #SINGLE LEFT-POINTING ANGLE QUOTATION MARK 0x8c Œ #LATIN CAPITAL LIGATURE OE 0x8d #NOT USED 0x8e #NOT USED 0x8f #NOT USED 0x90 #NOT USED 0x91 ‘ #LEFT SINGLE QUOTATION MARK 0x92 ’ #RIGHT SINGLE QUOTATION MARK 0x93 “ #LEFT DOUBLE QUOTATION MARK 0x94 ” #RIGHT DOUBLE QUOTATION MARK 0x95 • #BULLET 0x96 – #EN DASH 0x97 — #EM DASH 0x98 ˜ #SMALL TILDE 0x99 ™ #TRADE MARK SIGN 0x9a š #LATIN SMALL LETTER S WITH CARON 0x9b › #SINGLE RIGHT-POINTING ANGLE QUOTATION MARK 0x9c œ #LATIN SMALL LIGATURE OE 0x9d #NOT USED 0x9e #NOT USED 0x9f Ÿ #LATIN CAPITAL LETTER Y WITH DIAERESIS