Data converted from Unicode to SBCS or DBCS is subject to data loss, because a given code page might not be able to represent every character used in that particular Unicode data. Each SBCS/DBCS code page supports a different subset, differently encoded.ĭata converted from one SBCS or DBCS code page to another is subject to corruption, because the same data value on different code pages can encode a different character.
Each SBCS/DBCS code page supports different characters, but no code page supports the full breadth of characters provided by Unicode.
Some legacy protocols require the use of SBCS and DBCS code pages. Instead of encoding characters in their own right, lead bytes can be mapped to a character only in conjunction with a "trail byte". In such a code page, some characters have two-byte encodings with certain byte values (always values greater than 127) serving as "lead bytes". DBCS code pages are used for languages such as Japanese and Chinese. In SBCS pages, each byte directly encodes a single character, so that it is possible to represent exactly 256 distinct characters (including control characters, letters, digits, punctuation, symbols, and the like). Like other code pages, each page is known by a numeric identifier and can be handled with many of the same Unicode and character set API functions.Ĭode pages can be either single-byte character set (SBCS) pages or double-byte character set (DBCS) pages. Two encodings of Unicode (UTF-7 and UTF-8) are implemented as code pages. Examples are EBCDIC and Macintosh code pages.
In addition to Windows and OEM code pages, your applications can use non-native code pages. Windows code page 1252 and OEM code page 437 are generally used in the United States. Each character set includes different special characters, typically customized for a language or group of languages. Characters represented by the remaining codes, 0x80 through 0xff, vary among character sets. Code values 0x00 through 0x19 and 0x7F always represent standardized control characters and 0x20 through 0x7E represent standardized displayable characters. The usual OEM code page for English is code page 437.įor both Windows code pages and OEM code pages, the code values 0x00 through 0x7F correspond to the 7-bit ASCII character set. They are also used for the non-extended file names in the FAT12, FAT16, and FAT32 file systems, as described in Character Sets Used in File Names. These code pages were originally used for MS-DOS and are still used for console applications. Original equipment manufacturer (OEM) code pages are code pages for which non-ASCII values represent line drawing and punctuation characters. All ANSI versions of API functions use the currently active code page. A Windows operating system always has one currently active Windows code page. Windows code pages are also sometimes referred to as "active code pages" or "system active code pages". See Windows Data Types for Strings and Conventions for Function Prototypes. The "A" version handles text based on Windows code pages, while the "W" version handles Unicode text.
Many Windows API functions have "A" (ANSI) and "W" (wide, Unicode) versions.
That draft eventually became ISO 8859-1, but Windows code page 1252 was implemented before the standard became final, and is not exactly the same as ISO 8859-1. Originally, Windows code page 1252, the code page commonly used for English and other Western European languages, was based on an American National Standards Institute (ANSI) draft.