A BOM can be used to indicate that unlabeled text is UTF-16 or UTF-8 encoded, as well as indicating the byte-order of UTF-16 text, whether labeled or not.
In UTF-16, a BOM is expressed as the 8-bit byte sequence FE FF at the beginning of the encoded string, to indicate that the encoded characters that follow it use big-endian byte order; or it is expressed as the byte sequence FF FE to indicate little-endian order.
UTF-8 text can also use a BOM, although this is rare, since UTF-8 prescribes a fixed byte order, and since UTF-8 is often assumed or implicit, so it doesn't need a signature. The UTF-8 representation of the BOM is the byte sequence EF BB BF.
Search Encyclopedia
|
Featured Article
|