Each Unicode code point can be expressed in several different formats. These formats are called Unicode transformation formats (UTFs). For example, the letter M is the Unicode code point U+004D. In UTF-8, this code point is represented as X’4D’. In UTF-16, this code point can be represented as X'004D'.

NameUTF-8UTF-16UTF-16BEUTF-16LEUTF-32UTF-32BEUTF-32LE
Smallest code point0000000000000000000000000000
Largest code point10FFFF10FFFF10FFFF10FFFF10FFFF10FFFF10FFFF
Code unit size8 bits16 bits16 bits16 bits32 bits32 bits32 bits
Byte orderN/ABOMbig-endianlittle-endianBOMbig-endianlittle-endian
Fewest bytes per character1222444
Most bytes per character4444444

BOM

  • Byte Order Map