Character encoding
Computers only understand 0 & 1. We can represent decimals using base2. What about characters? We need an agreed mapping between characters and values.
ASCII encoding works like this: Each character is converted into a corresponding binary. One grapheme -> 1 byte. Works only for English as 8 bits can represent up to 256 symbols only.
UTF-32
One grapheme -> 4 bytes. So, it’s wasteful. Even if the character only “needed” 1 byte it’s assigning 4 bytes for it. i.e. English letters.
UTF-8 | Most adopted encoding scheme
Solve the issues with UTF-32. It maps each grapheme to [1,4] bytes.
Code points with lower values are mapped to 1 bytes while larger ones are given more. English alphabets are given the same mapping in ASCII and UTF-8. As a result, UTF-8 is backward compatible as ASCII programs can read UTF-8 easily. So, English letters get fewer bytes while other languages are assigned more.