Character encoding

Computers can only work in 1's and 0's as we should already know. In order to represent text a way of mapping each letter to a binary number needs to found. This is known as character encoding.

There are three main methods you need to be aware of -

EBCDIC is no longer used (apart from on very old systems). ASCII is the primary character set for most western machines while UNICODE is becoming the code of choice due to its extended chacater set for use with arabic and asian style languages which contain many thousands of characters.

Before you look at the above consider a simple example -

A

1

 

G

7

 

M

13

 

S

19

 

Y

25

B

2

 

H

8

 

N

14

 

T

20

 

Z

26

C

3

 

I

9

 

O

15

 

U

21

 

 

 

D

4

 

J

10

 

P

16

 

V

22

 

 

 

E

5

 

K

11

 

Q

17

 

W

23

 

 

 

F

6

 

L

12

 

R

18

 

X

24

 

 

 

We can then encode the text "Hello"as

H

E

L

L

O

8

5

12

12

15

So 8,5,12,12,15 means hello. Now that we have numbers it is a simple task of converting these numbers into binary. As we have 26 possible values the smallest number of bits we need is 5 bits

H

E

L

L

O

01000

00101

01100

01100

01111

So 01000 00101 01100 01100 01111 is hello.