Algorithms Design | Clanguage | Index
01000010 01111001 01110100 01100101 01110011 00101100 0010000001101110 01110101 01101101 01100010 01100101 01110010 0111001100101100 00100000 01100011 01101000 01100001 01110010 01100001 01100011 01110100 01100101 01110011 01110011 00001010
The memory of any computer is a sequence of bytes.Each byte is a sequence of 8bits (binary digits)and therefore has 28 = 256 possible values:
000000000000000100000010⋮1111111011111111
A sequence of consecutive bytes in memory can be interpreted in three different ways:
- as a natural number (0, 1, 2, 3, … ),
- as an integer ( … , −2, −1, 0, +1, +2, … )or
- as a character (A, B, C, … ,a, b, c, … , &, +, -, *, … ) .
This chapter discusses these three interpretations.
Table of contents:
- Natural numbers and binary notation
- Integers and two's complement notation
- Characters and the ASCII table
- Questions and answers
Natural numbers and binary notation
Every sequence of bits can be seen as a natural numberin binary notation:the number is the sum of the powers of2 that correspond to the 1 bits.For example,the sequence 1101 represents the number23 + 22 +20,which is equal to13.The sequence 1111represents23 + 22 +21 + 20,which is equal to15.
Every sequence of s bytes—that is, 8s bits—represents a natural number in the closed interval
0 . . 28s−1.
If s = 1, for example,the interval goes from 0 to 28−1,i.e., from 0to255.If s = 2, the interval goes up to 216−1,i.e., 65535.If s = 4, the interval goes up to 232−1,i.e., 4294967295.
Example.In order to make the example fit on the page,we take s = 1and pretend that each byte has only4 bits.A sequence of 4 bits represents, in binary notation,a number in the interval 0 . . 24−1:
byte | number |
---|---|
0000 | 0 |
0001 | 1 |
0010 | 2 |
0011 | 3 |
0100 | 4 |
0101 | 5 |
0110 | 6 |
0111 | 7 |
1000 | 8 |
1001 | 9 |
1010 | 10 |
1011 | 11 |
1100 | 12 |
1101 | 13 |
1110 | 14 |
1111 | 15 |
Exercises 1
- Show that every natural number can be written in binary notation.
- Show that 2k + 2k−1 +… +21 + 20 =2k+1−1,for any natural numberk.
- Write the numbers 28,28−1,216,216−1,232, and232−1in hexadecimal notation.
Integers and two's complement notation
Let s be a nonnull natural number.Every sequence of s bytes—that is, 8s bits—can be interpreted as an integer in the closed interval
−28s−1. . 28s−1−1 .
Ifs = 1, for example,this interval goes from −27 to27−1,i.e., from −128 a127.Ifs = 2,the interval goes from−215 to215−1,i.e., from −32768 to 32767.If s = 4,the interval goes from−231 to231−1,i.e., from −2147483648 to 2147483647.
What integer does a given sequence of 8s bits represent?Begin by interpreting the sequence as a natural numberin binary notation.Let's say that this number isn.If the first bit of the sequence is0,the sequence represents the positive integern.If the first bit is1,the sequence represents the strictly negative integern − 28s.This way of representing integersis known astwo's complement notation.
Example.For the example to fit on the page,we take s = 1and pretend that each byte has only4 bits.Any such sequence of bits represents an integer in the interval −23 . . 23−1 :
byte | integer |
---|---|
0000 | +0 |
0001 | +1 |
0010 | +2 |
0011 | +3 |
0100 | +4 |
0101 | +5 |
0110 | +6 |
0111 | +7 |
1000 | −8 |
1001 | −7 |
1010 | −6 |
1011 | −5 |
1100 | −4 |
1101 | −3 |
1110 | −2 |
1111 | −1 |
Exercises 2
- Complement of n.We have shown above how the two's complement notation transforms any sequence of s byteswhose first bit is1into a negative integer.Now consider the converse operation.Given an integern in the interval −28s−1 . . −1,show that the sequence of s bytes that represents nin two's complement notationis equal to the sequence of bytes that represents the natural number n + 28sin binary notation.
- Two's complement.The two's complement notation transformsany sequence of s byteswhose first bit is1into a negative integer.Now consider the converse operation.Suppose that n is um integer in the interval −28s−1 . . −1.Take the sequence of bits that represents the absolute value ofn in binary notation;complement all the bits(that is, change 0s into1sand vice versa)and add1, in binary, to the result.Show that this operation produces the sequence of s bitsthat represents n in two's complement notation.
- An alternative for two's complement?Suppose, as we did in the example above, that we have only 4 bits to represent integers.Now consider the following interpretation of such a sequence of 4bits.Let n be the positive integer represented by the last three bits in binary notation.If the first bit is0,then the whole sequence represents the positive integern.If the first bit is1,then the whole sequence represents the negative integer−n.(For example, the sequence 1101 represents−5.)Discuss the disadvantages of representing integers in this way.
- Write the numbers 27,27−1,215,215−1,231, and231−1in hexadecimal notation.
Characters and the ASCII table
A character is any typographic symbol(letter, digit, punctuation mark, and soon).Examples of characters:@,A, B, C, a, b, c, +, -, *, /, =, £, À, ñ, ó, ≤,≠ .(Do not confuse the idea of character with the char typeof the C language.)
In this chapter, we consider only the small set of 128 characters known as the ASCII alphabet.This set includes the characters
! " # $ % & ' ( ) * + , - . /0 1 2 3 4 5 6 7 8 9 : ; < = > ? @A B C D E F G H I J K L M N O P Q R S T U V W X Y Z[ \ ] ^ _ `a b c d e f g h i j k l m n o p q r s t u v w x y z{ | } ~
(the first character is a blank space)and a few others.
Every bytewhose first bit is 0represents a character in the ASCII alphabet.The correspondence between bytes and characters is defined by the ASCII table.Here is a small sample of that table:
byte | character |
---|---|
00111111 | ? |
01000000 | @ |
01000001 | A |
01000010 | B |
01000011 | C |
01100001 | a |
01100010 | b |
01100011 | c |
01111110 | ~ |
We use verbal shortcuts to refer to ASCII characters.For example, rather than sayingthecharacterA
we can saythecharacter65
,since the byte that corresponds toAin the ASCII table is the representation of 65 in binary notation.
Control characters.Besides the ninety-five normal
characters, the ASCII alphabet containsthirty-three control characters.These characters are not typographic symbols like the others and therefore are indicated by a special notation:a backslash followed by a digit or a letter.Here are the most used control characters:
byte | character | name |
---|---|---|
00000000 | \0 | null character |
00001001 | \t | horizontal tabulation (tab) |
00001010 | \n | end of line (newline) |
00001011 | \v | vertical tabulation |
00001100 | \f | end of page (new page) |
00001101 | \r | carriage return |
The character \0is used to mark the end of a stringand takes no space when displayed;the character \nsignals the end of a line of textand produces a jump to a new line when displayed;the character \f signals the end of a page;and so on.Though the space (character32) is not a control character,it can be indicated by \(backslash followed by a space).
The characters\,\t,\n,\v, \f, and\rare collectively known aswhite-spaces.Many functions of the standard libraries treat all the white-space charactersas if they were spaces.
Non-ASCII characters.If you only use English,the ASCII alphabet is likely all you need.However,you should be aware that the ASCII alphabet lacksmany letters from other languages,for example letters with diacritics such asÀ,ñ,ó,etc.,and special symbols such as £,≤, ≠,etc.Each of these charactersis represented by two or more consecutive bytesin a coding scheme known as UTF-8.More about this will be said in chapterStrings and character chainsand in chapterUnicode and UTF-8.
Exercises 3
- What bytes represent the charactersO,o,0 and \0 ?
- Write the bytes01000001,01000010, and01000011in hexadecimal notation.
- Write, in decimal notation,the sequence of bytes that represents the text
A byte has 8 bits.
. - Consider the bytes that represent the natural numbers 3965 39 32 105 115 54 53 10 39 97 39 32 105 115 5755in binary notation.What is the sequence of characters represented by this sequence of bytes?
- The epigraph at the top of this page is a sequence of ASCII characters.Decode the epigraph.
- Byte inspection.Study the documentation of the programs od and hexdump(the names are shorthands of octal dump and hexadecimal dump).These utilities display, byte-by-byte, the contents of any file.
Questions and answers
- Question:The ASCII alphabet uses only the bytes whose first bit is0.Why not use the bytes whose first bit is1to represent letters with diacritics and some special symbols?
Answer:The ISO-LATIN-1 table(also known as ISO-8859-1)does exactly this.But that table is not much used nowadays.
Updated 2019-01-30
https://www.ime.usp.br/~pf/algorithms/
© Paulo Feofiloff
CS-IME-USP