Lecture #5 Floating Point Arithmetic
· Optimizing Integer Arithmetic
Ripple Adders use a series of 1-bit full adders to create a multi-bit adder. Ripple addition has a long propagation delay due to the carry being dependant on each previous stage.
· Floating Point Number
numbers with fractions, e.g., 3.1416
very small numbers, e.g., .000000001
very large numbers, e.g., 3.15576 x 109
·
Floating
Point Representation:
sign, exponent, significand:
(1)sign x significand
x 2exponent
more bits for significand gives more accuracy
more bits for exponent increases range
· Optimizing Integer Arithmetic
single precision: 8 bit exponent,
23 bit significand
double precision: 11 bit exponent,
52 bit significand
Leading 1 bit of significand is implicit
Exponent is biased to make sorting easier
all 0s
is smallest exponent all 1s is largest
bias of
127 for single precision and 1023 for double precision
summary: (1)sign ΄ (1+significand) ΄ 2exponent bias
· Example
decimal: -.75 = - ( ½ + Ό )
binary: -.11 = -1.1 x 2-1
floating point: exponent = 126 =
01111110
IEEE single precision: 1 01111110 10000000000000000000000
· Floating Point Addition

