Preface | p. xi |
Acknowledgments | p. xiii |
Notation | p. xv |
Integer Addition | p. 1 |
Background | p. 1 |
Ripple Adders; Manchester Carry Chain | p. 2 |
Carry Skip Adders; Multilevel Carry Skip | p. 2 |
Carry-Select and Conditional-Sum Adders | p. 3 |
Carry Lookahead Adders; Canonic Adders | p. 4 |
Ling Adders | p. 6 |
Adder Implementations | p. 7 |
An ECL Ling Adder | p. 10 |
Group Generates | p. 10 |
Lookahead Network | p. 10 |
Final Sum | p. 13 |
Critical Path | p. 14 |
Implementation | p. 14 |
A CMOS Ling Adder | p. 14 |
Group Generates | p. 16 |
Lookahead Network | ...d>p. 17
Final Sum | p. 18 |
Critical Path | p. 119 |
Implementation | p. 20 |
Conclusion | p. 21 |
Floating-Point Addition | p. 23 |
Improved Algorithms for High-Speed FP Addition | p. 23 |
A Brief Review of FP Addition Algorithms (A1 and A2) | p. 24 |
A New Algorithm: A3 (Two Path with Integrated Rounding) | p. 27 |
Summary | p. 33 |
Variable-Latency FP Addition | p. 33 |
Variable-Latency Algorithm | p. 34 |
Two-Cycle Algorithm | p. 34 |
One-Cycle Algorithm | p. 36 |
Performance Results | p. 38 |
Conclusion | p. 42 |
Multiplication with Partially Redundant Multiples | p. 43 |
Introduction | p. 43 |
Background | p. 43 |
Add and Shift | p. 43 |
Dot Diagrams | p. 45 |
Booth's Algorithm | p. 46 |
Booth 3 | p. 47 |
Booth 4 and Higher | p. 50 |
Redundant Booth | p. 50 |
Booth 3 with Fully Redundant Partial Products | p. 50 |
Booth 3 with Partially Redundant Partial Products | p. 52 |
Dealing with Negative Partial Products | p. 52 |
Booth with Bias | p. 53 |
Choosing the Right Constant | p. 55 |
Producing the Multiples | p. 57 |
Redundant Booth 3 | p. 57 |
Redundant Booth 4 | p. 58 |
Choosing the Adder Length | p. 63 |
Conclusion | p. 63 |
Multiplier Topologies | p. 65 |
Review of Issues in Partial-Product Summation | p. 66 |
Regular Topologies | p. 68 |
Array Topologies | p. 69 |
Tree Topologies | p. 75 |
Effects of the Number of Tracks per Channel | p. 85 |
Irregular Topologies | p. 89 |
Wallace Tree | p. 89 |
Algorithmic Generation | p. 89 |
Conclusion | p. 99 |
Technology Scaling Effects on Multipliers | p. 101 |
Effects of Smaller Feature Sizes | p. 101 |
Wire Effects | p. 102 |
Binary Trees vs Procedural Layouts | p. 106 |
Scaling Effects on Encoding Schemes | p. 109 |
Topology | p. 110 |
Area [times] Time Product | p. 112 |
Pipelining | p. 113 |
Power | p. 114 |
Encoding Schemes | p. 115 |
Topology | p. 115 |
Conclusion | p. 116 |
Design Issues in Division | p. 117 |
Introduction | p. 117 |
System Level Study | p. 118 |
Instrumentation | p. 118 |
Method of Analysis | p. 119 |
Results | p. 120 |
Instruction Mix | p. 120 |
Compiler Effects | p. 120 |
Performance and Area Tradeoffs | p. 122 |
Shared-Multiplier Effects | p. 125 |
Shared Square Root | p. 127 |
On-the-Fly Rounding and Conversion | p. 128 |
Consumers of Division Results | p. 129 |
Conclusion | p. 130 |
Minimizing the Complexity of SRT Tables | p. 133 |
Theory of SRT Division | p. 134 |
Recurrence | p. 134 |
Choice of Radix | p. 135 |
Choice of Quotient Digit-Set | p. 135 |
Implementing SRT Tables | p. 138 |
Divisor and Partial-Remainder Estimates | p. 138 |
Uncertainty Regions | p. 139 |
Reducing Table Complexity | p. 140 |
Experimental Methodology | p. 143 |
TableGen | p. 143 |
Table Synthesis | p. 145 |
Results | p. 145 |
Same-Radix Tradeoffs | p. 145 |
Higher Radices | p. 147 |
Conclusion | p. 150 |
Very High-Radix Division | p. 153 |
Taylor Series Expansion | p. 153 |
Algorithm A | p. 154 |
Number of Accurate Bits per Iteration | p. 157 |
Representing X Using Redundancy | p. 161 |
Algorithm B | p. 162 |
Number of Accurate Bits per Iteration | p. 164 |
Representing X Using Redundancy | p. 170 |
Algorithm C | p. 171 |
Theory | p. 171 |
Lookup-Table Construction | p. 173 |
Booth Recoding | p. 173 |
Error Analysis | p. 175 |
Optimization Techniques | p. 175 |
Discussion | p. 178 |
Related Algorithms | p. 179 |
Cyrix Short-Reciprocal Algorithm | p. 179 |
Comparison with the Newton-Raphson Method | p. 179 |
Comparison with the IBM RISC System/6000 | p. 181 |
Comparison with MacLaurin Series | p. 181 |
Conclusion | p. 182 |
Using a Multiplier for Function Approximation | p. 183 |
Proposed Method: Implementation | p. 183 |
Partial-Product Array | p. 183 |
Related Work | p. 186 |
Implementation | p. 187 |
Summary of Implementation | p. 192 |
Proposed Method: Derivation | p. 192 |
Algorithm 1: Describing an Operation as a Signed PPA | p. 192 |
Algorithm 2: Adapting the Signed PPA to the Multiplier's PPA | p. 200 |
Performance and Comparisons | p. 203 |
Summary of Derivation | p. 207 |
Reciprocal, Division, and Square Root | p. 207 |
Reciprocal Operation | p. 208 |
Division Operation | p. 213 |
Square-Root Operation | p. 220 |
Conclusion | p. 230 |
FUPA | p. 235 |
Introduction | p. 235 |
Background | p. 237 |
Components of FUPA | p. 237 |
Technology Scaling | p. 238 |
Latency Component of FUPA | p. 239 |
Derivation of Delay Scale Factor | p. 239 |
Area Component of FUPA | p. 240 |
Derivation of Area Scale Factor | p. 240 |
Relationship between Area, Operating Frequency, and Power | p. 241 |
Application Profiling | p. 242 |
Computation of FUPA | p. 243 |
Microprocessor FPU Comparisons | p. 244 |
Effective and Normalized Effective Latency of Microprocessor FPUs | p. 244 |
Die-Area and Normalized-Die-Area Usage of Microprocessor FPUs | p. 246 |
FUPA of Microprocessor FPUs | p. 247 |
Limitations of FUPA | p. 249 |
Conclusion | p. 249 |
High-Speed Clocking Using Wave Pipelining | p. 251 |
Background | p. 251 |
Pipelining and Wave Pipelining | p. 252 |
Wave Pipeline Research | p. 253 |
Theory | p. 254 |
Minimum Clock Period for Traditional Pipelines | p. 254 |
Minimum Clock Period for Wave Pipelines | p. 256 |
Device Technologies: Applicability and Performance | p. 259 |
Performance Limits of Wave Pipelining | p. 260 |
Path-Length Imbalance | p. 261 |
Data Dependences | p. 261 |
Fabrication Process | p. 262 |
Environmental Variation | p. 263 |
CMOS Process and Environmental Performance Limits | p. 264 |
Design Optimizations | p. 265 |
Rough Tuning | p. 266 |
Fine Tuning | p. 266 |
CMOS Delay Compensation | p. 270 |
SNAP Wave-Pipeline Demonstration VLSI | p. 273 |
Bipolar Population Counter | p. 273 |
CMOS Multipler Circuit | p. 273 |
CMOS VLSI Vector Unit | p. 274 |
Conclusion | p. 283 |
Rational Arithmetic | p. 285 |
Introduction | p. 285 |
Continued Fractions | p. 287 |
The M-log-Fraction Transformation | p. 290 |
The Signed-Digit M-log Fraction | p. 292 |
A Rational-Arithmetic Unit | p. 293 |
Linear Fractional Transformation | p. 297 |
Quadratic Transformations | p. 298 |
A Shift-and-Add-Based Rational-Arithmetic Unit | p. 298 |
Implementing the Bilinear Function (ax + b)/(cx + d) | p. 299 |
VLSI Implementation of Rational Arithmetic Units | p. 301 |
Higher-Radix Rational Arithmetic | p. 302 |
Related Work | p. 304 |
Conclusions | p. 306 |
Historical Notes on Continued Fractions in Arithmetic | p. 306 |
Bibliography | p. 309 |
Index | p. 321 |
Table of Contents provided by Syndetics. All Rights Reserved. |