Skip to main content

Full text of "intel :: pentium :: 1993 Intel Pentium Processor Users Manual Volume 3"

See other formats


Pentium Processor User's Manual 

Volume 3: Architecture and Programming Manual 





Founded in 1968 to pursue the integration of large numbers of 
transistors onto tiny silicon chips, Intel's history has been marked by a 
remarkable number of scientific breakthroughs and innovations. In 1971, 
Intel introduced the 4004, the first microprocessor. Containing 2300 
transistors, this first commercially available computer-on-a-chip is 
primitive compared with today's million-plus transistor products. 

Innovations such as the microprocessor, the erasable programmable 
read-only memory (EPROM) and the dynamic random access memory 
(DRAM) revolutionized electronics by making integrated circuits the 
mainstay of both consumer and business computing products. 

Over the last two-and-a-half decades, Intel's business has evolved 
and today the company's focus is on delivering an extensive line of 
component, module and system-level building block products to the 
computer industry. The company's product line covers a broad spectrum, 
and includes microprocessors, flash memory, microcontrollers, a broad 
line of PC enhancement and local area network products, multimedia 
technology products, and massively parallel supercomputers. Intel's 32-bit 
X86 architecture, represented by the Intel386™ and Intel486™ 
microprocessor families, is the de facto standard of modern business 
computing in millions of PCs worldwide. 

Intel has over 26,000 employees located in offices and 
manufacturing facilities around the world. Today, Intel is the largest 
semiconductor company in the world. 



Intel 



LITERATURE 



To order Intel literature or obtain literature pricing information in the U.S. and Canada call or write Intel 
Literature Sales. In Europe and other international locations, please contact your local sales office or 
distributor. 



INTEL LITERATURE SALES 
P.O. Box 7641 

Mt. Prospect, IL 60056-7641 

CURRENT HANDBOOKS 



In the U.S. and Canada 
call toll free 
(800) 548-4725 

777/s 800 number is for external customers only. 



Product line handbooks contain data sheets, application notes, article reprints, and other design 
information. All handbooks can be ordered individually, and most are available in a pre-packaged set in the 
U.S. and Canada. 



Title 

SET OF TWELVE HANDBOOKS 

(Available in U.S. and Canada) 

CONTENTS LISTED BELOW FOR INDIVIDUAL ORDERING: 
CONNECTIVITY 

EMBEDDED APPLICATIONS (1993/94) 

EMBEDDED MICROCONTROLLERS & PROCESSORS 

(2 volume set) 

MEMORY PRODUCTS 
MICROCOMPUTER PRODUCTS 
MICROPROCESSORS (2 volume set) 
MOBILE COMPUTER PRODUCTS 

1750®, i860™, i960® PROCESSORS AND RELATED PRODUCTS 
PACKAGING 

PERIPHERAL COMPONENTS 

PRODUCT OVERVIEW 

(A guide to Intel Architectures and Applications) 

PROGRAMMABLE LOGIC 



Intel 

Order Number 
231003 



231658 
270648 
270645 

210830 
280407 
230843 
241420 
272084 
240800 
296467 
210846 

296083 



ISBN 



N/A 



1-55512-174-8 
1-55512-179-9 
1-55512-176-4 

1-55512-172-1 
1-5551 2-1 73-X 
1-55512-169-1 
1-55512-186-1 
1-55512-185-3 
1-55512-182-9 
1-55512-181-0 
N/A 

1-55512-180-2 



ADDITIONAL LITERATURE: 

(Not included in handbook set) 

AUTOMOTIVE 

COMPONENTS QUALITY/RELIABILITY 
CUSTOMER LITERATURE GUIDE 

INTERNATIONAL LITERATURE GUIDE 

(Available in Europe only) 

MILITARY AND SPECIAL PRODUCTS (2 volume set) 
SYSTEMS QUALITY/RELIABILITY 

HANDBOOK DIRECTORY 

(Index of all data sheets contained in the handbooks) 



231792 
210997 
210620 
E00029 

210461 
231762 
241197 



1-5551 2-1 25-X 

1-55512-132-2 

N/A 

N/A 

1-55512-189-6 
1-55512-091-9 
N/A 



LITINCOV-W/111193 



Pentium™ Processor 
User's Manual 

Volume 3: 

Architecture and Programming Manual 




Intel Corporation makes no warranty for the use of its products and assumes no responsibility for any errors which may 
appear in this document nor does it make a commitment to update the information contained herein. 

Intel retains the right to make changes to these specifications at any time, without notice. 

Contact your local sales office to obtain the latest specifications before placing your order. 

The following are trademarks of Intel Corporation and may only be used to identify Intel products: 



376™ 


i860™ 


MCS® 


Above™ 


i960® 


Media Mail™ 


ActionMedia® 


Intel287™ 


NetPort® 


BITBUS™ 


Intel386™ 


NetSentry™ 


Code Builder™ 


Intel387™ 


NetSight™ 


DeskWare™ 


Intel486™ 


OpenNET™ 


Digital Studio™ 


Intel487™ 


OverDrive™ 


DVI® 


Intel® 


Paragon™ 


EtherExpress™ 


intel inside® 


Pentium™ 


ETOX™ 


Intellec® 


ProSolver™ 


ExCA™ 


IPSC® 


RapidCAD™ 


Exchange and Go™ 


iRMX® 


READY-LAN™ 


FaxBACK® 


iSBC® 


Reference Point® 


Grand Challenge™ 


iSBX™ 


RMX/80™ 


j® 


iWARP™ 


RxServer™ 


ICE™ 


LANDesk™ 


SatisFAXtion® 


iCOMP™ 


LANPrint® 


SmartWire™ 


iLBX™ 


LANProtect™ 


Snapln386™ 


Inboard™ 


LANSelect® 


Storage Broker™ 


Indeo™ 


LANShell® 


StorageExpress™ 


i287™ 


LANSight™ 


SugarCube™ 


i386™ 


LANSpace® 


The Computer Inside™ 


i387™ 


LANSpool® 


Token Express™ 


i486™ 


MAPNET™ 


Visual Edge™ 


i487™ 


Matched™ 


WYPIWYF® 



i750® 

MDS is an ordering code only and is not used as a product name or trademark MDS is a registered trademark of Mohawk 
Data Sciences Corporation. 

CHMOS and HMOS are patented processes of Intel Corp. 

Intel Corporation and Intel's FASTPATH are not affiliated with Kinetics, a division of Excelan, Inc. or its FASTPATH trade- 
mark or products. 

OS/2 is a registered trademark of IBM Corp. UNIX is a registered trademark of UNIX System Laboratories, Inc. Windows is a 
trademark of Microsoft Corp. 

Additional copies of this manual or other Intel literature may be obtained from: 

Intel Corporation 

Literature Sales 

P.O. Box 7641 

Mt. Prospect, IL 60056-7641 



INTEL CORPORATION 1993 



CG-010493 



TABLE OF CONTENTS 



CHAPTER 1 

GETTING STARTED Page 

1 .1 . HOW TO USE THIS MANUAL 1 -1 

1.1.1. Part I — Application and Numeric Programming 1-2 

1 .1 .2. Part II — System Programming 1-2 

1 .1 .3. Part III— Compatibility 1-4 

1 .1 .4. Part IV— Optimization 1-4 

1 .1 .5. Part V— Instruction Set 1-4 

1.1.6. Appendices 1-4 

1 .2. RELATED LITERATURE 1-4 

1 .3. NOTATIONAL CONVENTIONS 1-5 

1 .3.1 . Bit and Byte Order 1-5 

1 .3.2. Undefined Bits and Software Compatibility 1 -6 

1 .3.3. Instruction Operands 1 -6 

1 .3.4. Hexadecimal Numbers 1 -7 

1 .3.5. Segmented Addressing 1 -7 

1.3.6. Exceptions 1-8 



CHAPTER 2 

INTRODUCTION TO THE INTEL PENTIUM PROCESSOR FAMILY 
PART I— APPLICATION AND NUMERIC PROGRAMMING 



CHAPTER 3 

BASIC PROGRAMMING MODEL 

3.1. MEMORY ORGANIZATION 3-1 

3.1.1. Unsegmented or "Flat" Model 3-2 

3.1.2. Segmented Model 3-2 

3.2. DATATYPES 3-3 

3.3. REGISTERS 3-8 

3.3. 1 . General Registers 3-8 

3.3.2. Segment Registers 3-10 

3.3.3. Stack Implementation 3-12 

3.3.4. Flags Register 3-13 

3.3.4.1. STATUS FLAGS 3-13 

3.3.4.2. CONTROL FLAG 3-15 

3.3.5. Instruction Pointer 3-15 

3.4. INSTRUCTION FORMAT 3-15 

3.5. OPERAND SELECTION 3-17 

3.5.1. Immediate Operands 3-18 

3.5.2. Register Operands 3-18 

3.5.3. Memory Operands 3-19 

3.5.3.1 . SEGMENT SELECTION 3-19 

3.5.3.2. EFFECTIVE-ADDRESS COMPUTATION 3-20 

3.6. INTERRUPTS AND EXCEPTIONS 3-22 



I 



CONTENTS in 



CHAPTER 4 

APPLICATION PROGRAMMING Page 

4.1 . DATA MOVEMENT INSTRUCTIONS 4-1 

4.1 .1 . General-Purpose Data Movement Instructions 4-1 

4.1 .2. Stack Manipulation Instructions 4-2 

4.1 .3. Type Conversion Instructions 4-5 

4.2. BINARY ARITHMETIC INSTRUCTIONS 4-6 

4.2.1 . Addition and Subtraction Instructions 4-7 

4.2.2. Comparison and Sign Change Instruction 4-7 

4.2.3. Multiplication Instructions 4-8 

4.2.4. Division Instructions 4-9 

4.3. DECIMAL ARITHMETIC INSTRUCTIONS 4-9 

4.3.1. Packed BCD Adjustment Instructions 4-10 

4.3.2. Unpacked BCD Adjustment Instructions 4-10 

4.4. LOGICAL INSTRUCTIONS 4-1 

4.4.1 . Boolean Operation Instructions 4-1 1 

4.4.2. Bit Test and Modify Instructions 4-1 1 

4.4.3. Bit Scan Instructions 4-12 

4.4.4. Shift and Rotate Instructions 4-12 

4.4.4.1. SHIFT INSTRUCTIONS 4-12 

4.4.4.2. DOUBLE-SHIFT INSTRUCTIONS 4-15 

4.4.4.3. ROTATE INSTRUCTIONS 4-16 

4.4.4.4. FAST "bit bit" USING DOUBLE-SHIFT INSTRUCTIONS 4-17 

4.4.4.5. FAST BIT STRING INSERT AND EXTRACT 4-18 

4.4.5. Byte-Set-On-Condition Instructions 4-21 

4.4.6. Test Instruction 4-21 

4.5. CONTROL TRANSFER INSTRUCTIONS 4-22 

4.5.1 . Unconditional Transfer Instructions 4-22 

4.5.1.1. JUMP INSTRUCTION 4-22 

4.5.1 .2. CALL INSTRUCTIONS 4-22 

4.5.1.3. RETURN AND RETURN-FROM-INTERRUPT INSTRUCTIONS ...4-23 

4.5.2. Conditional Transfer Instructions 4-23 

4.5.2.1. CONDITIONAL JUMP INSTRUCTIONS 4-23 

4.5.2.2. LOOP INSTRUCTIONS 4-24 

4.5.2.3. EXECUTING A LOOP OR REPEAT ZERO TIMES 4-25 

4.5.3. Software Interrupts 4-25 

4.6. STRING OPERATIONS 4-26 

4.6.1. Repeat Prefixes 4-27 

4.6.2. Indexing and Direction Flag Control 4-28 

4.6.3. String Instructions 4-28 

4.7. INSTRUCTIONS FOR BLOCK-STRUCTURED LANGUAGES 4-29 

4.8. FLAG CONTROL INSTRUCTIONS 4-36 

4.8.1 . Carry and Direction Flag Control Instructions 4-36 

4.8.2. Flag Transfer Instructions 4-36 

4.9. NUMERIC INSTRUCTIONS 4-38 

4.10. SEGMENT REGISTER INSTRUCTIONS 4-38 

4.10.1 . Segment-Register Transfer Instructions 4-39 

4.10.2. Far Control Transfer Instructions 4-39 

4.10.3. Data Pointer Instructions 4-39 

4.11. MISCELLANEOUS INSTRUCTIONS 4-40 

4.1 1 .1 . Address Calculation Instruction 4-40 

4.1 1 .2. No-Operation Instruction 4-41 

4.1 1 .3. Translate Instruction 4-41 

4.1 1 .4. Byte Swap Instruction 4-41 

4.11.5. Exchange-and-Add Instruction 4-43 

I 



CONTENTS 



Page 

4.11.6. Compare-and-Exchange Instructions 4-43 

4. 1 1 .7. CPUID Instruction 4-44 

CHAPTER 5 

FEATURE DETERMINATION 

5.1 . CPU IDENTIFICATION 5-1 

5.2. FPU DETECTION 5-1 

5.3. SAMPLE CPUID IDENTIFICATION/FPU DETECTION CODE 5-2 

CHAPTER 6 

NUMERIC APPLICATIONS 

6.1 . INTRODUCTION TO NUMERIC APPLICATIONS 6-1 

6.1.1. History 6-1 

6.1.2. Performance 6-2 

6.1.3. Ease of Use 6-2 

6.1.4. Applications 6-4 

6.1.5. Programming Interface 6-5 

6.2. ARCHITECTURE OF THE FLOATING-POINT UNIT 6-7 

6.2.1 . Numerical Registers 6-7 

6.2.1 .1 . THE FPU REGISTER STACK 6-7 

6.2.1 .2. THE FPU STATUS WORD 6-9 

6.2.1 .3. CONTROL WORD 6-10 

6.2.1 .4. THE FPU TAG WORD 6-1 2 

6.2.1.5. OPCODE FIELD OF LAST INSTRUCTION 6-14 

6.2.1.6. THE NUMERIC INSTRUCTION AND DATA POINTERS 6-15 

6.2.2. Computation Fundamentals 6-17 

6.2.2.1. NUMBER SYSTEM 6-17 

6.2.2.2. DATA TYPES AND FORMATS 6-1 9 

6.2.2.2.1 . Binary Integers 6-19 

6.2.2.2.2. Decimal Integers 6-1 9 

6.2.2.2.3. Real Numbers 6-20 

6.2.2.3. ROUNDING CONTROL 6-23 

6.2.2.4. PRECISION CONTROL 6-24 

6.3. FLOATING-POINT INSTRUCTION SET 6-24 

6.3.1 . Source and Destination Operands 6-25 

6.3.2. Data Transfer Instructions.... 6-25 

6.3.3. Nontranscendental Instructions 6-26 

6.3.4. Comparison Instructions 6-28 

6.3.5. Transcendental Instructions 6-29 

6.3.6. Constant Instructions 6-30 

6.3.7. Control Instructions 6-31 

6.4. NUMERIC APPLICATIONS 6-32 

6.4.1. High-Level Languages 6-33 

6.4.1.1. C PROGRAMS 6-33 

6.4.1.2. PL/M-386/486 6-33 

6.4. 1 .3. ASM386/486 6-35 

6.4.1 .3.1 . Defining Data 6-35 

6.4.1 .3.2. Records and Structures 6-37 

6.4.1 .3.3. Addressing Methods 6-38 

6.4.1 .4. COMPARATIVE PROGRAMMING EXAMPLE 6-39 

6.4.1.5. CONCURRENT PROCESSING 6-43 

6.4.1 .6. MANAGING CONCURRENCY 6-43 

i 



CONTENTS 



Page 

6.4.1.7. EXCEPTION SYNCHRONIZATION 6-44 

6.4.1.8. PROPER EXCEPTION SYNCHRONIZATION 6-45 

CHAPTER 7 

SPECIAL COMPUTATIONAL SITUATIONS 

7.1 . SPECIAL NUMERIC VALUES 7-1 

7. 1 . 1 . Denormal Real Numbers 7-7 

7.1.2. Zeros 7-9 

7.1.3. Infinity 7-9 

7. 1 .4. NAN (Not-A-Number) 7-15 

7.1.4.1. SIGNALING NANS 7-16 

7.1.4.2. QUIET NANS 7-16 

7.1.5. Indefinite 7-17 

7.1.6. Encoding of Data Types 7-18 

7.1.6.1. UNSUPPORTED FORMATS 7-18 

7.1.7. Numeric Exceptions 7-18 

7.1 .8. Handling Numeric Exceptions 7-1 9 

7.1.8.1. AUTOMATIC EXCEPTION HANDLING 7-19 

7.1 .8.2. SOFTWARE EXCEPTION HANDLING 7-20 

7.1 .9. Invalid Operation 7-21 

7.1 .9.1 . STACK EXCEPTION 7-21 

7.1 .9.2. INVALID ARITHMETIC OPERATION 7-22 

7.1.10. Division by Zero 7-22 

7.1 .1 1 . Denormal Operand 7-23 

7.1 .12. Numeric Overflow and Underflow 7-24 

7.1.12.1. OVERFLOW 7-24 

7.1.12.2. UNDERFLOW 7-25 

7.1.13. Inexact (Precision) 7-26 

7.1.14. Exception Priority 7-27 

7.1 .15. Standard Underflow/Overflow Exception Handler 7-27 

CHAPTER 8 

NUMERIC PROGRAMMING EXAMPLES 

8.1 . CONDITIONAL BRANCHING EXAMPLE 8-1 

8.2. EXCEPTION HANDLING EXAMPLES 8-4 

8.3. FLOATING-POINT TO ASCII CONVERSION EXAMPLES 8-7 

8.3.1 . Function Partitioning 8-20 

8.3.2. Exception Considerations 8-20 

8.3.3. Special Instructions 8-20 

8.3.4. Description of Operation 8-21 

8.3.5. Scaling the Value 8-21 

8.3.5.1 . INACCURACY IN SCALING 8-22 

8.3.5.2. AVOIDING UNDERFLOW AND OVERFLOW 8-22 

8.3.5.3. FINAL ADJUSTMENTS 8-22 

8.3.6. Output Format 8-22 

8.4. TRIGONOMETRIC CALCULATION EXAMPLES 8-23 



i 



CONTENTS 



PART II— SYSTEM PROGRAMMING 



CHAPTER 9 

REAL-ADDRESS MODE SYSTEM ARCHITECTURE Page 

9.1 . ADDRESS TRANSLATION 9-1 

9.2. REGISTERS AND INSTRUCTIONS 9-2 

9.3. INTERRUPT AND EXCEPTION HANDLING 9-2 

9.4. REAL-ADDRESS MODE EXCEPTIONS 9-3 

CHAPTER 10 

PROTECTED MODE SYSTEM ARCHITECTURE OVERVIEW 

10.1. SYSTEM REGISTERS 10-1 

10.1.1. System Flags 10-2 

1 0.1 .2. Memory-Management Registers 1 0-4 

10.1.3. Control Registers 10-5 

1 0. 1 .4. Debug Registers 10-9 

1 0.2. SYSTEM INSTRUCTIONS 1 0-1 

CHAPTER 11 

PROTECTED MODE MEMORY MANAGEMENT 

11.1. SELECTING A SEGMENTATION MODEL 1 1 -2 

11.1.1. Flat Model 11-3 

1 1 .1 .2. Protected Flat Model 1 1 -4 

11.1.3. Multisegment Model 11-5 

1 1 .2. SEGMENT TRANSLATION 11-6 

1 1 .2.1 . Segment Registers 11-9 

1 1 .2.2. Segment Selectors 11-10 

1 1 .2.3. Segment Descriptors 11-11 

1 1 .2.4. Segment Descriptor Tables 11-15 

1 1 .2.5. Descriptor Table Base Registers 11-16 

1 1 .3. PAGE TRANSLATION 11-17 

11.3.1. Paging Options 11-18 

1 1 .3.2. Linear Address 11-18 

11.3.3. Page Tables 11-18 

1 1 .3.4. Page-Table Entries 11-19 

1 1 .3.4.1 . PAGE FRAME ADDRESS 1 1 -20 

1 1 .3.4.2. PRESENT BIT 1 1 -20 

1 1 .3.4.3. ACCESSED AND DIRTY BITS 11-21 

1 1 .3.4.4. READ/WRITE AND USER/SUPERVISOR BITS 1 1 -22 

1 1 .3.4.5. PAGE-LEVEL CACHE CONTROL BITS 1 1 -22 

1 1 .3.5. Translation Lookaside Buffers 1 1 -22 

1 1 .4. COMBINING SEGMENT AND PAGE TRANSLATION 1 1 -23 

11.4.1. Flat Model 11-23 

1 1 .4.2. Segments Spanning Several Pages 1 1 -23 

1 1 .4.3. Pages Spanning Several Segments 1 1 -23 

1 1 .4.4. Non-Aligned Page and Segment Boundaries 1 1 -24 

1 1 .4.5. Aligned Page and Segment Boundaries 1 1 -25 

1 1 .4.6. Page-Table Per Segment 1 1 -25 

CHAPTER 12 
PROTECTION 

12.1. SEGMENT-LEVEL PROTECTION 12-1 

1 2.2. SEGMENT DESCRIPTORS AND PROTECTION 1 2-2 



vii 



CONTENTS 



Page 



1 2.2.1 . Type Checking 1 2-2 

1 2.2.2. Limit Checking 12-5 

12.2.3. Privilege Levels 12-6 

1 2.3. RESTRICTING ACCESS TO DATA 1 2-7 

12.3.1 . Accessing Data in Code Segments 12-9 

1 2.4. RESTRICTING CONTROL TRANSFERS 1 2-9 

1 2.5. GATE DESCRIPTORS 12-11 

12.5.1. Stack Switching 12-14 

1 2.5.2. Returning from a Procedure 12-17 

1 2.6. INSTRUCTIONS RESERVED FOR THE OPERATING SYSTEM 12-18 

12.6.1. Privileged Instructions 12-18 

1 2.6.2. Sensitive Instructions 12-19 

1 2.7. INSTRUCTIONS FOR POINTER VALIDATION 12-19 

1 2.7.1 . Descriptor Validation 1 2-21 

12.7.2. Pointer Integrity and RPL 12-21 

1 2.8. PAGE-LEVEL PROTECTION 1 2-22 

1 2.8.1 . Page-Table Entries Hold Protection Parameters 1 2-22 

12.8.1.1. RESTRICTING ADDRESSABLE DOMAIN 12-22 

1 2.8. 1 .2. TYPE CHECKING 1 2-23 

1 2.8.2. Combining Protection of Both Levels of Page Tables 1 2-23 

1 2.8.3. Overrides to Page Protection 1 2-24 

12.9. COMBINING PAGE AND SEGMENT PROTECTION 12-24 

CHAPTER 13 

PROTECTED-MODE MULTITASKING 

13.1 . TASK STATE SEGMENT 13-2 

1 3.2. TSS DESCRIPTOR 1 3-4 

13.3. TASK REGISTER 13-5 

1 3.4. TASK GATE DESCRIPTOR 13-6 

1 3.5. TASK SWITCHING 13-8 

13.6. TASK LINKING 13-11 

13.6.1. Busy Bit Prevents Loops 13-13 

1 3.6.2. Modifying Task Linkages 13-13 

1 3.7. TASK ADDRESS SPACE 13-13 

13.7.1 . Task Linear-to-Physical Space Mapping 13-14 

1 3.7.2. Task Logical Address Space 13-14 

CHAPTER 14 

PROTECTED-MODE EXCEPTIONS AND INTERRUPTS 

14.1. EXCEPTION AND INTERRUPT VECTORS 14-1 

14.2. INSTRUCTION RESTART 14-3 

1 4.3. ENABLING AND DISABLING INTERRUPTS 14-3 

14.3.1. NMI Masks Further NMIs 14-3 

14.3.2. IF Masks INTR 14-3 

1 4.3.3. RF Masks Debug Faults 14-4 

14.3.4. MOV or POP to SS Masks Some Exceptions and Interrupts 14-4 

1 4.4. PRIORITY AMONG SIMULTANEOUS EXCEPTIONS AND INTERRUPTS 14-5 

14.5. INTERRUPT DESCRIPTOR TABLE 14-5 

1 4.6. IDT DESCRIPTORS 14-7 

1 4.7. INTERRUPT TASKS AND INTERRUPT PROCEDURES 1 4-8 

14.7.1 . Interrupt Procedures 14-9 

14.7.1 .1 . STACK OF INTERRUPT PROCEDURE 14-9 



viii 



CONTENTS 



Page 

14.7.1.2. RETURNING FROM AN INTERRUPT PROCEDURE 14-10 

14.7.1.3. FLAG USAGE BY INTERRUPT PROCEDURE 14-10 

14.7.1.4. PROTECTION IN INTERRUPT PROCEDURES 14-11 

14.7.2. Interrupt Tasks 14-11 

14.8. ERROR CODE 14-13 

1 4.9. EXCEPTION CONDITIONS 14-13 

14.9.1. Interrupt 0— Divide Error 14-14 

14.9.2. Interrupt 1 — Debug Exceptions 14-14 

1 4.9.3. Interrupt 3— Breakpoint 14-14 

1 4.9.4. Interrupt 4— Overflow 1 4-1 4 

14.9.5. Interrupt 5— Bounds Check 14-15 

1 4.9.6. Interrupt 6— Invalid Opcode 14-15 

14.9.7. Interrupt 7 — Device Not Available 14-15 

1 4.9.8. Interrupt 8— Double Fault 14-16 

14.9.9. Interrupt 9 — (Intel reserved Do not use.) 14-17 

14.9.10. Interrupt 10— Invalid TSS 14-17 

14.9.11. Interrupt 11— Segment Not Present 14-18 

14.9.12. Interrupt 12— Stack Exception 14-19 

14.9.13. Interrupt 13— General Protection 14-19 

14.9.14. Interrupt 14— Page Fault 14-20 

14.9.14.1. PAGE FAULT DURING TASK SWITCH 14-21 

14.9.14.2. PAGE FAULT WITH INCONSISTENT STACK POINTER 14-22 

14.9.15. Interrupt 16— Floating-Point Error 14-22 

14.9.15.1. NUMERICS EXCEPTION HANDLING 14-23 

14.9.15.2. SIMULTANEOUS EXCEPTION RESPONSE 14-24 

14.9.16. Interrupt 17— Alignment Check 14-24 

14.9.17. Interrupt 18— Machine Check 14-25 

14.10. EXCEPTION SUMMARY 14-26 

14.11. ERROR CODE SUMMARY 14-28 

CHAPTER 15 
INPUT/OUTPUT 

15.1. I/O ADDRESSING 15-1 

15.1.1. I/O Address Space 15-1 

15.1.2. Memory-Mapped I/O 15-2 

1 5.2. I/O INSTRUCTIONS 15-3 

15.2.1. Register I/O Instructions 15-4 

1 5.2.2. Block I/O Instructions 1 5-4 

1 5.3. PROTECTED-MODE I/O 15-5 

15.3.1. I/O Privilege Level 15-5 

1 5.3.2. I/O Permission Bit Map 15-6 

1 5.3.3. Paging and Caching 1 5-8 

15.4. ORDERING OF I/O 15-8 

CHAPTER 16 

INITIALIZATION AND MODE SWITCHING 

16.1. PROCESSOR INITIALIZATION 16-1 

1 6.1 .1 . Processor State After Reset 16-1 

1 6.1 .2. First Instruction Executed 16-5 

1 6.2. FPU INITIALIZATION 16-5 

16.2.1. Configuring the Numerics Environment 16-6 

1 6.2.2. FPU Software Emulation 16-8 

i 



CONTENTS 



Page 

16.3. CACHE ENABLING 16-9 

16.4. SOFTWARE INITIALIZATION IN REAL-ADDRESS MODE 16-9 

1 6.4. 1 . System Tables 16-10 

16.4.2. NMI Interrupt 16-10 

16.5. SOFTWARE INITIALIZATION IN PROTECTED MODE 16-10 

1 6.5.1 . System Tables 16-10 

16.5.2. Interrupts 16-11 

16.5.3. Paging 16-11 

16.5.4. Tasks 16-12 

1 6.5.5. TLB, BTB and Cache Testing 16-12 

1 6.6. MODE SWITCHING 16-12 

1 6.6.1 . Switching to Protected Mode 16-12 

1 6.6.2. Switching Back to Real-Address Mode 16-13 

1 6.7. INITIALIZATION AND MODE SWITCHING EXAMPLE 16-14 

16.7.1. Goal of this Example 16-14 

1 6.7.2. Memory Layout Following Reset 16-14 

16.7.3. The Algorithm 16-15 

16.7.4. Tool Usage 16-17 

1 6.7.5. STARTUP.ASM Listing 16-18 

1 6.7.6. MAIN.ASM Source Code 1 6-25 

1 6.7.7. Supporting Files 1 6-28 

CHAPTER 17 
DEBUGGING 

1 7.1 . DEBUGGING SUPPORT 1 7-1 

1 7.2. DEBUG REGISTERS 1 7-2 

17.2.1. Debug Address Registers (DR0-DR3) 17-3 

1 7.2.2. Debug Control Register (DR7) 1 7-3 

1 7.2.3. Debug Status Register (DR6) 17-4 

1 7.2.4. Debug Registers DR4 and DR5 1 7-5 

1 7.2.5. Breakpoint Field Recognition 17-5 

1 7.3. DEBUG EXCEPTIONS 17-6 

17.3.1. Interrupt 1— Debug Exceptions 17-6 

1 7.3.1 .1 . INSTRUCTION-BREAKPOINT FAULT 1 7-7 

17.3.1.2. DATA MEMORY AND I/O BREAKPOINTS 17-8 

17.3.1.3. GENERAL-DETECT FAULT 17-8 

1 7.3.1 .4. SINGLE-STEP TRAP 1 7-8 

17.3.1.5. TASK-SWITCH TRAP 17-9 

1 7.3.2. Interrupt 3— Breakpoint Instruction 17-9 

CHAPTER 18 

CACHING, PIPELINING AND BUFFERING 

18.1 . INTERNAL INSTRUCTION AND DATA CACHES 18-1 

18.1.1. Data Cache 18-2 

1 8.1 .2. Data Cache Update Policies 18-2 

18.1.3. Instruction Cache 18-3 

1 8.2. OPERATION OF THE INTERNAL CACHES 18-3 

18.2.1. Cache Control Bits 18-3 

1 8.2.2. Cache Management Instructions 18-4 

1 8.2.3. Self-Modifying Code 1 8-5 

1 8.3. PAGE-LEVEL CACHE MANAGEMENT 1 8-5 

18.3.1. PCD Bit 18-6 

i 



into! CONTENTS 



Page 

18.3.2. PWTBit ; 18-6 

18.4. ADDRESS TRANSLATION CACHES 18-6 

1 8.5. CACHE REPLACEMENT ALGORITHM 18-6 

1 8.6. EXECUTION PIPELINING AND PAIRING 1 8-6 

18.7. WRITE BUFFERS 18-7 

1 8.8. SERIALIZING INSTRUCTIONS 1 8-7 

CHAPTER 19 
MULTIPROCESSING 

19.1. LOCKED BUS CYCLES 1 9-1 

19.1.1. LOCK Prefix and the LOCK# Signal 19-2 

19.1.2. Automatic Locking 19-2 

1 9.2. MEMORY ACCESS ORDERING 1 9-3 

CHAPTER 20 

SYSTEM MANAGEMENT MODE 

20.1. THE SMI INTERRUPT 20-1 

20.2. SMM INITIAL STATE 20-3 

20.2.1 . System Management Mode Execution 20-3 

20.3. SMRAM PROCESSOR STATE DUMP FORMAT 20-4 

20.3.1 . System Management Mode Revision Identifier (Offset FEFCH) 20-6 

20.3.2. I/O Trap Restart (Offset FF00H) 20-7 

20.3.3. Halt Auto Restart (Offset FF02H) 20-7 

20.3.4. State Dump Base (Offset FEF8H) 20-7 

20.4. RELOCATING SMRAM 20-8 

20.5. RETURNING FROM SMM 20-8 

PART III— COMPATIBILITY 
CHAPTER 21 

MIXING 16-BIT AND 32-BIT CODE 

21.1. USING 1 6-BIT AND 32-BIT ENVIRONMENTS 21 -1 

21.2. MIXING 16-BIT AND 32-BIT OPERATIONS 21-2 

21 .3. SHARING DATA AMONG MIXED-SIZE CODE SEGMENTS 21-3 

21 .4. TRANSFERRING CONTROL AMONG MIXED-SIZE CODE SEGMENTS 21-3 

21 .4.1 . Size of Code-Segment Pointer 21-4 

21 .4.2. Stack Management for Control Transfer 21 -4 

21.4.2.1. CONTROLLING THE OPERAND SIZE FOR A CALL 21-6 

21 .4.2.2. CHANGING SIZE OF A CALL 21 -6 

21 .4.3. Interrupt Control Transfers 21-6 

21 .4.4. Parameter Translation 21-7 

21 .4.5. The Interface Procedure 21-7 

CHAPTER 22 
VIRTUAL-8086 MODE 

22.1 . EXECUTING 8086 CPU CODE 22-1 

22.1 .1 . Registers and Instructions 22-1 

22. 1 .2. Address Translation 22-2 

22.2. STRUCTURE OF A VIRTUAL-8086 TASK 22-3 

22.2. 1 . Paging for Virtual-8086 Tasks 22-4 

22.2.2. Protection within a Virtual-8086 Task 22-5 

i 



CONTENTS 



Page 

22.3. ENTERING AND LEAVING VIRTUAL-8086 MODE 22-5 

22.3.1. Transitions Through Task Switches 22-7 

22.3.2. Transitions Through Trap Gates and Interrupt Gates 22-8 

22.4. SENSITIVE INSTRUCTIONS 22-9 

22.5. VIRTUAL INTERRUPT SUPPORT 22-9 

22.6. EMULATING 8086 OPERATING SYSTEM CALLS 22-1 

22.7. VIRTUAL I/O ..22-10 

22.7.1. I/O-Mapped I/O 22-11 

22.7.2. Memory-Mapped I/O 22-1 1 

22.7.3. Special I/O Buffers 22-1 1 

22.8. DIFFERENCES FROM 8086 CPU 22-11 

22.9. DIFFERENCES FROM Intel 286 CPU 22-1 4 

22.9.1. Privilege Level 22-15 

22.9.2. Bus Lock 22-15 

22.10. DIFFERENCES FROM Intel386 AND Intel486 CPU'S 22-16 

CHAPTER 23 
COMPATIBILITY 

23.1. RESERVED BITS 23-1 

23.2. INTEGER UNIT 23-1 

23.2. 1 . New Functions and Modes 23-2 

23.2.2. Serializing Instructions 23-2 

23.2.3. Detecting the Presence of New Features 23-2 

23.2.4. Undefined Opcodes 23-3 

23.2.5. Clock Counts 23-3 

23.2.6. Initialization and Reset 23-3 

23.2.6.1 . INTEGER UNIT INITIALIZATION AND RESET 23-3 

23.2.6.2. FPU/NPX INITIALIZATION AND RESET 23-3 

23.2.6.3. Intel486 SX MICROPROCESSOR AND Intel487 SX MATH COPROCESSOR 
INITIALIZATION 23-5 

23.2.7. New Instructions 23-7 

23.2.7.1 . NEW PENTIUM PROCESSOR INSTRUCTIONS 23-7 

23.2.7.2. NEW Intel486 PROCESSOR INSTRUCTIONS 23-7 

23.2.7.3. NEW Intel386 PROCESSOR INSTRUCTIONS 23-8 

23.2.8. Obsolete Instructions 23-8 

23.2.9. Flags 23-8 

23.2.9.1 . NEW PENTIUM PROCESSOR FLAGS 23-9 

23.2.9.2. NEW Intel486 PROCESSOR FLAGS 23-9 

23.2.10. Control Registers 23-9 

23.2.1 0.1 . PENTIUM PROCESSOR CONTROL REGISTERS 23-1 

23.2.10.2. Intel486 PROCESSOR CONTROL REGISTERS 23-11 

23.2.11. Debug Registers 23-13 

23.2.11.1. DIFFERENCES IN DR6 23-13 

23.2.1 1 .2. DIFFERENCES IN DR7 23-1 3 

23.2.1 1 .3. DEBUG REGISTERS 4 AND 5 23-13 

23.2.12. Test Registers 23-13 

23.2.1 3. Model Specific Registers 23-1 4 

23.2.14. Exceptions 23-14 

23.2.14.1. NEW PENTIUM PROCESSOR EXCEPTIONS 23-14 

23.2.14.2. NEW Intel486 PROCESSOR EXCEPTIONS 23-15 

23.2.14.3. NEW Intel386 PROCESSOR EXCEPTIONS 23-15 

23.2.14.4. INTERRUPT PROPAGATION DELAY 23-15 

xii ■ 



CONTENTS 



Page 



23.2.1 4.5. PRIORITY OF EXCEPTIONS 23-1 5 

23.2.14.6. DIVIDE-ERROR EXCEPTIONS 23-16 

23.2.14.7. WRITES USING THE CS REGISTER PREFIX 23-16 

23.2.14.8. NMI INTERRUPTS 23-16 

23.2.1 4.9. INTERRUPT VECTOR TABLE LIMIT 23-1 6 

23.2.15. Descriptor Types and Contents 23-16 

23.2.16. Changes in Segment Descriptor Loads 23-17 

23.2.17. Task Switching and Task State Segments 23-17 

23.2.17.1. PENTIUM PROCESSOR TASK STATE SEGMENTS 23-17 

23.2.1 7.2. TSS SELECTOR WRITES 23-1 7 

23.2.17.3. ORDER OF READS/WRITES TO THE TSS 23-17 

23.2.17.4. USING A 16-BIT TSS WITH 32-BIT CONSTRUCTS 23-17 

23.2.17.4.1. Differences In I/O Map Base Addresses 23-18 

23.2.17.4.2. Caching, Pipelining, Prefetching 23-18 

23.2.1 7.5. SELF MODIFYING CODE WITH CACHE ENABLED 23-1 9 

23.2.18. Paging 23-19 

23.2.18.1. PENTIUM PROCESSOR PAGING 23-20 

23.2.18.2. Intel486 PROCESSOR PAGING 23-20 

23.2.18.3. ENABLING AND DISABLING PAGING 23-20 

23.2.19. Stack Operations 23-20 

23.2.19.1. PUSH SP 23-20 

23.2.1 9.2. FLAGS PUSHED ON THE STACK 23-21 

23.2.19.3. SELECTOR PUSHES/POPS 23-21 

23.2.19.4. ERROR CODE PUSHES 23-21 

23.2.19.5. FAULT HANDLING EFFECTS ON THE STACK 23-22 

23.2.19.6. INTERLEVEL RET/I RET FROM A 16-BIT INTERRUPT OR CALL GATE 23-22 

23.2.20. Mixing 1 6- and 32-Bit Segments 23-22 

23.2.21 . Segment and Address Wraparound 23-23 

23.2.21 .1 . SEGMENT WRAPAROUND 23-23 

23.2.22. Write Buffers and Memory Ordering 23-23 

23.2.23. Bus Locking 23-24 

23.2.24. Bus Hold 23-25 

23.2.25. Two Ways to Run Intel 286 CPU Tasks 23-25 

23.3. FLOATING-POINT UNIT 23-25 

23.3.1. Control Register Bits 23-26 

23.3.1 .1 . EXTENSION TYPE (ET) BIT 23-26 

23.3.1.2. NUMERIC EXCEPTION (NE) BIT 23-26 

23.3.1 .3. MONITOR COPROCESSOR (MP) BIT 23-26 

23.3.1.4. FPU STATUS WORD 23-26 

23.3.1 .5. CONTROL WORD 23-27 

23.3.1 .6. TAG WORD 23-27 

23.3.2. Data Types 23-28 

23.3.2.1. NaN'S 23-28 

23.3.2.2. PSEUDOZERO, PSEUDO-NaN, PSEUDEOINFINITY, AND UNNORMAL 
FORMATS 23-28 

23.3.3. Exceptions 23-29 

23.3.3.1. DENORMAL EXCEPTIONS 23-29 

23.3.3.2. OVERFLOW EXCEPTIONS 23-29 

23.3.3.3. UNDERFLOW EXCEPTIONS 23-30 

23.3.3.4. EXCEPTION PRECEDENCE 23-30 

23.3.3.5. CS AND IP FOR FPU EXCEPTIONS 23-30 

23.3.3.6. FPU ERROR SIGNALS 23-31 

23.3.3.7. INVALID OPERATION ON DENORMALS 23-31 



xiii 



CONTENTS 



Page 

23.3.3.8. ALIGNMENT EXCEPTIONS 23-31 

23.3.3.9. SEGMENT FAULT DURING FLDENV 23-31 

23.3.3.1 0. INTERRUPT 7 — DEVICE NOT AVAILABLE 23-31 

23.3.3.1 1 . INTERRUPT 9 — COPROCESSOR SEGMENT OVERRUN 23-32 

23.3.3.12. INTERRUPT 13 — GENERAL PROTECTION 23-32 

23.3.3.1 3. INTERRUPT 1 6 — FLOATING-POINT ERROR 23-32 

23.3.4. Instructions 23-32 

23.3.5. Transcendental Instructions 23-34 

23.3.6. Obsolete Instructions 23-35 

23.3.6.1. WAIT PREFIX DIFFERENCES 23-35 

23.3.6.2. OPERANDS SPLIT ACROSS SEGMENTS/PAGES 23-35 

23.3.6.3. FPU INSTRUCTION SYNCHRONIZATION 23-35 

PART IV— OPTIMIZATION 

CHAPTER 24 
OPTIMIZATION 

24.1. ADDRESSING MODES AND REGISTER USAGE 24-1 

24.2. ALIGNMENT 24-2 

24.2.1 . Code Alignment 24-2 

24.2.2. Data Alignment 24-2 

24.3. PREFIXED OPCODES 24-3 

24.4. OPERAND AND REGISTER USAGE 24-3 

24.5. INTEGER INSTRUCTION SELECTION 24-3 

PART V— INSTRUCTION SET 

CHAPTER 25 
INSTRUCTION SET 

25.1. OPERAND-SIZE AND ADDRESS-SIZE ATTRIBUTES 25-1 

25.1 .1 . Default Segment Attribute 25-1 

25.1 .2. Operand-Size and Address-Size Instruction Prefixes 25-1 

25.1 .3. Address-Size Attribute for Stack 25-2 

25.2. INSTRUCTION FORMAT 25-2 

25.2.1 . ModR/M and SIB Bytes 25-4 

25.2.2. How to Read the Instruction Set Pages 25-9 

25.2.2.1. OPCODE COLUMN 25-9 

25.2.2.2. INSTRUCTION COLUMN 25-1 

25.2.2.3. CLOCKS COLUMN 25-1 2 

25.2.2.4. DESCRIPTION COLUMN 25-1 3 

25.2.2.5. OPERATION 25-1 3 

25.2.2.6. DESCRIPTION 25-1 7 

25.2.2.7. FLAGS AFFECTED 25-1 7 

25.2.2.8. PROTECTED MODE EXCEPTIONS 25-1 7 

25.2.2.9. REAL ADDRESS MODE EXCEPTIONS 25-1 8 

25.2.2.10. VIRTUAL-8086 MODE EXCEPTIONS 25-18 

AAA— ASCII Adjust after Addition 25-19 

AAD— ASCII Adjust AX before Division 25-20 

AAM— ASCII Adjust AX after Multiply 25-21 

AAS— ASCII Adjust AL after Subtraction 25-22 

ADC— Add with Carry 25-23 

xiv ■ 



CONTENTS 



Page 

ADD— Add 25-24 

AND— Logical AND 25-25 

ARPL— Adjust RPL Field of Selector 25-26 

BOUND— Check Array Index Against Bounds 25-27 

BSF— Bit Scan Forward 25-29 

BSR— Bit Scan Reverse 25-30 

BSWAP— Byte Swap 25-32 

BT — Bit Test 25-33 

BTC— Bit Test and Complement 25-35 

BTR— Bit Test and Reset 25-37 

BTS— Bit Test and Set 25-39 

CALL— Call Procedure 25-41 

CBW/CWDE— Convert Byte to Word/Convert Word to Doubleword 25-47 

CDQ— Convert Double to Quad 25-48 

CLC— Clear Carry Flag 25-49 

CLD— Clear Direction Flag 25-50 

CLI— Clear Interrupt Flag 25-51 

CLTS— Clear Task-Switched Flag in CRO 25-53 

CMC— Complement Carry Flag 25-54 

CMP— Compare Two Operands 25-55 

CMPS/CMPSB/CMPSW/CMPSD— Compare String Operands 25-56 

CMPXCHG— Compare and Exchange 25-58 

CMPXCHG8B— Compare and Exchange 8 Bytes 25-60 

CPUID— CPU Identification 25-62 

CWD/CDQ— Convert Word to Double/Convert Double to Quad 25-64 

CWDE— Convert Word to Doubleword 25-65 

DAA— Decimal Adjust AL after Addition 25-66 

DAS— Decimal Adjust AL after Subtraction 25-67 

DEC— Decrement by 1 25-68 

DIV— Unsigned Divide 25-69 

ENTER — Make Stack Frame for Procedure Parameters 25-71 

F2XM1— Compute 2x-1 25-73 

FABS— Absolute Value 25-74 

FADD/FADDP/FIADD — Add 25-75 

FBLD— Load Binary Coded Decimal 25-77 

FBSTP— Store Binary Coded Decimal and Pop 25-79 

FCHS— Change Sign 25-80 

FCLEX/FNCLEX— Clear Exceptions 25-81 

FCOM/FCOMP/FCOMPP— Compare Real 25-82 

FCOS— Cosine 25-84 

FDECSTP— Decrement Stack-Top Pointer 25-85 

FDIV/FDIVP/FIDIV— Divide 25-86 

FDIVR/FDIVRP/FIDIVR— Reverse Divide 25-88 

FFREE— Free Floating-Point Register 25-90 

FICOM/FICOMP— Compare Integer 25-91 

FILD— Load Integer 25-93 

FINCSTP— Increment Stack-Top Pointer 25-94 

FINIT/FNINIT— Initialize Floating-Point Unit 25-95 

FIST/FISTP— Store Integer 25-97 

FLD— Load Real 25-99 

FLD1/FLDL2T/FLDL2E/FLDPI/FLDLG2/FLDLN2/FLDZ — Load Constant 25-101 

FLDCW— Load Control Word 25-102 

FLDENV— Load FPU Environment 25-103 

| xv 



CONTENTS 



Page 

FMUL/FMULP/FIMUL— Multiply 25-105 

FNOP— No Operation 25-1 06 

FPATAN— Partial Arctangent 25-1 07 

FPREM— Partial Remainder 25-108 

FPREM1— Partial Remainder 25-110 

FPTAN— Partial Tangent 25-1 1 2 

FRNDINT— Round to Integer 25-114 

FRSTOR— Restore FPU State 25-1 15 

FSAVE/FNSAVE— Store FPU State 25-1 1 7 

FSCALE— Scale 25-119 

FSIN— Sine 25-120 

FSINCOS— Sine and Cosine 25-121 

FSQRT— Square Root 25-123 

FST/FSTP— Store Real 25-124 

FSTCW/FNSTCW— Store Control Word 25-126 

FSTENV/FNSTENV— Store FPU Environment 25-127 

FSTSW/FNSTSW— Store Status Word 25-129 

FSUB/FSUBP/FISUB— Subtract 25-1 31 

FSUBR/FSUBRP/FISUBR— Reverse Subtract 25-1 32 

FTST— TEST 25-133 

FUCOM/FUCOMP/FUCOMPP— Unordered Compare Real 25-135 

FWAIT— Wait 25-137 

FXAM— Examine 25-138 

FXCH— Exchange Register Contents 25-140 

FXTRACT— Extract Exponent and Significand 25-142 

FYL2X— Compute y x log2x 25-144 

FYL2XP1— Compute y x log2(x +1) 25-145 

HLT— Halt 25-146 

IDIV — Signed Divide 25-147 

IMUL— Signed Multiply 25-149 

IN— Input from Port 25-151 

INC— Increment by 1 25-153 

INS/INSB/INSW/INSD— Input from Port to String 25-154 

INT/INTO— Call to Interrupt Procedure 25-156 

INVD— Invalidate Cache 25-163 

INVLPG— Invalidate TLB Entry 25-165 

IRET/IRETD— Interrupt Return 25-166 

Jcc — Jump if Condition is Met 25-171 

JMP— Jump 25-174 

LAHF— Load Flags into AH Register 25-179 

LAR— Load Access Rights Byte 25-180 

LDS/LES/LFS/LGS/LSS— Load Full Pointer 25-182 

LEA— Load Effective Address 25-184 

LEAVE— High Level Procedure Exit 25-186 

LES— Load Full Pointer 25-187 

LFS— Load Full Pointer 25-188 

LGDT/LIDT— Load Global/Interrupt Descriptor Table Register 25-189 

LGS— Load Full Pointer 25-191 

LLDT— Load Local Descriptor Table Register 25-192 

LIDT — Load Interrupt Descriptor Table Register 25-193 

LMSW— Load Machine Status Word 25-194 

LOCK— Assert LOCK# Signal Prefix 25-195 

LODS/LODSB/LODSW/LODSD— Load String Operand 25-196 

xvi ■ 



CONTENTS 



Page 

LOOP/LOOPcond— Loop Control with CX Counter 25-198 

LSL— Load Segment Limit 25-200 

LSS— Load Full Pointer 25-202 

LTR— Load Task Register 25-203 

MOV— Move Data 25-204 

MOV— Move to/from Control Registers 25-206 

MOV— Move to/from Debug Registers 25-207 

MOVS/MOVSB/MOVSW/MOVSD— Move Data from String to String 25-209 

MOVSX— Move with Sign-Extend 25-21 1 

MOVZX— Move with Zero-Extend 25-212 

MUL— Unsigned Multiplication of AL, AX, or EAX 25-213 

NEG — Two's Complement Negation 25-215 

NOP— No Operation 25-21 6 

NOT— One's Complement Negation 25-217 

OR— Logical Inclusive OR 25-21 8 

OUT— Output to Port 25-21 9 

OUTS/OUTSB/OUTSW/OUTSD— Output String to Port 25-221 

POP— Pop a Word from the Stack 25-224 

POPA/POPAD— Pop all General Registers 25-227 

POPF/POPFD— Pop Stack into FLAGS or EFLAGS Register 25-229 

PUSH— Push Operand onto the Stack 25-231 

PUSHA/PUSHAD— Push all General Registers 25-233 

PUSHF/PUSHFD— Push Flags Register onto the Stack 25-235 

RCL/RCR/ROL/ROR— Rotate 25-237 

RDMSR— Read from Model Specific Register 25-240 

REP/REPE/REPZ/REPNE/REPNZ— Repeat Following String Operation 25-242 

RET— Return from Procedure 25-245 

ROL/ROR— Rotate 25-249 

RSM— Resume from System Management Mode 25-250 

SAHF— Store AH into Flags 25-251 

SAL/SAR/SHL/SHR— Shift Instructions 25-252 

SBB— Integer Subtraction with Borrow 25-255 

SCAS/SCASB/SCASW/SCASD— Compare String Data 25-257 

SETcc— Byte Set on Condition 25-259 

SGDT/SIDT— Store Global/Interrupt Descriptor Table Register 25-261 

SHL/SHR— Shift Instructions 25-263 

SHLD— Double Precision Shift Left 25-264 

SHRD— Double Precision Shift Right 25-266 

SIDT— Store Interrupt Descriptor Table Register 25-268 

SLDT— Store Local Descriptor Table Register 25-269 

SMSW— Store Machine Status Word 25-270 

STC— Set Carry Flag 25-271 

STD— Set Direction Flag 25-272 

STI— Set Interrupt Flag 25-273 

STOS/STOSB/STOSW/STOSD— Store String Data 25-275 

STR— Store Task Register 25-277 

SUB— Integer Subtraction 25-278 

TEST— Logical Compare 25-280 

VERR, VERW— Verify a Segment for Reading or Writing 25-281 

WAIT— Wait 25-283 

WBINVD— Write-Back and Invalidate Cache 25-284 

WRMSR— Write to Model Specific Register 25-286 

XADD— Exchange and Add 25-288 



xvii 



CONTENTS 



Page 

XCHG — Exchange Register/Memory with Register 25-289 

XLAT/XLATB— Table Look-up Translation 25-290 

XOR— Logical Exclusive OR 25-291 



APPENDIX A 
OPCODE MAP 

APPENDIX B 

FLAG CROSS-REFERENCE 
APPENDIX C 

STATUS FLAG SUMMARY 



APPENDIX D 
CONDITION CODES 

APPENDIX E 

NUMERIC EXCEPTION SUMMARY 



APPENDIX F 

INSTRUCTION FORMAT AND TIMING 
APPENDIX G 

REPORT ON TRANSCENDENTAL FUNCTIONS 



APPENDIX H 
ADVANCED FEATURES 

GLOSSARY 



Figures 



Figure Title Page 

1-1. Bit and Byte Order 1-6 

3-1 . Segmented Addressing 3-3 

3-2. Fundamental Data Types 3-4 

3-3. Bytes, Words, Doublewords and Quadwords in Memory 3-5 

3-4. Data Types 3-7 

3-5. Application Register Set 3-9 

3-6. An Unsegmented Memory 3-11 

3-7. A Segmented Memory 3-11 

3-8. Stacks 3-13 

3-9. EFLAGS Register 3-14 

3- 1 0. Effective Address Computation 3-21 

4- 1. PUSH Instruction 4-2 

4-2. PUSHA Instruction 4-3 

4-3. POP Instruction 4-4 

4-4. POPA Instruction 4-4 



xviii 



CONTENTS 



Figure Title Page 

4-5. Sign Extension 4-5 

4-6. SHL/SAL Instruction 4-13 

4-7. SHR Instruction 4-14 

4-8. SAR Instruction 4-14 

4-9. SHLD Instruction 4-15 

4-1 0. SHRD Instruction 4-16 

4-11. ROL Instruction 4-17 

4-12. ROR Instruction 4-17 

4-13. RCL Instruction 4-17 

4-14. RCR Instruction 4-17 

4-1 5. Nested Procedures 4-31 

4-1 6. Stack Frame After Entering MAIN 4-32 

4-1 7. Stack Frame After Entering PROCEDURE A 4-33 

4-1 8. Stack Frame After Entering PROCEDURE B 4-34 

4-1 9. Stack Frame After Entering PROCEDURE C 4-35 

4-20. Low Byte of EFLAGS Register 4-37 

4-21 . Flags Used with PUSHF and POPF 4-37 

4-22. EAX Following the CPUID Instruction 4-44 

6-1 . Floating-point Unit Register Set 6-8 

6-2. FPU Status Word 6-9 

6-3. FPU Control Word Format 6-12 

6-4. Tag Word Format 6-1 3 

6-5. Opcode Field 6-14 

6-6. Protected Mode Numeric Instruction and Data Pointer Image in Memory, 

32-Bit Format 6-15 

6-7. Real Mode Numeric Instruction and Data Pointer Image in Memory, 

32-Bit Format 6-16 

6-8. Protected Mode Numeric Instruction and Data Pointer Image in Memory, 

16-Bit Format 6-16 

6-9. Real Mode Numeric Instruction and Data Pointer Image in Memory, 

16-Bit Format 6-17 

6-10. Double-Precision Number System 6-18 

6-11. Numerical Data Formats 6-20 

6- 12. Instructions and Register Stack 6-42 

7- 1 . Arithmetic Example Using Infinity 7-20 

8- 1 . Relationships Between Adjacent Joints 8-23 

9- 1 . 8086 Address Translation 9-1 

10- 1. System Flags 10-2 

1 0-2. Memory Management Registers 10-4 

1 0-3. Control Registers 10-6 

1 0- 4. Debug Registers 10-9 

11- 1. Flat Model 11-3 

11-2. Protected Flat Model 11-4 

11-3. Multisegment Model 11-6 

1 1 -4. Tl Bit Selects Descriptor Table 118 

11-5. Segment Translation 1 1 -9 

1 1 -6. Segment Registers 11-9 

1 1 -7. Segment Selector 11-10 

1 1 -8. Segment Descriptors 11-12 

1 1 -9. Segment Descriptor (Segment Not Present) 11-15 

11-10. Descriptor Tables 11-16 

11-11. Pseudo-Descriptor Format 11-17 

11-12. Format of a Linear Address 11-18 



xix 



CONTENTS 



Figure Title Page 

11-13. Page Translation 11-19 

11-14. Format of Page Directory and Page Table Entries for 4K Pages 1 1 -20 

11-15. Format of a Page Table Entry for a Not-Present Page 11-21 

11-16. Combined Segment and Page Address Translation 1 1 -24 

11- 17. Each Segment Can Have Its Own Page Table 1 1 -25 

12- 1. Descriptor Fields Used for Protection 12-3 

1 2-2. Protection Rings 12-7 

12-3. Privilege Check for Data Access 12-8 

1 2-4. Privilege Check for Control Transfer Without Gate 12-10 

12-5. Call Gate 12-11 

12-6. Call Gate Mechanism 12-12 

1 2-7. Privilege Check for Control Transfer with Call Gate 12-13 

1 2-8. Initial Stack Pointers in a TSS 12-15 

1 2-9. Stack Frame During Interlevel Call 12-16 

12- 10. Protection Fields of a Page Table Entry 12-22 

13- 1. 32-Bit Task State Segment 1 3-3 

13-2. TSS Descriptor 13-4 

13-3. Task Register 13-6 

1 3-4. Task Gate Descriptor 13-7 

1 3-5. Task Gates Reference Tasks 13-8 

13-6. Nested Tasks 13-12 

1 3- 7. Overlapping Linear-to-Physical Mappings 13-15 

14- 1 . IDTR Locates IDT in Memory 14-6 

14-2. IDT Gate Descriptors 14-8 

14-3. Interrupt Procedure Call 14-9 

1 4-4. Stack Frame after Exception or Interrupt 14-10 

14-5. Interrupt Task Switch 14-12 

14-6. Error Code 14-13 

1 4- 7. Page Fault Error Code 14-21 

15- 1. Memory-Mapped I/O 15-3 

1 5- 2. I/O Permission Bit Map 15-7 

16- 1 . Contents of the EDX Register After Reset 16-2 

1 6-2. Contents of CRO Register After Reset 1 6-3 

1 6-3. Processor State After Reset 16-15 

16-4. Constructing Temp_GDT and Switching to Protected Mode (Lines 162-172 of 

List File) 16-26 

16-5. Moving The GDT, IDT, and TSS from ROM to RAM (Lines 196-261 of 

List File) 16-27 

1 6- 6. Task Switching (Lines 282-296 of List File) 1 6-28 

1 7- 1 . Debug Registers 17-3 

20-1 . Mode State Transitions 20-2 

20-2. SMM Revision Identifier 20-6 

21 -1 . Stack After Far 1 6- and 32-Bit Calls 21-5 

22-1 . 8086 Address Translation 22-3 

22-2. Entering and Leaving Virtual-8086 Mode 22-6 

22- 3. Privilege Level Stack After Interrupt in Virtual-8086 Mode 22-7 

23- 1 . Pentium™ Processor EFLAGS Register 23-9 

23-2. Control Register Extensions 23-1 

23-3. I/O Map Base Address Differences 23-1 8 

25-1 . Instruction Format 25-3 

25-2. ModR/M and SIB Byte Formats 25-5 

25-3. Bit Offset for BIT[EAX,21 ] 25-1 6 

25-4. Memory Bit Indexing 25-1 6 

xx H 



CONTENTS 



Figure Title Page 

F-1. General Instruction Format F-1 

G-1 . Scatterplot for FSIN (3FBB-403E) G-5 

G-2. Scatterplot for FSIN (3FFC-3FFD) G-5 

G-3. Scatterplot for FSIN (3FFE-3FFF) G-6 

G-4. Scatterplot for FSIN (4000-4002) G-6 

G-5. Scatterplot for FCOS (3FBB-403E) G-7 

G-6. Scatterplot for FCOS (3FFC-3FFD) G-7 

G-7. Scatterplot for FCOS (3FFE-3FFF) G-8 

G-8. Scatterplot for FCOS (4000-4002) G-8 

G-9. Scatterplot for FSINCOS (SIN, 3FBB-403E) G-9 

G-1 0. Scatterplot for FSINCOS (COS, 3FBB-403E) G-9 

G-1 1 . Scatterplot for FPTAN (3FDD-403E) G-1 

G-1 2. Scatterplot for FPTAN (3FE4-3FFA) G-1 

G-1 3. Scatterplot for FPTAN (3FFB-4008) G-1 1 

G-1 4. Scatterplot for FYL2X (0001 -7FFD) G-1 1 

G-1 5. Scatterplot for FYL2X (3FFF-3FFF) G-1 2 

G-1 6. Scatterplot for FYL2XP1 (0001 -3FFE) G-1 2 

G-1 7. Scatterplot for FYL2XP1 (3FBE-3FC5) G-1 3 

G-1 8. Scatterplot for FYL2XP1 (3FEB-3FFE) G-1 3 

G-1 9. Scatterplot for F2XM1 (0001-3FFE) G-1 4 

G-20. Scatterplot for F2XM1 (3FBA-3FFE) G-1 4 

G-21 . Scatterplot for F2XM1 (3FFD-3FFE) G-1 5 

G-22. Scatterplot for FPATAN (0001 -7FFD) G-1 5 

Tables 

Table Title Page 

3-1 . Register Names 3-10 

3-2. Status Flags 3-14 

3-3. Default Segment Selection Rules 3-20 

3- 4. Exceptions and Interrupts 3-24 

4- 1 . Operands for Division 4-9 

4-2. Bit Test and Modify Instructions 4-1 1 

4-3. Conditional Jump Instructions.... 4-24 

4-4. Repeat Instructions 4-27 

4-5. Flag Control Instructions 4-36 

6-1 . Numeric Processing Speed Comparisons 6-2 

6-2. Numeric Data Types 6-6 

6-3. Principal Numeric Instructions 6-6 

6-4. Condition Code Interpretation 6-11 

6-5. Correspondence Between FPU and IU Flag Bits 6-12 

6-6. Summary of Format Parameters 6-21 

6-7. Real Number Notation 6-22 

6-8. Rounding Modes 6-24 

6-9. Data Transfer Instructions 6-25 

6-1 0. Nontranscendental Instructions (Besides Arithmetic) 6-26 

6-11. Basic Arithmetic Instructions and Operands 6-27 

6-1 2. Comparison Instructions 6-28 

6-1 3. TEST Constants for Conditional Branching 6-29 

6-14. Transcendental Instructions 6-29 

6-1 5. Constant Instructions 6-31 

6-1 6. Control Instructions 6-32 

■ xxi 



CONTENTS 



Table Title Page 

6-1 7. PL/M-386/486 Built-in Procedures 6-35 

6-1 8. ASM386/486 Storage Allocation Directives 6-36 

6- 1 9. Addressing Method Examples 6-38 

7- 1 . Arithmetic and Nonarithmetic Instructions 7-2 

7-2. Binary Integer Encodings 7-3 

7-3. Packed Decimal Encodings 7-4 

7-4. Single and Double Real Encodings 7-5 

7-5. Extended Real Encodings 7-6 

7-6. Unsupported Formats 7-7 

7-7. Denormalized Values 7-8 

7-8. Zero Operands and Results 7-10 

7-9. Infinity Operands and Results 7-1 3 

7-1 0. Rules for Generating QNaNs 7-17 

7-1 1 . Masked Responses to Invalid Operations 7-22 

7-1 2. Masked Overflow Results 7-25 

7-1 3. Transcendental Core Ranges 7-26 

9-1 . Exceptions and Interrupts 9-4 

11- 1. Application Segment Types 11-14 

12- 1. System Segment and Gate Types 12-4 

1 2-2. Interlevel Return Checks 12-18 

1 2-3. Valid Descriptor Types for LSL Instruction 1 2-20 

12- 4. Combined Page Directory and Page Table Protection 12-24 

1 3- 1 . Checks Made during a Task Switch 1 3-1 1 

1 3- 2. Effect of a Task Switch on Busy, NT, and Link Fields 13-12 

14- 1. Exception and Interrupt Vectors 1 4-2 

14-2. Priority Among Simultaneous Exceptions and Interrupts 14-5 

14-3. Interrupt and Exception Classes 14-16 

1 4-4. Double Fault Conditions 14-16 

1 4-5. Invalid TSS Conditions 14-17 

1 4-6. Alignment Requirements by Data Type 1 4-25 

14-7. Exception Summary 14-26 

1 4- 8. Error Code Summary 1 4-28 

15- 1. I/O Serialization 15-9 

1 6- 1 . Processor State Following Reset 1 6-4 

1 6-2. FPU State Following FINIT or FNINIT 1 6-6 

1 6-3. EM and MP Bits Interpretations 16-7 

1 6-4. Recommended Values by Processor 1 6-7 

1 6-5. Action Taken for Different Combinations of EM, MP, and TS 16-8 

1 6-6. Software Emulation Settings 16-9 

1 6-7. The Algorithm and Related Listing Line Numbers 16-16 

1 6- 8. Relationship Between BLD Item and ASM Source File 16-18 

1 7- 1 . Breakpointing Examples 17-6 

1 7- 2. Debug Exception Conditions 1 7-7 

18- 1. MESI Cache Line States 1 8-2 

1 8-2. Cache Operating Modes 18-4 

20-1 . SMM Initial State 20-3 

20-2. State Dump Format 20-5 

20-3. State Disposition 20-6 

20-4. SMM Revision Identifier 20-7 

20-5. Halt Auto Restart 20-7 

22- 1 . Software Interrupt Operation 22-1 

23- 1 . Processor State Following Power-Up 23-4 

23-2. FPU and NPX State Following Power-Up 23-5 

xxii ■ 



CONTENTS 



Table Title Page 

23-3. Recommended Values of the FP Related Bits for Intel486™ SX 

Microprocessor/lntel487™ SX Math Coprocessor System 23-6 

23-4. EM and MP Bits Interpretations 23-7 

23-5. Cache Mode Differences Between the Pentium™ and Intel486™ Processors 23-12 

25-1 . Effective Size Attributes 25-2 

25-2. 1 6-Bit Addressing Forms with the ModR/M Byte 25-6 

25-3. 32-Bit Addressing Forms with the ModR/M Byte 25-7 

25-4. 32-Bit Addressing Forms with the SIB Byte 25-8 

25-5. Task Switch Times for Exceptions 25-1 3 

25-6. Exceptions 25-18 

F-1 . Fields within Instructions F-2 

F-2. Integer Clock Count Summary F-9 

F-3. I/O Instructions Clock Count Summary F-23 

F-4. General Floating-Poing Instruction Format F-24 

F-5. Floating-Point Clock Count Summary F-26 

G-1 . Summary of Accuracy G-2 

G-2. Speed of Functions at Typical Arguments G-3 

G-3. Number of Arguments Used in Accuracy Tests G-4 

G-4. Number of Arguments Used in Monotonicity Tests G-4 

Examples 

Example Title Page 

4-1. ENTER Definition 4-30 

4- 2. ASCII Arithmetic Using BSWAP 4-41 

5- 1 . CPU Identification and FPU Detection 5-2 

6- 1 . Modifying the Tag Word 6-1 3 

6-2. Sample C Program 6-34 

6-3. Sample Numeric Constants 6-36 

6-4. Status Word Record Definition 6-37 

6-5. Structure Definition 6-38 

6-6. Sample PL/M-386/486 Program 6-39 

6-7. Sample ASM386/486 Program 6-39 

8-1 . Conditional Branching for Compares 8-1 

8-2. Conditional Branching for FXAM 8-2 

8-3. Full-State Exception Handler 8-4 

8-4. Reduced-Latency Exception Handler 8-5 

8-5. Reentrant Exception Handler 8-6 

8-6. Floating-Point to ASCII Conversion Routine 8-7 

8-7. Robot Arm Kinematics Example 8-24 

16-1. STARTUP.ASM 16-18 

16-2. MAIN.ASM 16-25 

1 6-3. Batch File to Assemble, Compile and Build the Application 1 6-28 

16-4. Build File 16-29 



xxiii 



Intel 

Getting Started 



I 



inlel 



CHAPTER 1 
GETTING STARTED 



1 .1 . HOW TO USE THIS MANUAL 

Chapter 1 provides an overview of this manual and the related Pentium™ processor 
documentation. Also included are some notational conventions regarding reserved bits, 
instruction operands, number formats, addressing and exceptions found throughout the 
manual. 

Chapter 2 provides an introduction to Intel's Pentium processor family. The remainder of this 
book presents the architecture of the Pentium processor in five parts: 

• Part I — Application and Numeric Programming 

• Part II — System Programming 

• Part III— Compatibility 

• Part IV — Optimization 

• Part V — Instruction Set 

• Appendices 

The first three parts are explanatory, showing the purpose of architectural features, developing 
terminology and concepts, and describing instructions as they relate to specific purposes or to 
specific architectural features. The remaining parts are reference material for programmers 
developing software for the Pentium processor. 

The first two parts cover the operating modes and protection mechanism of the Pentium 
processor. The distinction between application programming and system programming is 
related to the protection mechanism of the Pentium processor. One purpose of protection is to 
prevent applications from interfering with the operating system. For this reason, certain 
registers and instructions are inaccessible to application programs. The features discussed in 
Part I are those which are accessible to applications; the features in Part II are available only to 
programs running with special privileges or programs running on systems where the protection 
mechanism is not used. 

The features available to application programs in protected mode and to programs in real- 
address and virtual-8086 mode are the same. These features are described in Part I of this 
book. The additional features available to system programs in protected mode are described in 
Part II. Part III describes virtual-8086 mode, how to mix 16-bit and 32-bit code, and 
compatibility considerations. 

Part IV provides general optimization techniques for programming on Intel x86 architectures. 
For information on obtaining optimization techniques for the Pentium processor, see 
Appendix H. 



I 



1-1 



GETTING STARTED 



1.1.1. Part I — Application and Numeric Programming 

This section presents the features used by most application programmers. It includes features 
used in numeric applications which are object-code compatible with features provided by the 
Intel486™ DX processor, and the Intel487™ SX, the Intel387™ DX, and the Intel387 SX math 
coprocessors used with the Intel486 SX, Intel386™ DX and Intel386 SX processors, 
respectively. 

Chapter 3 — Basic Programming Model: This chapter introduces the models of memory 
organization, defines the data types, presents the register set used by applications, introduces 
the stack, explains string operations, defines the parts of an instruction, explains address 
calculations, and introduces interrupts and exceptions as they apply to application 
programming. 

Chapter 4 — Application Programming: Chapter 4 surveys the integer instructions 
commonly used for application programming. Instructions are considered in functionally 
related groups; for example, string instructions are considered in one section, while control- 
transfer instructions are considered in another. The concepts behind the instructions are 
explained. Details of individual instructions are deferred until Part V, the instruction- set 
reference. 

Chapter 5 — Feature Determination: This chapter discusses how to determine the CPU type 
and the presence of a math coprocessor in order to determine what features are available to an 
application. A program example is provided. 

Chapter 6 — Numeric Applications: This chapter gives an overview of the floating-point unit 
and reviews the concepts of numerical computation. The "Architecture of the Floating-Point 
Unit" section presents the floating-point registers and data types available to both applications 
and systems programmers. The "Floating-Point Instructions" section of this chapter surveys the 
instructions commonly used for numeric processing. Details of individual instructions are 
deferred until Part V, the instruction- set reference. The "Numerics Applications" section 
describes the Pentium processor's floating-point arithmetic facilities and gives short 
programming examples in both assembly language and high-level languages. 

Chapter 7 — Special Computational Situations: This chapter discusses the special values that 
can be represented in the real formats of the Pentium processor — denormal numbers, zeros, 
infinities, NaNs (Not a Number) — as well as the numerical exceptions. 

Chapter 8 — Numeric Programming Examples: Chapter 8 provides detailed examples of 
assembly-language numeric programming with the Pentium processor, including conditional 
branching, conversion between floating-point values and their ASCII representations, and use 
of trigonometric functions. 



1 .1 .2. Part II— System Programming 

This section presents the features used by operating systems, device drivers, debuggers, and 
other software which support application programs. 

Chapter 9 — Real-Address Mode System Architecture: This chapter explains the real- 
address mode of the Pentium processor as it relates to the system programmer. In this mode, 



1-2 



i 



GETTING STARTED 



the Pentium processor appears as a fast real-mode Intel 286 or Intel386 processor or a fast 
8086 processor enhanced with additional instructions. 

Chapter 10 — Protected-Mode System Architecture Overview: Chapter 10 describes the 
features of the Pentium processor used by system programmers. System-oriented registers and 
data structures of the Pentium processor which are mentioned briefly in Part I are discussed in 
detail. The system-oriented instructions are introduced in the context of the registers and data 
structures they support. References to the chapters in which each register, data structure, and 
instruction is discussed in more detail. 

Chapter 11 — Protected Mode Memory Management: This chapter presents details of the 
data structures, registers, and instructions which support segmentation and paging and explains 
how system designers can choose between an unsegmented ("flat") model of memory 
organization and a model with segmentation. 

Chapter 12 — Protection: This chapter discusses protection as it applies to segments and 
pages. It explains the implementation of privilege rules, stack switching, pointer validation, 
user and supervisor modes. The protection aspects of multitasking are deferred until the 
following chapter. 

Chapter 13 — Protected-Mode Multitasking: Chapter 13 explains how the hardware of the 
Pentium processor supports multitasking with context-switching operations and intertask 
protection. 

Chapter 14 — Protected-Mode Exceptions and Interrupts: This chapter explains the basic 
interrupt mechanisms of the Pentium processor, shows how interrupts and exceptions relate to 
protection, discusses all possible exceptions including floating-point exceptions, listing causes 
and including information needed to handle and recover from each exception. 

Chapter 15 — Input/Output: Chapter 15 describes the I/O features of the Pentium processor, 
including I/O instructions, protection as it relates to I/O, and the I/O permission bit map. 

Chapter 16 — Initialization and Mode Switching: Chapter 16 defines the condition of the 
processor and floating-point unit after reset initialization. It explains how to set up registers, 
flags, and data structures. The steps necessary for switching between real-address and 
protected modes are also identified. 

Chapter 17 — Debugging: Chapter 17 discusses how to use the debugging registers and other 
debug features of the Pentium processor. 

Chapter 18 — Caching, Pipelining and Buffering: Chapter 18 explains the general concept of 
caching and the specific mechanisms used by the internal cache on the Pentium processor. It 
explains how the superscalar pipeline architecture of the Pentium processor and the Translation 
Lookaside Buffer (TLB) relate to the system programmer. 

Chapter 19 — Multiprocessing: Chapter 19 explains the instructions and flags which support 
multiple processors with shared memory. 

Chapter 20 — System Management Mode: This chapter explains the operation of SMM used 
to implement power management functions. Some possible customer differentiation features 
are mentioned. 



I 



1-3 



GETTING STARTED 



1 .1 .3. Part III— Compatibility 

This section explains the features of the architecture which support programs written for earlier 
Intel processors. Three execution modes have support for 16-bit programming: 16-bit 
operations can be performed in protected mode with or without using the operand-size prefix, 
programs written for the 8086 processor or the real mode of the Intel 286 processor can run in 
real mode on one of the 32-bit microprocessors, and a virtual machine monitor can be used to 
emulate real mode using virtual-8086 mode, even while multitasking with 32-bit programs. 

Chapter 21 — Mixing 16-Bit and 32-Bit Code: This chapter explains how to mix 16-bit and 
32-bit modules within the same program or task. Any particular module can use both 16-bit 
and 32-bit operands and addresses. 

Chapter 22 — Virtual-8086 Mode: Chapter 22 describes how to execute one or more 8086, 
8088, 80186 or 80188 programs in a Pentium processor protected-mode environment. 

Chapter 23 — Compatibility: This chapter explains the programming differences between the 
Intel 286, Intel386, and Intel486 processors. This chapter compares the floating-point unit of 
the Intel486 and Pentium processors with the arithmetic of the numerics coprocessors used 
with earlier Intel processors. 



1 .1 .4. Part IV— Optimization 

Chapter 24 discusses general optimization techniques for programming in the Intel x86 
architecture environment. For obtaining information on Pentium processor-specific 
optimization techniques, see Appendix H. 



1 .1 .5. Part V— Instruction Set 

Parts I, II and III present the general features of the instruction set as they relate to specific 
aspects of the architecture. Part V, Chapter 25, presents the instructions in alphabetical order, 
with detail needed by assembly language programmers and programmers of debuggers, 
compilers, operating systems, etc. Instruction descriptions include an algorithmic description 
of operations, effect on flag settings, effect of operand- and address-size attributes, and 
exceptions which may be generated. 

1.1.6. Appendices 

The appendices present tables of encodings and other details in a format designed for quick 
reference by programmers. 

1.2. RELATED LITERATURE 

The following books contain additional material related to Intel processors: 



1-4 



i 



GETTING STARTED 



• Pentium™ Processor Data Book, Order No. 241428 

• 82496 Cache Controller and 82491 Cache SRAM Data Book For Use With the Pentium™ 
Processor, Order No. 241429 

• Intel486™ Microprocessor Data Book, Order Number 240440 

• Intel486™ Processor Hardware Reference Manual, Order Number 240552 

• Intel486™ DX Processor Programmer s Reference Manual, Order Number 240486 

• Intel486™ SX CPU/Intel487™ SX Math CoProcessor Data Book, Order Number 240950 

• Intel486™ DX2 Microprocessor Data Book, Order Number 24 1 245 

• Intel486™ Microprocessor Product Brief Book, Order Number 240459 

• Intel386™ Processor Hardware Reference Manual, Order Number 23 1732 

• Intel386™ DX Processor Programmer s Reference Manual, Order Number 230985 

• Intel386™ SX Processor Programmer s Reference Manual, Order Number 24033 1 

• Intel386™ Processor System Software Writer's Guide, Order Number 23 1499 

• Intel386™ High-Performance 32-Bit CHMOS Microprocessor with Integrated Memory 
Management, Order Number 231630 

• 376™ Embedded Processor Programmer's Reference Manual, Order Number 2403 14 

• 80387 DX User's Manual Programmer's Reference, Order Number 23 1917 

• 376™ High-Performance 3 2 -Bit Embedded Processor, Order Number 240182 

• Intel386™ SX Microprocessor, Order Number 240 1 87 

• Microprocessor and Peripheral Handbook (vol. 1), Order Number 230843 



1 .3. NOTATIONAL CONVENTIONS 

This manual uses special notation for data-structure formats, for symbolic representation of 
instructions, and for hexadecimal numbers. A review of this notation makes the manual easier 
to read. 



1 .3.1 . Bit and Byte Order 

In illustrations of data structures in memory, smaller addresses appear toward the bottom of the 
figure; addresses increase toward the top. Bit positions are numbered from right to left. The 
numerical value of a set bit is equal to two raised to the power of the bit position. The Pentium 
processor is a "little endian" machine; this means the bytes of a word are numbered starting 
from the least significant byte. Figure 1-1 illustrates these conventions. 



i 



1-5 



GETTING STARTED 



DATA STRUCTURE 








31 23 15 7 







— BIT OFFSET 


GREATEST 




28 




ADDRESS 




24 








20 








16 








12 








8 








4 


SMALLEST 




BYTE 3 BYTE 2 BYTE 1 BYTE 





ADRESS 




BYTE OFFSET 








APM87 



Figure 1-1. Bit and Byte Order 



1 .3.2. Undefined Bits and Software Compatibility 

In many register and memory layout descriptions, certain bits are marked as reserved. When 
bits are marked as undefined or reserved, it is essential for compatibility with future processors 
that software treat these bits as having a future, though unknown, effect. The behavior of 
reserved bits should be regarded as not only undefined, but unpredictable. Software should 
follow these guidelines in dealing with reserved bits: 

• Do not depend on the states of any reserved bits when testing the values of registers which 
contain such bits. Mask out the reserved bits before testing. 

• Do not depend on the states of any reserved bits when storing to memory or to a register. 

• Do not depend on the ability to retain information written into any reserved bits. 

• When loading a register, always load the reserved bits with the values indicated in the 
documentation, if any, or reload them with values previously read from the same register. 

NOTE 

Depending upon the values of reserved register bits will make software 
dependent upon the unspecified manner in which the processor handles these 
bits. Depending upon reserved values risks incompatibility with future 
processors. AVOID ANY SOFTWARE DEPENDENCE UPON THE 
STATE OF RESERVED Pentium PROCESSOR REGISTER BITS. 

1 .3.3. Instruction Operands 

When instructions are represented symbolically, a subset of the assembly language for the 
Pentium processor is used. In this subset, an instruction has the following format: 

label: mnemonic arguments argument2, arguments 



GETTING STARTED 



where: 

• A label is an identifier which is followed by a colon. 

• A mnemonic is a reserved name for a class of instruction opcodes which have the same 
function. 

• The operands argument!, argument!, and argument^ are optional. There may be from 
zero to three operands, depending on the opcode. When present, they take the form of 
either literals or identifiers for data items. Operand identifiers are either reserved names of 
registers or are assumed to be assigned to data items declared in another part of the 
program (which may not be shown in the example). 

When two operands are present in an arithmetic or logical instruction, the right operand is the 
source and the left operand is the destination. 

For example: 

LOADREG: MOV EAX, SUBTOTAL 

In this example LOADREG is a label, MOV is the mnemonic identifier of an opcode, EAX is 
the destination operand, and SUBTOTAL is the source operand. Some assembly languages put 
the source and destination in reverse order. 



1.3.4. Hexadecimal Numbers 

Base 16 numbers are represented by a string of hexadecimal digits followed by the character 
H. A hexadecimal digit is a character from the set (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F). 
A leading zero is added if the number would otherwise begin with one of the digits A-F. For 
example, OFH is equivalent to the decimal number 15. 

Numbers are usually expressed in decimal notation (base 10). When hexadecimal (base 16) 
numbers are used, they are indicated by an 'H' suffix. For example 16 = 10H. 



1 .3.5. Segmented Addressing 

The processor uses byte addressing. This means memory is organized and accessed as a 
sequence of bytes. Whether one or more bytes are being accessed, a byte number is used to 
address memory. The memory which can be addressed with this number is called an address 
space. 

The processor also supports segmented addressing. This is a form of addressing where a 
program may have many independent address spaces, called segments. For example, a program 
can keep its code (instructions) and stack in separate segments. Code addresses would always 
refer to the code space, and stack addresses would always refer to the stack space. An example 
of the notation used to show segmented addresses is shown below. 

CS:EIP 

This example refers to a byte within the code segment. The byte number is held in the EIP 
register. CS identifies the code segment. 



i 



1-7 



GETTING STARTED 



1 .3.6. Exceptions 

An exception is an event which typically occurs when an instruction causes an error For 
example, an attempt to divide by zero generates an exception. However, some exceptions, such 
as breakpoints, occur under other conditions. Some types of exceptions may provide error 
codes. An error code reports additional information about the error. Error codes are produced 
only for some exceptions. An example of the notation used to show an exception and error 
code is shown below. 

#PF(fault code) 

This example refers to a page-fault exception under conditions where an error code naming a 
type of fault is reported. Under some conditions, exceptions which produce error codes may 
not be able to report an accurate code. In this case, the error code is zero, as shown below. 

#GP(0) 

See Chapter 14, Protected-Mode Exceptions and Interrupts, for a list of exception mnemonics 
and their description. 



1-8 



i 



Introduction to the 
Intel Pentium 
Processor Family 



i 



intel 

CHAPTER 2 

INTRODUCTION TO THE INTEL PENTIUM 

PROCESSOR FAMILY 



In 1985, Intel introduced the first in a line of 32-bit microprocessors compatible with the 
already broad base of existing x86 software. That was the Intel386 microprocessor. The Intel 
32-bit architecture has since grown to become the standard for cost-effective, high 
performance computing with an installed base of over 40 million units. Intel has continued to 
evolve and improve the basic implementation by incorporating the most advanced computer 
design and silicon technology. The Intel Pentium family is the most recent product of that 
effort. 

The Intel Pentium processor, like its predecesor the Intel486 microprocessor, is 100% binary 
software compatible with the installed base of over 100 million compatible Intel x86 systems. 
In addition, the Intel Pentium processor provides new levels of performance to new and 
existing software through a reimplementation of the Intel 32-bit instruction set architecture 
using the latest, most advanced, design techniques. Optimized, dual execution units provide 
one-clock execution for "core" instructions, while advanced technology, such as superscalar 
architecture, branch prediction, and execution pipelining, enables multiple instructions to 
execute in parallel with high efficiency. Separate code and data caches combined with wide 
128-bit and 256-bit internal data paths and a 64-bit, burstable, external bus allow these 
performance levels to be sustained in cost-effective systems. The application of this advanced 
technology in the Intel Pentium processor brings "state of the art" performance and capability 
to existing Intel x86 software as well as new and advanced applications. 

The Pentium processor has two primary operating modes and a "system management mode". 
The operating mode determines which instructions and architectural features are accessible. 
These modes are: 

• Protected Mode 

This is the native state of the microprocessor. In this mode all instructions and 
architectural features are available, providing the highest performance and capability. This 
is the recommended mode that all new applications and operating systems should target. 

Among the capabilities of protected mode is the ability to directly execute "real-address 
mode" 8086 software in a protected, multi-tasking environment. This feature is known as 
Virtual-8086 "mode" (or "V86 mode"). Virtual-8086 "mode" however, is not actually a 
processor "mode", it is in fact an attribute which can be enabled for any task (with 
appropriate software) while in protected mode. 

• Real-Address Mode (also called "real mode") 

This mode provides the programming environment of the Intel 8086 processor, with a few 
extensions (such as the ability to break out of this mode). Reset initialization places the 
processor in real mode where, with a single instruction, it can switch to protected mode. 



2-1 



INTRODUCTION TO THE INTEL PENTIUM PROCESSOR FAMILY 



• System Management Mode 

The Pentium microprocessor also provides support for System Management Mode 
(SMM). SMM is a standard architectural feature unique to all new Intel microprocessors, 
beginning with the Intel386 SL processor, which provides an operating-system and 
application independent and transparent mechanism to implement system power 
management and OEM differentiation features. SMM is entered through activation of an 
external interrupt pin (SMM), which switches the CPU to a separate address space while 
saving the entire context of the CPU. SMM-specific code may then be executed 
transparently. The operation is reversed upon returning. 



2-2 



Parti 



Application and 
Numeric Processing 



intel 



Basic Programming 
Model 



i 



CHAPTER 3 
BASIC PROGRAMMING MODEL 



This chapter describes the application programming environment (except for the floating-point 
features) as seen by assembly-language programmers. The chapter introduces the architectural 
features which directly affect the design and implementation of application programs. 
Floating-point applications are described separately in Chapter 6. 

The basic programming model consists of these parts: 

• Memory organization 

• Data types 

• Registers 

• Instruction format 

• Operand selection 

• Interrupts and exceptions 

Note that input/output is not included as part of the basic programming model. System 
designers can choose to make I/O instructions available to applications or can choose to 
reserve these functions for the operating system. For this reason, the I/O features are discussed 
in Chapter 9 and Chapter 15. 

This chapter contains a section for each feature of the architecture normally visible to 
applications. 



3.1. MEMORY ORGANIZATION 

The memory on the bus of a Pentium processor is called physical memory. It is organized as a 
sequence of 8 -bit bytes. Each byte is assigned a unique address, called a physical address, 
which ranges from zero to a maximum of 2 32 -l (4 gigabytes). 

Memory management is a hardware mechanism for making reliable and efficient use of 
memory. When memory management is used, programs do not directly address physical 
memory. Programs address a memory model, called virtual memory. 

Memory management consists of segmentation and paging. Segmentation is a mechanism for 
providing multiple, independent address spaces. Paging is a mechanism to support a model of a 
large address space in RAM using a small amount of RAM and some disk storage. Either or 
both of these mechanisms can be used. An address issued by a program is a logical address. 
Segmentation hardware translates a logical address into an address for a continuous, 
unsegmented address space, called a linear address. Paging hardware translates a linear 
address into a physical address. 

Memory can appear as a single, "flat" address space like physical memory. Or, it can appear as 
one or more independent memory spaces, called segments. Segments can be assigned 
specifically for holding a program's code (instructions), data, or stack. In fact, a single program 



3-1 



BASIC PROGRAMMING MODEL 



can have up to 16,383 segments of different sizes and kinds. Segments can be used to increase 
the reliability of programs and systems. For example, a program's stack can be put into a 
different segment than its code to prevent the stack from growing into the code space and 
overwriting instructions with data. Each segment defines a module. 

Both the flat and segmented models can provide memory protection. Models intermediate 
between these extremes also can be chosen. The reasons for choosing a particular memory 
model and the manner in which system programmers implement a model are discussed in 
Chapter 11. 

Whether or not multiple segments are used, logical addresses are translated into linear 
addresses by treating the address as an offset into a segment. Each segment has a segment 
descriptor, which holds its base address and size limit. If the offset does not exceed the limit, 
and no other condition exists which would prevent reading the segment, the offset and base 
address are added together to form the linear address. 

The linear address produced by segmentation is used directly as the physical address if bit 3 1 
of the CRO register is clear (the CRO register is discussed in Chapter 10). This register bit 
controls whether paging is used or not used. If the bit is set, the paging hardware is used to 
translate the linear address into the physical address. 

The paging hardware gives another level of organization to memory. It breaks the linear 
address space into fixed blocks called pages. The logical address space is mapped into the 
linear address space, which is mapped into some number of pages. A page can be in memory 
or on disk. When a logical address is issued, it is translated into an address for a page in 
memory, or an exception is issued. An exception gives the operating system a chance to read 
the page from disk and update the page mapping. The program which generated the exception 
then can be restarted without generating an exception. 

If multiple segments are used, they are part of the programming environment seen by 
application programmers. Paging, however, is invisible to the application programmer and is 
not discussed in this chapter. See Chapter 1 1 for details on this subject. 



3.1 .1 . Unsegmented or "Flat" Model 

The simplest memory model is the flat model. Although there isn't a mode bit or control 
register which turns off the segmentation mechanism, the same effect can be achieved by 
mapping all segments to the same linear addresses. This will cause all memory operations to 
refer to the same memory space. 

In a flat model, segments can cover the entire range of physical addresses, or they can cover 
only those addresses which are mapped to physical memory. The advantage of the smaller 
address space is it provides a minimum level of hardware protection against software bugs; an 
exception will occur if any logical address refers to an address for which no memory exists. 



3.1.2. Segmented Model 

In a segmented model of memory organization, the logical address space consists of as many 
as 16,383 segments of up to 4 gigabytes each, or a total as large as 2 46 bytes (64 terabytes). 
The processor maps this 64 terabyte logical address space onto the physical address space by 
the address translation mechanism described in Chapter 1 1 . Application programmers can 



3-2 



BASIC PROGRAMMING MODEL 



ignore the details of this mapping. The advantage of the segmented model is that offsets within 
each address space are separately checked and access to each segment can be individually 
controlled. 

A pointer into a segmented address space consists of two parts (see Figure 3-1). 

1. A segment selector, which is a 16-bit field which identifies a segment. 

2. An offset, which is a 32-bit byte address within a segment. 



J. 



OPERAND 



15 

SEGMENT SELECTOR 



31 

OFFSET WITHIN SEGMENT 



APM48 



Figure 3-1. Segmented Addressing 

The processor uses the segment selector to find the linear address of the beginning of the 
segment, called the base address. Programs access memory using fixed offsets from this base 
address, so an object-code module can be loaded into memory and run without changing the 
addresses it uses (dynamic linking). The size of a segment is defined by the programmer, so a 
segment can be exactly the size of the module it contains. 



3.2. DATA TYPES 

Bytes, words, doublewords, and quadwords are the principal data types (see Figure 3-2). A 
byte is eight bits. The bits are numbered through 7, bit being the least significant bit (LSB). 



3-3 



BASIC PROGRAMMING MODEL 



intel 



BYTE 



Address 
N 



15 





HIGH 


LOW 




BYTE 


BYTE 




Address 


Address 




N+1 


N 


31 


15 





HIGH WORD 


LOW WORD 


Address Address 


Address 


Address 



BYTE 



WORD 



DOUBLEWORD 



N+3 



N+2 



N+1 



63 



47 



31 



15 



HIGH DOUBLEWORD 
| 



LOW DOUBLEWORD 
I 



QUADWORD 



Address Address 
N+7 N+6 



Address Address Address Address 
N+5 N+4 N+3 N+2 



Address Address 
N+1 N 



Figure 3-2. Fundamental Data Types 



A word is two bytes occupying any two consecutive addresses. A word contains 16 bits. The 
bits of a word are numbered from through 15, bit again being the least significant bit. The 
byte containing bits 0-7 of the word is called the low byte; the byte containing bits 8-15 is 
called the high byte. The low byte is stored in the byte with the lower address. The address of 
the low byte also is the address of the word. The address of the high byte is used only when the 
upper half of the word is being accessed separately from the lower half. 

A doubleword is four bytes occupying any four consecutive addresses. A doubleword contains 
32 bits. The bits of a doubleword are numbered from through 31, bit again being the least 
significant bit. The word containing bits 0-15 of the doubleword is called the low word; the 
word containing bits 16-31 is called the high word. The low word is stored in the two bytes 
with the lower addresses. The address of the lowest byte is the address of the doubleword. The 
higher addresses are used only when the upper word is being accessed separately from the 
lower word, or when individual bytes are being accessed. 

A quadword is eight bytes occupying any eight consecutive addresses. A quadword contains 
64 bits. The bits of a quadword are numbered from to 64 with bit being the least significant 
bit. The doubleword containing bits 0-31 is called the low doubleword and the doubleword 
containing bits 32-63 is called the high doubleword. The low doubleword is stored in the four 
bytes with the lower addresses. The higher addresses are used only when the upper 
doubleword is being accessed separately from the lower doubleword, or when individual bytes 
are being accessed. Figure 3-3 illustrates the arrangement of bytes within words, doublewords 
and quadwords. 



3-4 




BASIC PROGRAMMING MODEL 





















DOUBLEWORD AT ADDRESS OAH 








OEH 






CONTAINS 7AFE0636H 






V 




7A 


ODH 


V 




WORD AT ADDRESS OBH 


V 








FE 


OCH 






CONTAINS FE06H 


^> 


f 






06 


OBH 












> 


f 


36 


OAH 




QUADWORD AT ADDRESS 6 


BYTE AT ADDRESS 9 
CONTAINS 1FH 










1F 


9H 




CONTAINS 7AFE06361FA4230BH 


t 










A4 


8H 






M/non at Annoccc c 
CONTAINS 230BH 




K 






23 


7H 






> 


1 






OB 


6H 


> 


f 














5H 


















4H 






WORD AT ADDRESS 2 
CONTAINS 74CBH 








K 


74 


3H 






1 


K 


_> 


1 


CB 


2H 






WORD AT ADDRESS 1 
CONTAINS CB31H 


_> 


<_ 






31 


1H 


















OH 






















APM43 



Figure 3-3. Bytes, Words, Doublewords and Quadwords in Memory 



Note that words do not need to be aligned at even-numbered addresses, doublewords do not 
need to be aligned at addresses evenly divisible by four, and quadwords do not need to be 
aligned at addresses evenly divisible by eight. This allows maximum flexibility in data 
structures (e.g., records containing mixed byte, word, and doubleword items) and efficiency in 
memory utilization. Because the Pentium processor has a 64-bit data bus, communication 
between processor and memory takes place as byte, word, doubleword and quadword transfers. 
Data can be accessed at any byte boundary, but multiple cycles can be required for unaligned 
transfers. The Pentium processor considers a 2-byte or 4-byte operand that crosses a 4-byte 
boundary and an 8-byte operand that crosses an 8-byte boundary to be misaligned. For 
maximum performance, data structures (especially stacks) should be designed so, whenever 
possible, word operands are aligned to even addresses, doubleword operands are aligned to 
addresses evenly divisible by four, and quadwords are aligned to addresses evenly divisible by 
eight. 

Although bytes, words, and doublewords are the fundamental types of operands, the processor 
also supports additional interpretations of these operands. Specialized instructions recognize 
the following data types (shown in Figure 3-4): 

• Integer: A signed binary number held in a 32-bit doubleword, 16-bit word, or 8-bit byte. 



3-5 



BASIC PROGRAMMING MODEL 



All operations assume a two's complement representation. The sign bit is located in bit 7 
in a byte, bit 15 in a word, and bit 31 in a double word. The sign bit is set for negative 
integers, clear for positive integers and zero. The value of an 8-bit integer is from -128 to 
+127; a 16-bit integer from -32,768 to +32,767; a 32-bit integer from -2 31 to +2 31 -1. 

• Ordinal: An unsigned binary number contained in a 32-bit doubleword, 16-bit word, or 8- 
bit byte. The value of an 8-bit ordinal is from to 255; a 16-bit ordinal from to 65,535; a 
32-bit ordinal from to 2 32 - 1. This is sometimes referred to as an unsigned integer. 

• BCD Integer: A representation of a binary-coded decimal (BCD) digit in the range 
through 9. Unpacked decimal numbers are stored as unsigned byte quantities. One digit is 
stored in each byte. The magnitude of the number is the binary value of the low-order 
half-byte; values to 9 are valid and are interpreted as the value of a digit. The high-order 
half-byte must be zero during multiplication and division; it can contain any value during 
addition and subtraction. 

• ' Packed BCD Integer : A representation of binary-coded decimal digits, each in the range 

to 9. One digit is stored in each half-byte, two digits in each byte. The digit in bits 4 to 7 is 
more significant than the digit in bits to 3. Values to 9 are valid for a digit. 

• Near Pointer: A 32-bit effective address. A near pointer is an offset within a segment. 
Near pointers are used for all pointers in a flat memory model, or for references within a 
segment in a segmented model. 

• Far Pointer: A 48-bit logical address consisting of a 16-bit segment selector and a 32-bit 
offset. Far pointers are used in a segmented memory model to access other segments. 

• Bit field: A contiguous sequence of bits. A bit field can begin at any bit position of any 
byte and can contain up to 32 bits. 

• Bit string: A contiguous sequence of bits. A bit string can begin at any bit position of any 
byte and can contain up to 2 32 - 1 bits. 

• Byte String: A contiguous sequence of bytes, words, or doublewords. A string can contain 
from zero to 2 32 - 1 bytes (4 gigabytes). 

• Floating-Point Types: For a discussion of the data types used by floating-point 
instructions, see Chapter 6. 



3-6 



irrtel, 



BASIC PROGRAMMING MODEL 



7 

f Tn T TTT 1 

15 ~^ ^~ 

prr[ 1 1 1 j 1 1 1 1 1 iTj 

K » 



31 

1 1 1 1 1 I 1 1 1 1 1 1 I 1 1 I 1 1 I 1 1 1 1 1 1 1 1 1 1 I I I I 



>\ 

7 

K-X 

15 

1 1 I I | I I I 1 1 I I | I I I [ 

K >\ 



31 



1 1 I I I I 



I | I I I [ I I I [ I I I | I I I | H I JTTTj 



BYTE INTEGER 
7-BIT MAGNITUDE 
1-BIT SIGN 

WORD INTEGER 
15-BIT MAGNITUDE 
1-BIT SIGN 

DOUBLEWORD INTEGER 
31-BIT MAGNITUDE 
1-BIT SIGN 



BYTE ORDINAL 
8-BIT MAGNITUDE 



WORD ORDINAL 
16-BIT MAGNITUDE 



DOUBLEWORD ORDINAL 
32-BIT MAGNITUDE 



j I 1 1 | I 1 1 | - 1 1 I 1 1 I I I 1 1 II | I I I [ 



rj _ j i i i | i i i j i i i | i i i j 



31 



j I I I | I I I [ I I I | I I I |l I I | I I I 1 1 I I | I ITj 



47 31 

1 1 I I I I I I 1 1 I I I I I II I I I 



j I I I | I I I j I I I | I I I [ I I I | I I I j l I I | I I I j I I I | I I I [ 



j I I I | I I I j M I j I I I j I M j I I I j I 1 I | I I I j It I j I I I | ) I I j I I I j 



j I I I | I II j I I I | I II | p 



j I I I | I I I j I I I | I I I | 

— X 



1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 



K- 



j i i i j i i i j i i i | i TTj 
>\ 



BCD INTEGER 

4-BIT DIGIT PER BYTE 



PACKED BCD INTEGER 
4-BIT DIGIT PER HALF-BYTE 



NEAR POINTER 
32-BIT OFFSET 

FAR POINTER 
32-BIT OFFSET 
16-BIT SELECTOR 



BIT FIELD 
UP TO 32 BITS 

BIT STRING 

UP TO 4 GIGABITS 



BYTE STRING 

UP TO 4 GIGABYTES 



Figure 3-4. Data Types 



3-7 



BASIC PROGRAMMING MODEL 



3.3. REGISTERS 

The processor contains sixteen registers which can be used by an application programmer. As 
Figure 3-5 shows, these registers can be grouped as: 

1. General registers. These eight 32-bit registers are free for use by the programmer. 

2. Segment registers. These registers hold segment selectors associated with different forms 
of memory access. For example, there are separate segment registers for access to code 
and stack space. These six registers determine, at any given time, which segments of 
memory are currently available. 

3. Status and control registers. These registers report and allow modification of the state of 
the processor. 

3.3.1. General Registers 

The general registers are the 32-bit registers EAX, EBX, ECX, EDX, EBP, ESP, ESI, and EDI. 
These registers hold operands for logical and arithmetic operations. They also can hold 
operands for address calculations (except the ESP register cannot be used as an index 
operand). The names of these registers are derived from the names of the general registers on 
the 8086 processor, the AX, BX, CX, DX, BP, SP, SI, and DI registers. As Table 3-1 shows, 
the low 16 bits of the general registers can be referenced using these names. 

Each byte of the 16-bit registers AX, BX, CX, and DX also has another name. The byte 
registers are named AH, BH, CH, and DH (high bytes) and AL, BL, CL, and DL (low bytes). 

All of the general-purpose registers are available for address calculations and for the results of 
most arithmetic and logical operations; however, a few instructions assign specific registers to 
hold operands. For example, string instructions use the contents of the ECX, ESI, and EDI 
registers as operands. By assigning specific registers for these functions, the instruction set can 
be encoded more compactly. The instructions that use specific registers include: double- 
precision multiply and divide, I/O, strings, translate, loop, variable shift and rotate, and stack 
operations. 



3-8 



BASIC PROGRAMMING MODEL 



22L 



GENERAL REGISTERS 

15 



AH 



DH 



CH 



BH 



AL 



i' 16-BIT 32-BIT 
AX EAX 



DL 



CL 



BL 



BP 



Dl 



SP 



SEGMENT REGISTERS 

15 



CS 



SS 



DS 



ES 



FS 



GS 



STATUS AND CONTROL REGISTERS 



31 



EFLAGS 



EIP 



DX 
CX 
BX 



EDX 
ECX 
EBX 

EBP 
ESI 
EDI 
ESP 



Figure 3-5. Application Register Set 



3-9 



BASIC PROGRAMMING MODEL 




Table 3-1. Register Names 



8-Bit 


16-Bit 


32-Bit 


AL 


AX 


EAX 


AH 






BL 


BX 


EBX 


BH 






PI 


OA 


Fry 


CH 






DL 


DX 


EDX 


DH 








SI 


ESI 




Dl 


EDI 




BP 


EBP 




SP 


ESP 



3.3.2. Segment Registers 

Segmentation gives system designers the flexibility to choose among various models of 
memory organization. Implementation of memory models is the subject of Chapter 11. 

The segment registers contain 16-bit segment selectors, which index into tables in memory. 
The tables hold the base address for each segment, as well as other information regarding 
memory access. An unsegmented model is created by mapping each segment to the same place 
in physical memory, as shown in Figure 3-6. 

At any instant, up to six segments of memory are immediately available. The segment registers 
CS, DS, SS, ES, FS, and GS hold the segment selectors for these six segments. Each register is 
associated with a particular kind of memory access (code, data, or stack). Each register 
specifies a segment, from among the segments used by the program (see Figure 3-7). Other 
segments can be used by loading their segment selectors into the segment registers. 

The segment containing the instructions being executed is called the code segment. Its segment 
selector is held in the CS register. The processor fetches instructions from the code segment, 
using the contents of the EIP register as an offset into the segment. The CS register is loaded as 
the result of interrupts, exceptions, and instructions which transfer control between segments 
(e.g., the CALL, RET and JMP instructions). 

Before a procedure is called, a region of memory needs to be allocated for a stack. The stack 
holds the return address, parameters passed by the calling routine, and temporary variables 
allocated by the procedure. All stack operations use the SS register to find the stack segment. 
Unlike the CS register, the SS register can be loaded explicitly, which permits application 
programs to set up stacks. 



3-10 




BASIC PROGRAMMING MODEL 



ONE PHYSICAL ADDRESS SPACE 
DIFFERENT LOGICAL SEGMENTS I 1 




APM51 



Figure 3-6. An Unsegmented Memory 



DIFFERENT LOGICAL SEGMENTS 



DIFFERENT ADDRESS SPACE 
IN PHYSICAL MEMORY 



CS 
SS l 



DS 
ES r 
FS J± 



GS 



CODE 
SEGMENT 



STACK 
SEGMENT 



DATA 
SEGMENT 



DATA 
SEGMENT 



DATA 
SEGMENT 



DATA 
SEGMENT 



APM49 



Figure 3-7. A Segmented Memory 



i 



3-11 



BASIC PROGRAMMING MODEL 



The DS, ES, FS, and GS registers allow as many as four data segments to be available 
simultaneously. Four data segments give efficient and secure access to different types of data 
structures. For example, separate data segments can be created for the data structures of the 
current module, data exported from a higher-level module, a dynamically-created data 
structure, and data shared with another program. If a bug causes a program to run wild, the 
segmentation mechanism can limit the damage to only those segments allocated to the 
program. 

Depending on the structure of data (i.e., the way data is partitioned into segments), a program 
can require access to more than four data segments. To access additional segments, the DS, ES, 
FS, and GS registers can be loaded by an application program during execution. The only 
requirement is to load the appropriate segment register before accessing data in its segment. 

A base address is kept for each segment. To address data within a segment, a 32-bit offset is 
added to the segment's base address. Once a segment is selected (by loading the segment 
selector into a segment register), an instruction only needs to specify the offset. An operand 
within a data segment is addressed by specifying its offset either in an instruction or a general 
register. Simple rules define which segment register is used to form an address when only an 
offset is specified. 



3.3.3. Stack Implementation 

Stack operations are supported by three registers: 

1 . Stack Segment (SS) Register. Stacks reside in memory. The number of stacks in a system 
is limited only by the maximum number of segments. A stack can be up to 4 gigabytes 
long, the maximum size of a segment. One stack is available at a time — the stack whose 
segment selector is held in the SS register. This is the current stack, often referred to 
simply as "the" stack. The SS register is used automatically by the processor for all stack 
operations. 

2. Stack Pointer (ESP) Register. The ESP register holds an offset to the top-of-stack (TOS) 
in the current stack segment. It is used by PUSH and POP operations, subroutine calls and 
returns, exceptions, and interrupts. When an item is pushed onto the stack (see Figure 3-8), 
the processor decrements the ESP register, then writes the item at the new TOS. When an 
item is popped off the stack, the processor copies it from the TOS, then increments the 
ESP register. In other words, the stack grows down in memory toward lesser addresses. 

3. Stack-Frame Base Pointer (EBP) Register. The EBP register typically is used to access 
data structures passed on the stack. For example, on entering a subroutine the stack 
contains the return address and some number of data structures passed to the subroutine. 
The subroutine adds to the stack whenever it needs to create space for temporary local 
variables. As a result, the stack pointer gets incremented and decremented as temporary 
variables are pushed and popped. If the stack pointer is copied into the base pointer before 
anything is pushed on the stack, the base pointer can be used to reference data structures 
with fixed offsets. If this is not done, the offset to access a particular data structure would 
change whenever a temporary variable is allocated or de-allocated. 

When the EBP register is used to address memory, the current stack segment is referenced 
(i.e., the SS segment). Because the stack segment does not have to be specified, instruction 

3-12 ■ 



BASIC PROGRAMMING MODEL 



encoding is more compact. The EBP register also can be used to address other segments. 

Instructions, such as the ENTER and LEAVE instructions, are provided which 
automatically set up the EBP register for convenient access to variables. 



STACK SEGMENT 
31 



SUBROUTINE 

PASSED 
VARIABLES 



TOP OF STACK 



< 



BOTTOM OF STACK 
(INITIAL ESP VALUE) 



EBP 



ESP 



E 



PUSHES PUT THE 
TOP OF STACK AT 
LOWER ADDRESSES 



POPS PUT THE 
TOP OF STACK AT 
HIGHER ADDRESSES 



Figure 3-8. Stacks 



3.3.4. Flags Register 

Condition codes (e.g., carry, sign, overflow) and mode bits are kept in a 32-bit register named 
EFLAGS. Figure 3-9 defines the bits within this register. 

The flags control certain operations and indicate the status of the Pentium processor. Besides 
status and control flag bits, the flag register also contains system flags. See Chapter 10 for a 
description of the system and control flags. 



3.3.4.1. STATUS FLAGS 

The status flags of the EFLAGS register report the kind of result produced from the execution 
of arithmetic instructions, such as ADD, SUB, MUL, and DIV. The MOV instruction does not 
affect these flags. Conditional jumps and subroutine calls allow a program to sense the state of 
the status flags and respond to them. For example, when the counter controlling a loop is 
decremented to zero, the state of the ZF flag changes, and this change can be used to suppress 
the conditional jump to the start of the loop. The status flags are shown in Table 3-2. 



I 



3-13 



BASIC PROGRAMMING MODEL 



/31/30/29/28/27/26/25/24/23/22/21/20/19/18/17/16/15/14/13 12/11/10/9 


(' 


h/6 




A 


I s 


I s 




Q 














6 














1 

D 


V 
1 

P 


V 

1 

F 


A 

C 


V 
M 


R 
F 





N 
T 


1 


P 
L 



F 


D 
F 


1 

F 


T 
F 


s 

F 


I 
F 





A 

F 





P 
F 


1 


c 

F 




\ 


\ 


\ 


\ 


\ 


\ 


\ 


\ 


\ 


\ \ 


\ 


\ 


\ 


\ 


\ 


\ 


\ 


\ 


\ 


\ 


\ 


\ 


\ 


\ 


\ 


\ 


\ \ 


\\> 



X ID FLAG (ID) 

X VIRTUAL INTERRUPT PENDING (VIP) 
X VIRTUAL INTERRUPT FLAG (VIF) 

X ALIGNMENT CHECK (AC) 

X VIRTUAL 8086 MODE (VM) 

X RESUME FLAG (RF) 

X NESTED TASK (NT) 

X I/O PRIVILEGE LEVEL (IOPL) 

S OVERFLOW FLAG (OF) 

C DIRECTION FLAG (DF) 

X INTERRUPT ENABLE FLAG (IF) _ 

X TRAP FLAG (TF) 

S SIGN FLAG (SF) 

S ZERO FLAG (ZF) 

S AUXILIARY CARRY FLAG (AF) 

S PARITY FLAG (PF) 

S CARRY FLAG (CF) 

S INDICATES A STATUS FLAG 
C INDICATES A CONTROL FLAG 
X INDICATES A SYSTEM FLAG 

□ BIT POSITIONS SHOWN AS OR 1 ARE INTEL RESERVED. 
DO NOT USE. ALWAYS SET THEM TO THE VALUE PREVIOUSLY READ. 

APM45 



Figure 3-9. EFLAGS Register 



Table 3-2. Status Flags 



Name 


Purpose 


Condition Reported 


OF 


overflow 


Result exceeds positive or negative limit of number range 


SF 


sign 


Result is negative (less than zero) 


ZF 


zero 


Result is zero 


AF 


auxiliary carry 


Carry out of bit position 3 (used for BCD) 


PF 


parity 


Low byte of result has even parity (even number of set bits) 


CF 


carry flag 


Carry out of most significant bit of result 



3-14 



I 



BASIC PROGRAMMING MODEL 



3-3.4.2. CONTROL FLAG 

The control flag DF of the EFLAGS register controls string instructions. 
DF (Direction Flag, bit 10) 

Setting the DF flag causes string instructions to auto-decrement, that is, to process strings from 
high addresses to low addresses. Clearing the DF flag causes string instructions to auto- 
increment, or to process strings from low addresses to high addresses. 

3.3.5. Instruction Pointer 

The instruction pointer (EIP) register contains the offset in the current code segment for the 
next instruction to execute. The instruction pointer is not directly available to the programmer; 
it is controlled implicitly by control-transfer instructions (jumps, returns, etc.), interrupts, and 
exceptions. 

The EIP register is advanced from one instruction boundary to the next. Because of instruction 
prefetching, it is only an approximate indication of the bus activity which loads instructions 
into the processor. See Chapter 18 for detailed information on prefetching. 

3.3.5.1. Instruction Format 



3.4. INSTRUCTION FORMAT 

The information encoded in an instruction includes a specification of the operation to be 
performed, the type of the operands to be manipulated, and the location of these operands. If 
an operand is located in memory, the instruction also must select, explicitly or implicitly, the 
segment which contains the operand. 

An instruction can have various parts and formats. The exact format of instructions is shown in 
Appendix A; the parts of an instruction are described below. Of these parts, only the opcode is 
always present. The other parts may or may not be present, depending on the operation 
involved and the location and type of the operands. The parts of an instruction, in order of 
occurrence, are listed below: 

• Prefixes: one or more bytes preceding an instruction which modify the operation of the 
instruction. The following prefixes can be used by application programs: 

1. Segment override — explicitly specifies which segment register an instruction should 
use, instead of the default segment register. The segment override prefixes include: 
2EH CS segment override prefix 

36H SS segment override prefix 
26H ES segment override prefix 
65H GS segment override prefix 

2. Address size (67H) — switches between 16- and 32-bit addressing. Either size can be 
the default; this prefix selects the non-default size. 



3-15 



BASIC PROGRAMMING MODEL 



3. Operand size (66H) — switches between 16- and 32-bit data size. Either size can be the 
default; this prefix selects the non-default size. 

4. Repeat — used with a string instruction to cause the instruction to be repeated for each 
element of the string. The repeat prefixes include: 

F3H REP prefix (used only with string instructions) 

F3H REPE/REPZ prefix (used only with string instructions) 

F2h REPNE/REPNZ prefix (used only with string instructions) 

5. Lock (OFOH) — used to ensure exclusive use of shared memory in multiprocessor 
environments. This prefix can only be used with the following instructions: BTS, 
BTR, BTC, XCHG, ADD, OR, ADC, SBB, AND, SUB, XOR, NOT, NEG, INC, 
DEC, CMPXCHG, CMPXCH8B, XADD 

Zero or one bytes are reserved for each group of prefixes. The prefixes are grouped as 
follows: 

— Instruction Prefixes: REP, REPE/REPZ, REPNE/REPNZ, LOCK 

— Segment Override Prefixes: CS, SS, DS, ES, FS, GS 

— Operand Size Override 

— Address Size Override 

For each instruction, one prefix may be used from each group. The effect of redundant 
prefixes (more than one prefix from a group) is undefined and may vary from processor to 
processor. 

• Opcode: specifies the operation performed by the instruction. Some operations have 
several different opcodes, each specifying a different form of the operation. 

• Register specifier: an instruction can specify one or two register operands. Register 
specifiers occur either in the same byte as the opcode or in the same byte as the 
addressing-mode specifier. 

• Addressing-mode specifier: when present, specifies whether an operand is a register or 
memory location; if in memory, specifies whether a displacement, a base register, an index 
register, and scaling are to be used. 

• SIB (scale, index, base) byte: when the addressing-mode specifier indicates the use of an 
index register to calculate the address of an operand, an SIB byte is included in the 
instruction to encode the base register, the index register, and a scaling factor. 

• Displacement: when the addressing-mode specifier indicates a displacement will be used 
to compute the address of an operand, the displacement is encoded in the instruction. A 
displacement is a signed integer of 32, 16, or 8 bits. The 8-bit form is used in the common 
case when the displacement is sufficiently small. The processor extends an 8-bit 
displacement to 16 or 32 bits, taking into account the sign. 

• Immediate operand: when present, directly provides the value of an operand. Immediate 
operands can be bytes, words, or doublewords. In cases where an 8-bit immediate operand 
is used with a 16- or 32-bit operand, the processor extends the eight-bit operand to an 
integer of the same sign and magnitude in the larger size. In the same way, a 16-bit 
operand is extended to 32-bits. 



3-16 



i 



BASIC PROGRAMMING MODEL 



3.5. OPERAND SELECTION 

An instruction acts on zero or more operands. An example of a zero-operand instruction is the 
NOP instruction (no operation). An operand can be held in any of these places: 

• In the instruction itself (an immediate operand). 

• In a register (in the case of 32-bit operands, EAX, EBX, ECX, EDX, ESI, EDI, ESP, or 
EBP; in the case of 16-bit operands AX, BX, CX, DX, SI, DI, SP, or BP; in the case of 8- 
bit operands AH, AL, BH, BL, CH, CL, DH, or DL; the segment registers; or the 
EFLAGS register for flag operations). Use of 16-bit register operands requires use of the 
16-bit operand size prefix if the current default operand size is 32 bits. (See Chapter 1 1 for 
information on setting the D-bit in the code segment descriptor to control default operand 
size.) 

• In memory. 

• At an I/O port. See Chapter 15 for information on I/O. 

Register and immediate operands are available on-chip — the latter because they are prefetched 
as part of interpreting the instruction. Memory operands residing in the on-chip cache can be 
accessed just as fast for most instructions. 

Of the instructions which have operands, some specify operands implicitly; others specify 
operands explicitly; still others use a combination of both. For example: 

Implicit operand: aam 

By definition, AAM (ASCII adjust for multiplication) operates on the contents of the AX 
register. 

Explicit operand: xchg eax, ebx 

The operands to be exchanged are encoded in the instruction with the opcode. 
Implicit and explicit operands: push counter 

The memory variable COUNTER (the explicit operand) is copied to the top of the stack (the 
implicit operand). 

Note that most instructions have implicit operands. All arithmetic instructions, for example, 
update the EFLAGS register. 

An instruction can explicitly reference one or two operands. Two-operand instructions, such as 
MOV, ADD, and XOR, generally overwrite one of the two participating operands with the 
result. This is one difference between the source operand (the one unaffected by the operation) 
and the destination operand (the one overwritten by the result). 

For most instructions, one of the two explicitly specified operands — either the source or the 
destination — can be either in a register or in memory. The other operand must be in a register 
or it must be an immediate source operand. This puts the explicit two-operand instructions into 
the following groups: 

• Register to register 



i 



3-17 



BASIC PROGRAMMING MODEL 



• Register to memory 

• Memory to register 

• Immediate to register 

• Immediate to memory 

Certain string instructions and stack manipulation instructions, however, transfer data from 
memory to memory. Both operands of some string instructions are in memory and are 
specified implicitly. Push and pop stack operations allow transfer between memory operands 
and the memory-based stack. 

Several three-operand instructions are provided, such as the IMUL, SHRD, and SHLD 
instructions. Two of the three operands are specified explicitly, as for the two-operand 
instructions, while a third is taken from the CL register or supplied as an immediate. Other 
three-operand instructions, such as the string instructions when used with a repeat prefix, take 
all their operands from registers. 



3.5.1 . Immediate Operands 

Certain instructions use data from the instruction itself as one (and sometimes two) of the 
operands. Such an operand is called an immediate operand. It can be a byte, word, or 
doubleword. For example: 

SHR PATTERN, 2 

One byte of the instruction holds the value 2, the number of bits by which to shift the variable 
PATTERN. 

TEST PATTERN, 0FFFF00FFH 

A doubleword of the instruction holds the mask which is used to test the variable PATTERN. 

IMUL CX, MEMWORD, 3 

A word in memory is multiplied by an immediate 3 and stored into the CX register. 

All arithmetic instructions (except divide) allow the source operand to be an immediate value. 
When the destination is the EAX or AL register, the instruction encoding is one byte shorter 
than with the other general registers. 



3.5.2. Register Operands 

Operands can be located in one of the 32-bit general registers (EAX, EBX, ECX, EDX, ESI, 
EDI, ESP, or EBP), in one of the 16-bit general registers (AX, BX, CX, DX, SI, DI, SP, or 
BP), or in one of the 8-bit general registers (AH, BH, CH, DH, AL, BL, CL, or DL). Sixty- 
four bit operands are also used in 32-bit register pairs for operations such as DIV and MUL. 
Register pairs are represented with a colon separating them. For example, in the register pair 
EDX:EAX, EDX contains the high order bits and EAX contains the low order bits of the 64-bit 
operand. 

The Pentium processor has instructions for referencing the segment registers (CS, DS, ES, SS, 
FS, and GS). These instructions are used by application programs only if system designers 



3-18 



i 



BASIC PROGRAMMING MODEL 



have chosen a segmented memory model. 

The Pentium processor also has instructions for changing the state of individual flags in the 
EFLAGS register. Instructions have been provided for setting and clearing flags which often 
need to be accessed. The other flags, which are not accessed so often, can be changed by 
pushing the contents of the EFLAGS register on the stack, making changes to it while it's on 
the stack, and popping it back into the register. 



3.5.3. Memory Operands 

Instructions with explicit operands in memory must reference the segment containing the 
operand and the offset from the beginning of the segment to the operand. Segments are 
specified using a segment-override prefix, which is a byte placed at the beginning of an 
instruction. If no segment is specified, simple rules assign the segment by default. The offset is 
specified in one of the following ways: 

1. Most instructions which access memory contain a byte for specifying the addressing 
method of the operand. The byte, called the modRIM byte, comes after the opcode and 
specifies whether the operand is in a register or in memory. If the operand is in memory, 
the address is calculated from a segment register and any of the following values: a base 
register, an index register, a scaling factor, and a displacement. When an index register is 
used, the modR/M byte also is followed by another byte to specify the index register and 
scaling factor. This form of addressing is the most flexible. 

2. A few instructions use implied address modes: 

A MOV instruction with the AL, AX, or EAX register as either source or destination can 
address memory with a doubleword encoded in the instruction. This special form of the 
MOV instruction allows no base register, index register, or scaling factor to be used. This 
form is one byte shorter than the general-purpose form. 

String operations address memory in the DS segment using the ESI register, (the MOVS, 
CMPS, OUTS, and LODS instructions) or using the ES segment and EDI register (the 
MOVS, CMPS, INS, SCAS, and STOS instructions). 

Stack operations address memory in the SS segment using the ESP register (the PUSH, 
POP, PUSHA, PUSHAD, POPA, POPAD, PUSHF, PUSHFD, POPF, POPFD, CALL, 
LEAVE, ENTER, INT, RET, IRET, and IRETD instructions, exceptions, and interrupts). 

3.5.3.1 . SEGMENT SELECTION 

Explicit specification of a segment is optional. If a segment is not specified using a segment- 
override prefix, the processor automatically chooses a segment according to the rules of Table 
3-3. 



3-19 



BASIC PROGRAMMING MODEL 




Table 3-3. Default Segment Selection Rules 



Type of Reference 


Segment Used 
Register Used 


Default Selection Rule 


Instructions 
Stack 
Local Data 
Destination Strings 


Code Segment 
CS register 

Stack Segment 
SS register 

Data Segment 
DS register 

E-Space Segment 
ES register 


Automatic with instruction fetch. 

All stack pushes and pops. Any memory reference which 
uses ESP or EBP as a base register. 

All data references except when relative to stack or string 
destination. 

Destination of string instructions. 



Different kinds of memory access have different default segments. Data operands usually use 
the main data segment (the DS segment). However, the ESP and EBP registers are used for 
addressing the stack, so when either register is used, the stack segment (the SS segment) is 
selected. 



Segment-override prefixes are provided for each of the segment registers. Only the following 
special cases have a default segment selection which is not affected by a segment-override 
prefix: 

• Destination strings in string instructions use the ES segment 

• Destination of a push or source of a pop uses the SS segment 

• Instruction fetches use the CS segment 

3.5.3.2. EFFECTIVE-ADDRESS COMPUTATION 

The modR/M byte provides the most flexible form of addressing. Instructions which have a 
modR/M byte after the opcode are the most common in the instruction set. For memory 
operands specified by a modR/M byte, the offset within the selected segment is the sum of four 
components: 

• A displacement 

• A base register 

• An index register 

• A scaling factor (the index register can be multiplied by a factor of 2, 4, or 8) 

The offset which results from adding these components is called an effective address. Each of 
these components can have either a positive or negative value, with the exception of the 
scaling factor. Figure 3-10 illustrates the full set of possibilities for modR/M addressing. 

The displacement component, because it is encoded in the instruction, is useful for relative 
addressing by fixed amounts, such as: 

• Location of simple scalar operands. 

• Beginning of a statically allocated array. 



3-20 



BASIC PROGRAMMING MODEL 



• Offset to a field within a record. 

The base and index components have similar functions. Both use the same set of general 
registers. Both can be used for addressing which changes during program execution, such as: 

• Location of procedure parameters and local variables on the stack. 

• The beginning of one record among several occurrences of the same record type or in an 
array of records. 

• The beginning of one dimension of multiple dimension array. 

• The beginning of a dynamically allocated array. 

The uses of general registers as base or index components differ in the following respects: 

• The ESP register cannot be used as an index register. 

• When the ESP or EBP register is used as the base, the SS segment is the default selection. 
In all other cases, the DS segment is the default selection. 

SEGMENT + BASE + (INDEX * SCALE) + DISPLACEMENT 



f \ 

cs 
ss 

DS 
ES 
FS 
GS 

V 



^EAX^ 

ECX 

EDX 

EBX 

ESP 

EBP 

ESI 
* EDI , 
V J 



f A 

EAX 
ECX 
EDX 
EBX 
EBP 
ESI 
EDI 
V J 



f 



V J 



NO DISPLACEMENT 
8-BIT DISPLACEMENT 
32-BIT DISPLACEMENT 



V 



APM42 



Figure 3-10. Effective Address Computation 



The scaling factor permits efficient indexing into an array when the array elements are 2, 4, or 
8 bytes. The scaling of the index register is done in hardware at the time the address is 
evaluated. This eliminates an extra shift or multiply instruction. 

The base, index, and displacement components can be used in any combination; any of these 
components can be null. A scale factor can be used only when an index also is used. Each 
possible combination is useful for data structures commonly used by programmers in high- 
level languages and assembly language. Suggested uses for some combinations of address 
components are described below. 



DISPLACEMENT 

The displacement alone indicates the offset of the operand. This form of addressing is used to 
access a statically allocated scalar operand. A byte, word, or doubleword displacement can be 
used. 



i 



3-21 



BASIC PROGRAMMING MODEL 



BASE 

The offset to the operand is specified indirectly in one of the general registers, as for "based" 
variables. 

BASE + DISPLACEMENT 

A register and a displacement can be used together for two distinct purposes: 

1. Index into an array when the element size is not 2, 4, or 8 bytes. The displacement 
component encodes the offset of the beginning of the array. The register holds the results 
of a calculation to determine the offset to a specific element within the array. 

2. Access a field of a record. The base register holds the address of the beginning of the 
record, while the displacement is an offset to the field. 

An important special case of this combination is access to parameters in a procedure activation 
record. A procedure activation record is the stack frame created when a subroutine is entered. 
In this case, the EBP register is the best choice for the base register, because it automatically 
selects the stack segment. This is a compact encoding for this common function. 

(INDEX * SCALE) + DISPLACEMENT 

This combination is an efficient way to index into a static array when the element size is 2, 4, 
or 8 bytes. The displacement addresses the beginning of the array, the index register holds the 
subscript of the desired array element, and the processor automatically converts the subscript 
into an index by applying the scaling factor. 

BASE + INDEX + DISPLACEMENT 

Two registers used together support either a two-dimensional array (the displacement holds the 
address of the beginning of the array) or one of several instances of an array of records (the 
displacement is an offset to a field within the record). 

BASE + (INDEX * SCALE) + DISPLACEMENT 

This combination provides efficient indexing of a two-dimensional array when the elements of 
the array are 2, 4, or 8 bytes in size. 

3.6. INTERRUPTS AND EXCEPTIONS 

The processor has two mechanisms for interrupting program execution: 

1. Exceptions are synchronous events which are responses of the processor to certain 
conditions detected during the execution of an instruction. 

2. Interrupts are asynchronous events typically triggered by external devices needing 
attention. 

Interrupts and exceptions are alike in that both cause the processor to temporarily suspend the 
3-22 ■ 



BASIC PROGRAMMING MODEL 



program being run in order to run a program of higher priority. The major distinction between 
these two kinds of interrupts is their origin. An exception is always reproducible by re- 
executing the program which caused the exception, while an interrupt can have a complex, 
timing-dependent relationship with programs. 

Application programmers normally are not concerned with handling exceptions or interrupts. 
The operating system, monitor, or device driver handles them. More information on interrupts 
for system programmers can be found in Chapter 12. Certain kinds of exceptions, however, are 
relevant to application programming, and many operating systems give application programs 
the opportunity to service these exceptions. However, the operating system defines the 
interface between the application program and the exception mechanism of the processor. 
Table 3-4 lists the interrupts and exceptions. 

• A divide-error exception results when the DIV or IDIV instruction is executed with a zero 
denominator or when the quotient is too large for the destination operand. (See Chapter 3 
for more information on the DIV and IDIV instructions.) 

• A debug exception can be sent back to an application program if it results from the TF 
(trap) flag. 

• A breakpoint exception results when an INT3 instruction is executed. This instruction is 
used by some debuggers to stop program execution at specific points. 

• An overflow exception results when the INTO instruction is executed and the OF 
(overflow) flag is set. See Chapter 3 for a discussion of the INTO instruction. 

• A bounds-check exception results when the BOUND instruction is executed with an array 
index which falls outside the bounds of the array. See Chapter 3 for a discussion of the 
BOUND instruction. 

• The device-not-available exception occurs whenever the processor encounters an escape 
instruction and either the TS (task switched) or the EM (emulate coprocessor) bit of the 
CRO control register is set. 

• An alignment-check exception is generated for unaligned memory operations in user mode 
(privilege level 3), provided both AM and AC are set. Memory operations at supervisor 
mode (privilege levels 0, 1, and 2), or memory operations which default to supervisor 
mode, do not generate this exception. 

The INT instruction generates an interrupt whenever it is executed; the processor treats this 
interrupt as an exception. Its effects (and the effects of all other exceptions) are determined by 
exception handler routines in the application program or the operating system. The INT 
instruction itself is discussed in Chapter 25. See Chapter 14 for a more complete description of 
exceptions. 



3-23 



BASIC PROGRAMMING MODEL 




Table 3-4. Exceptions and Interrupts 



Vector 




Number 


Description 





Divide Error 


1 


Debugger Call 


2 


NMI Interrupt 


3 


Breakpoint 


4 


INTO-detected Overflow 


5 


BOUND Range Exceeded 


6 


Invalid Opcode 


7 


Device Not Available 


8 


Double Fault 


9 


(Intel reserved. Do not use. 

Not used by Pentium™ processor.) 


10 


Invalid Task State Segment 


11 


Segment Not Present 


12 


Stack Exception 


13 


General Protection 


14 


Page Fault 


15 


(Intel reserved. Do not use.) 


16 


Floating-Point Error 


17 


Alignment Check 


18 


Machine Check Exception 


19-31 


(Intel reserved. Do not use.) 


32-255 


Maskable Interrupts 



3-24 



Intel 



Application 
Programming 



i 



CHAPTER 4 
APPLICATION PROGRAMMING 



This chapter is an overview of the integer instructions which programmers can use to write 
application software for the Pentium processor. The instructions are grouped by categories of 
related functions. Additional application instructions for operating on floating-point operands 
are described in Chapter 6. 

The instructions not discussed in this chapter or Chapter 6 normally are used only by 
operating-system programmers. System-level instructions are discussed in Part II. 

The instruction set descriptions in Chapter 25 contain more detailed information on all 
instructions, including encoding, operation, timing, effect on flags, and exceptions which may 
be generated. 

For information on the introduction of new instructions which may not be supported on earlier 
versions of x86 microprocessors, see Chapter 23. 



4.1 . DATA MOVEMENT INSTRUCTIONS 

These instructions provide convenient methods for moving bytes, words, doublewords, or 
quadwords between memory and the processor registers. They come in three types: 

1 . General-purpose data movement instructions. 

2. Stack manipulation instructions. 

3 . Type-conversion instructions . 



4.1 .1 . General-Purpose Data Movement Instructions 

MOV (Move) transfers a byte, word, or doubleword from the source operand to the 
destination operand. The MOV instruction is useful for transferring data along any of these 
paths: 

• To a register from memory. 

• To memory from a register. 

• Between general registers. 

• Immediate data to a register. 

• Immediate data to memory. 

The MOV instruction cannot move from memory to memory or from a segment register to a 
segment register. Memory-to-memory moves can be performed, however, by the string move 
instruction MO VS. A special form of the MOV instruction is provided for transferring data 
between the AL, AX, or EAX registers and a location in memory specified by a 32-bit offset 
encoded in the instruction. This form of the instruction does not allow a segment override, 

i 



APPLICATION PROGRAMMING 



index register, or scaling factor to be used. The encoding of this form is one byte shorter than 
the encoding of the general-purpose MOV instruction. A similar encoding is provided for 
moving an 8-, 16-, or 32-bit immediate into any of the general registers. 

XCHG (Exchange) swaps the contents of two operands. This instruction takes the place of 
three MOV instructions. It does not require a temporary location to save the contents of one 
operand while the other is being loaded. The XCHG instruction is especially useful for 
implementing semaphores or similar data structures for process synchronization. 

The XCHG instruction can swap two byte operands, two word operands, or two doubleword 
operands. The operands for the XCHG instruction may be two register operands, or a register 
operand and a memory operand. When used with a memory operand, XCHG automatically 
activates the LOCK signal. (See Chapter 16 for more information on bus locking.) 



4.1 .2. Stack Manipulation Instructions 

PUSH (Push) decrements the stack pointer (ESP register), then copies the source operand to 
the top of stack (see Figure 4-1). The PUSH instruction often is used to place parameters on 
the stack before calling a procedure. Inside a procedure, it can be used to reserve space on the 
stack for temporary variables. The PUSH instruction operates on memory operands, immediate 
operands, and register operands (including segment registers). A special form of the PUSH 
instruction is available for pushing a 32-bit general register on the stack. This form has an 
encoding which is one byte shorter than the general-purpose form. 





BEFORE PUSHING DOUBLEWORD 
31 




AFTER PUSHING DOUBLEWORD 
31 








<- ESP 




<— ESP 












DOUBLEWORD 




















APM27 



Figure 4-1. PUSH Instruction 



PUSHA (Push All Registers) saves the contents of the eight general registers on the stack (see 
Figure 4-2). This instruction simplifies procedure calls by reducing the number of instructions 
required to save the contents of the general registers. The processor pushes the general 
registers on the stack in the following order: EAX, ECX, EDX, EBX, the initial value of ESP 
before EAX was pushed, EBP, ESI, and EDI. The effect of the PUSHA instruction is reversed 
using the POPA instruction. 



4-2 



I 



APPLICATION PROGRAMMING 



POP (Pop) transfers the word or doubleword at the current top of stack (indicated by the ESP 
register) to the destination operand, and then increments the ESP register to point to the new 
top of stack. See Figure 4-3. POP moves information from the stack to a general register, 
segment register, or to memory. A special form of the POP instruction is available for popping 
a doubleword from the stack to a general register. This form has an encoding which is one byte 
shorter than the general-purpose form. 

POPA (Pop All Registers) pops the data saved on the stack by PUSHA into the general 
registers, except for the ESP register. The ESP register is restored by the action of reading the 
stack (popping). See Figure 4-4. 





BEFORE PUSHA INSTRUCTION 
31 




AFTER PUSHA INSTRUCTION 
31 








<- ESP 




<— ESP 












EAX 




ECX 




EDX 




EBX 




OLD ESP 




EBP 




ESI 




EDI 
















APM28 



Figure 4-2. PUSHA Instruction 



I 



4-3 



APPLICATION PROGRAMMING 





BEFORE POPPING A DOUBLEWORD 
31 




AFTER POPPING A DOUBLEWORD 
31 








<— ESP 




<— ESP 










DOUBLEWORD 






















APM25 



Figure 4-3. POP Instruction 





BEFORE POPA INSTRUCTION 
31 




AFTER POPA INSTRUCTION 
31 








<- ESP 




<- ESP 










EAX 




ECX 




EDX 




EBX 




IGNORED 




EBP 




ESI 




EDI 


















APM26 



Figure 4-4. POPA Instruction 



4-4 



I 



APPLICATION PROGRAMMING 



4.1 .3. Type Conversion Instructions 

The type conversion instructions convert bytes into words, words into double words, and 
double words into 64-bit quantities (called quadwords). These instructions are especially useful 
for converting signed integers, because they automatically fill the extra bits of the larger item 
with the value of the sign bit of the smaller item. This results in an integer of the same sign and 
magnitude, but a larger format. This kind of conversion, shown in Figure 4-5, is called sign 
extension. 

There are two kinds of type conversion instructions: 

• The CWD, CDQ, CBW, and CWDE instructions which only operate on data in the EAX 
register. 

• The MOVSX and MOVZX instructions, which permit one operand to be in a general 
register while letting the other operand be in memory or a register. 



































15 



































































S 


N 


N 


N 


N 


N 


N 


N 


N 


N 


N 


N 


N 


N 


N 


N 


BEFORE SIGN 
EXTENSION 


31 
































15 





































S 


S 


S 


S 


S 


S 


S 


S 


S 


s 


s 


s 


s 


s 


s 


s 


S 


N 


N 


N 


N 


N 


N 


N 


N 


N 


N 


N 


N 


N 


N 


N 


AFTER SIGN 
EXTENSION 


































































APM38 



Figure 4-5. Sign Extension 



CWD (Convert Word to Doubleword) and CDQ (Convert Doubleword to Quad-Word) 

double the size of the source operand. The CWD instruction copies the sign (bit 15) of the 
word in the AX register into every bit position in the DX register. The CDQ instruction copies 
the sign (bit 31) of the doubleword in the EAX register into every bit position in the EDX 
register. The CWD instruction can be used to produce a doubleword dividend from a word 
before a word division, and the CDQ instruction can be used to produce a quadword dividend 
from a doubleword before doubleword division. The CWD and CDQ instructions are different 
mnemonics for the same opcode. Which one gets executed is determined by whether it is in a 
16- or 32-bit segment and the presence of any operand-size override prefixes. See Chapter 25 
for a detailed description of these instructions. 

CBW (Convert Byte to Word) copies the sign (bit 7) of the byte in the AL register into every 
bit position of the upper byte of the AX register. 

CWDE (Convert Word to Doubleword Extended) copies the sign (bit 15) of the word in the 



I 



4-5 



APPLICATION PROGRAMMING 



AX register into every bit position of the high word of the EAX register. 

MOVSX (Move with Sign Extension) extends an 8-bit value to a 16-bit value or an 8- or 16- 
bit value to 32-bit value by using the value of the sign to fill empty positions. 

MOVZX (Move with Zero Extension) extends an 8-bit value to a 16-bit value or an 8- or 16- 
bit value to 32-bit value by clearing the empty bit positions. 



4.2. BINARY ARITHMETIC INSTRUCTIONS 

The arithmetic instructions operate on numeric data encoded in binary. Operations include the 
add, subtract, multiply, and divide as well as increment, decrement, compare, and change sign 
(negate). Both signed and unsigned binary integers are supported. The binary arithmetic 
instructions may also be used as steps in arithmetic on decimal integers. Source operands can 
be immediate values, general registers, or memory. Destination operands can be general 
registers or memory (except when the source operand is in memory). The basic arithmetic 
instructions have special forms for using an immediate value as the source operand and the 
AL, AX, or EAX registers as the destination operand. These forms are one byte shorter than 
the general-purpose arithmetic instructions. 

The arithmetic instructions update the ZF, CF, SF, and OF flags to report the kind of result 
which was produced. The kind of instruction used to test the flags depends on whether the data 
is being interpreted as signed or unsigned. The CF flag contains information relevant to 
unsigned integers; the SF and OF flags contain information relevant to signed integers. The ZF 
flag is relevant to both signed and unsigned integers; the ZF flag is set when all bits of the 
result are clear. 

Arithmetic instructions operate on 8-, 16-, or 32-bit data. The flags are updated to reflect the 
size of the operation. For example, an 8-bit ADD instruction sets the CF flag if the sum of the 
operands exceeds 255 (decimal). 

If the integer is unsigned, the CF flag may be tested after one of these arithmetic operations to 
determine whether the operation required a carry or borrow to be propagated to the next stage 
of the operation. The CF flag is set if a carry occurs (addition instructions ADD, ADC, AAA, 
and DAA) or borrow occurs (subtraction instructions SUB, SBB, A AS, DAS, CMP, and 
NEG). 

The INC and DEC instructions do not change the state of the CF flag. This allows the 
instructions to be used to update counters used for loop control without changing the reported 
state of arithmetic results. To test the arithmetic state of the counter, the ZF flag can be tested 
to detect loop termination, or the ADD and SUB instructions can be used to update the value 
held by the counter. 

The SF and OF flags support signed integer arithmetic. The SF flag has the value of the sign 
bit of the result. The most significant bit (MSB) of the magnitude of a signed integer is the bit 
next to the sign — bit 6 of a byte, bit 14 of a word, or bit 30 of a doubleword. The OF flag is set 
in either of these cases: 

• A carry was generated from the MSB into the sign bit but no carry was generated out of 
the sign bit (addition instructions ADD, ADC, INC, AAA, and DAA). In other words, the 
result was greater than the greatest positive number which could be represented in two's 
complement form. 



4-6 



I 



APPLICATION PROGRAMMING 



• A carry was generated from the sign bit into the MSB but no carry was generated into the 
sign bit (subtraction instructions SUB, SBB, DEC, A AS, DAS, CMP, and NEG). In other 
words, the result was smaller than the smallest negative number which could be 
represented in two's complement form. 

These status flags are tested by either kind of conditional instruction: Jcc (J um P on condition 
cc) or SETcc (byte set on condition). 

4.2.1. Addition and Subtraction Instructions 

ADD (Add Integers) replaces the destination operand with the sum of the source and 
destination operands. The OF, SF, ZF, AF, PF, and CF flags are affected. 

ADC (Add Integers with Carry) replaces the destination operand with the sum of the source 
and destination operands, plus 1 if the CF flag is set. If the CF flag is clear, the ADC 
instruction performs the same operation as the ADD instruction. An ADC instruction is used to 
propagate carry when adding numbers in stages, for example when using 32-bit ADD 
instructions to sum quadword operands. The OF, SF, ZF, AF, PF, and CF flags are affected. 

INC (Increment) adds 1 to the destination operand. The INC instruction preserves the state of 
the CF flag. This allows the use of INC instructions to update counters in loops without 
disturbing the status flags resulting from an arithmetic operation used for loop control. The ZF 
flag can be used to detect when carry would have occurred. Use an ADD instruction with an 
immediate value of 1 to perform an increment which updates the CF flag. A one-byte form of 
this instruction is available when the operand is a general register. The OF, SF, ZF, AF, and PF 
flags are affected. 

SUB (Subtract Integers) subtracts the source operand from the destination operand and 
replaces the destination operand with the result. If a borrow is required, the CF flag is set. The 
operands may be signed or unsigned bytes, words, or double words. The OF, SF, ZF, AF, PF, 
and CF flags are affected. 

SBB (Subtract Integers with Borrow) subtracts the source operand from the destination 
operand and replaces the destination operand with the result, minus 1 if the CF flag is set. If 
the CF flag is clear, the SBB instruction performs the same operation as the SUB instruction. 
An SBB instruction is used to propagate borrow when subtracting numbers in stages, for 
example when using 32-bit SUB instructions to subtract one quadword operand from another. 
The OF, SF, ZF, AF, PF, and CF flags are affected. 

DEC (Decrement) subtracts 1 from the destination operand. The DEC instruction preserves 
the state of the CF flag. This allows the use of the DEC instruction to update counters in loops 
without disturbing the status flags resulting from an arithmetic operation used for loop control. 
Use a SUB instruction with an immediate value of 1 to perform a decrement which updates the 
CF flag. A one-byte form of this instruction is available when the operand is a general register. 
The OF, SF, ZF, AF, and PF flags are affected. 



4.2.2. Comparison and Sign Change Instruction 

CMP (Compare) subtracts the source operand from the destination operand. It updates the 
OF, SF, ZF, AF, PF, and CF flags, but does not modify the source or destination operands. A 



I 



4-7 



APPLICATION PROGRAMMING 



subsequent Jcc or SETcc instruction can test the flags. 

NEG (Negate) subtracts a signed integer operand from zero. The effect of the NEG instruction 
is to change the sign of a two's complement operand while keeping its magnitude. The OF, SF, 
ZF, AF, PF, and CF flags are affected. 



4.2.3. Multiplication Instructions 

The processor has separate multiply instructions for unsigned and signed operands. The MUL 
instruction operates on unsigned integers, while the IMUL instruction operates on signed 
integers as well as unsigned. 

MUL (Unsigned Integer Multiply) performs an unsigned multiplication of the source 
operand and the AL, AX, or EAX register. If the source is a byte, the processor multiplies it by 
the value held in the AL register and returns the double-length result in the AH and AL 
registers. If the source operand is a word, the processor multiplies it by the value held in the 
AX register and returns the double-length result in the DX and AX registers. If the source 
operand is a doubleword, the processor multiplies it by the value held in the EAX register and 
returns the quadword result in the EDX and EAX registers. The MUL instruction sets the CF 
and OF flags when the upper half of the result is non-zero; otherwise, the flags are cleared. The 
state of the SF, ZF, AF, and PF flags is undefined. 

IMUL (Signed Integer Multiply) performs a signed multiplication operation. The IMUL 
instruction has three forms: 

1. A one-operand form. The operand may be a byte, word, or doubleword located in memory 
or in a general register. This instruction uses the EAX and EDX (or AX and DX) registers 
as implicit operands in the same way as the MUL instruction. 

2. A two-operand form. One of the source operands is in a general register while the other 
may be in a general register or memory. The result replaces the general-register operand. 

3. A three-operand form; two are source operands and one is the destination. One of the 
source operands is an immediate value supplied by the instruction; the second may be in 
memory or in a general register. The result is stored in a general register. The immediate 
operand is a two's complement signed integer. If the immediate operand is a byte, the 
processor automatically sign-extends it to the size of the second operand before 
performing the multiplication. 

The three forms are similar in most respects: 

• The length of the product is calculated to twice the length of the operands. 

• The CF and OF flags are set when significant bits are carried into the upper half of the 
result. The CF and OF flags are cleared when the upper half of the result is the sign- 
extension of the lower half. The state of the SF, ZF, AF, and PF flags is undefined. 

However, forms 2 and 3 differ from 1 because the product is truncated to the length of the 
operands before it is stored in the destination register. Because of this truncation, the OF flag 
should be tested to ensure that no significant bits are lost. (For ways to test the OF flag, see the 
JO, INTO, and PUSHF instructions.) 

Forms 2 and 3 of IMUL also may be used with unsigned operands because, whether the 
4-8 | 



APPLICATION PROGRAMMING 



operands are signed or unsigned, the lower half of the product is the same. The CF and OF 
flags, however, cannot be used to determine if the upper half of the result is non-zero. 



4.2.4. Division Instructions 

The Pentium processor has separate division instructions for unsigned and signed operands. 
The DIV instruction operates on unsigned integers, while the IDIV instruction operates on both 
signed and unsigned integers. In either case, a divide-error exception is generated if the divisor 
is zero or if the quotient is too large for the AL, AX, or EAX register. 

DIV (Unsigned Integer Divide) performs an unsigned division of the AL, AX, or EAX 

register by the source operand. The dividend (the accumulator) is twice the size of the divisor 
(the source operand); the quotient and remainder have the same size as the divisor, as shown in 
Table 4-1. 

Non-integral results are truncated toward 0. The remainder is always smaller than the divisor. 
For unsigned byte division, the largest quotient is 255. For unsigned word division, the largest 
quotient is 65,535. For unsigned doubleword division the largest quotient is 2 32 -l. The state of 
the OF, SF, ZF, AF, PF, and CF flags is undefined. 

IDIV (Signed Integer Divide) performs a signed division of the accumulator by the source 
operand. The IDIV instruction uses the same registers as the DIV instruction. 

For signed byte division, the maximum positive quotient is +127, and the minimum negative 
quotient is -128. For signed word division, the maximum positive quotient is +32,767, and the 
minimum negative quotient is -32,768. For signed doubleword division the maximum positive 
quotient is 2 31 -1, the minimum negative quotient is -2 31 . Non-integral results are truncated 
towards 0. The remainder always has the same sign as the dividend and is less than the divisor 
in magnitude. The state of the OF, SF, ZF, AF, PF, and CF flags is undefined. 



Table 4-1 . Operands for Division 



Operand Size 
(Divisor) 


Dividend 


Quotient 


Remainder 


Byte 
Word 

Doubleword 


AX register 
DX and AX 
EDX and EAX 


AL register 
AX register 
EAX register 


AH register 
DX register 
EDX register 



4.3. DECIMAL ARITHMETIC INSTRUCTIONS 

Decimal arithmetic is performed by combining the binary arithmetic instructions (already 
discussed in the prior section) with the decimal arithmetic instructions. The decimal arithmetic 
instructions are used in one of the following ways: 

• To adjust the results of a previous binary arithmetic operation to produce a valid packed or 
unpacked decimal result. 

• To adjust the inputs to a subsequent binary arithmetic operation so that the operation will 
produce a valid packed or unpacked decimal result. 

■ 4-9 



APPLICATION PROGRAMMING 



These instructions operate only on the AL or AH registers. Most use the AF flag. 



4.3.1. Packed BCD Adjustment Instructions 

DAA (Decimal Adjust after Addition) adjusts the result of adding two valid packed decimal 
operands in the AL register. A DAA instruction must follow the addition of two pairs of 
packed decimal numbers (one digit in each half-byte) to obtain a pair of valid packed decimal 
digits as results. The CF flag is set if a carry occurs. The SF, ZF, AF, PF, and CF flags are 
affected. The state of the OF flag is undefined. 

DAS (Decimal Adjust after Subtraction) adjusts the result of subtracting two valid packed 
decimal operands in the AL register. A DAS instruction must always follow the subtraction of 
one pair of packed decimal numbers (one digit in each half-byte) from another to obtain a pair 
of valid packed decimal digits as results. The CF flag is set if a borrow is needed. The SF, ZF, 
AF, PF, and CF flags are affected. The state of the OF flag is undefined. 



4.3.2. Unpacked BCD Adjustment Instructions 

AAA (ASCII Adjust after Addition) changes the contents of the AL register to a valid 
unpacked decimal number, and clears the upper 4 bits. An AAA instruction must follow the 
addition of two unpacked decimal operands in the AL register. The CF flag is set and the 
contents of the AH register are incremented if a carry occurs. The AF and CF flags are 
affected. The state of the OF, SF, ZF, and PF flags is undefined. 

A AS (ASCII Adjust after Subtraction) changes the contents of the AL register to a valid 
unpacked decimal number, and clears the upper 4 bits. An A AS instruction must follow the 
subtraction of one unpacked decimal operand from another in the AL register. The CF flag is 
set and the contents of the AH register are decremented if a borrow is needed. The AF and CF 
flags are affected. The state of the OF, SF, ZF, and PF flags is undefined. 

AAM (ASCII Adjust after Multiplication) corrects the result of a multiplication of two valid 
unpacked decimal numbers. An AAM instruction must follow the multiplication of two 
decimal numbers to produce a valid decimal result. The upper digit is left in the AH register, 
the lower digit in the AL register. The SF, ZF, and PF flags are affected. The state of the AF, 
OF, and CF flags is undefined. 

AAD (ASCII Adjust before Division) modifies the numerator in the AH and AL registers to 
prepare for the division of two valid unpacked decimal operands, so that the quotient produced 
by the division will be a valid unpacked decimal number. The AH register should contain the 
upper digit and the AL register should contain the lower digit. This instruction adjusts the 
value and places the result in the AL register. The AH register will be clear. The SF, ZF, and 
PF flags are affected. The state of the AF, OF, and CF flags is undefined. 



4.4. LOGICAL INSTRUCTIONS 

The logical instructions have two operands. Source operands can be immediate values, general 
registers, or memory. Destination operands can be general registers or memory (except when 
the source operand is in memory). The logical instructions modify the state of the flags. Short 



4-10 



I 



APPLICATION PROGRAMMING 



forms of the instructions are available when an immediate source operand is applied to a 
destination operand in the AL, AX, or EAX registers. The group of logical instructions 
includes: 

• Boolean operation instructions. 

• Bit test and modify instructions. 

• Bit scan instructions. 

• Rotate and shift instructions. 

• Byte set on condition. 

4.4.1. Boolean Operation Instructions 

The logical operations are performed by the AND, OR, XOR, and NOT instructions. 

NOT (Not) inverts the bits in the specified operand to form a one's complement of the 
operand. The NOT instruction is a unary operation which uses a single operand in a register or 
memory. NOT has no effect on the flags. 

The AND, OR, and XOR instructions perform the standard logical operations "and", "or", 
and "exclusive or". These instructions can use the following combinations of operands: 

• Two register operands. 

• A general register operand with a memory operand. 

• An immediate operand with either a general register operand or a memory operand. 

The AND, OR, and XOR instructions clear the OF and CF flags, leave the AF flag undefined, 
and update the SF, ZF, and PF flags. 

4.4.2. Bit Test and Modify Instructions 

This group of instructions operates on a single bit which can be in memory or in a general 
register. The location of the bit is specified as an offset from the low end of the operand. The 
value of the offset either may be given by an immediate byte in the instruction or may be 
contained in a general register. 

These instructions first assign the value of the selected bit to the CF flag. Then a new value is 
assigned to the selected bit, as determined by the operation. The state of the OF, SF, ZF, AF, 
and PF flags is undefined. Table 4-2 defines these instructions. 



Table 4-2. Bit Test and Modify instructions 



Instruction 


Effect on CF Flag 


Effect on Selected Bit 


BT (Bit Test) 

BTS (Bit Test and Set) 

BTR (Bit Test and Reset) 

BTC (Bit Test and Complement) 


CF flag <- Selected Bit 
CF flag <- Selected Bit 
CF flag <- Selected Bit 
CF flag <- Selected Bit 


no effect 

Selected Bit <- 1 

Selected Bit <- 

Selected Bit <- - (Selected Bit) 



I 



4-11 



APPLICATION PROGRAMMING 



4.4.3. Bit Scan Instructions 

These instructions scan a word or doubleword for a set bit and store the bit index (an integer 
representing the bit position) of the first set bit into a register. The bit string being scanned may 
be in a register or in memory. The ZF flag is set if the entire word is clear, otherwise the ZF 
flag is cleared. In the former case, the value of the destination register is left undefined. The 
state of the OF, SF, AF, PF, and CF flags is undefined. 

BSF (Bit Scan Forward) scans low-to-high (from bit toward the upper bit positions). 
BSR (Bit Scan Reverse) scans high-to-low (from the uppermost bit toward bit 0). 

4.4.4. Shift and Rotate Instructions 

The shift and rotate instructions rearrange the bits within an operand. 
These instructions fall into the following classes: 

• Shift instructions. 

• Double shift instructions. 

• Rotate instructions. 



4.4.4.1 . SHIFT INSTRUCTIONS 

Shift instructions apply an arithmetic or logical shift to bytes, words, and double words. An 
arithmetic shift right copies the sign bit into empty bit positions on the upper end of the 
operand, while a logical shift right fills high order empty bit positions with zeros. An 
arithmetic shift is a fast way to perform a simple calculation. For example, an arithmetic shift 
right by one bit position divides an integer by two. A logical shift right divides an unsigned 
integer or a positive integer, but a signed negative integer loses its sign bit. 

The arithmetic and logical shift right instructions, SAR and SHR, differ only in their treatment 
of the bit positions emptied by shifting the contents of the operand. Note that there is no 
difference between an arithmetic shift left and a logical shift left. Two names, SAL and SHL, 
are supported for this instruction in the assembler. 

A count specifies the number of bit positions to shift an operand. Bits can be shifted up to 3 1 
places. A shift instruction can give the count in any of three ways. One form of shift 
instruction always shifts by one bit position. The second form gives the count as an immediate 
operand. The third form gives the count as the value contained in the CL register. This last 
form allows the count to be a result from a calculation. Only the low five bits of the CL 
register are used. 

When the number of bit positions to shift is zero, no flags are affected. Otherwise, the CF flag 
is left with the value of the last bit shifted out of the operand. In a single-bit shift, the OF flag 
is set if the value of the uppermost bit (sign bit) was changed by the operation. Otherwise, the 
OF flag is cleared. After a shift of more than one bit position, the state of the OF flag is 
undefined. On a shift of one or more bit positions, the SF, ZF, PF, and CF flags are affected. 
On a shift of one or more bit positions the state of the AF flag is undefined. If the count length 

4-12 ■ 



APPLICATION PROGRAMMING 



is greater than or equal to the size of the operand, the value of the CF flag is undefined. 

SAL (Shift Arithmetic Left) shifts the destination byte, word, or doubleword operand left by 
one bit position or by the number of bits specified in the count operand (an immediate value or 
a value contained in the CL register). Empty bit positions are cleared. See Figure 4-6. 

SHL (Shift Logical Left) is another name for the SAL instruction. It is supported in the 
assembler. 

SHR (Shift Logical Right) shifts the destination byte, word, or doubleword operand right by 
one bit position or by the number of bits specified in the count operand (an immediate value or 
a value contained in the CL register). Empty bit positions are cleared. See Figure 4-7. 

SAR (Shift Arithmetic Right) shifts the destination byte, word, or doubleword operand to the 
right by one bit position or by the number of bits specified in the count operand (an immediate 
value or a value contained in the CL register). The sign of the operand is preserved by clearing 
empty bit positions if the operand is positive, or setting the empty bits if the operand is 
negative. See Figure 4-8. 

Even though this instruction can be used to divide integers by an integer power of two, the 
type of division is not the same as that produced by the IDIV instruction. The quotient 
from the IDIV instruction is rounded toward zero, whereas the "quotient" of the SAR 
instruction is rounded toward negative infinity. This difference is apparent only for negative 
numbers. For example, when the IDIV instruction is used to divide -9 by 4, the result is -2 
with a remainder of -1. If the SAR instruction is used to shift -9 right by two bits, the result is 
-3. The "remainder" of this kind of division is +3; however, the SAR instruction stores only 
the high-order bit of the remainder (in the CF flag). 



INITIAL STATE: 

CF OPERAND 



X 10001000100010001000100010001111 



AFTER 1-BIT SHL/SAL INSTRUCTION: 



00010001000100 1 0001000100011110 



AFTER 10-BIT SHL/SAL INSTRUCTION: 



I 0|^| 00100010001000100011110000000000 



APM34 



Figure 4-6. SHL/SAL Instruction 



I 



4-13 



APPLICATION PROGRAMMING 



INITIAL STATE: 



OPERAND 



10001000100010001000100010001111 



CF 

□ 



AFTER 1-BIT SHR INSTRUCTION: 



01000100010001000100010001000111 



AFTER 10-BIT SHR INSTRUCTION: 



— ^ 00000000001000100010001000100010 



APM36 



Figure 4-7. SHR Instruction 



INITIAL STATE (POSITIVE OPERAND): 



OPERAND 



01000100010001000100010001000111 



CF 

□ 



AFTER 1-BIT SAR INSTRUCTION: 



00 100010001000100010001000100011 



INITIAL STATE (NEGATIVE OPERAND): 



CF 



11000100010001000100010001000111 X 



AFTER 1-BIT SAR INSTRUCTION 



11100010001000100010001000100011 



Figure 4-8. SAR Instruction 



4-14 



I 



APPLICATION PROGRAMMING 



4.4.4.2. DOUBLE-SHIFT INSTRUCTIONS 

These instructions provide the basic operations needed to implement operations on long 
unaligned bit strings. The double shifts operate either on word or double word operands, as 
follows: 

• Take two word operands and produce a one-word result (32-bit shift). 

• Take two double word operands and produce a double word result (64-bit shift). 

Of the two operands, the source operand must be in a register while the destination operand 
may be in a register or in memory. The number of bits to be shifted is specified either in the 
CL register or in an immediate byte in the instruction. Bits shifted out of the source operand 
fill empty bit positions in the destination operand, which also is shifted. Only the destination 
operand is stored. 

When the number of bit positions to shift is zero, no flags are affected. Otherwise, the CF flag 
is set to the value of the last bit shifted out of the destination operand, and the SF, ZF, and PF 
flags are affected. On a shift of one bit position, the OF flag is set if the sign of the operand 
changed, otherwise it is cleared. For shifts of more than one bit position, the state of the OF 
flag is undefined. For shifts of one or more bit positions, the state of AF flag is undefined. 

SHLD (Shift Left Double) shifts bits of the destination operand to the left, while filling empty 
bit positions with bits shifted out of the source operand (see Figure 4-9). The result is stored 
back into the destination operand. The source operand is not modified. 

SHRD (Shift Right Double) shifts bits of the destination operand to the right, while filling 
empty bit positions with bits shifted out of the source operand (see Figure 4-10). The result is 
stored back into the destination operand. The source operand is not modified. 



CF 



31 o 

<- 



DESTINATION (MEMORY OR REGISTER) 



31 

SOURCE (REGISTER) 



Figure 4-9. SHLD Instruction 



I 



4-15 



APPLICATION PROGRAMMING 



31 











SOURCE (REGISTER) 




















31 









DESTINATION (MEMORY OR REGISTER) 


— > 



APM37 



Figure 4-10. SHRD Instruction 



4.4.4.3. ROTATE INSTRUCTIONS 

Rotate instructions apply a circular permutation to bytes, words, and doublewords. Bits rotated 
out of one end of an operand enter through the other end. Unlike a shift, no bits are emptied 
during a rotation. 

Rotate instructions use only the CF and OF flags. The CF flag may act as an extension of the 
operand in two of the rotate instructions, allowing a bit to be isolated and then tested by a 
conditional jump instruction (JC or JNC). The CF flag always contains the value of the last bit 
rotated out of the operand, even if the instruction does not use the CF flag as an extension of 
the operand. The state of the SF, ZF, AF, and PF flags is not affected. 

In a single-bit rotation, the OF flag is set if the operation changes the uppermost bit (sign bit) 
of the destination operand. If the sign bit retains its original value, the OF flag is cleared. After 
a rotate of more than one bit position, the value of the OF flag is undefined. 

ROL (Rotate Left) rotates the byte, word, or doubleword destination operand left by one bit 
position or by the number of bits specified in the count operand (an immediate value or a value 
contained in the CL register). For each bit position of the rotation, the bit which exits from the 
left of the operand returns at the right. See Figure 4-11. 

ROR (Rotate Right) rotates the byte, word, or doubleword destination operand right by one 
bit position or by the number of bits specified in the count operand (an immediate value or a 
value contained in the CL register). For each bit position of the rotation, the bit which exits 
from the right of the operand returns at the left. See Figure 4-12. 

RCL (Rotate Through Carry Left) rotates bits in the byte, word, or doubleword destination 
operand left by one bit position or by the number of bits specified in the count operand (an 
immediate value or a value contained in the CL register). 

This instruction differs from ROL in that it treats the CF flag as a one-bit extension on the 
upper end of the destination operand. Each bit which exits from the left side of the operand 
moves into the CF flag. At the same time, the bit in the CF flag enters the right side. See Figure 
4-13. 

RCR (Rotate Through Carry Right) rotates bits in the byte, word, or doubleword destination 
operand right by one bit position or by the number of bits specified in the count operand (an 
immediate value or a value contained in the CL register). 



4-16 



I 



APPLICATION PROGRAMMING 



This instruction differs from ROR in that it treats CF as a one-bit extension on the lower end of 
the destination operand. Each bit which exits from the right side of the operand moves into the 
CF flag. At the same time, the bit in the CF flag enters the left side. See Figure 4-14. 



CF 









31 



DESTINATION (MEMORY OR REGISTER) 



APM31 



Figure 4-1 1 . ROL Instruction 



31 



DESTINATION (MEMORY OR REGISTER) 



|-r> 



CF 



APM32 



Figure 4-12. ROR Instruction 




Figure 4-13. RCL Instruction 





31 









DESTINATION (MEMORY OR REGISTER) 


^ CF 









APM30 



Figure 4-14. RCR Instruction 



4.4.4.4. FAST "bit bit" USING DOUBLE-SHIFT INSTRUCTIONS 

One purpose of the double shift instructions is to implement a bit string move, with arbitrary 
■ 4-17 



APPLICATION PROGRAMMING 



misalignment of the bit strings. This is called a "bit bit" (BIT BLock Transfer). A simple 
example is to move a bit string from an arbitrary offset into a doubleword-aligned byte string. 
A left-to-right string is moved 32 bits at a time if a double shift is used inside the move loop. 

MOV ESI,ScrAddr 

MOV EDI,DestAddr 

MOV EBX,DWordCnt 

MOV CL, RelOf f set ; relative offset Dest-Src 

MOV EDX, [ESI] ; load first dword of source 

ADD ESI, 4 ; bump source address 



JNZ BltLoop 

This loop is simple, yet allows the data to be moved in 32-bit chunks for the highest possible 
performance. Without a double shift, the best which can be achieved is 16 bits per loop 
iteration by using a 32-bit shift, and replacing the XCHG instruction with a ROR instruction by 
16 to swap the high and low words of registers. A more general loop than shown above would 
require some extra masking on the first double word moved (before the main loop), and on the 
last doubleword moved (after the main loop), but would have the same 32-bits per loop 
iteration as the code above. 

4.4.4.5. FAST BIT STRING INSERT AND EXTRACT 

The double shift instructions also make possible: 

• Fast insertion of a bit string from a register into an arbitrary bit location in a larger bit 
string in memory, without disturbing the bits on either side of the inserted bits 

• Fast extraction of a bit string into a register from an arbitrary bit location in a larger bit 
string in memory, without disturbing the bits on either side of the extracted bits 

The following coded examples illustrate bit insertion and extraction under various conditions: 

1. Bit String Insertion into Memory (when the bit string is 1-25 bits long, i.e., spans four 
bytes or less): 

; Insert a right- justified bit string from a register into 
; a bit string in memory. 

; Assumptions: 

; 1. The base of the string array is doubleword aligned. 
; 2. The length of the bit string is an immediate value 
; and the bit offset is held in a register. 

; The ESI register holds the right- justified bit string 
; to be inserted. 

; The EDI register holds the bit offset of the start of the 



BltLoop : 
LODSD 

SHLD EDX,EAX,CL 
XCHG EDX,EAX 
STOSD 
DEC EBX 



new low order part in EAX 

EDX overwritten with aligned stuff 

Swap high and low dwords 

Write out next aligned chunk 

Decrement loop count 



4-18 



® 



APPLICATION PROGRAMMING 



substring . 

The EAX register and ECX are also used. 



MOV 



MOV 

SHR 

AND 

MOV 

ROR 

SHRD 

ROL 

ROL 



ECX, EDI 
EDI, 3 
CL, 7H 

EAX, [EDI] strg_base 
EAX,CL 

EAX, ESI, length 
EAX, length 
EAX,CL 

[EDI] strg_base , EAX 



save original offset 

divide offset by 8 (byte addr) 

get low three bits of offset 

move string dword into EAX 

right justify old bit field 

bring in new bits 

right justify new bit field 

bring to final position 

replace doubleword in memory 



2. Bit String Insertion into Memory (when the bit string is 1-31 bits long, i.e., spans five 
bytes or less): 

; Insert a right- justified bit string from a register into 
; a bit string in memory. 

; Assumptions: 

; 1. The base of the string array is doubleword aligned. 

; 2 . The length of the bit string is an immediate value 
; and the bit offset is held in a register. 

; The ESI register holds the right- justified bit string 

; to be inserted. 

; The EDI register holds the bit offset of the start of the 

; substring. 

; The EAX, EBX, ECX, and EDI registers also are used. 



MOV 


ECX, EDI 


temp storage for offset 


SHR 


EDI, 5 


divide offset by 32 (dwords) 


SHL 


EDI, 2 


multiply by 4 (byte address) 


AND 


CL, 1FH 


get low five bits of offset 


MOV 


EAX, [EDI] strgjoase 


move low string dword into EAX 


MOV 


EDX, [EDI] strg_base+4 


other string dword into EDX 


MOV 


EBX, EAX 


temp storage for part of string 


SHRD 


EAX, EDX, CL 


shift by offset within dword 


SHRD 


EAX, EBX, CL 


shift by offset within dword 


SHRD 


EAX, ESI, length 


bring in new bits 


ROL 


EAX, length 


right justify new bit field 


MOV 


EBX, EAX 


temp storage for string 


SHLD 


EAX, EDX, CL 


shift by offset within dword 


SHLD 


EDX, EBX, CL 


shift by offset within dword 


MOV 


[EDI] strg_base, EAX 


replace dword in memory 


MOV 


[EDI] strg_base+4,EDX 


replace dword in memory 



3. Bit String Insertion into Memory (when the bit string is exactly 32 bits long, i.e., spans 
four or five bytes): 

; Insert right- justified bit string from a register into 



I 



4-19 



APPLICATION PROGRAMMING 



a bit string in memory. 
Assumptions : 

1. The base of the string array is doubleword aligned. 

2. The length of the bit string is 32 bits 

and the bit offset is held in a register. 

The ESI register holds the 32-bit string to be inserted. 
The EDI register holds the bit offset to the start of the 
substring . 

The EAX, EBX, ECX, and EDI registers also are used. 



MOV 


EDX, EDI 


save original offset 


SHR 


EDI, 5 


divide offset by 32 (dwords) 


SHL 


EDI, 2 


multiply by 4 (byte address) 


AND 


CL, 1FH 


isolate low five bits of offset 


MOV 


EAX, [EDI] strg_base 


move low string dword into EAX 


MOV 


EDX, [EDI] strg_base+4 


other string dword into EDX 


MOV 


EBX , EAX 


temp storage for part of string 


SHRD 


EAX , EDX 


shift by offset within dword 


SHRD 


EDX, EBX 


• shift by offset within dword 


MOV 


EAX, ESI 


• move 32-bit field into position 


MOV 


EBX, EAX 


• temp storage for part of string 


SHLD 


EAX, EDX 


• shift by offset within dword 


SHLD 


EDX, EBX 


• shift by offset within dword 


MOV 


[EDI] strg_base , EAX 


• replace dword in memory 


MOV 


[EDI] strg_base, +4, EDX 


• replace dword in memory 



4. Bit string Extraction from Memory (when the bit string is 1-25 bits long, i.e., spans four 
bytes or less): 

; Extract a right- justified bit string into a register from 

; a bit string in memory. 

; Assumptions: 

; 1) The base of the string array is doubleword aligned. 

; 2) The length of the bit string is an immediate value 
; and the bit offset is held in a register. 

; The EAX register hold the right- justified, zero-padded 

; bit string that was extracted. 

; The EDI register holds the bit offset of the start of the 

; substring. 

; The EDI, and ECX registers also are used. 



MOV 


ECX, EDI 


temp storage for offset 


SHR 


EDI, 3 


divide offset by 8 (byte addr 


AND 


CL, 7H 


get low three bits of offset 


MOV 


EAX, [EDI] strgjoase 


move string dword into EAX 


SHR 


EAX,CL 


• shift by offset within dword 


AND 


EAX, mask 


• extracted bit field in EAX 



4-20 



APPLICATION PROGRAMMING 



5. Bit string Extraction from Memory (when bit string is 1-32 bits long, i.e., spans five bytes 
or less): 

; Extract a right- justified bit string into a register from 
; bit string in memory. 

; Assumptions: 

; 1) The base of the string array is doubleword aligned. 
; 2) The length of the bit string is an immediate 

value and the bit offset is held in a register. 

; The EAX register holds the right- just if ied, zero-padded 
; bit string that was extracted. 

; The EDI register holds the bit offset of the start of the 
; substring. 

; The EAX, EBX, and ECX registers also are used. 



MOV 


ECX, EDI 




temp storage for offset 


SHR 


EDI, 5 




divide offset by 32 (dwords) 


SHL 


EDI, 2 




multiply by 4 (byte address) 


AND 


CL, 1FH 




get low five bits of offset in 


MOV 


EAX, [EDI] strg_ 


_base 


move low string dword into EAX 


MOV 


EDX, [EDI] strg_ 


_base +4 


other string dword into EDX 


SHRD 


EAX,EDX,CL 




shift right by offset in dword 


AND 


EAX, mask 




extracted bit field in EAX 



4.4.5. Byte-Set-On-Condition Instructions 

This group of instructions sets a byte to the value of zero or one, depending on any of the 16 
conditions defined by the status flags. The byte may be in a register or in memory. These 
instructions are especially useful for implementing Boolean expressions in high-level 
languages such as Pascal. 

Some languages represent a logical one as an integer with all bits set. This can be done by 
using the SETcc instruction with the mutually exclusive condition, then decrementing the 
result. 

SETcc (Set Byte on Condition cc) loads the value 1 into a byte if condition cc is true; clears 
the byte otherwise. See Appendix D for a definition of the possible conditions. 



4.4.6. Test Instruction 

TEST (Test) performs the logical "and" of the two operands, clears the OF and CF flags, 
leaves the AF flag undefined, and updates the SF, ZF, and PF flags. The flags can be tested by 
conditional control transfer instructions or the byte-set-on-condition instructions. The operands 
may be bytes, words, or doublewords. 

The difference between the TEST and AND instructions is that the TEST instruction does not 
alter the destination operand. The difference between the TEST and BT instructions is that the 

■ 4-21 



APPLICATION PROGRAMMING 



TEST instruction can test the value of multiple bits in one operation, while the BT instruction 
tests a single bit. 



4.5. CONTROL TRANSFER INSTRUCTIONS 

The processor provides both conditional and unconditional control transfer instructions to 
direct the flow of execution. Conditional transfers are taken only for certain combinations of 
the state of the flags. Unconditional control transfers are always executed. 



4.5.1 . Unconditional Transfer Instructions 

The JMP, CALL, RET, INT, and IRET instructions transfer execution to a destination in a 
code segment. The destination can be within the same code segment (near transfer) or in a 
different code segment (far transfer). The forms of these instructions which transfer execution 
to other segments are discussed in a later section of this chapter. If the model of memory 
organization used in a particular application does not make segments visible to application 
programmers, far transfers are not used. 

4.5.1 .1 . JUMP INSTRUCTION 

JMP (Jump) unconditionally transfers execution to the destination. The JMP instruction is a 
one-way transfer of execution; it does not save a return address on the stack. 

The JMP instruction transfers execution from the current routine to a different routine. The 
address of the routine is specified in the instruction, in a register, or in memory. The location 
of the address determines whether it is interpreted as a relative address or an absolute address. 

Relative Address. A relative jump uses a displacement (immediate mode constant used for 
address calculation) held in the instruction. The displacement is signed and variable-length 
(byte or doubleword). The destination address is formed by adding the displacement to the 
address held in the EIP register. The EIP register then contains the address of the next 
instruction to be executed. 

Absolute Address. An absolute jump is used with a 32-bit segment offset in either of the 
following ways: 

1. The program can jump to an address in a general register. This 32-bit value is copied into 
the EIP register and execution continues. 

2. The destination address can be a memory operand specified using the standard addressing 
modes. The operand is copied into the EIP register and execution continues. 

4.5.1 .2. CALL INSTRUCTIONS 

CALL (Call Procedure) transfers execution and saves the address of the instruction following 
the CALL instruction for later use by a RET (Return) instruction. CALL pushes the current 
contents of the EIP register on the stack. The RET instruction in the called procedure uses this 
address to transfer execution back to the calling program. 



4-22 



APPLICATION PROGRAMMING 



CALL instructions, like JMP instructions, have relative and absolute forms. 
Indirect CALL instructions specify an absolute address in one of the following ways: 

1 . The program can jump to an address in a general register. This 32-bit value is copied into 
the EIP register, the return address is pushed on the stack, and execution continues. 

2. The destination address can be a memory operand specified using the standard addressing 
modes. The operand is copied into the EIP register, the return address is pushed on the 
stack, and execution continues. 

4.5.1 .3. RETURN AND RETURN-FROM-INTERRUPT INSTRUCTIONS 

RET (Return From Procedure) terminates a procedure and transfers execution to the 
instruction following the CALL instruction which originally invoked the procedure. The RET 
instruction restores the contents of the EIP register which were pushed on the stack when the 
procedure was called. 

The RET instructions have an optional immediate operand. When present, this constant is 
added to the contents of the ESP register, which has the effect of removing any parameters 
pushed on the stack before the procedure call. 

IRET (Return From Interrupt) returns control to an interrupted procedure. The IRET 
instruction differs from the RET instruction in that it restores the EFLAGS register from the 
stack. The contents of the EFLAGS register are stored on the stack when an interrupt occurs. 

4.5.2. Conditional Transfer Instructions 

The conditional transfer instructions are jumps which transfer execution if the states in the 
EFLAGS register match conditions specified in the instruction. 

4.5.2.1 . CONDITIONAL JUMP INSTRUCTIONS 

Table 4-3 shows the mnemonics for the jump instructions. The instructions listed as pairs are 
alternate names for the same instruction. The assembler provides these names for greater 
clarity in program listings. 

A form of conditional jump instruction is available which uses a displacement added to the 
contents of the EIP register if the specified condition is true. The displacement may be a byte 
or doubleword. The displacement is signed; it can be used to jump forward or backward. 



4-23 



APPLICATION PROGRAMMING 




Table 4-3. Conditional Jump Instructions 



Mnemonic 


Flag States 


Description 


Unsigned Conditional Jumps 


JA/JNBE 


(CF or ZF)=0 


above/not below nor equal 


JAE/JNB 


CF=0 


above or equal/not below 


JB/JNAE 


CF=1 


below/not above nor equal 


JBE/JNA 


(CF ot ZF)=1 


below or equal/not above 


JO 


CF=1 


carry 


JE/JZ 


ZF=1 


equal/zero 


JNC 


CF=0 


not carry 


JNE/JNZ 


ZF=0 


not equal/not zero 


JNP/JPO 


PF=0 


not parity/parity odd 


JP/JPE 


PF=1 


parity/parity even 


Signed Conditional Jumps 


JG/JNLE 


((SF xor OF) or ZF) =0 


greater/not less nor equal 


JGE/JNL 


(SF xor OF)=0 


greater or equal/not less 


JL/JNGE 


(SF xor OF)=1 


less/not greater nor equal 


JLE/JNG 


((SF xor OF) or ZF)=1 


less or equal/not greater 


JNO 


OF=0 


not overflow 


JNS 


SF=0 


not sign (non-negative) 


JO 


OF=1 


overflow 


JS 


SF=1 


sign (negative) 



4.5.2.2. LOOP INSTRUCTIONS 

The loop instructions are conditional jumps which use the value of the ECX register as a count 
for the number of times to run a loop. All loop instructions decrement the contents of the ECX 
register on each reposition and terminate when zero is reached. Four of the five loop 
instructions accept the ZF flag as a condition for terminating the loop before the count reaches 
zero. 

LOOP (Loop While ECX Not Zero) is a conditional jump instruction which decrements the 
contents of the ECX register before testing for the loop-terminating condition. If the contents 
of the ECX register are non-zero, the program jumps to the destination specified in the 
instruction. The LOOP instruction causes the execution of a block of code to be repeated until 
the count reaches zero. When zero is reached, execution is transferred to the instruction 
immediately following the LOOP instruction. If the value in the ECX register is zero when the 
instruction is first called, the count is pre-decremented to OFFFFFFFFH and the LOOP runs 2 32 
times. 



4-24 



APPLICATION PROGRAMMING 



LOOPE (Loop While Equal) and LOOPZ (Loop While Zero) are synonyms for the same 
instruction. These instructions are conditional jumps which decrement the contents of the ECX 
register before testing for the loop-terminating condition. If the contents of the ECX register 
are non-zero and the ZF flag is set, the program jumps to the destination specified in the 
instruction. When zero is reached or the ZF flag is clear, execution is transferred to the 
instruction immediately following the LOOPE/LOOPZ instruction. 

LOOPNE (Loop While Not Equal) and LOOPNZ (Loop While Not Zero) are synonyms for 
the same instruction. These instructions are conditional jumps which decrement the contents of 
the ECX register before testing for the loop-terminating condition. If the contents of the ECX 
register are non-zero and the ZF flag is clear, the program jumps to the destination specified in 
the instruction. When zero is reached or the ZF flag is set, execution is transferred to the 
instruction immediately following the LOOPE/LOOPZ instruction. 

4.5.2.3. EXECUTING A LOOP OR REPEAT ZERO TIMES 

JECXZ (Jump if ECX Zero) jumps to the destination specified in the instruction if the ECX 
register holds a value of zero. The JECXZ instruction is used in combination with the LOOP 
instruction and with the string scan and compare instructions. Because these instructions 
decrement the contents of the ECX register before testing for zero, a loop will run 2 32 times if 
the loop is entered with a zero value in the ECX register. The JECXZ instruction is used to 
create loops which fall through without executing when the initial value is zero. A JECXZ 
instruction at the beginning of a loop can be used to jump out of the loop if the count is zero. 
When used with repeated string scan and compare instructions, the JECXZ instruction can 
determine whether the loop terminated due to the count or due to satisfaction of the scan or 
compare conditions. 



4.5.3. Software Interrupts 

The INT, INTO, and BOUND instructions allow the programmer to specify a transfer of 
execution to an exception or interrupt handler. 

INT/i (Software Interrupt) calls the handler specified by an interrupt vector encoded in the 
instruction. The INT instruction may specify any interrupt type. This instruction is used to 
support multiple types of software interrupts or to test the operation of interrupt service 
routines. The interrupt service routine terminates with an IRET instruction, which returns 
execution to the instruction following the INT instruction. 

INTO (Interrupt on Overflow) calls the handler for the overflow exception, if the OF flag is 
set. If the flag is clear, execution continues without calling the handler. The OF flag is set by 
arithmetic, logical, and string instructions. This instruction causes a software interrupt for 
handling error conditions, such as arithmetic overflow. 

BOUND (Detect Value Out of Range) compares the signed value held in a general register 
against an upper and lower limit. The handler for the bounds-check exception is called if the 
value held in the register is less than the lower bound or greater than the upper bound. This 
instruction causes a software interrupt for bounds checking, such as checking an array index to 
make sure it falls within the range defined for the array. 



I 



4-25 



APPLICATION PROGRAMMING 



The BOUND instruction has two operands. The first operand specifies the general register 
being tested. The second operand is the base address of two words or doublewords at adjacent 
locations in memory. The lower limit is the word or doubleword with the lower address; the 
upper limit has the higher address. The BOUND instruction assumes that the upper limit and 
lower limit are in adjacent memory locations. These limit values cannot be register operands; if 
they are, an invalid-opcode exception occurs. 

The upper and lower limits of an array can reside just before the array itself. This puts the 
array bounds at a constant offset from the beginning of the array. Because the address of the 
array already will be present in a register, this practice avoids extra bus cycles to obtain the 
effective address of the array bounds. 



4.6. STRING OPERATIONS 

String operations manipulate large data structures in memory, such as alphanumeric character 
strings. See also the section on I/O for information about the string I/O instructions (also 
known as block I/O instructions). 

The string operations are made by putting string instructions (which execute only one iteration 
of an operation) together with other features of the instruction set, such as repeat prefixes. The 
string instructions include: 

• MO VS— Move String 

• CMPS — Compare string 

• SCAS— Scan string 

• LODS— Load string 

• STOS— Store string 

After a string instruction executes, the string source and destination registers point to the next 
elements in their strings. The string instructions automatically increment or decrement the 
contents of these registers by the number of bytes occupied by each string element. A string 
element can be a byte, word, or doubleword. The string registers include: 

• ESI — Source index register 

• EDI — Destination index register 

String operations can begin at higher addresses and work toward lower ones, or they can begin 
at lower addresses and work toward higher ones. The direction is controlled by: 

• DF — Direction flag 

If the DF flag is clear, the registers are incremented. If the flag is set, the registers are 
decremented. These instructions set and clear the flag: 

• STD— Set direction flag 

• CLD — Clear direction flag 



4-26 



i 



irrtel 



APPLICATION PROGRAMMING 



To operate on more than one element of a string, a repeat prefix must be used, such as: 

• REP — Repeat while the ECX register not zero 

• REPE/REPZ — Repeat while the ECX register not zero and the ZF flag is set 

• REPNE/REPNZ — Repeat while the ECX register not zero and the ZF flag is clear 

Exceptions or interrupts that occur during a string instruction leave the registers in a state 
which allows the string instruction to be restarted. The source and destination registers point to 
the next string elements, the EIP register points to the string instruction, and the ECX register 
has the value it held following the last successful iteration. All that is necessary to restart the 
operation is to service the interrupt or fix the source of the exception, then execute an IRET 
instruction. 



4.6.1 . Repeat Prefixes 

The repeat prefixes REP (Repeat While ECX Not Zero), REPE/REPZ (Repeat While 
Equal/Zero), and REPNE/REPNZ (Repeat While Not Equal/Not Zero) specify repeated 
operation of a string instruction. 

When a string instruction has a repeat prefix, the operation executes until one of the 
termination conditions specified by the prefix is satisfied. 

For each repetition of the instruction, the string operation may be suspended by an exception 
or interrupt. After the exception or interrupt has been serviced, the string operation can restart 
where it left off. This mechanism allows long string operations to proceed without affecting 
the interrupt response time of the system. 

All three prefixes shown in Table 4-4 cause the instruction to repeat until the ECX register is 
decremented to zero, if no other termination condition is satisfied. The repeat prefixes differ in 
their other termination condition. The REP prefix has no other termination condition. The 
REPE/REPZ and REPNE/REPNZ prefixes are used exclusively with the SCAS (Scan String) 
and CMPS (Compare String) instructions. The REPE/REPZ prefix terminates if the ZF flag is 
clear. The REPNE/REPNZ prefix terminates if the ZF flag is set. The ZF flag does not require 
initialization before execution of a repeated string instruction, because both the SCAS and 
CMPS instructions affect the ZF flag according to the results of the comparisons they make. 



Table 4-4. Repeat instructions 



Repeat Prefix 


Termination Condition 1 


Termination Condition 2 


REP 


ECX= 


=0 


none 


REPE/REPZ 


ECX= 


=0 


ZF=0 


REPNE/REPNZ 


ECX= 


=0 


ZF=1 



I 



4-27 



APPLICATION PROGRAMMING 



4.6.2. Indexing and Direction Flag Control 

Although the general registers are completely interchangeable under most conditions, the 
string instructions require the use of two specific registers. The source and destination strings 
are in memory addressed by the ESI and EDI registers. The ESI register points to source 
operands. By default, the ESI register is used with the DS segment register. A segment- 
override prefix allows the ESI register to be used with the CS, SS, ES, FS, or GS segment 
registers. The EDI register points to destination operands. It uses the segment indicated by the 
ES segment register; no segment override is allowed. The use of two different segment 
registers in one instruction permits operations between strings in different segments. 

When ESI and EDI are used in string instructions, they automatically are incremented or 
decremented after each iteration. String operations can begin at higher addresses and work 
toward lower ones, or they can begin at lower addresses and work toward higher ones. The 
direction is controlled by the DF flag. If the flag is clear, the registers are incremented. If the 
flag is set, the registers are decremented. The STD and CLD instructions set and clear this flag. 
Programmers should always put a known value in the DF flag before using a string instruction. 



4.6.3. String Instructions 

MOVS (Move String) moves the string element addressed by the ESI register to the location 
addressed by the EDI register. The MOVSB instruction moves bytes, the MOVSW instruction 
moves words, and the MOVSD instruction moves doublewords. The MOVS instruction, when 
accompanied by the REP prefix, operates as a memory-to-memory block transfer. To set up 
this operation, the program must initialize the ECX, ESI, and EDI registers. The ECX register 
specifies the number of elements in the block. 

CMPS (Compare Strings) subtracts the destination string element from the source string 
element and updates the AF, SF, PF, CF and OF flags. Neither string element is written back to 
memory. If the string elements are equal, the ZF flag is set; otherwise, it is cleared. CMPSB 
compares bytes, CMPSW compares words, and CMPSD compares doublewords. 

SCAS (Scan String) subtracts the destination string element from the EAX, AX, or AL 
register (depending on operand length) and updates the AF, SF, ZF, PF, CF and OF flags. The 
string and the register are not modified. If the values are equal, the ZF flag is set; otherwise, it 
is cleared. The SCASB instruction scans bytes; the SCASW instruction scans words; the 
SCASD instruction scans doublewords. 

When the REPE/REPZ or REPNE/REPNZ prefix modifies either the SCAS or CMPS 
instructions, the loop which is formed is terminated by the loop counter or the effect the SCAS 
or CMPS instruction has on the ZF flag. 

LODS (Load String) places the source string element addressed by the ESI register into the 
EAX register for doubleword strings, into the AX register for word strings, or into the AL 
register for byte strings. This instruction usually is used in a loop, where other instructions 
process each element of the string as they appear in the register. 

STOS (Store String) places the source string element from the EAX, AX, or AL register into 
the string addressed by the EDI register. This instruction usually is used in a loop, where it 
writes to memory the result of processing a string element read from memory with the LODS 

4-28 ■ 



APPLICATION PROGRAMMING 



instruction. A REP STOS instruction is the fastest way to initialize a large block of memory. 



4.7. INSTRUCTIONS FOR BLOCK-STRUCTURED LANGUAGES 

These instructions provide machine-language support for implementing block-structured 
languages, such as C and Pascal. They include ENTER and LEAVE, which simplify procedure 
entry and exit in compiler-generated code. They support a structure of pointers and local 
variables on the stack called a stack frame. 

ENTER (Enter Procedure) creates a stack frame compatible with the scope rules of block- 
structured languages. In these languages, a procedure has access to its own variables and some 
number of other variables defined elsewhere in the program. The scope of a procedure is the 
set of variables to which it has access. The rules for scope vary among languages; they may be 
based on the nesting of procedures, the division of the program into separately-compiled files, 
or some other modularization scheme. 

The ENTER instruction has two operands. The first specifies the number of bytes to be 
reserved on the stack for dynamic storage in the procedure being entered. Dynamic storage is 
the memory allocated for variables created when the procedure is called, also known as 
automatic variables. The second parameter is the lexical nesting level (from to 31) of the 
procedure. The nesting level is the depth of a procedure in the hierarchy of a block-structured 
program. The lexical level has no particular relationship to either the protection privilege level 
or to the I/O privilege level. 

The lexical nesting level determines the number of stack frame pointers to copy into the new 
stack frame from the preceding frame. A stack frame pointer is a doubleword used to access 
the variables of a procedure. The set of stack frame pointers used by a procedure to access the 
variables of other procedures is called the display. The first doubleword in the display is a 
pointer to the previous stack frame. This pointer is used by a LEAVE instruction to undo the 
effect of an ENTER instruction by discarding the current stack frame. 

Example: enter 2 048,3 

Allocates 2K bytes of dynamic storage on the stack and sets up pointers to two previous stack 
frames in the stack frame for this procedure. 

After the ENTER instruction creates the display for a procedure, it allocates the dynamic 
(automatic) local variables for the procedure by decrementing the contents of the ESP register 
by the number of bytes specified in the first parameter. This new value in the ESP register 
serves as the initial top-of-stack for all PUSH and POP operations within the procedure. 

To allow a procedure to address its display, the ENTER instruction leaves the EBP register 
pointing to the first doubleword in the display. Because stacks grow down, this is actually the 
doubleword with the highest address in the display. Data manipulation instructions which 
specify the EBP register as a base register automatically address locations within the stack 
segment instead of the data segment. 

The ENTER instruction can be used in two ways: nested and non-nested. If the lexical level is 
0, the non-nested form is used. The non-nested form pushes the contents of the EBP register on 
the stack, copies the contents of the ESP register into the EBP register, and subtracts the first 
operand from the contents of the ESP register to allocate dynamic storage. The non-nested 
form differs from the nested form in that no stack frame pointers are copied. The nested form 
of the ENTER instruction occurs when the second parameter (lexical level) is not zero. 

■ 4-29 



APPLICATION PROGRAMMING 



The psuedo code in Example 4-1 shows the formal definition of the ENTER instruction. 
STORAGE is the number of bytes of dynamic storage to allocate for local variables, and 
LEVEL is the lexical nesting level. 

Example 4-1. ENTER Definition 

Push EBP 

Set a temporary value FRAME_PTR := ESP 
If LEVEL > then 

Repeat LEVEL-1 times 
EBP := EBP-4 

Push the doubleword pointed to by EBP 
End Repeat 
Push FRAME_PTR 
End if 

EBP := FRAME_PTR 
ESP := ESP-STORAGE 

The main procedure (in which all other procedures are nested) operates at the highest lexical 
level, level 1 . The first procedure it calls operates at the next deeper lexical level, level 2. A 
level 2 procedure can access the variables of the main program, which are at fixed locations 
specified by the compiler. In the case of level 1, the ENTER instruction allocates only the 
requested dynamic storage on the stack because there is no previous display to copy. 

A procedure which calls another procedure at a lower lexical level gives the called procedure 
access to the variables of the caller. The ENTER instruction provides this access by placing a 
pointer to the calling procedure's stack frame in the display. 

A procedure which calls another procedure at the same lexical level should not give access to 
its variables. In this case, the ENTER instruction copies only that part of the display from the 
calling procedure which refers to previously nested procedures operating at higher lexical 
levels. The new stack frame does not include the pointer for addressing the calling procedure's 
stack frame. 

The ENTER instruction treats a re-entrant procedure as a call to a procedure at the same lexical 
level. In this case, each succeeding iteration of the re-entrant procedure can address only its 
own variables and the variables of the procedures within which it is nested. A re-entrant 
procedure always can address its own variables; it does not require pointers to the stack frames 
of previous iterations. 

By copying only the stack frame pointers of procedures at higher lexical levels, the ENTER 
instruction makes certain that procedures access only those variables of higher lexical levels, 
not those at parallel lexical levels (see Figure 4-15). 



4-30 



I 



APPLICATION PROGRAMMING 



MAIN (LEXICAL LEVEL 1) 
PROCEDURE A (LEXICAL LEVEL 2) 
PROCEDURE B (LEXICAL LEVEL 3) 



PROCEDURE C (LEXICAL LEVEL 3) 
I PROCEDURE D (LEXICAL LEVEL 4) I 



APM24 



Figure 4-15. Nested Procedures 



Block-structured languages can use the lexical levels defined by ENTER to control access to 
the variables of nested procedures. In the figure, for example, if PROCEDURE A calls 
PROCEDURE B which, in turn, calls PROCEDURE C, then PROCEDURE C will have access 
to the variables of MAIN and PROCEDURE A, but not those of PROCEDURE B because they 
are at the same lexical level. The following definition describes the access to variables for the 
nested procedures in Figure 4-15. 

1 . MAIN has variables at fixed locations. 

2. PROCEDURE A can access only the variables of MAIN. 

3. PROCEDURE B can access only the variables of PROCEDURE A and MAIN. 
PROCEDURE B cannot access the variables of PROCEDURE C or PROCEDURE D. 

4. PROCEDURE C can access only the variables of PROCEDURE A and MAIN. 
PROCEDURE C cannot access the variables of PROCEDURE B or PROCEDURE D. 

5. PROCEDURE D can access the variables of PROCEDURE C, PROCEDURE A, and 
MAIN. PROCEDURE D cannot access the variables of PROCEDURE B. 

In Figure 4-16, an ENTER instruction at the beginning of the MAIN program creates three 
double words of dynamic storage for MAIN, but copies no pointers from other stack frames. 
The first doubleword in the display holds a copy of the last value in the EBP register before the 
ENTER instruction was executed. The second doubleword (which, because stacks grow down, 
is stored at a lower address) holds a copy of the contents of the EBP register following the 
ENTER instruction. After the instruction is executed, the EBP register points to the first 
doubleword pushed on the stack, and the ESP register points to the last doubleword in the 
stack frame. 



I 



4-31 



APPLICATION PROGRAMMING 



irrtel 



DISPLAY 



DYNAMIC 
STORAGE 



OLD EBP 



MAIN'S EBP 



EBP 



ESP 



Figure 4-16. Stack Frame After Entering MAIN 



When MAIN calls PROCEDURE A, the ENTER instruction creates a new display (see Figure 
4-17). The first doubleword is the last value held in MAIN'S EBP register. The second 
doubleword is a pointer to MAIN'S stack frame which is copied from the second doubleword in 
MAIN'S display. This happens to be another copy of the last value held in MAIN'S EBP 
register. PROCEDURE A can access variables in MAIN because MAIN is at level 1. 
Therefore the base address for the dynamic storage used in MAIN is the current address in the 
EBP register, plus four bytes to account for the saved contents of MAIN'S EBP register. All 
dynamic variables for MAIN are at fixed, positive offsets from this value. 

When PROCEDURE A calls PROCEDURE B, the ENTER instruction creates a new display. 
(See Figure 4-18). The first doubleword holds a copy of the last value in PROCEDURE A's 
EBP register. The second and third doublewords are copies of the two stack frame pointers in 
PROCEDURE A's display. PROCEDURE B can access variables in PROCEDURE A and 
MAIN by using the stack frame pointers in its display. 

When PROCEDURE B calls PROCEDURE C, the ENTER instruction creates a new display 
for PROCEDURE C. (See Figure 4-19). The first doubleword holds a copy of the last value in 
PROCEDURE B's EBP register. This is used by the LEAVE instruction to restore 
PROCEDURE B's stack frame. The second and third doublewords are copies of the two stack 
frame pointers in PROCEDURE A's display. If PROCEDURE C were at the next deeper 
lexical level from PROCEDURE B, a fourth doubleword would be copied, which would be the 
stack frame pointer to PROCEDURE B's local variables. 

Note that PROCEDURE B and PROCEDURE C are at the same level, so PROCEDURE C is 
not intended to access PROCEDURE B's variables. This does not mean that PROCEDURE C 
is completely isolated from PROCEDURE B; PROCEDURE C is called by PROCEDURE B, 
so the pointer to the returning stack frame is a pointer to PROCEDURE B's stack frame. In 
addition, PROCEDURE B can pass parameters to PROCEDURE C either on the stack or 
through variables global to both procedures (i.e., variables in the scope of both procedures). 



4-32 



I 



APPLICATION PROGRAMMING 



OLD EBP 
MAIN'S EBP 



DISPLAY 



DYNAMIC 
STORAGE 



MAIN'S EBP 



MAIN'S EBP 



PROCEDURE A'S EBP 



EBP 



<— ESP 



Figure 4-17. Stack Frame After Entering PROCEDURE A 



i 



4-33 



APPLICATION PROGRAMMING 



OLD EBP 
MAIN'S EBP 



MAIN'S EBP 
MAIN'S EBP 
PROCEDURE A'S EBP 



DISPLAY 



DYNAMIC 
STORAGE 



PROCEDURE A'S EBP 



MAIN'S EBP 



PROCEDURE A'S EBP 



PROCEDURE B'S EBP 



EBP 



ESP 



Figure 4-18. Stack Frame After Entering PROCEDURE B 



4-34 



i 



APPLICATION PROGRAMMING 



OLD EBP 
MAIN'S EBP 



MAIN'S EBP 



MAIN'S EBP 



PROCEDURE A'S EBP 



PROCEDURE A'S EBP 

MAIN'S EBP 
PROCEDURE A S EBP 
PROCEDURE B'S EBP 



DISPLAY 



DYNAMIC 
STORAGE 



PROCEDURE B'S EBP 



MAIN'S EBP 



PROCEDURE A'S EBP 



PROCEDURE C'S EBP 



EBP 



ESP 



Figure 4-19. Stack Frame After Entering PROCEDURE C 



LEAVE (Leave Procedure) reverses the action of the previous ENTER instruction. The 
LEAVE instruction does not have any operands. The LEAVE instruction copies the contents of 
the EBP register into the ESP register to release all stack space allocated to the procedure. 
Then the LEAVE instruction restores the old value of the EBP register from the stack. This 



4-35 



APPLICATION PROGRAMMING 



simultaneously restores the ESP register to its original value. A subsequent RET instruction 
then can remove any arguments and the return address pushed on the stack by the calling 
program for use by the procedure. 

4.8. FLAG CONTROL INSTRUCTIONS 

The flag control instructions change the state of bits in the EFLAGS register, as shown in 
Table 4-5. 



Table 4-5. Flag Control Instructions 



Instruction 


Effect 


STC (Set Carry Flag) 


CF<- 1 


CLC (Clear Carry Flag) 


CF^-0 


CMC (Complement Carry Flag) 


CF*--CF 


CLD (Clear Direction Flag) 


DF<-0 


STD (Set Direction Flag) 


DF<-1 



4.8.1. Carry and Direction Flag Control Instructions 

The carry flag instructions are useful with instructions like the rotate-with-carry instructions 
RCL and RCR. They can initialize the carry flag, CF, to a known state before execution of an 
instruction which copies the flag into an operand. 

The direction flag control instructions set or clear the direction flag, DF, which controls the 
direction of string processing. If the DF flag is clear, the processor increments the string index 
registers, ESI and EDI, after each iteration of a string instruction. If the DF flag is set, the 
processor decrements these index registers. 



4.8.2. Flag Transfer Instructions 

Though specific instructions exist to alter the CF and DF flags, there is no direct method of 
altering the other application-oriented flags. The flag transfer instructions allow a program to 
change the state of the other flag bits using the bit manipulation instructions once these flags 
have been moved to the stack or the AH register. 

The LAHF and SAHF instructions deal with five of the status flags, which are used primarily 
by the arithmetic and logical instructions. 

LAHF (Load AH from Flags) copies the SF, ZF, AF, PF, and CF flags to the AH register bits 
7, 6, 4, 2, and 0, respectively (see Figure 4-20). The contents of the remaining bits 5, 3, and 1 
are left undefined. The contents of the EFLAGS register remain unchanged. 

SAHF (Store AH into Flags) copies bits 7, 6, 4, 2, and from the AH register into the SF, ZF, 
AF, PF, and CF flags, respectively (see Figure 4-20). 



4-36 



APPLICATION PROGRAMMING 



h/e/s 




hfck 


S 
F 


Z 
F 





A 
F 





P 
F 


1 




\ 


L 




W 





THE BIT POSITIONS OF THE FLAGS ARE THE SAME, 
WHETHER THEY ARE HELD IN THE EFLAGS REGISTER 
OR T HE AH REGISTER. BIT POSITIONS SHOWN AS 
r I ARE INTEL RESERVED. DO NOT USE. 



APM21 



Figure 4-20. Low Byte of EFLAGS Register 



The PUSHF and POPF instructions are not only useful for storing the flags in memory where 
they can be examined and modified, but also are useful for preserving the state of the EFLAGS 
register while executing a subroutine. 

PUSHF (Push Flags) pushes the lower word of the EFLAGS register onto the stack (see 
Figure 4-21). The PUSHFD instruction pushes the entire EFLAGS register onto the stack (the 
RF and VM flags read as clear, however). 



K- 



K- 



131/30/29/28/27/26/25/24/23/22/21/20 12/11 /W/9/ 8 / 7 /6 / 5 /4 / 3 / 2 /1 



□ 



BIT POSITIONS MARKED OR 1 ARE INTEL RESERVED. 
DO NOT USE. 



PUSHFD/POPFD 
PUSHF/POPF 



Figure 4-21. Flags Used with PUSHF and POPF 



POPF (Pop Flags) pops a word from the stack into the EFLAGS register. Only bits 11, 10, 8, 
7, 6, 4, 2, and are affected with all uses of this instruction. If the privilege level of the current 
code segment is (most privileged), the IOPL bits (bits 13 and 12) also are affected. If the I/O 
privilege level (IOPL) is 0, the IF flag (bit 9) also is affected. The POPFD instruction pops a 
doubleword into the EFLAGS register, and it can change the state of the AC bit (bit 18) and 
the ID bit (bit 21), as well as the bits affected by a POPF instruction. 



i 



4-37 



APPLICATION PROGRAMMING 



4.9. NUMERIC INSTRUCTIONS 

The Pentium processor includes hardware and instructions for high-precision numeric 
operations on a variety of numeric data types, including 80-bit extended real and 64-bit long 
integer. Arithmetic, comparison, transcendental, and data transfer instructions are available. 
Frequently-used constants are also provided, to enhance the speed of numeric calculations. 

The numeric instructions are embedded in the instruction stream of the Pentium processor, as 
though they were being executed by a single device having both integer and floating-point 
capabilities. But the floating-point unit of the Pentium CPU actually works in parallel with the 
integer unit, resulting in higher performance. 

Refer to Chapter 5 to confirm the presence of a Pentium processor floating point unit. 
Chapter 6 describes the numeric instructions in more detail. 



4.10. SEGMENT REGISTER INSTRUCTIONS 

There are several distinct types of instructions which use segment registers. They are grouped 
together here because, if system designers choose an unsegmented model of memory 
organization, none of these instructions are used. The instructions which deal with segment 
registers include the following: 

1. Segment-register transfer instructions. 



MOV SegReg, . . . 

MOV . . . , SegReg 

PUSH SegReg 
POP SegReg 

2. Control transfers to another executable segment. 

JMP far 
CALL far 
RET far 

3 . Data pointer instructions . 

LDS reg, 4 8 -bit memory operand 

LES reg, 4 8 -bit memory operand 

LFS reg, 4 8 -bit memory operand 

LGS reg, 4 8 -bit memory operand 

LSS reg, 4 8 -bit memory operand 



4. Note that the following interrupt-related instructions also are used in unsegmented 
systems. Although they can transfer execution between segments when segmentation is 
used, this is transparent to the application programmer. 



4-38 



i 



APPLICATION PROGRAMMING 



INT n 

INTO 

BOUND 

IRET 



4.1 0.1 . Segment-Register Transfer Instructions 

Forms of the MOV, POP, and PUSH instructions also are used to load and store segment 
registers. These forms operate like the general-register forms, except that one operand is a 
segment register. The MOV instruction cannot copy the contents of a segment register into 
another segment register. 

The POP and MOV instructions cannot place a value in the CS register (code segment); only 
the far control-transfer instructions affect the CS register. When the destination is the SS 
register (stack segment), interrupts are disabled until after the next instruction. 

No 16-bit operand size prefix is needed when transferring data between a segment register and 
a 32-bit general register. 



4.10.2. Far Control Transfer Instructions 

The far control-transfer instructions transfer execution to a destination in another segment by 
replacing the contents of the CS register. The destination is specified by a far pointer, which is 
a 16-bit segment selector and a 32-bit offset into the segment. The far pointer can be an 
immediate operand or an operand in memory. 

Far CALL. An intersegment CALL instruction places the values held in the EIP and CS 
registers on the stack. 

Far RET. An intersegment RET instruction restores the values of the CS and EIP registers 
from the stack. 



4.10.3. Data Pointer Instructions 

The data pointer instructions load a far pointer into the processor registers. A far pointer 
consists of a 16-bit segment selector, which is loaded into a segment register, and a 32-bit 
offset into the segment, which is loaded into a general register. 

LDS (Load Pointer Using DS) copies a far pointer from the source operand into the DS 
register and a general register. The source operand must be a memory operand, and the 
destination operand must be a general register. 

Example: lds esi, string_x 

Loads the DS register with the segment selector for the segment addressed by STRING_X, and 
loads the offset within the segment to STRING_X into the ESI register. Specifying the ESI 
register as the destination operand is a convenient way to prepare for a string operation, when 
the source string is not in the current data segment. 



[-39 



APPLICATION PROGRAMMING 



LES (Load Pointer Using ES) has the same effect as the LDS instruction, except the segment 
selector is loaded into the ES register rather than the DS register. 

Example: les edi, destination_x 

Loads the ES register with the segment selector for the segment addressed by 
DESTINATION_X, and loads the offset within the segment to DESTINATION_X into the 
EDI register. This instruction is a convenient way to select a destination for string operation if 
the desired location is not in the current E-data segment. 

LFS (Load Pointer Using FS) has the same effect as the LDS instruction, except the FS 
register receives the segment selector rather than the DS register. 

LGS (Load Pointer Using GS) has the same effect as the LDS instruction, except the GS 
register receives the segment selector rather than the DS register. 

LSS (Load Pointer Using SS) has the same effect as the LDS instruction, except the SS 
register receives the segment selector rather than the DS register. This instruction is especially 
important, because it allows the two registers which identify the stack (the SS and ESP 
registers) to be changed in one uninterruptible operation. Unlike the other instructions which 
can load the SS register, interrupts are not inhibited at the end of the LSS instruction. The other 
instructions, such as POP SS, turn off interrupts to permit the following instruction to load the 
ESP register without an intervening interrupt. Since both the SS and ESP registers can be 
loaded by the LSS instruction, there is no need to disable or re-enable interrupts. 

4.1 1 . MISCELLANEOUS INSTRUCTIONS 

The following instructions do not fit in any of the previous categories, but are no less 
important. 

The CMPXCHG8B and CPUID instructions are new instructions on the Pentium processor and 
bring improved functionality by providing a single instruction to accomplish what previously 
took multiple instructions on earlier microprocessors. 

The BSWAP, XADD, and CMPXCHG instructions are not available on Intel386 DX or SX 
microprocessors. An Intel386 CPU can perform the same operations in multiple instructions. 
To use these instructions, always include functionally-equivalent code for Intel386 CPUs. 

To determine whether these new instructions can be used, the type of processor in a system 
needs to be determined. See Chapter 5 for code examples and information on determining the 
type of the different processors. 



4.1 1 .1 . Address Calculation Instruction 

LEA (Load Effective Address) puts the 32-bit offset to a source operand in memory (rather 
than its contents) into the destination operand. The source operand must be in memory, and the 
destination operand must be a general register. This instruction is especially useful for 
initializing the ESI or EDI registers before the execution of string instructions or initializing 
the EBX register before an XLAT instruction. The LEA instruction can perform any indexing 
or scaling which may be needed. 



4-40 



APPLICATION PROGRAMMING 



Example: lea ebx, ebcdic_table 

Causes the processor to place the address of the starting location of the table labeled 
EBCDIC_TABLE into EBX. 

4.1 1 .2. No-Operation Instruction 

NOP (No Operation) occupies a byte of code space. When executed, it increments the EIP 
register to point at the next instruction, but affects nothing else. 

4.1 1 .3. Translate Instruction 

XLATB (Translate) replaces the contents of the AL register with a byte read from a 
translation table in memory. The contents of the AL register are interpreted as an unsigned 
index into this table, with the contents of the EBX register used as the base address. The XLAT 
instruction does the same operation and loads its result into the same register, but it gets the 
byte operand from memory. This function is used to convert character codes from one alphabet 
into another. For example, an ASCII code could be used to look up its EBCDIC equivalent. 



4.1 1 .4. Byte Swap Instruction 

BSWAP (Byte Swap) reverses the byte order in a 32-bit register operand. Bit positions 7..0 
are exchanged with 31.. 24, and bit positions 15.. 8 are exchanged with 23.. 16. This instruction 
is useful for converting between "big-endian" and "little-endian" data formats. Executing this 
instruction twice in a row leaves the register in the same value as before. This instruction also 
speeds execution of decimal arithmetic by operating on four digits at a time as shown in 
Example 1. 

Example 4-2. ASCII Arithmetic Using BSWAP 

$title ( 'ASCII Add/Subtract with BSWAP') 

name ASCII_arith 
code segment er public use32 

Add a string of 4 ASCII decimal digits together. 

The upper nibble MUST be 3 . 
; DS : [ESI] points at operand 1 

DS: [EBX] points at operand 2 
; DS:[EDI] points at the destination 



addlOproc near 

Perform ASCII add using BSWAP instruction 



4-41 



APPLICATION PROGRAMMING 



intel 



mov eax, [esi] 
bswap eax 

add eax, 96969696H 

mov ecx, [ebx] 

bswap ecx 

add eax, ecx 

rcr ch, 1 

mov edx,eax 

and eax, 0F0F0F0F0H 

sub eax, eax 

shr eax, 4 

and eax, AO AO AO AH 

add eax, edx 

or eax, 30303030H 

bswap eax 

mov [edi] , eax 

rcl ch, 1 

ret 



Get low four digits of first operand 

Put into big-endian form 

Adjust for addition so carries work 

Get low four digits of second operand 

Put into big-endian form 

Do the add with inter-digit carry 

Save the carr flag 

Save the value 

Extract the uppernibble 

Zero out uppernibble of each byte 

Prepare for fixup 

If non-zero upper nibble then form 
as adjustment value to lower nibble 
Form adjusted lower nibble value 



Upper nibbles may be 1 
Convert back to ASCII 
Back to little-endien 
Set destination 
Restore carry 



from adjustment 



addlO 



endp 



Subtract a string of 4 ASCII decimal digits together. 
The upper nibble must be 3 . 
DS : [ESI] points at operand 1 
DS:[EBX] points at operand 2 
DS:[EDI] points at the destination 

sublOproc near 

; Perform ASCII subtract using BSWAP instruction. 



mov eax, [esi] 

bswap eax 

mov ecx, [ebx] 

bswap ecx 

sub eax, ecx 

rcr ch, 1 

mov edx, eax 

and eax, 0F0F0F0F0H 

sub edx, eax 

shr eax, 4 

and eax, AO AO AO AH 

add eax, edx 

or eax, 30303030H 



Get low four digits of first operand 
Put into big-endian form 
get low four digits of second operand 
Put into big-endian form 

Do the subtraction with inter-digit borrow 
Save the carry flag 
Save the value 

Extract upper nibble, F if borrow happened 
Zero out upper nibble of each byte 
Prepare for fixup 

If non-zero upper nibble then form 
10 as adjustment value to lower nibble 
Form adjusted lower nibble value 
upper nibbles may be 1 from adjustment 
Convert back to ASCII 



4-42 



® 



APPLICATION PROGRAMMING 



mov 



bswap eax 



rcl ch, 1 

ret 



[edi] , eax 



Convert to little-endian 
Set to destination 
Restore borrow 



sublO endp 



code ends 
end 



4.1 1 .5. Exchange-and-Add Instruction 



XADD (Exchange and Add) takes two operands: a source operand in a register and a 
destination operand in a register or memory. The source operand is replaced with the 
destination operand, and the destination operand is replaced with the sum of the source and 
destination operands. The flags reflect the result of the addition. This instruction can be 
combined with LOCK in a multiprocessing system to allow multiple processors to execute one 
do loop. 



4.1 1 .6. Compare-and-Exchange Instructions 

CMPXCHG (Compare and Exchange) takes three operands: a source operand in a register, a 
destination operand in a register or memory, and the accumulator (i.e., the AL, AX, or EAX 
register, depending on operand size). If the values in the destination operand and the 
accumulator are equal, the destination operand is replaced with the source operand. Otherwise, 
the original value of the destination operand is loaded into the accumulator. The flags reflect 
the result which would have been obtained by subtracting the destination operand from the 
accumulator. The ZF flag is set if the values in the destination operand and the accumulator 
were equal, otherwise it is cleared. 

The CMPXCHG instruction is useful for testing and modifying semaphores. It performs a 
check to see if a semaphore is free. If the semaphore is free it is marked allocated, otherwise it 
gets the ID of the current owner. This is all done in one uninterruptible operation. In a single 
processor system, it eliminates the need to switch to level to disable interrupts to execute 
multiple instructions. For multiple processor systems, CMPXCHG can be combined with 
LOCK to perform all bus cycles atomically. 

CMPXCHG8B (Compare and Exchange 8 Bytes) takes three operands: a destination 
operand in memory, a 64-bit value in EDX:EAX and a 64-bit value in ECX:EBX. 
CMPXCHG8B compares the 64-bit value in EDX:EAX with the destination. If they are equal, 
the 64-bit value in ECX:EBX is stored in the destination. If EDX:EAX and the destination are 
not equal, the destination is loaded into EDX:EAX. The ZF flag is set if the values in the 
destination and EDX:EAX are equal, otherwise it is cleared. The CF, PF, AF, SF, and OF 
flags are unaffected. CMPXCHG8B can be combined with LOCK to perform all bus cycles in 
one uninterruptible operation. 



I 



4-43 



APPLICATION PROGRAMMING 



4.1 1 .7. CPUID Instruction 

CPUID provides information to software about the the vendor and model of microprocessor 
on which it is executing. By loading a zero into EAX and then executing the CPUD 
instruction, the ECX, EDX, and EBX registers will contain a vendor identification string. The 
EAX register will contain the highest input value understood by the CPUID instruction. 
Software can then obtain additional information regarding which features are present by 
moving a one (or up to the highest value returned in EAX previously) into EAX and executing 
the CPUID instruction again. 

When a one is loaded into the EAX register before executing the CPUID instruction, the EAX 
register contains information regarding the family, model and stepping of the processor as 
shown in Figure 4-22. Bits 8-11 of the EAX register indicate what family the processor 
belongs to and will be 5 for the Pentium microprocessor. Bits 4-7 of the EAX register indicate 
the model and will be to indicate the first model in the Pentium family. Bits 0-3 of the EAX 
register indicate the Stepping ID which is a unique identifier for each revision level. 

The EBX and ECX registers are reserved following execution of this instruction with an input 
value of one, and the EDX register will contain information on which features are present on a 
particular processor. For more information on the feature bits of EDX, see Appendix H. 

The ability to set and clear the ID flag in the EFLAGS register indicates whether the processor 
supports the CPUID instruction. The CPUID instruction can be executed at any privilege level 
to serialize instruction execution. Serializing instruction execution guarantees that any 
modifications to flags, registers, and memory for previous instructions are completed before 
the next instruction is fetched and executed. For more information on serializing operations, 
see Chapter 18. 



f31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12/11 10 9 8 


17 6 5 4 


/3 2 1 O 




RESERVED 


1001 










\ 


\ 


\ 


\ 






FAMILY 


MODEL 


STEPPING 

APM19 



Figure 4-22. EAX Following the CPUID Instruction 



4-44 



i 



Intel 



Feature 
Determination 



i 



irrtel 



® 



CHAPTER 5 
FEATURE DETERMINATION 



Identifying the type of processor present in a system may be necessary in order to determine 
which features are available to an application. Chapter 23 contains a complete list of which 
features are available for the different Intel architectures. The absence of an integrated 
floating-point unit (FPU) or numeric processor extension (NPX) may also need to be 
determined if software needs to emulate the floating-point instructions. 

This chapter discusses processor identification, as well as on-chip FPU and NPX presence 
detection and identification. Sample code is provided in Example 5-1. 



The setting of the flags stored by the PUSHF instruction, by interrupts, and by exceptions is 
different on the 32 bit processors than that stored by the 8086 and Intel 286 processors in bits 
12 and 13 (IOPL), 14 (NT), and 15 (reserved). These differences can be used to distinguish 
what type of processor is present in a system while an application is running. 

• 8086 processor — bits 12 through 15 are always set. 

• Intel 286 processor — bits 12 through 15 are always clear in real-address mode. 

• 32-bit processors — in real-address mode, bit 15 is always clear and bits 14 through 12 
have the last value loaded into them. In protected mode, bit 14 has the last value loaded 
into it, bit 15 is always clear, and IOPL depends on the CPL (if CPL * 0, the IOPL is 
unchanged, otherwise it is updated). 

Other EFLAG register bits that can be used to differentiate between the 32 bit processors 
include: 

• Bit 18 (AC), implemented on the Intel486 and Pentium processors, can be used to 
distinguish an Intel386 processor from the Intel486 and Pentium processors as it will 
always be clear on an Intel386 processor. 

• Bit 21 (ID) can be used to determine if an application can execute the CPUID instruction. 
This instruction supplies information to applications at runtime that identifies the vendor, 
family, model, stepping, and what features are implemented on the processor in the system 
an application is running on. The ability to set and clear this bit indicates that the CPUID 
instruction is supported by the processor. See Chapter 25 for details on this instruction. 



To determine whether an FPU or NPX is present in a system, applications can write to the 
status and control word registers using the FNINIT instruction and then verify the correct 
values are read back. Once an FPU or NPX is determined to be present, its type can then be 



5.1. 



CPU IDENTIFICATION 



5.2. 



FPU DETECTION 



i 



5-1 



FEATURE DETERMINATION 




determined. In most cases, the processor type will determine the type of FPU or NPX, 
however, an Intel386 microprocessor may work with either an Intel287 or Intel387 math 
coprocessor. To determine which of these is present, the infinity of the coprocessor must be 
checked. On the Intel287 math coprocessor, positive infinity is equal to negative infinity. On 
the Intel387 math coprocessor, however, positive infinity is not equal to negative infinity. 



5.3. SAMPLE CPUID IDENTIFICATION/FPU DETECTION CODE 

Example 5-1 is the Intel recommended method of determing the processor type as well as the 
presence and type of NPX or integrated FPU. This code has been modified from previous 
versions of Intel's recommended CPU identification code by modularizing the printing 
functions so that applications not running in a DOS environment can remove or change the 
print function to conform to the appropriate environment. Note that this code (and previous 
versions) is supported on the Intel 286 in real-address mode only. This example was created 
using Microsoft's assembler directives. 

Example 5-1 . CPU Identification and FPU Detection 

Filename: cpuid32 .msm 

; This program has been developed by Intel Corporation. You have 

Intel's permission to incorporate this source code into your 
; product royalty free. 

; Intel specifically disclaims all warranties, express or implied, 
; and all liability, including consequential and other indirect 
; damages, for the use of this code, including liability for 
; infringement of any proprietary rights. Intel does not assume 
; any responsibility for any errors which may appear in this code 
; nor any responsibility to update it. 

; This program determines the type of processor present in the 
; system it is running on. It also determines the presence of a 
; floating-point unit or numeric processor extension on the 
system. 

; This program is modularized and contains two parts: 

; Part 1: Identifies CPU type in cpu_.type : 

; 0=8086 processor 

; 2=Intel 286 processor 

3=Intel386 (TM) processor 

4=Intel486 (TM) processor 
; 5=Pentium (TM) processor 

; The presence of a floating-point unit is 

; indicated in fp_flag (l=present). 

; The variable infinity is used to determine if 

an Intel287(TM) NPX (2) is being used with an Intel386 cpu 



FEATURE DETERMINATION 



or an Intel3 87 (TM) NPX (3) is being used. 

Part 2: Prints out the appropriate message. This part can 

be removed or modified if this program is not used in a 
DOS-based system. Portions affected are at the end of the 
data segment and the print procedure in the code 
segment . 

This program uses 32-bit instructions and operands. 
For use on 16-bit assemblers, replace 32-bit instructions 
with 16-bit versions and use the override prefix 66H, for 
example : 

Instead of: POPFD EAX 

MOV ECX, EAX 

Use: DB 66H 

POPF AX 
DB 66H 
MOV CX, AX 



TITLE CPUID 
DOSSEG 
.model small 
.stack lOOh 
.486 

CPUID MACRO 

db OFh ; Hardcoded opcode for CPUID instruction on Pentium 
CPU 

db 0a2h 

ENDM 

. data 



fp_status 


dw 


? 


saved_cpuid 


dd 


? 


vendor_id 


db 


12 


cpu_type 


db 


? 


model 


db 


? 


stepping 


db 


? 


id_f lag 


db 





fpu__present 


db 





intel_proc 


db 





infinity 


db 






; remove the remaining data declarations if not using print procedure 

id_msg db "This system has a$" 

fp_8087 db " and an 8087 math coprocessor$ 11 



I 



5-3 



FEATURE DETERMINATION 



intel 



fp_80287 


db 


fp_80387 


db 


C8086 


db 


c286 


db 


c386 


db 


c486 


db 




math 


c486nfp 


db 


Pentium 


db 


intel 


db 



" and an Intel287 (TM) math coprocessor$ " 
" and an Intel387 (TM) math coprocessor$ " 
"n 8086/8088 microprocessors 11 
"n Intel 286 microprocessors" 
"n Intel386(TM) microprocessors" 

"n Intel486 (TM) DX microprocessor or Intel487 (TM) SX 
coprocessors " 

"n Intel486 SX microprocessors" 

" Pentium (TM) microprocessor ", 13 , 10 , "$ " 

" This system contains a Genuine Intel 







processor" ,13,10, 


" $ " 


modelmsg 


db 


"Model: 


$" 


steppingmsg 


db 


" Stepping : 


$" 


f amilymsg 


db 


"Processor Family 


: $" 


period 


db 


" . " , 13, 10, "$" 




dataCR 


db 


?, 13, 10, "$" 




intel_id 


db 


"Genuinelntel " 





The purpose of this code is to allow the user the 
ability to identify the processor and coprocessor 
that is currently in the system. The algorithm of 
the program is to first determine the processor 
id. When that is accomplished, the program continues 
to then identify whether a coprocessor exists 
in the system. If a coprocessor or integrated 
coprocessor exists, the program will identify 
the coprocessor id. 



. code 

start: mov ax, ©data 

mov ds , ax 
mov es, ax 
mov ebx, esp 
and esp, not 3 
pushf d 

call get_cpuid 
call check_fpu 
call print 
popf d 

mov esp, ebx 
mov ax, 4c Oh 
int 2 lh 



set segment register 

set segment register 

save current stack pointer to align 

align stack to avoid AC fault 

save for restoration at end 



restore original stack pointer 
terminate program 



get_cpuid proc 

8086 CPU check 



5-4 



FEATURE DETERMINATION 



Bits 12-15 are always set on the 8086 processor. 



check_8086 : 

push ebx 

push ecx 

pushf 

pop bx 

mov ax, Of f f h 

and ax , bx 

push ax 

popf 

pushf 

pop ax 

and ax,0f000h 

cmp ax,0f000h 

mov cpu_type , 

je end_get_cpuid 



save EFLAGS 

store EFLAGS in BX 

clear bits 12-15 

in EFLAGS 
store new EFLAGS value on stack 
replace current EFLAGS value 
set new EFLAGS 
store new EFLAGS in AX 
if bits 12-15 are set, then CPU 
is an 8086/8088 
turn on 8086/8088 flag 
if CPU is 8086/8088, check for 8087 



Intel 286 CPU check 

Bits 12-15 are always clear on the Intel 286 processor. 



check_802 86: 

or bx,0f000h 

push bx 

popf 

pushf 

pop ax 

and ax,0f000h 

mov cpu_type , 2 

j z end_get_cpuid 



try to set bits 12-15 



if bits 12-15 are cleared, CPU-Intel 286 

turn on Intel 2 86 CPU flag 

if CPU is Intel 286, 

check for Intel287 math coprocessor 



Intel386 CPU check 

The AC bit, bit #18, is a new bit introduced in the EFLAGS 
register on the Intel486 DX CPU to generate alignment faults. 
This bit can not be set on the Intel386 CPU. 



check_Intel386 : 
pushf d 
pop eax 
mov ecx, eax 
xor eax,40000h 
push eax 
popf d 
pushf d 
pop eax 
xor eax, ecx 
mov cpu_type , 3 



get original EFLAGS 

save original EFLAGS 

flip AC bit in EFLAGS 

save for EFLAGS 

copy to EFLAGS 

push EFLAGS 

get new EFLAGS value 

can't toggle AC bit, CPU=Intel386 

turn on Intel386 CPU flag 



I 



5-5 



FEATURE DETERMINATION 



je end_get_cpuid ; if CPU is Intel3 86, now check 

; for an Intel287 or Intel387 MCP 



Intel486 DX CPU, Intel487 SX MCP, and Intel486 SX CPU checking 



Checking for ability to set/clear ID flag (Bit 21) in EFLAGS 
which differentiates between a Pentium CPU or other 
processor with the ability to use the CPUID instruction. If this 
bit cannot be set, CPU=Intel486 . 



check_Intel486 : 

mov cpu_type , 4 

pushf d 

pop eax 

mov ecx,eax 

xor eax,200000h 

push eax 

popf d 

pushf d 

pop eax 

xor eax, ecx 

je end_get_cpuid 



turn on Intel486 CPU flag 

push original EFLAGS 

get original EFLAGS in eax 

save original EFLAGS in ecx 

flip ID bit in EFLAGS 

save for EFLAGS 

copy to EFLAGS 

push EFLAGS 

get new EFLAGS value 

if ID bit cannot be changed, CPU=Intel486 
without CPUID instruction functionality 



Otherwise, execute CPUID instruction to determine vendor, 
family, model and stepping. 



check_vendor : 

mov id_flag, 
mov eax, 
CPUID 



set flag indicating use of CPUID inst . 
set up for CPUID instruction 
macro for CPUID instruction 



mov dword ptr vendor_id, ebx 



mov 
mov 
mov 
mov 
mov 
compare : 

repe cmpsb 
cmp ecx, 
jne cpuid_data 



dword ptr vendor_id [+4] , 
dword ptr vendor_id [+8 ] , 
esi, offset vendor_id 
edi, offset intel_id 
ecx, length intel_id 



; Test for 
; vendor id 

edx 

ecx 



" Genuinelntel " 



must be Genuinelntel if ecx 



intel_processor : 

mov intel_proc, 



cpuid_data: 

mov eax , 1 
CPUID 



5-6 



FEATURE DETERMINATION 



mov saved_cpuid, eax 

and eax, 0F00H 

shr eax, 8 

mov cpu_type, al 



; save for future use 

; mask everything but family 



set cpu_type with family 



mov eax, saved_cpuid 
mov stepping, al 
and stepping, OFH 



restore data 



isolate stepping info 



mov eax, saved_cpuid 
mov model, al 
and model, OFOH 



isolate model info 



shr model, 4 

end_get_cpuid : 
pop ecx 
pop ebx 
ret 

get_cpuid endp 
check_fpu proc 



Co-processor checking begins here for the 

8086, Intel 286, and Intel386 CPUs. The algorithm is to 

determine whether or not the floating-point 

status and control words can be written to. 

If they are not, no coprocessor exists. If 

the status and control words can be written 

to, the correct coprocessor is then determined 

depending on the processor id. Coprocessor 

checks are first performed for an 8086, Intel 286 

and an Intel486 DX CPU. If the coprocessor id is still 

undetermined, the system must contain an Intel386 CPU. 

The Intel386 CPU may work with either an Intel287 or 

an Intel3 87 math coprocessor. The infinity of the 

coprocessor must be checked to determine the correct 

coprocessor id. 

push eax 



fnstsw fp_status 
mov ax,fp_status 
cmp al , 



f ninit 

mov fp_status , 5a5ah 



check for 8087, Intel287, or 

Intel387 math coprocessor 

reset FP status word 

initialize temp word to 

non-zero value 

save FP status word 

check FP status word 

see if correct status with 

written 



check_control_word 



i 



5-7 



FEATURE DETERMINATION 



mov f pu_present , ; no fpu present 
jmp end_check_fpu 



mtel 



check_control_word : 

fnstcw fp_status 
mov ax, fp_status 
and ax, 103 fh 

cmp ax , 3 f h 

je set_fpu_present 
mov fpu_present, 
jmp end_check_fpu 
set_f pu__present : 

mov fpujresent, 1 



save FP control word 
check FP control word 
see if selected parts 
looks OK 

check that 1 ' s & ' s 
correctly read 



Intel287 and Intel387 math coprocessor check for the Intel386 CPU 



check_inf inity : 

cmp cpu_type 
jne 
fldl 
fldz 
fdiv 
fid 
f chs 
f compp 

fstsw fp_status 



3 

end_check_fpu 

must use default control from FNINIT 
form infinity 

8087 and Intel287 MCP says +inf = -inf 
form negative infinity 
Intel387 MCP says +inf <> -inf 
see if they are the same and remove them 
look at status from FCOMPP 



st 



mov 
mov 
sahf 
jz 
mov 

end_check_f pu : 
pop eax 
ret 

check_fpu endp 



ax, fp_status 
infinity, 2 



end_check_fpu 
infinity, 3 



store Intel287 MCP for fpu infinity 
see if infinities matched 
jump if 8087 or Intel287 MCP is present 
store Intel387 MCP for fpu infinity 



This procedure prints the appropriate cpuid string and 
numeric processor presence status. If the CPUID instruction 
was supported, prints out cpuid info. 



print proc 

push eax 
push ebx 
push ecx 
push edx 
cmp id_flag, 1 



;if set to 1, cpu supported CPUID 



5-8 



FEATURE DETERMINATION 



je 

mov 
mov 
int 

print_86 : 
cmp 
jne 
mov 
mov 
int 
cmp 
je 
mov 
mov 
int 
jmp 

print_2 8 6 
cmp 
jne 
mov 
mov 
int 
cmp 
je 
mov 
mov 
int 
jmp 

print_3 86 
cmp 
jne 
mov 
mov 
int 
cmp 
je 
cmp 
jne 
mov 
mov 
int 
jmp 

print_387 



print_cpuid_data 

dx, offset id_msg 

ah, 9h 

21h 



; instruction 
; print detailed 



;print initial 



CPUID information 



message 



cpu_type , 

print_2 86 

dx, offset c8086 

ah, 9h 

21h 

fpu_present, 
end_print 

dx, offset fp_8087 

ah, 9h 

21h 

end_print 



cpu_type , 2 

print_3 86 

dx, offset c286 

ah, 9h 

21h 

fpu_present, 
end_print 

dx, offset fp_80287 

ah, 9h 

21h 

end_print 



cpu_type , 3 

print_486 

dx, offset c386 

ah, 9h 

21h 

fpu__present , 
end_print 
infinity, 2 
print_3 87 

dx, offset fp_80287 

ah, 9h 

21h 

end_print 



I 



5-9 



FEATURE DETERMINATION 



in 



mov dx, offset fp_80387 

mov ah, 9h 

int 2 In 

jmp end_print 



print_486 : 

cmp fpu_present, 

je print_Intel486sx 

mov dx, offset c486 

mov ah, 9h 

int 21h 

jmp end_print 



print_Intel486sx : 

mov dx, offset c486nfp 

mov ah,9h 

int 21h 

jmp end_print 



print_cpuid_data : 

mov edx, offset familymsg ; print family msg 

mov ah, 9h 

int 2 lh 

mov al , cpu_type 

mov byte ptr dataCR, al 

add byte ptr dataCR, 3 OH ; convert to ASCII 

mov edx, offset dataCR ; print family info 

mov ah, 9h 

int 2 lh 



mov edx, offset steppingmsg ; print stepping msg 

mov ah, 9h 

int 21h 

mov al , stepping 

mov byte ptr dataCR, al 

add byte ptr dataCR, 3 OH ; convert to ASCII 

mov edx, offset dataCR ; print stepping info 

mov ah, 9h 

int 21h 



mov edx, offset modelmsg ; print model msg 

mov ah, 9h 

int 21h 

mov al, model 

mov byte ptr dataCR, al 

add byte ptr dataCR, 3 OH ; convert to ASCII 

mov edx, offset dataCR ; print model info 

mov ah, 9h 

int 21h 



5-10 



FEATURE DETERMINATION 



end_print : 

pop edx 
pop ecx 
pop ebx 
pop eax 
ret 

print endp 

end start 



i 



5-11 



intel 



Numeric 
Applications 



i 



® 



CHAPTER 6 
NUMERIC APPLICATIONS 



The Pentium processor contains a high-performance numerics processing element that 
provides significant numeric capabilities and direct support for floating-point, extended- 
integer, and BCD data types. The Pentium Floating-point Unit (FPU) easily supports powerful 
and accurate numeric applications through its implementation, with radix 2, of the IEEE 
Standard 754 for Floating-Point Arithmetic. The Pentium FPU provides floating-point 
performance comparable to that of large minicomputers while offering compatibility with 
object code for 8087, Intel287™, Intel387 DX, Intel387 SX, and Intel487 DX math 
coprocessors and the Intel486 DX processor. 



The 8087 numeric processor extension (NPX) was designed for use in 8086-family systems. 
The 8086 was the first microprocessor family to partition the processing unit to permit high- 
performance numeric capabilities. The 8087 NPX for this processor family implemented a 
complete numeric processing environment in compliance with an early proposal for IEEE 
Standard 754 for Binary Floating-Point Arithmetic. 

With the Intel287 NPX, high-speed numeric computations were extended to 80286 high- 
performance multitasking and multiuser systems. Multiple tasks using the numeric processor 
extension were afforded the full protection of the 80286 memory management and protection 
features. 

The Intel387 DX and SX math coprocessors are Intel's third generation numerics processors. 
They implement the final IEEE Std 754, adding new trigonometric instructions, and using a 
new design and CHMOS-III process to allow higher clock rates and require fewer clocks per 
instruction. Together, the Intel387 math coprocessor with additional instructions and the 
improved standard brought even more convenience and reliability to numerics programming 
and made this convenience and reliability available to applications that need the high-speed 
and large memory capacity of the 32-bit environment of the Intel386 microprocessor. 

The Intel486 FPU is an on-chip equivalent of the Intel387 DX CPU conforming to both IEEE 
Std 754 and the more recent, generalized IEEE Std 854. Having the FPU on chip results in a 
considerable performance improvement in numerics-intensive computation. 

The Pentium FPU has been completely redesigned over the Intel486 FPU while maintaining 
conformance to both the IEEE Std 754 and 854. Faster algorithms provide at least three times 
the performance over the Intel486 FPU for common operations including ADD, MUL, and 
LOAD. Many applications can achieve five times the performance of the Intel486 FPU or 
more with instruction scheduling and pipelined execution. 



6.1. 



INTRODUCTION TO NUMERIC APPLICATIONS 



6.1.1. 



History 



6-1 



NUMERIC APPLICATIONS 



6.1.2. Performance 

Today, floating-point performance is more important than ever. Applications of personal 
computer workstations, no longer limited to simple spreadsheets and business applications, 
now include sophisticated algorithms such as lab data analysis and three-dimensional graphics. 

Table 6-1 compares the execution times of several Pentium processor numeric instructions 
with the equivalent operations executed on a 66-MHz Intel486 DX2 processor. As indicated in 
the table, the 66-MHz Pentium CPU provides about three times the floating-point performance 
of a 66-MHz Intel486 DX2 CPU. A 66-MHz Pentium processor multiplies 32-bit and 64-bit 
floating-point numbers in about 45 nanoseconds. Of course, the actual performance of the 
processor in a given system depends on the characteristics of the individual application. 



Table 6-1. Numeric Processing Speed Comparisons 



Floating-Point Instruction 


Approximate Performance Ratio: 
66 MHz Pentium™ Processor -r 
66 MHz Intel486™ DX2 


FADD 


ST, ST(i) 


Addition 


3.8 


FDIV 


dword__var 


Division 


2.2 


FYL2X 


ST(0),ST(1) assumed 


Logarithm 


3.1 


FPATAN 


ST(0) assumed 


Arctangent 


2.6 


F2XM1 


ST(0) assumed 


Exponentiation 


4.8 


FLD 


ST(0), ST(i) 


Data Transfer 


4.0 



The processor coordinates its integer and floating-point activities in a manner transparent to 
software. Moreover, built-in coordination facilities allow the integer pipe(s) to proceed with 
other instructions while the FPU is simultaneously executing numeric instructions. See 
Appendix H on how to obtain more information on floating-point instruction pairing as 
programs can exploit this concurrency of execution to further increase system performance and 
throughput. 



6.1.3. Ease of Use 

The 32-bit Intel architectures, with their on-chip FPU (such as the Pentium and Intel486 
CPU's) or NPX's (such as the Intel386 CPU with an Intel387 math coprocessor) are explicitly 
designed to deliver stable, accurate results when programmed using straightforward "pencil 
and paper" algorithms, bringing the functionality and power of accurate numeric computation 
into the hands of the general user. IEEE Std 754 specifically addresses this issue, recognizing 
the fundamental importance of making numeric computations both easy and safe to use. 

These NPX's and FPU's provide more than raw execution speed for computation-intensive 
tasks; bringing the functionality and power of accurate numeric computation into the hands of 
the general user. These features are available in most high-level languages available for these 
processors. 



6-2 



i 



irrtel 



® 



NUMERIC APPLICATIONS 



For example, most computers can overflow when two single-precision floating-point numbers 
are multiplied together and then divided by a third, even if the final result is a perfectly valid 
32-bit number. The FPU delivers the correctly rounded result. Other typical examples of 
undesirable machine behavior in straightforward calculations occur when computing financial 
rate of return, which involves the expression (1 + i) n or when solving for roots of a quadratic 
equation: 



If a does not equal 0, the formula is numerically unstable when the roots are nearly coincident 
or when their magnitudes are wildly different. The formula is also vulnerable to spurious 
over/underflows when the coefficients a, b, and c are all very big or all very tiny. When single- 
precision (4-byte) floating-point coefficients are given as data and the formula is evaluated in 
the FPU's normal way, keeping all intermediate results in its stack, the FPU produces 
impeccable single-precision roots. This happens because, by default and with no effort on the 
programmer's part, the FPU evaluates all those subexpressions with so much extra precision 
and range as to overwhelm almost any threat to numerical integrity. 

If double-precision data and results were at issue, a better formula would have to be used, and 
once again the FPU's default evaluation of that formula would provide substantially enhanced 
numerical integrity over mere double-precision evaluation. 

On most machines, straightforward algorithms will not deliver consistently correct results (and 
will not indicate when they are incorrect). To obtain correct results on traditional machines 
under all conditions usually requires sophisticated numerical techniques that go beyond typical 
programming practice. General application programmers using straightforward algorithms 
will produce much more reliable programs using the Intel architectures. This simple fact 
greatly reduces the software investment required to develop safe, accurate computation-based 
products. 

Beyond traditional numerics support for scientific applications, the Intel architectures have 
built-in facilities for commercial computing. They can process decimal numbers of up to 18 
digits without round-off errors, performing exact arithmetic on integers as large as 2 64 or 10 18 . 
Exact arithmetic is vital in accounting applications where rounding errors may introduce 
monetary losses that cannot be reconciled. 

The Intel FPU's contain a number of optional numerical facilities that can be invoked by 
sophisticated users. These advanced features include directed rounding, gradual underflow, 
and programmed exception-handling facilities. 

These automatic exception-handling facilities permit a high degree of flexibility in numeric 
processing software, without burdening the programmer. While performing numeric 
calculations, the processor automatically detects exception conditions that can potentially 
damage a calculation (for example, XtO or Vx when X < 0). By default, on-chip exception 
logic handles these exceptions so that a reasonable result is produced and execution may 
proceed without program interruption. Alternatively, the processor can invoke a software 
exception handler to provide special results whenever various types of exceptions are detected. 




2a 



i 



6-3 



NUMERIC APPLICATIONS 



6.1.4. Applications 

The Pentium FPU's versatility and performance make it appropriate for a broad array of 
numeric applications. In general, applications that exhibit any of the following characteristics 
can benefit by implementing numeric processing : 

• Numeric data vary over a wide range of values, or include nonintegral values. 

• Algorithms produce very large or very small intermediate results. 

• Computations must be very precise; i.e., a large number of significant digits must be 
maintained. 

• Performance requirements exceed the capacity of traditional microprocessors. 

• Consistently safe, reliable results must be delivered using a programming staff that is not 
expert in numerical techniques. 

Note also that the software development costs can be reduced and performance of systems 
improved that use not only real numbers, but operate on multiprecision binary or decimal 
integer values as well. 

A few examples, which show how the Pentium processor might be used in specific numerics 
applications, are described below. 

• Business data processing — The FPU's ability to accept decimal operands and produce 
exact decimal results of up to 18 digits greatly simplifies accounting programming. 
Financial calculations that use power functions can take advantage of the Intel 
architecture's exponentiation and logarithmic instructions. Many business software 
packages can benefit from the speed and accuracy of the FPU. 

• Simulation — The large (32-bit) memory space and raw speed of the processor make it 
suitable for attacking large simulation problems, which heretofore could only be executed 
on expensive mini and mainframe computers. For example, complex electronic circuit 
simulations using SPICE can be performed. Simulation of mechanical systems using finite 
element analysis can employ more elements, resulting in more detailed analysis or 
simulation of larger systems. 

• Graphics transformations — The FPU can be used in graphics applications such as 
computer-aided design (CAD), with the FPU performing many functions concurrently 
with the execution of integer instructions; these functions include rotation, scaling, and 
interpolation. 

• Process control — The FPU solves dynamic range problems automatically, and its extended 
precision allows control functions to be fine-tuned for more accurate and efficient 
performance. Using the Pentium processor to implement control algorithms also 
contributes to improved reliability and safety, while the processor's speed can be exploited 
in real-time operations. 

• Computer numerical control (CNC) — The FPU can move and position machine tool heads 
with accuracy in real-time. Axis positioning also benefits from the hardware trigonometric 
support provided by the FPU. 



6-4 



i 



NUMERIC APPLICATIONS 



• Robotics — The powerful computational abilities of the Pentium FPU are ideal for on- 
board six-axis positioning. 

• Navigation — Very small, lightweight, and accurate inertial guidance systems can be 
implemented with the FPU. Its built-in trigonometric functions can speed and simplify the 
calculation of position from bearing data. 

• Data acquisition — The FPU can be used to scan, scale, and reduce large quantities of data 
as it is collected, thereby lowering storage requirements and time required to process the 
data for analysis. 

• Digital Signal Processing (DSP) — All DSP-related applications, such as matrix 
multiplication and convolution, can benefit from the pipelined instruction implementation 
of the Pentium processor. 

The preceding examples are oriented toward traditional numerics applications. There are, in 
addition, many other types of systems that do not appear to the end user as computational, but 
can employ the 32-bit Intel architecture's numerical capabilities to advantage. The imaginative 
system designer has an opportunity similar to that created by the introduction of the 
microprocessor itself. Many applications can be viewed as numerically-based if sufficient 
computational power is available to support this view (e.g., character generation for a laser 
printer). This is analogous to the thousands of successful products that have been built around 
"buried" microprocessors, even though the products themselves bear little resemblance to 
computers. 



6.1.5. Programming Interface 

The Intel x86 architectures have a class of instructions known as ESCAPE instructions, all 
having a common format. These ESC instructions are numeric instructions for the FPU. These 
numeric instructions are part of a single integrated instruction set. 

Numeric processing centers around the floating-point register stack. Programmers can treat 
these eight 80-bit registers either as a fixed register set, with instructions operating on 
explicitly-designated registers, or as a classical stack, with instructions operating on the top 
one or two stack elements. 

Internally, the FPU holds all numbers in a uniform 80-bit extended format. Operands that may 
be represented in memory as 16-, 32-, or 64-bit integers, 32-, 64-, or 80-bit floating-point 
numbers, or 18-digit packed BCD numbers, are automatically converted into extended format 
as they are loaded into the FPU registers. Computation results are subsequently converted back 
into one of these destination data formats when they are stored into memory from the FPU 
registers. 

Table 6-2 lists each of the seven numeric data types supported by the FPU, showing the data 
format for each type. The table also shows the approximate range of normalized values that 
can be represented with each type. Denormal values are also supported in each of the real 
types, as required by IEEE Std 854. Denormals are discussed later in this chapter. 



I 



6-5 



NUMERIC APPLICATIONS 




Table 6-2. Numeric Data Types 



Data Type 


Bits 


Significant 

Digits 
(Decimal) 


Approximate Normalized 
Range (Decimal) 


Word integer 


16 


4 


-32,768 < x < + 32,767 


Short integer 


32 


9 


-2x 10 9 <x< + 2x 10 9 


Long integer 


64 


18 


-9x 10 18 <x< + 9x 10 18 


Packed decimal 


80 


18 


- 99.. .99 < x < + 99.. .99 (18 digits) 


Single real 


32 


7 


1.18x10" 38 <|x|<3.40x 10 38 


Double real 


64 


15-16 


2.23 x 10" 308 < | x | < 1 .79 x 10 308 


Extended real* 


80 


19 


3.37 x 10" 4932 < | x | < 1.18 x 10 4932 



* Equivalent to double extended format of IEEE Std 854. 



All operands are stored in memory with the least significant digits starting at the initial 
(lowest) memory address. Numeric instructions access and store memory operands using only 
this initial address. See Chapter 24 for alignment strategies for the different processors. 

Table 6-3 lists the numeric instructions by class. No special programming tools are necessary 
to use the numerical capabilities, because all of the numeric instructions and data types are 
directly supported by the Intel ASM386/ASM486 Assembler, by high-level languages from 
Intel, and by assemblers and compilers produced by many independent software vendors. 
Numeric routines can be written in assembly language or any of the following higher-level 
languages from Intel: 

• PL/M-386/486 

• C-386/486 

• FORTRAN-386/486 

• ADA-386/486 



Table 6-3. Principal Numeric Instructions 



Class 


Instruction Types 


Data Transfer 


Load (all data types), Store (all data types), Exchange 


Arithmetic 


Add, Subtract, Multiply, Divide, Subtract Reversed, Divide Reversed, Square 
Root, Scale, Extract, Remainder, Integer Part, Change Sign, Absolute Value 


Comparison 


Compare, Examine, Test 


Transcendental 


Tangent, Arctangent, Sine, Cosine, Sine and Cosine, 2 X -1 , Y .Log 2 (X), 
Y Log 2 (X+1) 


Constants 


0, 1, 7i, Log 10 2, Log^, Log2l0, Log 2 e 


Processor Control 


Load Control Word, Store Control Word, Store Status Word, Load Environment, 
Store Environment, Save, Restore, Clear Exceptions, Initialize 



6-6 



i 



NUMERIC APPLICATIONS 



All of these high-level languages provide programmers with access to the computational power 
and speed of the 32-bit Intel architectures without requiring an understanding of its 
architecture. Such architectural considerations as concurrency and synchronization are handled 
automatically by these high-level languages. For the assembly language programmer, specific 
rules for handling these issues are discussed in a later section of this manual. 



6.2. ARCHITECTURE OF THE FLOATING-POINT UNIT 

To the programmer, the FPU appears as a set of additional registers, data types, and 
instructions. Refer to Chapter 25 for detailed explanations of the numerical instruction set. This 
section explains the numerical registers and data types of the FPU architecture. 



6.2.1. Numerical Registers 

The numerical registers consist of: 

• Eight individually-addressable 80-bit numeric registers, organized as a register stack. 

• Three 16-bit registers containing: 

— The FPU status word. 

— The FPU control word. 

— The tag word. 

• Error pointers, consisting of: 

— Two 16-bit registers containing selectors for the last instruction and operand. 

— Two 32-bit registers containing offsets for the last instruction and operand. 

— One 1 1-bit register containing the opcode of the last non-control FPU instruction. 
All of the numeric instructions focus on the contents of these FPU registers. 



6.2.1 .1 . THE FPU REGISTER STACK 

The FPU register stack is shown in Figure 6-1. Each of the eight numeric registers in the stack 
is 80 bits wide and is divided into fields corresponding to the processor's extended real data 
type. 



I 



6-7 



NUMERIC APPLICATIONS 










FPU DATA REGISTERS 


















TAG 
FIELD 




79 


78 64 


63 







1 


R7 


SIGN 


EXPONENT 


SIGNIFICAND 








R6 














R5 














R4 














R3 














R2 














R1 














RO 
















15 









47 









CONTROL REGISTER 




INSTRUCTION POINTER 






STATUS REGISTER 




DATA POINTER 






TAG WORD 
























APM7 



Figure 6-1. Floating-point Unit Register Set 



Numeric instructions address the data registers relative to the register on the top of the stack. 
At any point in time, this top-of-stack register is indicated by the TOP (stack TOP) field in the 
FPU status word. Load or push operations decrement TOP by one and load a value into the 
new top register. A store-and-pop operation stores the value from the current TOP register and 
then increments TOP by one. Like stacks in memory, the FPU register stack grows down 
toward lower-addressed registers. 

Many numeric instructions have several addressing modes that permit the programmer to 
implicitly operate on the top of the stack, or to explicitly operate on specific registers relative 
to the TOP. The ASM3 86/486 Assembler supports these register addressing modes, using the 
expression ST(0), or simply ST, to represent the current Stack Top and ST(/) to specify the zth 
register from TOP in the stack (0 < i < 7). For example, if TOP contains 01 IB (register 3 is the 
top of the stack), the following statement would add the contents of two registers in the stack 
(registers 3 and 5): 

FADD ST, ST (2) 

The stack organization and top-relative addressing of the numeric registers can simplify 
subroutine programming by allowing routines to pass parameters on the register stack. By 
using the stack to pass parameters rather than using "dedicated" registers, calling routines gain 
flexibility in how they use the stack. As long as the stack is not full, each routine simply loads 
the parameters onto the stack before calling a particular subroutine to perform a numeric 
calculation. The subroutine then addresses its parameters as ST, ST(1), etc., even though TOP 
may, for example, refer to physical register 3 in one invocation and physical register 5 in 

6-8 ■ 




NUMERIC APPLICATIONS 



another. Programmers can use the numeric registers like a conventional stack as described 
herein, or by using the pipelined architecture of the Pentium processor in conjunction with the 
FXCH instruction, reduce stack bottleneck and move towards a random register machine. 



6.2.1.2. 



THE FPU STATUS WORD 



The 16-bit status word shown in Figure 6-2 reflects the overall state of the FPU. This status 
word may be stored into memory using the FSTSW/FNSTSW, FSTENV/FNSTENV, and 
FSAVE/FNSAVE instructions, and can be transferred into the AX register with the FSTSW 
AX/FNSTSW AX instructions, allowing the FPU status to be inspected by the Integer Unit. 



FPU BUSY 

-TOP OF STACK POINTER 
CONDITION CODE 



TOP 



it 



AAAAAAAA 



ERROR SUMMARY STATUS 

STACK FAULT 

EXCEPTION FLAGS 

PRECISION 



UNDERFLOW 

OVERFLOW 

ZERO DIVIDE 

DENORMALIZED OPERAND 
INVALID OPERATION 



ES IS SET IF ANY UNMASKED EXCEPTION BIT IS SET; CLEARED OTHERWISE. 
SEE TABLE 4-1 FOR INTERPRETATION OF CONDITION CODE. 

TOP VALUES: 

000 = REGISTER IS TOP OF STACK 

001 = REGISTER 1 IS TOP OF STACK 



111 = REGISTER 7 IS TOP OF STACK 



Figure 6-2. FPU Status Word 



The four FPU condition code bits (C 3 -C ) are similar to the flags in a CPU: the processor 
updates these bits to reflect the outcome of arithmetic operations. The effect of these 
instructions on the condition code bits is summarized in Table 6-4. These condition code bits 
are used principally for conditional branching. The FSTSW AX instruction stores the FPU 
status word directly into the AX register, allowing these condition codes to be inspected 
efficiently. The SAHF instruction can copy C 3 -C directly to the CPU's flag bits to simplify 
conditional branching. Table 6-5 shows the mapping of these bits to the CPU flag bits. 

| 6-9 



NUMERIC APPLICATIONS 



Bits 11-13 of the status word point to the FPU register that is the current Top of Stack (TOP). 
The significance of the stack top has been described in the prior section on the register stack. 

Figure 6-2 shows the six exception flags in bits 0-5 of the status word. Bit 7 is the exception 
summary status (ES) bit. ES is set if any unmasked exception bits are set, and is cleared 
otherwise. Bits 0-5 indicate whether the FPU has detected one of six possible exception 
conditions since these status bits were last cleared or reset. (For definitions of exceptions, refer 
to Chapter?.) They are "sticky" bits, and can only be cleared by the instructions FINIT, 
FCLEX, FLDENV, FSAVE, and FRSTOR. 

The B-bit (bit 15) is included for 8087 compatibility only. It reflects the contents of the ES bit 
(bit 7 of the status word). 

Bit 6 is the stack fault (SF) bit. This bit distinguishes invalid operations due to stack overflow 
or underflow from other kinds of invalid operations. When SF is set, bit 9 (Cj) distinguishes 
between stack overflow (C l = 1) and underflow (Cj = 0). 

6.2.1.3. CONTROL WORD 

The FPU provides the programmer with several processing options, which are selected by 
loading a word from memory into the control word. Figure 6-3 shows the format and encoding 
of the fields in the control word. 

The low-order byte of this control word configures the numerical exception masking. Bits 0-5 
of the control word contain individual masks for each of the six floating-point exception 
conditions recognized by the processor. The high-order byte of the control word configures the 
FPU processing options, including 

• Precision control 

• Rounding control 

The precision-control bits (bits 8-9) can be used to set the FPU internal operating precision at 
less than the default precision (64-bit significand). These control bits can be used to provide 
compatibility with the earlier-generation arithmetic processors having less precision than the 
Intel 32-bit FPU's. The precision-control bits affect the results of only the following five 
arithmetic instructions: ADD, SUB(R), MUL, DIV(R), and SQRT. No other operations are 
affected by PC. 

The rounding-control bits (bits 10-1 1) provide for the common round-to-nearest mode, as well 
as directed rounding and true chop. Rounding control affects the arithmetic instructions (refer 
to Section 6.3. in this chapter for lists of arithmetic and nonarithmetic instructions) and certain 
non arthimetic instructions, namely (FLD constant) and (FST(P)mem) instructions. 



6-10 



I 




NUMERIC APPLICATIONS 



Table 6-4. Condition Code Interpretation 



Instruction 


CO 


C3 


C2 


C1 


FCOM, FCOMP, FCOMPP, 
FTST, FUCOMPP, FICOM, 
FICOMP 


Result of Comparison 


Operands is not 
Comparable 


Zero or 0/U# 


FXAM 


Operand class 


Sign or 0/U# 


rrntlvl, rr ntlvl I 


Q2 


Q1 


O=reduction 
complete 

1 reduction 
incomplete 


or vj/u# 


FST, FSTP, FADD, FMUL, 
FDIV, FDIVR, FSUB, 
FSUBR, FSCALE, FSQRT, 
FPATAN, F2XM1, FYL2X, 
FYL2XP1 


UNDEFINED 


nuuiiuup Ul 

o/u# 


FPTAN, FSIN, FCOS, 
FSINCOS 


UNDEFINED 


O=reduction 
complete 

1 reduction 
incomplete 


Roundup or 
0/U# 

(UNDEFINED) if 
C2=1) 


FCHS, FABS, FXCH, 
rINGb 1 r, rUbUb 1 r, 
Constant Loads, FXTRACT, 
FLD, FILD, FBLD, FSTP (ext. 
real) 


UNDEFINED 


Zero or 0/U# 


FLDENV, FRSTOR 


Each bit loaded from memory 


FLDCW, FSTENV, 
FSTCW, FSTSW, 
FCLEX 


UNDEFINED 


FINIT, FSAVE 


Zero 


Zero 


Zero 


Zero 



NOTES 

0/U# — When both IE and SF bits of status word are set, indicating a stack exception, this bit 
distinguishes between stack overflow (C1=1) and underflow (C1=0). 

Reduction — If FPREM and FPREM1 produces a remainder that is les than the modulus, reduction is 
complete. When reduction is incomplete the value at the top of the stack is a partial remainder, which can 
be used as input to further reduction. For FPTAN, FSIN, FCOS and FSINCOS, the reduction bit is set if the 
operand at the top of the stack is too large. In this case, the original operand remains at the top of the 
stack. 

Roundup — When the PE bit of the status word is set, this bit indicates whether the last rounding in the 
instruction was upward. 

UNDEFINED — Do not rely on any specific value in these bits. 



6-11 



NUMERIC APPLICATIONS 



Table 6-5. Correspondence Between FPU and IU Flag Bits 



FPU Flag 


IU Flag 


CO 


CF 


C1 


(none) 


C2 


PF 


C3 


ZF 



RESERVED 

- (INFINITY CONTROL)* 
ROUNDING CONTROL 
PRECISION CONTROL 



X X X RC 



PC 



i 



X X 



RESERVED 

EXCEPTION MASKS 
PRECISION — 
UNDERFLOW - 
OVERFLOW — 
ZERO DIVIDE - 



DENORMALIZED OPERAND 
INVALID OPERATION 



ROUNDING CONTROL 

00 —ROUND TO NEAREST OR EVEN 

01 —ROUND DOWN (TOWARD -«,) 

10— ROUND UP (TOWARD +<*>) 

1 1— CHOP (TRUNCATE TOWARD ZERO) 



PRECISION CONTROL 

00— 24 BITS (SINGLE PRECISION) 

01— (RESERVED) 

10— 53 BITS (DOUBLE PRECISION) 

1 1— 64 BITS (EXTENDED PRECISION) 



THIS "INFINITY CONTROL" BIT IS NOT MEANINGFUL TO THE Intel387™ NPX, THE Intel486™, OR 
THE PENTIUM™ FPU. TO MAINTAIN COMPATIBILITY WITH Intel287™ MATH COPROCESSOR, THIS 
BIT CAN BE PROGRAMMED; HOWEVER, REGARDLESS OF ITS VALUE, THE Intel387 NPX, THE 
Intel486 FPU AND THE PENTIUM FPU TREATS INFINITY IN THE AFFINE SENSE (— < +00). 



Figure 6-3. FPU Control Word Format 



6.2.1 .4. THE FPU TAG WORD 

The tag word (TW) indicates the contents of each register in the register stack, as shown in 
Figure 6-4. The TW is used by the FPU itself to distinguish between empty and nonempty 

6-12 ■ 



NUMERIC APPLICATIONS 



register locations. Programmers of exception handlers may use this tag information to check 
the contents of a numeric register without performing complex decoding of the actual data in 
the register. The tag values from the TW correspond to physical registers 0-7. Programmers 
must use the current top-of-stack (TOP) pointer stored in the FPU status word to associate 
these tag values with the relative stack registers ST(0) through ST(7). 



1 

TAG(7) 

1 


i 

TAG(6) 


i 

TAG(5) 
1 


TAG(4) 

1 


i 

TAG(3) 
1 


1 

TAG(2) 
1 


1 1 

TAG(1) 

i 


-1 — " 
TAG(O) 
i 



TAG VALUES: 

00 = VALID 

01 = ZERO 

10 = SPECIAL:INVALID(NaN, UNSUPPORTED), INFINITY, OR DENORMAL 

11 = EMPTY 

APM17 



Figure 6-4. Tag Word Format 

The exact values of the tags are generated during execution of the FSTENV and FSAVE 
instructions according to the actual contents of the nonempty stack locations. During execution 
of other instructions, the processor updates the TW only to indicate whether a stack location is 
empty or nonempty. As a result, the FPU tag word may not be the same as previously written 
when saving the FPU state, modifying the tag word, and reloading the FPU state. This can be 
demonstrated using the following steps to modify the FPU tag word. This example assumes 
FPU register has the value and tag(0)=ll (empty). Example 6-1 contains the actual 
assembly code to perform these steps. 

1 . FSAVE/FSTENV stores FPU state to memory M. M[tag(0)]=l 1 (empty). 

2. Modify memory such that M[tag(0)]=10 (i.e., special, infinity, or denormal). 

3. FLDENV loads fp state from memory M to FPU. 

4. FSAVE/FSTENV stores FPU state to memory M again. The value of M[tag(0)] will be 01 
(i.e., indicates zero because FPU register has the value of 0). 

Example 6-1. Modifying the Tag Word 

name tagword 

stack stackseg 100 

data segment rw usel6 
fpstate dw 7 dup (?) 
fpstate2 dw 7 dup (?) 
data ends 

code segment er public usel6 

assume ds:data, ss: stack 
start : 

mov ax, data 

■ 6-13 



NUMERIC APPLICATIONS 



mov 



ds , ax 



set segment register 



f init 
fldz 
mov 
f save 



bx, offset fpstate 
[bx] 



initialize FPU 
load zero 

save FPU state 



ax, [bx+4] 



tag word, AX should be 7FFFh, 

top of the fp stack has 

zero value and the rest are empty 



word ptr [bx+4] , 3FFFh 



now change the zero tag (01) to 
the valid tag (00) 



fldenv [bx] 

mov bx, offset fpstate2 

fsave [bx] 

code ends 

end start, dsidata, ss: stack 



now the tag word is 3FFFh 

but we are saving 7FFFh to tag 

word 



6.2.1.5. 



OPCODE FIELD OF LAST INSTRUCTION 



The opcode field in Figure 6-5 describes the 11 -bit format of the last non-control FPU 
instruction executed. The first and second instruction bytes (after all prefixes) are combined to 
form the opcode field. Since all floating-point instructions share the same five upper bits in the 
first instruction byte (following prefixes), they are not stored in the opcode field. Note that the 
second instruction byte is actually located in the low-order byte of the stored opcode field. 



o 7 



115 114 113 112 111 110 19 18 



17 16 15 14 13 12 11 10 



2ND INSTRUCTION BYTE 



1ST INSTRUCTION 
BYTE 



10 v 8 7 



If 



12 11 10 



115 114 113 111 110 19 18 



OPCODE FIELD 



Figure 6-5. Opcode Field 



6-14 



NUMERIC APPLICATIONS 



6.2.1 .6. THE NUMERIC INSTRUCTION AND DATA POINTERS 

The instruction and data pointers provide support for programmed exception-handlers. 
Whenever the processor decodes an ESC instruction other than FINIT, FCLEX, FLDCW, 
FSTCW, FSTSW, FSTSWAX, FSTENV, FLDENV, FSAVE, FRSTOR, and FWAIT, it saves 
the instruction address opcode and the oeprand address (if present) in registers than can be 
accessed by the user. Contents of these registers remain unchanged when any of the control 
instructions listed above is executed. Contents of the operand address register are undefined if 
the prior ESC instruction (which is not one of the above) did not have a memory operand. 

These registers can be accessed by the ESC instructions FSTENV, FLDENV, FSAVE and 
FRSTOR. The FINIT and FSAVE instructions clear these registers after writing them to 
memory. 

When stored in memory, the instruction and data pointers appear in one of four formats, 
depending on the operating mode of the processor (protected mode or real-address mode) and 
depending on the operand-size attribute in effect (32-bit operand or 16-bit operand). In virtual- 
8086 mode, the real-address mode formats are used. Figures 6-6 through Figure 6-9 show 
these pointers as they are stored following an FSTENV instruction. The FSTENV and FSAVE 
instructions store this data into memory, allowing exception handlers to determine the precise 
nature of any numeric exceptions that may be encountered. 

For all the Intel x86 FPU and NPX architectures, the instruction address saved points to any 
prefixes that preceded the instruction, except the 8087, for which the instruction address points 
only to the ESC instruction opcode. 



32-BIT PROTECTED MODE FORMAT 



31 


23 


15 


7 




RESERVED 


CONTROL WORD 


OH 


j 

RESERVED 

m 1 , , 


1 

STATUS WORD 

1 


4H 


RESERVED 


TAG WORD 


8H 


IP OFFSET 

i i 


CH 





OPCODE 10...00 


CS SELECTOR 


10H 


DATA OPERAND OFFSET 


14H 


I 

RESERVED 

1 


OPERAND SELECTOR 
1 


18H 



APM13 



Figure 6-6. Protected Mode Numeric Instruction and Data Pointer Image in Memory, 

32-Bit Format 



I 



6-15 



NUMERIC APPLICATIONS 




32-BIT REAL-ADDRESS MODE FORMAT 



31 




23 


15 






7 




RESERVED 


CONTROL WORD 


OH 


RESERVED 

I 


1 

STATUS WORD 

1 


4H 


RESERVED 


TAG WORD 


8H 


I 

RESERVED 


1 

INSTRUCTION POINTER 15...00 


CH 








INSTRUCTION POINTER 31 ...16 

i 





OPCODE 10...00 

I 


10H 


RESERVED 


OPERAND PONTER 15...00 


14H 








OPERAND POINTER 31...16 

t 


I 

000000000000 

.1 


18H 



APM15 



Figure 6-7. Real Mode Numeric Instruction and Data Pointer Image in Memory, 

32-Bit Format 



16-BIT PROTECTED MODE FORMAT 




15 7 






CONTROL WORD 


OH 




1 

STATUS WORD 


2H 




1 

TAG WORD 


4H 




1 

IP OFFSET 


6H 




1 

CS SELECTOR 


8H 




1 

OPERAND OFFSET 


AH 




1 

OPERAND SELECTOR 

i 


CH 




APM12 



Figure 6-8. Protected Mode Numeric Instruction and Data Pointer Image in Memory, 

16-Bit Format 



6-16 



i 




NUMERIC APPLICATIONS 



16-BIT REAL-ADDRESS MODE AND 
VIRTUAL 8086 MODE FORMAT 




15 




7 






CONTROL WORD 

1 


OH 




STATUS WORD 


2H 




1 

TAG WORD 

1 


4H 




INSTRUCTION POINTER 15...0 


6H 




IP 19..16 





I 

OPCODE 10...0 


8H 




OPERAND POINTER 15...0 

i 


AH 




DP19..16 





00000000000 

i 


CH 






APM14 



Figure 6-9. Real Mode Numeric Instruction and Data Pointer Image in Memory, 

16-Bit Format 



6.2.2. Computation Fundamentals 

This section covers numeric programming concepts that are common to all applications. It 
describes the FPU's internal number system and the various types of numbers that can be 
employed in numeric programs. The most commonly used options for rounding and precision 
(selected by fields in the control word) are described, with exhaustive coverage of less 
frequently used facilities deferred to later sections. Exception conditions that may arise during 
execution of floating-point instructions are also described along with the options that are 
available for responding to these exceptions. 

6.2.2.1 . NUMBER SYSTEM 

The system of real numbers that people use for pencil and paper calculations is conceptually 
infinite and continuous. There is no upper or lower limit to the magnitude of the numbers one 
can employ in a calculation, or to the precision (number of significant digits) that may be 
required to represent them. For any given real number, there are always arbitrarily many 
numbers both larger and smaller. There are also arbitrarily many numbers between any two 
real numbers. For example, between 2.5 and 2.6 are 2.51, 2.5897, 2.500001, etc. 

While ideally it would be desirable for a computer to be able to operate on the entire real 
number system, in practice this is not possible. Computers, no matter how large, ultimately 
have fixed-size registers and memories that limit the system of numbers that can be 
accommodated. These limitations determine both the range and the precision of numbers. The 
result is a set of numbers that is finite and discrete, rather than infinite and continuous. This 
sequence is a subset of the real numbers that is designed to form a useful approximation of the 
real number system. 

Figure 6-10 superimposes the basic floating-point number system on a real number line 
■ 6-17 



NUMERIC APPLICATIONS 




(decimal numbers are shown for clarity, although the processor actually represents numbers in 
binary). The dots indicate the subset of real numbers the processor can represent as data and 
final results of calculations. The range of double-precision, normalized numbers is 
approximately ±2.23 x 10~ 308 to ±1.79 x 10 308 . Applications that are required to deal with data 
and final results outside this range are rare. 



NEGATIVE RANGE POSITIVE RANGE 




APM4 



Figure 6-10. Double-Precision Number System 

The finite spacing in Figure 6-10 illustrates that the FPU can represent a great many, but not 
all, of the real numbers in its range. There is always a gap between two adjacent floating-point 
numbers, and it is possible for the result of a calculation to fall in this space. When this occurs, 
the FPU rounds the true result to a number that it can represent. Thus, a real number that 
requires more digits than the FPU can accommodate (e.g., a 20-digit number) is represented 
with some loss of accuracy. Notice also that the representable numbers are not distributed 
evenly along the real number line. In fact, the same number of representable numbers exists 
between any two successive powers of 2 (i.e., as many representable numbers exist between 2 
and 4 as between 65,536 and 131,072). Therefore, the gaps between representable numbers are 
larger as the numbers increase in magnitude. All integers in the range ±2 64 (approximately 
±10 19 ), however, are exactly representable. 

In its internal operations, the FPU actually employs a number system that is a substantial 
superset of that shown in Figure 6-10. The internal format (called extended real) extends the 
representable (normalized) range to about ±3.37 x lO" 4932 ^ ±1.18 x 10 4932 , and its precision to 
about 19 (equivalent decimal) digits. This format is designed to provide extra range and 
precision for constants and intermediate results, and is not normally intended for data or final 
results. 



6-18 



i 



NUMERIC APPLICATIONS 



From a practical standpoint, the processor's set of real numbers is sufficiently large and dense 
so as not to limit the vast majority of applications. Compared to most computers, including 
mainframes, the processor provides a very good approximation of the real number system. It is 
important to remember, however, that it is not an exact representation, and that computer 
arithmetic on real numbers is inherently approximate. 

6.2.2.2. DATA TYPES AND FORMATS 

The processor recognizes seven numeric data types for memory-based values, divided into 
three classes: binary integers, packed decimal integers, and binary reals. How these formats are 
stored in memory are discussed later in this section (the sign is always located in the highest- 
addressed byte). 

Figure 6-11 summarizes the format of each data type. In the figure, the most significant digits 
of all numbers (and fields within numbers) are the leftmost digits. 

6.2.2.2.1 . Binary Integers 

The three binary integer formats are identical except for length, which governs the range that 
can be accommodated in each format. The leftmost bit is interpreted as the number's sign: 
0=positive and l=negative. Negative numbers are represented in standard two's complement 
notation (the binary integers are the only format to use two's complement). The quantity zero is 
represented with a positive sign (all bits are 0). The word integer format is identical to the 16- 
bit signed integer data type; the short integer format is identical to the 32-bit signed integer 
data type. 

The binary integer formats exist in memory only. When used by the FPU, they are 
automatically converted to the 80-bit extended real format. All binary integers are exactly 
representable in the extended real format. 

6.2.2.2.2. Decimal Integers 

Decimal integers are stored in packed decimal notation, with two decimal digits "packed" into 
each byte, except the leftmost byte, which carries the sign bit (0=positive, l=negative). 
Negative numbers are not stored in two's complement form and are distinguished from positive 
numbers only by the sign bit. The most significant digit of the number is the leftmost digit. All 
digits must be in the range 0-9. 



I 



6-19 



NUMERIC APPLICATIONS 




DATA 
FORMATS 


R 
A 
N 
G 
E 


PRE- 
CISION 


MOST SIGNIFICANT BYTE HIGHEST ADDRESSED BYTE 


7 


7 


7 


7 


7 


7 


7 


7 


7 


7 


WORD 
INTEGER 


10 4 


1 6 
BITS 








(TWO'S 

COMPLEMENT) 


15 




SHORT 
INTEGER 


,o 9 


32 
BITS 








(TWO'S 

COMPLEMENT) 


31 




LONG 
INTEGER 


n 18 


64 
BITS 








(TWO'S 

COMPLEMENT) 


63 





PACKED 
BCD 




1 8 
DIGITS 




S 


X 


dddddddddddddddddd 
17| 16| 1 5 1 1 4 1 13| 12| 11| 1 1 9| 8 | 7, 6, 5, 4, 3, 2, 1, 


79 72 


SINGLE 
PRECISION 


±38 
1 


24 
BITS 






S 


BIASED 
EXPONENT 


SIGNIFICAND 




31 23 




DOUBLE 
PRECISION 


+ 308 

10 


53 
BITS 






S 


BIASED 
EXPONENT 


SIGNIFICAND 




63 52 




EXTENDED 
PRECISION 


■± 493; 
1 


64 
BITS 




S 


EXPONENT 71 SIGNIFICAND 


79 64 63 D 



(1) S = SIGN BIT (0 = positive, 1 = negative) 

(2) dn = DECIMAL DIGIT (TWO PER BYTE) 

(3) X = BITS HAVE NO SIGNIFICANCE; IGNORES WHEN LOADING, 

ZEROS WHEN STORING 

(4) D = POSITION OF IMPLICIT BINARY POINT 

(5) I = INTEGER BIT OF SIGNIFICAND; STORED IN TEMPORARY REAL, IMPLICIT IN 

SINGLE AND DOUBLE PRECISION 

(6) EXPONENT BIAS (NORMALIZED VALUES): 

SINGLE: 127 (7FH) 
DOUBLE: 1 023 (3FFH) 
EXTENDED REAL: 16383 (3FFFH) 

(7) PACKED BCD: (-1) (D-| 7 ...D ) 

(8) REAL: (-1) ( 2 E - B I AS ) ( F Q F 1 ...) 

APM3 



Figure 6-11. Numerical Data Formats 



The decimal integer format exists in memory only. When used by the FPU, it is automatically 
converted to the 80-bit extended real format. All decimal integers are exactly representable in 
the extended real format. 



6.2.2.2.3. Real Numbers 

The processor represents real numbers of the form: 
6-20 



NUMERIC APPLICATIONS 



(-l) s 2E(b Abib 2 b3..b p . 1 ) 

where: 
s = or 1 

E = any integer between Emin and Emax, inclusive 
b t = or 1 

p = number of bits of precision 

Table 6-6 summarizes the parameters for each of the three real-number formats. 

The Pentium processor stores real numbers in a three-field binary format that resembles 
scientific, or exponential, notation. The format consists of the following fields: 

• The significand field, b 0A b 1 b 2 b 3 ..b p A , is the number's significant digits. (The term 

"significand" is analogous to the term "mantissa" used to describe floating-point numbers 
on some computers.) 

• The exponent field, e = E+bias, locates the binary point within the significant digits (and 
therefore determines the number's magnitude). (The term "exponent" is analogous to the 
term "characteristic" used to describe floating-point numbers on some computers.) 

• The 1-bit sign field, which indicates whether the number is positive or negative. Negative 
numbers differ from positive numbers only in the sign bits of their significands. 



Tabie 6-6. Summary of Format Parameters 





Format 


Parameter 


Single 


Double 


Extended 


Format width in bits 


32 


64 


80 


p (bits of precision) 


24 


53 


64 


Exponent width in bits 


8 


11 


15 


Emax 


+127 


+ 1023 


+16383 


Emin 


-126 


-1022 


-16382 


Exponent bias 


+127 


+1023 


+16383 



Table 6-7 shows how the real number 178.125 (decimal) is stored in the single real format. 
The table lists a progression of equivalent notations that express the same value to show how a 
number can be converted from one form to another. (The ASM386/486 and PL/M386/486 
language translators perform a similar process when they encounter programmer-defined real 
number constants.) Note that not every decimal fraction has an exact binary equivalent. The 
decimal number 1/10, for example, cannot be expressed exactly in binary (just as the number 
1/3 cannot be expressed exactly in decimal). When a translator encounters such a value, it 
produces a rounded binary approximation of the decimal value. 



i 



6-21 



NUMERIC APPLICATIONS 




Table 6-7. Real Number Notation 



Notation 


Value 


Ordinary Decimal 


178.125 


Scientific Decimal 


1 A78125E2 


Scientific Binary 


1 a 01 1001 0001 E1 11 


Scientific Binary 
Biased Exponent) 


1 a01 1001 0001 E1 00001 10 


Single Format (Normalized) 


Sign 


Biased Exponent 


Signifcand 





10000110 


01 1 001 0001 0000000000000 
1 A (implict) 



The FPU usually carries the digits of the significand in normalized form. This means that, 
except for the value zero, the significand contains an integer bit and fraction bits as follows: 

l A fff...ff 

where A indicates an assumed binary point. The number of fraction bits varies according to the 
real format: 23 for single, 52 for double, and 63 for extended real. By normalizing real 
numbers so that their integer bit is always a 1 , the processor eliminates leading zeros in small 
values (I X I < 1). This technique maximizes the number of significant digits that can be 
accommodated in a significand of a given width. Note that, in the single and double formats, 
the integer bit is implicit and is not actually stored; the integer bit is physically present in the 
extended format only. 

If one were to examine only the significand with its assumed binary point, all normalized real 
numbers would have values greater than or equal to one and less than two. The exponent field 
locates the actual binary point in the significant digits. Just as in decimal scientific notation, a 
positive exponent has the effect of moving the binary point to the right, and a negative 
exponent effectively moves the binary point to the left, inserting leading zeros as necessary. 
An unbiased exponent of zero indicates that the position of the assumed binary point is also the 
position of the actual binary point. The exponent field, then, determines a real number's 
magnitude. 

In order to simplify comparing real numbers (e.g., for sorting), the processor stores exponents 
in a biased form. This means that a constant, called a bias, is added to the true exponent 
described above. As Table 6-6 shows, the value of this bias is different for each real format. It 
has been chosen so as to force the biased exponent to be a positive value. This allows two real 
numbers (of the same format and sign) to be compared as if they are unsigned binary integers. 
That is, when comparing them bitwise from left to right (beginning with the leftmost exponent 
bit), the first bit position that differs orders the numbers; there is no need to proceed further 
with the comparison. A number's true exponent can be determined simply by subtracting the 
bias value of its format. 

The single and double real formats exist in memory only. If a number in one of these formats 
is loaded into an FPU register, it is automatically converted to extended format, the format 
used for all internal operations. Likewise, data in registers can be converted to single or double 
real for storage in memory. The extended real format may be used in memory also, typically to 



6-22 



i 



NUMERIC APPLICATIONS 



store intermediate results that cannot be held in registers. 

Most applications should use the double format to store real-number data and results; it 
provides sufficient range and precision to return correct results with a minimum of 
programmer attention. The single real format is appropriate for applications that are 
constrained by memory, but it should be recognized that this format provides a smaller margin 
of safety. It is also useful for the debugging of algorithms, because roundoff problems will 
manifest themselves more quickly in this format. The extended real format should normally be 
reserved for holding intermediate results, loop accumulations, and constants. Its extra length is 
designed to shield final results from the effects of rounding and overflow/underflow in 
intermediate calculations. However, the range and precision of the double format are adequate 
for most microcomputer applications. 

6.2.2.3. ROUNDING CONTROL 

Internally, the FPU employs three extra bits (guard, round, and sticky bits) that enable it to 
round numbers in accord with the infinitely precise true result of a computation; these bits are 
not accessible to programmers. Whenever the destination can represent the infinitely precise 
true result, the FPU delivers it. Rounding occurs in arithmetic and store operations when the 
format of the destination cannot exactly represent the infinitely precise true result. For 
example, a real number may be rounded if it is stored in a shorter real format, or in an integer 
format. Or, the infinitely precise true result may be rounded when it is returned to a register. 

The FPU has four rounding modes, selectable by the RC field in the control word (see 
Figure 6-3). Given a true result b that cannot be represented by the target data type, the FPU 
determines the two representable numbers a and c that most closely bracket b in value (a<b < 
c). The processor then rounds (changes) b to a or to c according to the mode selected by the 
RC field as shown in Table 6-8. Rounding introduces an error in a result that is less than one 
unit in the last place to which the result is rounded. 

• "Round to nearest" is the default mode and is suitable for most applications; it provides 
the most accurate and statistically unbiased estimate of the true result. 

• The "chop" or "round toward zero" mode is provided for integer arithmetic applications. 

• "Round up" and "round down" are termed directed rounding and can be used to 
implement interval arithmetic. Interval arithmetic is used to determine upper and lower 
bounds for the true result of a multistep computation, when the intermediate results of the 
computation are subject to rounding. 

Rounding control affects only the arithmetic instructions (refer to Section 6.3. in this chapter 
for lists of arithmetic and nonarithmetic instructions). 



I 



6-23 



NUMERIC APPLICATIONS 




Table 6-8. Rounding Modes 



RC Field 


Rounding Mode 


Rounding Action 


00 


Round to Nearest 


Closer to b of a or c; if equally close, select 
even number (the one whose least 
significant bit is zero). 


01 


Round Down (toward -<*>) 


a 


10 


Round up (toward +<*>) 


c 


11 


Chop (toward 0) 


Smaller in magnitude of a or c. 



NOTE: a < b<c; a and c are successive representable numbers; b is not representable 



6.2.2.4. PRECISION CONTROL 

The FPU allows results to be calculated with either 64, 53, or 24 bits of precision in the 
significand as selected by the precision control (PC) field of the control word. The default 
setting (following FINIT), and the one that is best suited for most applications, is the full 64 
bits of significance provided by the extended real format. The other settings are required by the 
IEEE standard and are provided to obtain compatibility with the specifications of certain 
existing programming languages. Specifying less precision nullifies the advantages of the 
extended format's extended fraction length. When reduced precision is specified, the rounding 
of the fractional value clears the unused bits on the right to zeros. Precision Control affects 
only the instructions FADD, FSUB, FMUL, FDIV, and FSQRT. 

6.3- FLOATING-POINT INSTRUCTION SET 

The floating-point instructions available on the Pentium processor can be grouped into six 
functional classes: 

• Data Transfer Instructions 

• Nontranscendental Instructions 

• Comparison Instructions 

• Transcendental Instructions 

• Constant Instructions 

• Control Instructions 

In this chapter, the instruction classes are described as a collection of resources available to 
programmers. For details of format, encoding, and execution times, see the instruction 
reference pages in Chapter 25. 

The Intel387 math coprocessors and the Intel486 and Pentium FPU's have more instructions 
than the 8087/Intel287 math coprocessors. Some Intel386 DX microprocessor systems use an 
Intel287 math coprocessor. See Chapter 5 for examples of how to identify the processor type 
and determine what instructions are available. 



6-24 



i 



NUMERIC APPLICATIONS 



6.3.1 . Source and Destination Operands 

The typical floating-point instruction takes one or two operands, which can come from the 
FPU register stack or from memory. Many instructions, such as FSIN, automatically operate 
on the top FPU stack element. Others allow, or require, the programmer to code the operand(s) 
explicitly along with the instruction mnemonic. Still others accept one explicit operand and 
one implicit operand (usually the top FPU stack element). 

Whether specified by the programmer or supplied by default, floating-point operands are of 
two basic types, sources and destinations. A source operand provides an input to an 
instruction, but is not altered by its execution. Even when an instruction converts the source 
operand from one format to another (e.g., real to integer), the conversion is performed in an 
internal work area to avoid altering the source operand. A destination operand may also 
provide an input to an instruction; on execution, however, the instruction returns a result to the 
destination, overwriting its previous contents. 

Many instructions allow their operands to be coded in more than one way. For example, 
FADD (add real) may be written without operands, with only a source, or with a destination 
and a source. When both destination and source operands are specified, the destination must 
precede the source on the command line, and both must come from the FPU stack. 

Memory operands can be coded with any of the memory-addressing methods provided by the 
ModR/M byte. To review these methods (BASE = (INDEX X SCALE) + DISPLACEMENT), 
refer to Chapter 3. Floating-point instructions with memory operands either read from memory 
or write to it; no floating-point instruction does both. For a detailed description of each 
instruction, including its range of possible encodings, see the reference pages in Chapter 25. 



6.3.2. Data Transfer Instructions 

These instructions (summarized in Table 6-9) move operands among elements of the register 
stack, and between the stack top and memory. Any of the seven data types can be converted to 
extended-real and loaded (pushed) onto the stack in a single operation; they can be stored to 
memory in the same manner. The data transfer instructions automatically update the FPU tag 
word to reflect whether the register is empty or full following the instruction. 



Table 6-9. Data Transfer Instructions 



Real 


Integer 


Packed Decimal 


FLD Load Real 
FST Store Real 
FSTP Store Real and Pop 

FXCH Exchange register 
Contents 


FILD Load Integer 
FIST Store Integer 
FISTP Store Integer and Pop 


FBLD Load Packed Decimal 

FBSTP Store Packed Decimal 
and Pop 



6-25 



NUMERIC APPLICATIONS 



6.3.3. Nontranscendental Instructions 

The nontranscendental instruction set provides a wealth of variations on the basic add, 
subtract, multiply, and divide operations, and a number of other useful functions. These range 
from a simple absolute value instruction to instructions which perform exact modulo division, 
round real numbers to integers, and scale values by powers of two. Table 6-10 shows the 
nontranscendental operations provided, apart from basic arithmetic. 



Table 6-10. Nontranscendental Instructions (Besides Arithmetic) 



Mnemonic 


Operation 


FSQRT 


Square Root 


FSCALE 


Scale 


FXTRACT 


Extract Exponent and Significand 


FPREM 


Partial Remainder 


FPREM1* 


IEEE Standard Partial Remainder 


FRNDINT 


Round to Integer 


FABS 


Absolute Value 


FCHS 


Change Sign 



* Not available on 8087 or Intel287™ math coprocessor. 



The basic arithmetic instructions (addition, subtraction, multiplication and division) are 
designed to encourage the development of very efficient algorithms. In particular, they allow 
the programmer to reference memory as easily as the FPU register stack. Table 6-11 
summarizes the available operation/operand forms that are provided for basic arithmetic. In 
addition to the four normal operations, there are "reversed" subtraction and division 
instructions which eliminate the need for many exchanges between ST(0) and ST(1). The 
variety of instruction and operand forms give the programmer unusual flexibility: 

• Operands can be located in registers or memory. 

• Results can be deposited in a choice of registers. 

• Operands can be a variety of numerical data types: extended real, double real, single real, 
short integer or word integer, with automatic conversion to extended real performed by the 
FPU. 



6-26 



i 



NUMERIC APPLICATIONS 



Table 6-11. Basic Arithmetic Instructions and Operands 



Instruction Form 


Mnemonic 


Operand Forms: 




Form 


Destination, Source 


Classical Stack 


Fop 


{ST(1),ST} 


Classical Stack, extra pop 


FopP 


{ST(1),ST} 


Register 


Fop 


ST(i), ST or ST, ST(i) 


Register, pop 


FopP 


ST(i), ST 


Real Memory 


Fop 


{ST} single-real/double-real 


Integer Memory 


Flop 


{ST} word-integer/short-integer 



NOTES: 



Braces ({ }) surround implicit operands; these are not coded, but are supplied by the assembler. 

op = ADD DEST <- DEST + SRC 

SUB DEST <- ST - Other Operand 

SUBR DEST <- Other Operand - ST 

MUL DEST <- DEST x SRC 

DIV DEST <- DEST - SRC 

DIVR DEST <- SRC - DEST 



Five basic instruction forms can be used across all six operations, as shown in Table 6-11. The 
classical stack form can be used to make the FPU operate like a classical stack machine. No 
operands are coded in this form, only the instruction mnemonic. The FPU picks the source 
operand from the stack top (ST) and the destination from the next stack element (ST(1)). After 
performing its calculation, it returns the result to ST(1) and then pops ST, effectively replacing 
the operands by the result. 

The register form is a generalization of the classical stack form; the programmer specifies the 
stack top as one operand and any register on the stack as the other operand. Coding the stack 
top as the destination provides a convenient way to access a constant, held elsewhere in the 
stack, from the top stack. The destination need not always be ST, however. The basic two- 
operand instructions allow the use of another register as the destination. Using ST as the source 
allows, for example, adding the stack top into a register used as an accumulator. 

Often the operand in the stack top is needed for one operation but then is of no further use in 
the computation. The register pop form can be used to pick up the stack top as the source 
operand, and then discard it by popping the stack. Coding operands of ST(1), ST with a 
register pop mnemonic is equivalent to a classical stack operation: the top is popped and the 
result is left at the new top. 

The two memory forms increase the flexibility of the nontranscendental instructions. They 
permit a real number or a binary integer in memory to be used directly as a source operand. 
This is useful in situations where operands are not used frequently enough to justify holding 
them in registers. Note that any memory-addressing method can be used to define these 
operands, so they can be elements in arrays, structures, or other data organizations, as well as 
simple scalars. 



I 



6-27 



NUMERIC APPLICATIONS 



6.3.4. Comparison Instructions 

The instructions of this class allow numbers of all supported real and integer data types to be 
compared. Each of these instructions (Table 6-12) analyzes the top stack element, often in 
relationship to another operand, and reports the result as a condition code (flags CO, C2, and 
C3) in the status word. 



Table 6-12. Comparison Instructions 



Mnemonic 


Operation 


FCOM 


Compare Real 


FCOMP 


Compare Real and Pop 


FCOMPP 


Compare Real and Pop Twice 


FICOM 


Compare Integer 


FICOMP 


Compare Integer and Pop 


FTST 


Test 


FUCOM* 


Unordered Compare Real 


FUCOMP* 


Unordered Compare Real and Pop 


FUCOMPP* 


Unordered Compare Real and Pop Twice 


FXAM 


Examine 



The basic operations are compare, test (compare with zero), and examine (report type, sign, 
and normalization). Special forms of the compare operation are provided to optimize 
algorithms by allowing direct comparisons with binary integers and real numbers in memory, 
as well as popping the stack after a comparison. 

The FSTSW AX (store status word) instruction can be used after a comparison to transfer the 
condition code to the AX register for inspection. The TEST instruction is recommended for 
using the FPU flags (once they are in the AX register) to control conditional branching. First 
check to see if the comparison resulted in unordered. This can happen, for instance, if one of 
the operands is a NaN. TEST the contents of the AX register against the constant 0400H; this 
will clear ZF (the Zero Flag of the EFLAGS register) if the original comparison was 
unordered, and set ZF otherwise. The JNZ instruction can then be used to transfer control (if 
necessary) to code that handles the case of unordered operands. With the unordered case now 
filtered out, TEST the contents of the AX register against the appropriate constant from 
Table 6-13, and then use the corresponding conditional branch. 



6-28 



I 




NUMERIC APPLICATIONS 



Table 6-13. TEST Constants for Conditional Branching 



Order 


Constant 


Branch 


ST > Operand 


4500H 


JZ 


ST < Operand 


0100H 


JNZ 


ST = Operand 


4000H 


JNZ 


Unordered 


0400H 


JNZ 



It is not always necessary to filter out the unordered case when using this algorithm for 
conditional jumps. If the software has been thoroughly tested, and incorporates periodic checks 
for QNaN results (as recommended previously), then it is not necessary to check for unordered 
every time a comparison is made. 

Instructions other than those in the comparison group can update the condition code. To ensure 
that the status word is not altered inadvertently, store it immediately following a comparison 
operation. 



6.3.5. Transcendental Instructions 

The instructions in this group (Table 6-14) perform the time-consuming core calculations for 
all common trigonometric, inverse trigonometric, hyperbolic, inverse hyperbolic, logarithmic, 
and exponential functions. The transcendentals operate on the top one or two stack elements, 
and they return their results to the stack. The trigonometric operations assume their arguments 
are expressed in radians. The logarithmic and exponential operations work in base 2. 



Table 6-14. Transcendental Instructions 



Mnemonic 


Operation 


FSIN* 


Sine 


FCOS* 


Cosine 


FSINCOS* 


Sine and Cosine 


FPTAN** 


Tangent 


F PAT AN 


Arctangent of ST(1)-rST 


F2XM1** 


2 X - 1 ; X is in ST 


FYL2X 


Y x log 2 X; Y is in ST(1 ), X is in ST 


FYL2XP1 


Y x log 2 (X + 1 ); Y is in ST(1 ), X is in ST 



NOTES: 

*Not available on 8087/lntel287™ math coprocessor. 
**Operand range extended over 8087/lntel287 math coprocessor. 



The Pentium processor uses new algorithms for transcendental instructions, achieving a higher 
level of accuracy for the same instructions than the Intel486 processor. Accuracy is measured 
in terms of units in the last place (ulp). For a given argument x, let/(x) and F(x) be the correct 



6-29 



NUMERIC APPLICATIONS 



intel 



and computed (approximate) function values respectively. The error in ulps is defined to be 

f(x)-F(x) 
2 * -63 

where k is an integer such that 1 < 2~ k f(x) < 2. 

On the Pentium processor, the worst case error on functions is less than 1 ulp when rounding to 
the nearest-even and less than 1.5 ulps when rounding in other modes. The functions are 
guaranteed to be monotonic, with respect to the input operands, throughout the domain 
supported by the instruction. See Appendix G for detailed information on transcendental 
accuracy. 

The trigonometric functions accept a practically unrestricted range of operands, whereas the 
other transcendental instructions require that arguments be more restricted in range. FPREM or 
FPREM1 can be used to bring the otherwise valid operand of a periodic function into range. 
Prologue and epilogue software can be used to reduce arguments for other instructions to the 
expected range and to adjust the result to correspond to the original arguments if necessary. 
The instruction descriptions in the reference pages of Chapter 25 document the allowed 
operand range for each instruction. 

When the argument of a trigonometric function is in range, it is automatically reduced by the 
appropriate multiple of 2k (in 66-bit precision), by means of the same mechanism used in the 
FPREM and FPREM 1 instructions. The value of k used in the automatic reduction has been 
chosen so as to guarantee no loss of significance in the operand, provided it is within the 
specified range. The internal value of n in hexadecimal is: 

4 * 0.C90FDAA22168C234C 

A program may use an explicit value for n in computations whose results later appear as 
arguments to trigonometric functions. In such a case (in explicit reduction of a trigonometric 
operand outside the specified range, for example), the value used for n should be the same as 
the full 66-bit internal n. This will insure that the results are consistent with the automatic 
argument reduction performed by the trigonometric functions. The 66-bit n cannot be 
represented as an extended-real value, so it must be encoded as two or more numbers. A 
common solution is to represent n as the sum of a higlm which contains the 33 most- 
significant bits and a Iowtt which contains the 33 least- significant bits. When using this two- 
part 71, all computations should be performed separately on each part, with the results added 
only at the end. 

The complications of maintaining a consistent value of n for argument reduction can be 
avoided, either by applying the trigonometric functions only to arguments within the range of 
the automatic reduction mechanism, or by performing all argument reductions (down to a 
magnitude less than n/4) explicitly in software. 



6.3.6. Constant Instructions 

Each of these instructions, shown in Table 6-15, pushes a commonly used constant onto the 
stack. (ST(7) must be empty to avoid an invalid exception.) The values have full extended real 
precision (64 bits) and are accurate to approximately 19 decimal digits. Because an external 
real constant occupies 10 memory bytes, the constant instructions, which are only two bytes 
long, save storage and improve execution speed, in addition to simplifying programming. 



6-30 




NUMERIC APPLICATIONS 



Table 6-15. Constant Instructions 



Mnemonic 


Operation 




Load +U.U 


FLD1 


Load +1 .0 


FLDPI 


Load n 


FLDL2T 


Load log2 1 


FLDL2E 


Load log2e 


FLDLG2 


Load log-jo 2 


FLDLN2 


Load log e 2 



The constants used by these instructions are stored internally in a format more precise than 
extended real. When loading the constant, the FPU rounds the more precise internal constant 
according the RC (rounding control) bit of the control word. However, in spite of this 
rounding, the precision exception is not raised (to maintain compatibility). When the rounding 
control is set to round to nearest, the FPU produces the same constant that is produced by the 
8087 and Intel287 numeric coprocessors. 



6.3.7. Control Instructions 

The FPU control instructions are shown in Table 6-16. The FSTSW instruction is commonly 
used for conditional branching. The remaining instructions are not typically used in 
calculations; they provide control over the FPU for system-level activities. These activities 
include initialization of the FPU, numeric exception handling, and task switching. 

As shown in Table 6-16, certain instructions have alternative mnemonics. The instructions 
which initialize the FPU, clear exceptions, or store (all or part of) the FPU environment come 
in two forms: 

• Wait — the mnemonic is prefixed only with an F, such as FSTSW. This form checks for 
unmasked numeric exceptions. 

• No-wait — the mnemonic is prefixed with an FN, such as FNSTSW. This form ignores 
unmasked numeric exceptions. 

When a control instruction is coded using the no-wait form of the mnemonic, the 
ASM386/Intel486 assembler does not precede the ESC instruction with a WAIT instruction. 
The processor does not test for a floating-point error condition before executing a control 
instruction. 

The only no-wait instructions are those shown in Table 6-16. All other floating-point 
instructions are automatically synchronized by the processor; all operands are transferred 
before the next instruction is initiated. Because of this automatic synchronization, non-control 
floating-point instructions need not be preceded by a WAIT instruction in order to execute 
correctly. 



6-31 



NUMERIC APPLICATIONS 



Exception synchronization relies on the WAIT instruction. Since the Integer Unit and the FPU 
operate in parallel, it is possible in the case of a floating-point exception for the processor to 
disturb information vital to exception recovery before the exception-handler can be invoked. 
Coding a WAIT or FWAIT instruction in the proper place can prevent this. See the next 
section for details. 



Table 6-16. Control Instructions 



Mnemonic 


Operation 


FINIT / FNINIT 


Initialize FPU 


FLDCW 


Load Control Word 


FSTCW/FNSTCW 


Store Control Word 


FSTSW/FNSTSW 


Store Status Word 


FSTSW AX/FNSTSW AX* 


Store Status Word to AX Register 


FCLEX/FNCLEX 


Clear Exceptions 


FSTENV/FNSTENV 


Store Environment 


FLDENV 


Load Environment 


FSAVE/FNSAVE 


Save State 


FRSTOR 


Restore State 


FINCSTP 


Increment Stack Top Pointer 


FDECSTP 


Decrement Stack Top Pointer 


FFREE 


Free Regiser 


FNOP 


No Operation 


FWAIT 


Report FPU Error 



NOTE: 



*Not available on 8087 math coprocessor. 

It should also be noted that the 8087 instructions FENI and FDISI and the Intel287 math 
coprocessor instruction FSETPM perform no function in the Pentium, Intel486 and 
Intel386/Intel387 processors. If these opcodes are detected in the instruction stream, the 32-bit 
processors perform no specific operation and no internal states are affected. Chapter 23 
contains a more complete description of the differences between floating-point operations on 
the Pentium and Intel486 processors and on the 8087, Intel287, and Intel387 DX numeric 
coprocessors. 



6.4. NUMERIC APPLICATIONS 

This section describes how programmers in assembly language and in a variety of higher-level 
6-32 ■ 



NUMERIC APPLICATIONS 



languages can make use of the Intel486 processor's numerics capabilities. 

The level of detail in this section is intended to give programmers a basic understanding of the 
software tools that can be used for numeric programming, but this information does not 
document the full capabilities of these facilities. Complete documentation is available with 
each program development product. 



6.4.1. High-Level Languages 

A variety of Intel high-level languages are available that automatically make use of the 
numeric instruction set when appropriate. These languages include C-386/486 andPL/M- 
386/486. In addition, many high-level language compilers optimized for the Pentium processor 
are available from independent software vendors. 

Each of these high-level languages has special numeric libraries allowing programs to take 
advantage of the capabilities of the FPU. No special programming conventions are necessary 
to make use of the FPU when programming numeric applications in any of these languages. 

Programmers in PL/M-3 86/486 and ASM3 86/486 can also make use of many of these library 
routines by using routines contained in the Support Library. These libraries implement many 
of the functions provided by higher-level languages, including exception handlers, ASCII-to- 
floating-point conversions, and a more complete set of transcendental functions than that 
provided by the processor's numeric instruction set. 

6.4.1.1. C PROGRAMS 

C programmers automatically cause the C compiler to generate Intel486 numeric instructions 
when they use the double and float data types. The float type corresponds to the single real 
format; the double type corresponds to the double real format. The statement #include ( 
math.h) causes mathematical functions such as sin and sqrt to return values of type double. 
Example 6-2 illustrates the ease with which C programs can make use of the processor's 
numerics capabilities. 

6.4.1.2. PL/M-386/486 

Programmers in PL/M-386/486 can access a very useful subset of the FPU's numeric 
capabilities. The PL/M-386/486 REAL data type corresponds to the single real (32-bit) format. 
This data type provides a range of about 8.43 x 10" 37 < I X I < 3.38 x 10 38 , with about seven 
significant decimal digits. This representation is adequate for the data manipulated by many 
microcomputer applications. 



6-33 



NUMERIC APPLICATIONS 



Example 6-2. Sample C Program 

/****************************** 

★ * 

* SAMPLE C PROGRAM * 
****************************^ 

/** Include stdio.h if necessary **/ 

/** Include math declarations for transcendentals and others **/ 

#include <math.h> 

#define PI 3.1415926535897943 

main ( ) 

double sin_result, cos_result; 
double angle_deg = 0.0, angle_rad; 
int i, no_of__trial = 4 ; 

for (i = 1; i <= no__of_trial ; i++) { 

angle_rad = angle_deg * PI / 180.0; 
sin_result = sin (angle_rad) ; 
cos_result = cos (angle_rad) ; 

printf ( " sine of %f degrees equals %f\n", angle_deg, sin_result) ; 
printf ( "cosine of %f degrees equals %f\n\n", angle_deg, 
cos_result) ; 

angle__deg = angle_deg + 3 0.0; 
} 

/** etc. **/ 

} 

The utility of the REAL data type is extended by the PL/M-386/486 compiler's practice of 
holding intermediate results in the extended real format. This means that the full range and 
precision of the processor are utilized for intermediate results. Underflow, overflow, and 
rounding exceptions are most likely to occur during intermediate computations rather than 
during calculation of an expression's final result. Holding intermediate results in extended- 
precision real format greatly reduces the likelihood of overflow and underflow and eliminates 
roundoff as a serious source of error until the final assignment of the result is performed. 

The compiler generates floating-point instructions to evaluate expressions that contain REAL 
data types, whether variables or constants or both. This means that addition, subtraction, 
multiplication, division, comparison, and assignment of REALs will be performed by the FPU. 
INTEGER expressions, on the other hand, are evaluated by the Integer Unit. 

Five built-in procedures (Table 6-17) give the PL/M-Intel386/Intel486 programmer access to 
FPU control instructions. Prior to any arithmetic operations, a typical PL/M-386/486 program 
will set up the FPU using the INIT$REAL$MATH$UNIT procedure and then issue 
SET$REAL$MODE to configure the FPU. SET$REAL$MODE loads the FPU control word, 
and its 16-bit parameter has the format shown previously for the control word. The 
recommended value of this parameter is 033EH (round to nearest, 64-bit precision, all 



6-34 



NUMERIC APPLICATIONS 



exceptions masked except invalid operation). Other settings may be used at the programmer's 
discretion. 



Table 6-17. PL/M-386/486 Built-in Procedures 



Procedure 


FPU Control Instruction 


Description 


INIT$REAL$MATH$UNIT 
SET$REAL$MODE 

GET$REAL$ERROR 

SAVE$REAL$STATUS 

RESTORE$REAL$STATUS 


FINIT 
FLDCW 

FNSTSW& FNCLEX 

FNSAVE 

FRSTOR 


Initialize FPU 

Set exception masks, rounding 
precision, and infinity controls. 

Store, then clear, exception flags. 
Save FPU state. 
Restore FPU state. 



If any exceptions are unmasked, an exception handler must be provided in the form of an 
interrupt procedure that is designated to be invoked via interrupt vector number 16. The 
exception handler can use the GET$REAL$ERROR procedure to obtain the low-order byte of 
the FPU status word and to then clear the exception flags. The byte returned by 
GET$REAL$ERROR contains the exception flags; these can be examined to determine the 
source of the exception. 

The SAVE$REAL$STATUS and RESTORE$REAL$STATUS procedures are provided for 
multitasking environments where a running task that uses the FPU may be preempted by 
another task that also uses the FPU. It is the responsibility of the operating system to issue 
SAVE$REAL$STATUS before it executes any statements that affect the FPU; these include 
the INIT$REAL$MATH$UNIT and SET$REAL$MODE procedures as well as arithmetic 
expressions. SAVE$REAL$STATUS saves the FPU state (registers, status, and control words, 
etc.) on the memory stack. RESTORE$REAL$STATUS reloads the state information; the 
preempting task must invoke this procedure before terminating in order to restore the FPU to 
its state at the time the running task was preempted. This enables the preempted task to resume 
execution from the point of its preemption. 

6.4.1.3. ASM386/486 

The ASM3 86/486 assembly language provides programmers with complete access to all of the 
facilities of the processor. 

6.4.1 .3.1 . Defining Data 

The ASM3 86/486 directives shown in Table 6-18 allocate storage for numeric variables and 
constants. As with other storage allocation directives, the assembler associates a type with any 
variable defined with these directives. The type value is equal to the length of the storage unit 
in bytes (10 for DT, 8 for DQ, etc.). The assembler checks the type of any variable coded in an 
instruction to be certain that it is compatible with the instruction. For example, the coding 
FIADD ALPHA will be flagged as an error if ALPHA'S type is not 2 or 4, because integer 
addition is only available for word and short integer (doubleword) data types. The operand's 
type also tells the assembler which machine instruction to produce; although to the 
programmer there is only an FIADD instruction, a different machine instruction is required for 




6-35 



NUMERIC APPLICATIONS 




each operand type. 



Table 6-18. ASM386/486 Storage Allocation Directives 



Directives 


Interpretation 


Data Types 


DW 


Define Word 


Word integer 


DD 


Define Doubleword 


Short integer, short real 


DQ 


Define Quadword 


Long integer, long real 


DT 


Define Tenbyte 


Packed decimal, temporary real 



On occasion it is desirable to use an instruction with an operand that has no declared type. For 
example, if register BX points to a short integer variable, a programmer may want to code 
FIADD [BX]. This can be done by informing the assembler of the operand's type in the 
instruction, coding FIADD DWORD PTR [BX]. The corresponding overrides for the other 
storage allocations are WORD PTR, QWORD PTR, and TBYTE PTR. 

The assembler does not, however, check the types of operands used in processor control 
instructions. Coding FRSTOR [BP] implies that the programmer has set up register BP to point 
to the location (probably in the stack) where the processor's 94-byte state record has been 
previously saved. 

The initial values for numeric constants may be coded in several different ways. Binary integer 
constants may be specified as bit strings, decimal integers, octal integers, or hexadecimal 
strings. Packed decimal values are normally written as decimal integers, although the 
assembler will accept and convert other representations of integers. Real values may be written 
as ordinary decimal real numbers (decimal point required), as decimal numbers in scientific 
notation, or as hexadecimal strings. Using hexadecimal strings is primarily intended for 
defining special values such as infinities, NaNs, and denormalized numbers. Most 
programmers will find that ordinary decimal and scientific decimal provide the simplest way to 
initialize numeric constants. Example 6-3 compares several ways of setting the various 
numeric data types to the same initial value. 



Example 6-3. Sample Numeric Constants 



THE FOLLOWING ALL ALLOCATE THE CONSTANT: -126 

NOTE TWO'S COMPLEMENT STORAGE OF NEGATIVE BINARY INTEGERS. 



EVEN 

WORD_INTEGER 
SHORT INTEGER 



; FORCE WORD ALIGNMENT 
DW 1111111110000010b 
DD 0FFFFFF82H 



LONG_INTEGER DQ -12 6 
SINGLE_REAL DD -12 6.0 
DOUBLE_REAL DD -1.2 6e2 
PACKED_DECIMAL DT -12 6 



BIT STRING 

HEX STRING MUST START 
WITH DIGIT 
ORDINAL DECIMAL 
NOTE PRESENCE OF . 
SCIENTIFIC 

ORDINARY DECIMAL INTEGER 



IN THE FOLLOWING, SIGN AND EXPONENT IS 'C005' 

SIGNIFICAND IS '7300... 00' , 'R' INFORMS ASSEMBLER THAT 



6-36 



NUMERIC APPLICATIONS 



THE STRING REPRESENTS A REAL DATA TYPE. 
EXTENDED_REAL DT 0C0057E00000000000000R ; HEX STRING 

Note that preceding numeric variables and constants with the ASM3 86/486 EVEN directive 
ensures that the operands will be word-aligned in memory. The best performance is obtained 
when data transfers are aligned. See Chapter 24 for alignment strategies for the different 
processors. All numeric data types occupy integral numbers of words so that no storage is 
"wasted" if blocks of variables are defined together and preceded by a single EVEN 
declarative. 

6.4.1 .3.2. Records and Structures 

The ASM386/486 RECORD and STRUC (structure) declaratives can be very useful in 
numeric programming. The record facility can be used to define the bit fields of the control, 
status, and tag words. Example 6-4 shows one definition of the status word and how it might 
be used in a routine that polls the FPU until it has completed an instruction. 

Example 6-4. Status Word Record Definition 

; RESERVE SPACE FOR STATUS WORD 
STATUS_WORD 

; LAY OUT STATUS WORD FIELDS 
STATUS RECORD 



& 


BUSY : 




1, 


& 


COND_CODE3 : 


1, 


& 


STACK_TOP: 3, 




& 


COND_CODE2 : 


1, 


& 


COND_CODEl : 


1, 


& 


COND_CODE0 : 


1, 


& 


INT_REQ : 


1, 


& 


S_FLAG 




1, 


& 


P_FLAG 




1, 


& 


U_FLAG 




1, 


& 


0_FLAG 




1, 


& 


Z_FLAG 




1, 


& 


D_FLAG 




1, 


& 


I_FLAG 




1 



; REDUCE UNTIL COMPLETE 
REDUCE : 

FPREM1 

FNSTSW STATUS_WORD 

TEST STATUS_WORD, MASK_COND_CODE2 

JNZ REDUCE 

Because structures allow different but related data types to be grouped together, they often 
provide a natural way to represent "real world" data organizations. The fact that the structure 
template may be "moved" about in memory adds to its flexibility. Example 6-5 shows a simple 
structure that might be used to represent data consisting of a series of test score samples. This 



I 



6-37 



NUMERIC APPLICATIONS 



sample structure can be reorganized, if necessary, for the sake of more efficient execution. If 
the two double real fields were listed before the integer fields, then (provided that the structure 
is instantiated only at addresses divisible by eight) all the fields would be optimally aligned for 
efficient memory access and caching. A structure could also be used to define the organization 
of the information stored and loaded by the FSTENV and FLDENV instructions. 

Example 6-5. Structure Definition 

SAMPLE STRUC 

N_OBSDD ? ; SHORT INTEGER 

MEAN DQ ? ; DOUBLE REAL 

MODE DW ? ; WORD INTEGER 

STD_DEV DQ ? ; DOUBLE REAL 

; ARRAY OF OBSERVATIONS -- WORD INTEGER 

TEST_SCORES DW 1000 DUP (?) 
SAMPLE ENDS 

6.4.1.3.3. Addressing Methods 

Numeric data in memory can be accessed with any of the memory addressing methods 
provided by the ModR/M byte and (optionally) the SIB byte. This means that numeric data 
types can be incorporated in data aggregates ranging from simple to complex according to the 
needs of the application. The addressing methods and the ASM3 86/486 notation used to 
specify them in instructions make the accessing of structures, arrays, arrays of structures, and 
other organizations direct and straightforward. Table 6-19 gives several examples of numeric 
instructions coded with operands that illustrate different addressing methods. 



Table 6-19. Addressing Method Examples 



Coding 


Interpretation 


FIADD ALPHA 


ALPHA is a simple scalar (mode is direct). 


FDIVR ALPHA.BETA 


BETA is a field in a structure that is "overlaid" on 
ALPHA (mode is direct). 


FMUL QWORD PTR [BX] 


BX contains the address of a long real variable 
(mode is register indirect). 


FSUB ALPHA [SI] 


ALPHA is an array and SI contains the offset of an 
array element from the start of the array (mode is 
indexed). 


FILD [BP]. BETA 


BP contains the address of a structure on the CPU 
stack and BETA is a field in the structure (mode is 
based). 


FBLD TBYTE PTR [BX] [Dl] 


BX contains the address of a packed decimal array 
and Dl contains the offset of an array element 
(mode is based indexed). 



6-38 



I 



NUMERIC APPLICATIONS 



6.4.1 .4. COMPARATIVE PROGRAMMING EXAMPLE 

The following code examples show the PL/M-3 86/486 and ASM386/486 code for a simple 
numeric program, called ARRSUM. The program references an array (X$ARRAY), which 
contains 0-100 single real values; the integer variable N$OF$X indicates the number of array 
elements the program is to consider. ARRSUM steps through X$ARRAY accumulating three 
sums: 

• SUM$X, the sum of the array values 

• SUM$INDEXES, the sum of each array value times its index, where the index of the first 
element is 1, the second is 2, etc. 

• SUM$SQUARES, the sum of each array element squared 

Example 6-6. Sample PL/M-386/486 Program 

/************************************************************* 

• ARRAYSUM MODULE * 

• * 
************************************** 

array $ sum: do; 

declare (sum$x / sum$indexes, sum$squares ) real; 

declare x$array(100) real; 

declare (n$of$x, i) integer; 

declare control $ FPU literally ' 033eh'; 

/ *Assume x$array and n$of$x are initialized */ 

call init$real$math$unit ; 

call set$real$mode (control $ FPU); 

/* Clear sums */ 

sum$x, sum$ indexes, sum$ squares = 0.0; 

/* Loop through array, accumulating sums */ 
do i = to n$of$x - 1; 

sum$x = sum$x + x$array(i); 

sum$indexes = sum$indexes + (x$array ( i ) *f loat ( i + 1 ) ) ; 
sum$squares = sum$squares + (x$array ( i ) *x$array ( i ) ) ; 
end; 

/* etc. */ 

end array$sum; 

Example 6-7. Sample ASM386/486 Program 

name arraysum 

; Define initialization routine 



6-39 



NUMERIC APPLICATIONS 



extrn initFPUifar 



Allocate space for data 



data segment rw public 



control_FPU 
n_o f _x 



dw 033eh 
dd ? 

dd 100 dup(?) 



x_array 



sum_x 



sum_squares 
sum_indexes 



dd ? 
dd ? 
dd ? 



data ends 

; Allocate CPU stack space 

stack stackseg 400 

; Begin code 

code segment er public 

assume dsrdata, ss: stack 

start : 

mov ax, data 
mov ds , ax 
mov ax, stack 
mov ss, ax 

mov esp, stackstart stack 

; Assume x_array and n_of_x have been initialized 

; Prepare the FPU or its emulator 

call initFPU 

f ldcw control_FPU 

; Clear three registers to hold running sums 



; Setup ECX as loop counter and ESI as index into x_array 

mov ecx, n_of_x 

imul ecx 

mov esi, eax 

; ESI now contains index of last element + 1 
; Loop through x_array and accumulate sum 

sum_next : 

; Back up one element and push on the stack 

sub esi, type x_array 
fid x_array[esi] 



fldz 
fldz 
fldz 



6-40 



NUMERIC APPLICATIONS 



; Add to the sum and duplicate x on the stack 

fadd st (3) , st 
fid st 

; Square it and add into the sum of (index+1) and discard 

fmul st, st 
f addp st (2 ) , st 
fmul n_of_x 
faddpst(2), st 

; Reduce index for next iteration 

loop sum_next 

; Pop sums into memory 

pop_results : 

fstp sum_squares 
fstp sum_indexes 
fstp sum_x 
fwait 

; Etc. 

code ends 

end start, ds:data, ss: stack 

(A true program, of course, would go beyond these steps to store and use the results of these 
calculations.) The control word is set with the recommended values: round to nearest, 64-bit 
precision, interrupts enabled, and all exceptions masked except invalid operation. It is assumed 
that an exception handler has been written to field the invalid operation if it occurs, and that it 
is invoked by interrupt pointer 16. 

The PL/M-3 86/486 version of ARRAYSUM is very straightforward and illustrates how easily 
the numerics capabilities of the Intel486 processor can be used in this language. After 
declaring variables, the program calls built-in procedures to initialize the FPU and to load to 
the control word. The program clears the sum variables and then steps through X$ARRAY 
with a DO-loop. The loop control takes into account PL/M-3 86/486's practice of considering 
the index of the first element of an array to be 0. In the computation of SUM$INDEXES, the 
built-in procedure FLOAT converts 1+1 from integer to real because the language does not 
support "mixed mode" arithmetic. One of the strengths of the Intel486 FPU, of course, is that it 
does support arithmetic on mixed data types (because all values are converted internally to the 
80-bit extended-precision real format). 

The ASM3 86/486 version defines the external procedure INITFPU, which makes the different 
initialization requirements of the processor and its emulator transparent to the source code. 
After defining the data and setting up the segment registers and stack pointer, the program calls 
INITFPU and loads the control word. The computation begins with the next three instructions, 
which clear three registers by loading (pushing) zeros onto the stack. As shown in Figure 6-12, 
these registers remain at the bottom of the stack throughout the computation while temporary 
values are pushed on and popped off the stack above them. 



i 



6-41 



NUMERIC APPLICATIONS 



FLDZ, FLDZ, FLDZ 



FLD X_ARRAY[ESI] 



ST(0) 


0.0 


SUM_SQUARES 


ST(0) 


2.5 


X_ARRAY (19) 


ST(1) 


0.0 


SUMJNDEXES 


ST(1) 




SUM_SQUARES 


ST(2) 


0.0 


SUM_X 


ST(2) 


0.0 


SUMJNDEXES 








ST(3) 


0.0 


SUM_X 
















FADD ST(3),ST 






FLD ST 




ST(0) 


2.5 


X_ARRAY (19) 


ST((^ 


2.5 


X_ARRAY (19) 


ST(1) 


0.0 


SUM_SQUARES 


ST(1) 


2.5 


X_ARRAY (19) 


ST(2) 


0.0 


SUMJNDEXES 


ST(2) 


0.0 


SUM_SQUARES 


ST(3) 


2.5 


SUM_X 


ST(3) 


0.0 


SUMJNDEXES 








ST(4) 


2.5 


SUM_X 





FMUL ST,ST 






FADDP ST(2),ST 


ST(0) 


6.25 


X_ARRAY(19) * 


-> 

ST(0) 


2.5 


ST(1) 


2.5 


X_ARRAY (19) 


ST(1) 


6.25 


ST(2) 


0.0 


SUM_SQUARES 


ST(2) 


0.0 


ST(3) 


0.0 


SUMJNDEXES 


ST(3) 


2.5 


ST(4) 


2.5 


SUM_X 







X_ARRAY (19) 
SUM_SQUARES 
SUMJNDEXES 
SUM X 



FMUL N of X 



FADDP ST(2),ST 



ST(0) 


50.0 


XJVRRAY (19)*20 


ST(0) 


6.25 


ST(1) 


6.25 


SUM_SQUARES 


ST(1) 


50.0 


ST(2) 


0.0 


SUMJNDEXES 


ST(2) 


2.5 


ST(3) 


2.5 


SUM_X 







SUM_SQUARES 
SUMJNDEXES 
SUM X 



Figure 6-12. Instructions and Register Stack 



The program uses the LOOP instruction to control its iteration through X_ARRAY; register 
ECX, which LOOP automatically decrements, is loaded with n_pf_x the number of array 
elements to be summed. Register ESI is used to select (index) the array elements. The program 
steps through X_ARRAY from back to front, so ESI is initialized to point at the element just 
beyond the first element to be processed. The ASM3 86/486 TYPE operator is used to 
determine the number of bytes in each array element. This permits changing X_ARRAY to a 
double-precision real array by simply changing its definition (DD to DQ) and reassembling. 

Figure 6-12 shows the effect of the instructions in the program loop on the FPU register stack. 
The figure assumes that the program is in its first iteration, that N_OF_X is 20, and that 



6-42 



® 



NUMERIC APPLICATIONS 



X_ARRAY(19) (the 20th element) contains the value 2.5. When the loop terminates, the three 
sums are left as the top stack elements so that the program ends by simply popping them into 
memory variables. 

6.4.1 .5. CONCURRENT PROCESSING 

Because the Intel Pentium Processor Integer Unit (IU) and FPU execution units are separate, it 
is possible for the FPU to execute numeric instructions in parallel with integer instructions. 
This simultaneous execution of different instructions is called concurrency. 

No special programming techniques are required to gain the advantages of concurrent 
execution; numeric instructions are simply placed in line with the integer instructions. Integer 
and numeric instructions are initiated in the same order as they are encountered in the 
instruction stream. However, because numeric operations performed by the FPU generally 
require more time than integer operations, the IU can often execute several instructions before 
the FPU completes a numeric instruction previously initated. 

This concurrency offers obvious advantages in terms of execution performance, but 
concurrency also imposes several rules that must be observed in order to assure proper 
synchronization of the IU and FPU . 

All Intel high-level languages automatically provide for and manage concurrency in the FPU. 
Assembly-language programmers, however, must understand and manage some areas of 
concurrency in exchange for the flexibility and performance of programming in assembly 
language. This section is for the assembly-language programmer or well-informed high-level- 
language programmer. 



The activities of numeric programs can be split into two major areas: program control and 
arithmetic. The program control part performs activities such as deciding what functions to 
perform, calculating addresses of numeric operands, and loop control. The arithmetic part 
simply adds, subtracts, multiplies, and performs other operations on the numeric operands. The 
processor is designed to handle these two parts separately and efficiently. 

Concurrency management is required to check for an exception before letting the processor 
change a value just used by the FPU. Almost any numeric instruction can, under the wrong 
circumstances, produce a numeric exception. For programmers in higher-level languages, all 
required synchronization is automatically provided by the appropriate compiler. For assembly- 
language programmers exception synchronization remains the responsibility of the 
programmer. 

A complication is that a programmer may not expect their numeric program to cause numeric 
exceptions, but in some systems, they may regularly happen. To better understand these points, 
consider what can happen when the FPU detects an exception. 

Depending on options determined by the software system designer, the processor can perform 
one of two things when a numeric exception occurs: 

• The FPU can provide a default fix-up for selected numeric exceptions. Programs can mask 
individual exception types to indicate that the FPU should generate a safe, reasonable 
result whenever that exception occurs. The default exception fix-up activity is treated by 



6.4.1.6. 



MANAGING CONCURRENCY 



i 



6-43 



NUMERIC APPLICATIONS 



the FPU as part of the instruction causing the exception; no external indication of the 
exception is given. When exceptions are detected, a flag is set in the numeric status 
register, but no information regarding where or when is available. If the FPU performs its 
default action for all exceptions, then the need for exception synchronization is not 
manifest. However, as will be shown later, this is not sufficient reason to ignore exception 
synchronization when designing programs that use the FPU. 

• As an alternative to the default fix-up of numeric exceptions, the IU can be notified 
whenever an exception occurs. When a numeric exception is unmasked and the exception 
occurs, the FPU stops further execution of the numeric instruction and signals this event. 
On the next occurrence of an ESC or WAIT instruction, the processor traps to a software 
exception handler. The exception handler can then implement any sort of recovery 
procedures desired for any numeric exception detectable by the FPU. Some ESC 
instructions do not check for exceptions. These are the non waiting forms FNINIT, 
FNSTENV, FNSAVE, FNSTSW, FNSTCW, and FNCLEX. 

When the FPU signals an unmasked exception condition, it is requesting help. The fact that the 
exception was unmasked indicates that further numeric program execution under the arithmetic 
and programming rules of the FPU is unreasonable. 

If concurrent execution is allowed, the state of the processor when it recognizes the exception 
is undefined. It may have changed many of its internal registers and be executing a totally 
different program by the time the exception occurs. To handle this situation, the FPU has 
special registers updated at the start of each numeric instruction to describe the state of the 
numeric program when the failed instruction was attempted. 

Exception synchronization ensures that the FPU is in a well-defined state after an unmasked 
numeric exception occurs. Without a well-defined state, it would be impossible for exception 
recovery routines to determine why the numeric exception occurred, or to recover successfully 
from the exception. 

The following two sections illustrate the need to always consider exception synchronization 
when writing numeric code, even when the code is initially intended for execution with 
exceptions masked. If the code is later moved to an environment where exceptions are 
unmasked, the same code may not work correctly. An example of how some instructions 
written without exception synchronization will work initially, but fail when moved into a new 
environment, is shown in the following section. 



In the following examples, three instructions are shown to load an integer, calculate its square 
root, then increment the integer. The synchronous execution of the FPU will allow this 
program to execute correctly when no exceptions occur on the FILD instruction. 

Incorrect Error Synchronization: 



6.4.1.7. 



EXCEPTION SYNCHRONIZATION 



FILD COUNT 
INC COUNT 
FSQRT 



; FPU instruction 

; integer instruction alters operand 
; subsequent FPU instruction -- error 



from previous FPU 



instruction detected here 



6-44 



® 



NUMERIC APPLICATIONS 



Proper Error Synchronization: 



FILD COUNT 
FSQRT 



INC COUNT 



; FPU instruction 
subsequent FPU instruction -- error from 
previous FPU 

instruction detected here 

integer instruction alters operand 



This situation changes if the numeric register stack is extended to memory. To extend the FPU 
stack to memory, the invalid exception is unmasked. A push to a full register or pop from an 
empty register sets SF and causes an invalid exception. 

The recovery routine for the exception must recognize this situation, fix up the stack, then 
perform the original operation. The recovery routine will not work correctly in the first 
example shown in the figure. The problem is that the value of COUNT is incremented before 
the exception handler is invoked, so that the recovery routine will load an incorrect value of 
COUNT, causing the program to fail or behave unreliably. 

6.4.1 .8. PROPER EXCEPTION SYNCHRONIZATION 

Exception synchronization relies on the WAIT instruction. Whenever an unmasked numerical 
exception occurs, the FPU asserts an error-condition signal internal to the processor. When the 
next WAIT instruction (or an ESC instruction other than FNINIT, FNCLEX, FNSTSW, 
FNSTSW AX, FNSTCW, FNSTENV, FNSAVE) is encountered, the error-condition signal is 
acknowledged and a software exception handler is invoked. (See Chapter 7 for a more detailed 
discussion of the various floating-point error-reporting mechanisms.) If this WAIT or ESC 
instruction is properly placed, the processor will not yet have disturbed any information vital to 
recovery from the exception. A WAIT instruction should also be placed after the last floating 
point instruction in an application so that any unmasked exceptions will be serviced before the 
task completes. 



i 



6-45 



Special 

Computational 
Situations 



i 



intel 



CHAPTER 7 

SPECIAL COMPUTATIONAL SITUATIONS 



Besides being able to represent positive and negative numbers, the numerical data formats may 
be used to describe other entities. These special values provide extra flexibility, but most users 
will not need to understand them in order to use the numerics capabilities of the processor 
successfully. This section describes the special values that may occur in certain cases and the 
significance of each. The numeric exceptions are also described, for writers of exception 
handlers and for those interested in probing the limits of numeric computation. 

The material presented in this section is mainly of interest to programmers concerned with 
writing exception handlers. Many readers will only need to skim this section. 

When discussing these special computational situations, it is useful to distinguish between 
arithmetic instructions and nonarithmetic instructions. Nonarithmetic instructions are those 
that have no operands or transfer their operands without substantial change; arithmetic 
instructions are those that make significant changes to their operands. Table 7-1 defines these 
two classes of instructions. 



7.1 . SPECIAL NUMERIC VALUES 

The numerical data formats encompass encodings for a variety of special values in addition to 
the typical real or integer data values that result from normal calculations. These special values 
have significance and can express relevant information about the computations or operations 
that produced them. The various types of special values are: 

• Denormal real numbers 

• Zeros 

• Positive and negative infinity 

• NaN (Not-a-Number) 

• Indefinite 

• Unsupported formats 

The following sections explain the origins and significance of each of these special values. 
Tables 7-2 through Tables 7-6 show how each of these special values is encoded for each of 
the numeric data types. 



7-1 



SPECIAL COMPUTATIONAL SITUATIONS 




Table 7-1 . Arithmetic and Nonarithmetic Instructions 



Nonarithmetic Instructions 


Arithmetic Instructions 


FABS 


F2XM1 


FCHS 


FADD (P\ 


FCLEX 


FBLD 


FDFP9TP 




FFREE 


FPOMP/PWP^ 


FINCSTP 


FCOS 


FINIT 


FDIWRUP^ 

1 1— / 1 V I I 1 1 1 1 J 


FLD frpni^tpr-to-rpni^tpr^ 


FIADD 


FLD (GxtGndGd format from mGmory) 


FICOMfP^ 


FLD mn^tant 


FIDIWR^ 


FLDCW 


FILD 


Fl DFNV 


FIMI II 
r i ivi u i_ 


FNOP 


FI^T7P\ 
no i \\ ) 


r no i vn 


nOUD\nj 


FSAVE 


Fl n ^pnnworcinn^ 
rLU ^uui ivci oiui i j 


F^T/P^ /ronictor-tn-ronictcH^ 
ro i yry cyioitJi lu i loici ) 


FMI II (P\ 


FQTP ^OYtfinHoH format tn momnn/^ 


FPATAN 
rrn i rviN 


FSTCW 

1 O 1 w V V 


FPREM 


FSTENV 


FPRFM1 


FSTSW 


FPTAN 


FWAIT 


FRNniNT 


FXAM 


FSCALE 


FXCH 


FSIN 




FQIMPO^ 
roiiMOv-'o 




FSQRT 




FST(P) (convGrsion) 




FSUB(R)(P) 




FTST 




FUCOM(P)(P) 




FXTRACT 




FYL2X 




FYL2XP1 



7-2 



i 




SPECIAL COMPUTATIONAL SITUATIONS 



Table 7-2. Binary Integer Encodings 



Class 


Sign 


Magnitude 


Positives 


(Largest) 





11. .11 


(Smallest) 





00..01 


Zero 





00..00 


Negatives 


(Smallest) 


1 


11..11 


(Largest/Indefinite*) 


1 


00..00 




Word: 


15 bits 




Short: 


31 bits 




Long: 


63 bits 



NOTES: 

*lf this encoding is used as a source operand (as in an integer load or integer arithmetic instruction), the 
FPU interprets it as the largest negative number representable in the format... -2 15 , -2 31 , or -2 63 . The 
FPU delivers this encoding to an integer destination in two cases: 

1 . If the result is the largest negative number. 

2. As the response to a masked invalid operation exception, in which case it represents the special value 
integer indefinite. 



I 



7-3 



SPECIAL COMPUTATIONAL SITUATIONS 




Table 7-3. Packed Decimal Encodings 









Magnitude 


Class 


Sign 




digit 


digit 


digit 


digit 




digit 


Positives 


Largest 





0000000 


1001 


1001 


1001 


1001 




1001 


Smallest 





0000000 


0000 


0000 


0000 


0000 




0001 


Zero 





0000000 


0000 


0000 


0000 


0000 




0000 


Negatives 


Zero 


1 


0000000 


0000 


0000 


0000 


0000 




0000 


Smallest 


1 


0000000 


0000 


0000 


0000 


0000 




0000 


Largest 


1 


0000000 


1001 


1001 


1001 


1001 




1001 


Indefinite* 


1 


1111111 


1111 


1111 


UUULT* 


uuuu 




uuuu 




—1 byte— 


— 9 bytes — 



NOTES: 

* The packed decimal indefinite is stored by FBSTP in response to a masked invalid operation exception. 
Attempting to load this value via FBLD produces an undefined result. 

** UUUU means bit values are undefined and may contain any value. 



7-4 



I 




SPECIAL COMPUTATIONAL SITUATIONS 



Table 7-4. Single and Double Real Encodings 



Class 


Sign 


Biased Exponent 


Significand 
ff-ff* 


Positive NaNs 


Quiet 







11. .11 


11. .11 









11. .11 


10..00 


Signaling 







11. .11 


01. .11 









11. .11 


00..01 


Infinity 





11. .11 


00..00 


Positive Reals 


Normals 







11. .10 


11. .11 









00..01 


00..00 


Denormals 







00..00 


11.11 









00..00 


00..01 


Zero 





00..00 


00..00 


Negative Reals 


Zero 


1 


00..00 


00..00 


Denormals 


1 




00..00 


00..01 




1 




00..00 


11. .11 


Normals 


1 




00..01 


00..00 




1 




11. .10 


11. .11 


Infinity 


1 


11. .11 


00..00 


Negative NaNs 


Signaling 


1 




11. .11 


00..01 




1 




11. .11 


01. .11 


Quiet 


1 (Indefinite) 




11. .11 


10..00 




1 




11. .11 


11. .11 


Single: 
Double: 


—8 bits — 
— 1 1 bits— 


—23 bits— 
—52 bits— 



NOTE: integer bit is implied and not stored. 



7-5 



SPECIAL COMPUTATIONAL SITUATIONS 




Table 7-5. Extended Real Encodings 



Class 


Sign 


Biased Exponent 


Significand 
ff-ff* 


Positive NaNs 


Quiet 





11..11 


1 11. .11 







11. .11 


1 10.. 00 


Signaling 





11..11 


1 01. .11 







11. .11 


1 00..01 


Infinity 





11-11 


1 00..00 


Positive Reals 


Normals 





11. .10 


1 11. .11 







00..01 


1 00.. 00 


Pseudodenormals 






00..00 
00..00 


1 11.11 

1 00..01 


Denormals 





00..00 


11.11 







00..00 


00..01 


Zero 





00..00 


00..00 


Negative Reals 


Zero 


1 


00..00 


00..00 


Denormals 


1 


00..00 


00..01 




1 


00..00 


11. .11 


Pseudodenormals 






00..00 
00..00 


1 11. .11 

1 00..00 


Normals 


1 


00..01 


1 00..00 




1 


11..10 


1 11. .11 


Infinity 


1 


11. .11 


1 00..00 


Negative NaNs 


Signaling 


1 


11. .11 


1 00..01 




1 


11..11 


1 01. .11 


Quiet 


1 (Indefinite) 


11. .11 


1 10..00 




1 


11. .11 


1 11 ..11 




—15 bits— 


—64 bits— 



7-6 




SPECIAL COMPUTATIONAL SITUATIONS 



Table 7-6. Unsupported Formats 



Class 


Sign 


Biased Exponent 


Significand 
f A ff-ff* 


Positive Pseudo-NaNs 


Quiet 





11. .11 


A 11-11 







11. .11 


10..00 


Signaling 





11. .11 


001 ..11 







11. .11 


00..01 


Pseudoinfinity 





11. .11 


1 00..00 


Positive Reals 


Unnormals 





11. .10 


11. .11 







00..01 


00..00 


Negative Reals 


Unnormals 


1 


11. .10 


11. .01 




1 


00..01 


00..00 


Pseudoinfinity 


1 


11. .11 


00..00 


Negative Pseudo NaNs 


Signaling 


1 


11. .11 


001. .11 




1 


11. .11 


00..01 


Quiet 


1 


11. .11 


11. .11 




1 


11. .11 


10..00 




—15 bits— 


—64 bits— 



7.1.1. Denormal Real Numbers 

The processor generally stores nonzero real numbers in normalized floating-point form; that is, 
the integer (leading) bit of the significand is always a one. (Refer to the previous section for a 
review of operand formats.) This bit is explicitly stored in the extended format, and is 
implicitly assumed to be a one (1 A ) in the single and double formats. Since leading zeros are 
eliminated, normalized storage allows the maximum number of significant digits to be held in 
a significand of a given width. 

When a numeric value becomes very close to zero, normalized floating-point storage cannot be 
used to express the value accurately. The term tiny is used here to precisely define what values 
require special handling. A number R is said to be tiny when -2 Emm <R<0or0<R< +2 Emin . 
(As defined in the previous section, Emin is -126 for single format, -1022 for double format, 
and -16382 for extended format.) In other words, a nonzero number is tiny if its exponent 



i 



7-7 



SPECIAL COMPUTATIONAL SITUATIONS 




would be too negative to store in the destination format. 

To accommodate these instances, the processor can store and operate on reals that are not 
normalized, i.e., whose significands contain one or more leading zeros. Denormals typically 
arise when the result of a calculation yields a value that is tiny. 

Denormal values have the following properties: 

• The biased floating-point exponent is stored at its smallest value (zero) 

• The integer bit of the significand (whether explicit or implicit) is zero 

The leading zeros of denormals permit smaller numbers to be represented, at the possible cost 
of some lost precision (the number of significant bits is reduced by the leading zeros). In 
typical algorithms, extremely small values are most likely to be generated as intermediate, 
rather than final, results. By using the extended real format for holding intermediate values, 
quantities as small as ±3.37 x 10" 4932 can be represented; this makes the occurrence of 
denormal numbers a rare phenomenon in numerical applications. Nevertheless, the processor 
can load, store, and operate on denormalized real numbers when they do occur. 

Denormals receive special treatment by the processor in three respects: 

• The processor avoids creating denormals whenever possible. In other words, it always 
normalizes real numbers except in the case of tiny numbers. 

• The processor provides the unmasked underflow exception to permit programmers to 
detect cases when denormals would be created. 

• The processor provides the denormal operand exception to permit programmers to detect 
cases when denormals enter into calculations. 

Denormalizing means incrementing the true result's exponent and inserting a corresponding 
leading zero in the significand, shifting the rest of the significand one place to the right. 
Denormal values may occur in any of the single, double, or extended formats. Table 7-7 shows 
the range of denormalized values in each format. 



Table 7-7. Denormalized Values 



Format 


Smallest Magnitude 


Largest Magnitude 




(Exact) 


(Approx.) 


(Exact) 


(Approx.) 


Single Precision 


2 -149 


10 -46 


2 -126_ 2 -150 


10 -38 


Double Precision 


2 -1074 


1 -324 


2 -1022_ 2 -1075 


1 0-308 


Extended 


2-16445 


1 0-4951 


2 -16382_ 2 -16445 


! 0-4932 



Denormalization produces either a denormal or a zero. Denormals are readily identified by 
their exponents, which are always the minimum for their formats; in biased form, this is always 
the bit string: 00. .00. This same exponent value is also assigned to the zeros, but a denormal 
has a nonzero significand. A denormal in a register is tagged special. Tables 7-2 through 
Table 7-6 show how denormal values are encoded in each of the real data formats. 

The denormalization process causes loss of significance if low-order one-bits are shifted off 
the right of the significand. In a severe case, all the significand bits of the true result are shifted 



7-8 



SPECIAL COMPUTATIONAL SITUATIONS 



out and replaced by the leading zeros. In this case, the result of denormalization is a true zero, 
and, if the value is in a register, it is tagged as a zero. 

Denormals are rarely encountered in most applications. Typical debugged algorithms generate 
extremely small results only during the evaluation of intermediate subexpressions; the final 
result is usually of an appropriate magnitude for its single or double format real destination. If 
intermediate results are held in temporary real, as is recommended, the greater range of this 
format makes underflow very unlikely. Denormals are likely to arise only when an application 
generates a great many intermediates, so many that they cannot be held on the register stack or 
in extended format memory variables. If storage limitations force the use of single or double 
format reals for intermediates, and small values are produced, underflow may occur, and, if 
masked, may generate denormals. 

When a denormal number in single or double format is used as a source operand and the 
denormal exception is masked, the FPU automatically normalizes the number when it is 
converted to extended format. 



7.1-2- Zeros 

The value zero in the real and decimal integer formats may be signed either positive or 
negative, although the sign of a binary integer zero is always positive. For computational 
purposes, the value of zero always behaves identically, regardless of sign, and typically the 
fact that a zero may be signed is transparent to the programmer. If necessary, the FXAM 
instruction may be used to determine a zero's sign. 

A programmer can code a zero, or it can be created by the FPU as its masked response to an 
underflow exception. If a zero is loaded or generated in a register, the register is tagged zero. 
Table 7-8 lists the results of instructions executed with zero operands and also shows how a 
zero may be created from nonzero operands. 

7.1.3. Infinity 

The real formats support signed representations of infinities. These values are encoded with a 
biased exponent of all ones and a significand of 1 A 00..00; if the infinity is in a register, it is 
tagged special. 

A programmer can code an infinity, or it can be created by the FPU as its masked response to 
an overflow or a zero divide exception. Note that depending on rounding mode, the masked 
response may create the largest valid value representable in the destination rather than infinity. 

The signs of the infinities are observed, and comparisons are possible. Infinities are always 
interpreted in the affine sense; that is, -°o < (any finite number) < +°°. Arithmetic on infinities 
is always exact and, therefore, signals no exceptions, except for the invalid operations 
specified in Table 7-9. 



7-9 



SPECIAL COMPUTATIONAL SITUATIONS 



irrtel 



Table 7-8. Zero Operands and Results 



Operation 



Operands 



Result 



FLD, FLBD 



±0 



*o 



FILD 



+0 



+0 



FST, FSTP, FRNDINT 



±0 

+x 

-X 



*o 
-o 1 



FBSTP 



±0 



*o 



FIST,FISTP 



±0 
+X 
-X 



*o 

-o 3 

-o 4 



FCHS 



+0 
-0 



-0 
+0 



FABS 



±0 



+0 



Addition 



+0 plus +0 
-0 plus -0 

±0 plus -0, -0 plus+0 
-X plus +X, +X plus-X 
±0 plus ± X, ± X plus ±0 



+0 
-0 
+0 2 
±0 

#x 



Subtraction 



+0 minus - 

-0 minus + 

+0 minus + 0, -0 minus 

-0 

+X minus +X, -X minus 
-X 

±0 minus ±X 
±X minus ±0 



+0 

-0 

±0 2 

±0 2 

-#X 

#x 



7-10 




SPECIAL COMPUTATIONAL SITUATIONS 



Table 7-8. Zero Operands and Results 


(Contd.) 


Operation 


Operands 


Result 


Multiplication 


+0x10 







±0 x ±X, ±X x ±0 







+X x +Y, -X x -Y 


+0 1 




+X x -Y, -X x +Y 


-o 1 


Division 


±0-±0 


Invalid Operation 




±X -*-±0 


°o (Zero Divide) 




±X - ±oo 







+0 -s- +X, -0 - -X 


+0 




+0 - -X, -0 * +X 


-0 




-X - -Y, +X - +Y 


+0 1 




-X h- -Y, +X + +Y 


-O 1 


FPREM, FPREM1 


±0 rem ±0 


Invalid Operation 




±x rem ±0 


Invalid Operation 




+0 rem ±X 


+0 




-0 rem ±X 


-0 




+a rem ±Y 


+0 Y exactly divides X 




Y ram 4-V 

—a rem it 


-0 Y exactly divides X 


cchpt 
roUn 1 


±U 


*0 


Compare 


±0 : +X 


±0<+X 




±0 : ±0 


±0 = ±0 




±U . —A 


±0 >-X 


CTOT 

r 1 b 1 


±0 


±0 = 


FXAM 


+0 


C3 = 1 ; C2 =Ci = Co =0 




-0 


C3 =Ci = 1 ; C2 = Co = 


FSCALE 


±0 scaled by -°° 


*o 




±0 scaled by 


Invalid Operation 




±0 scaled by X 


*o 


FXTRACT 


+0 


ST = +0,ST(1) = oo, 




-0 


Zero divide 

ST = -0,ST(1)=-oo, 

Zero divide 



7-11 



SPECIAL COMPUTATIONAL SITUATIONS 




Table 7-8. Zero Operands and Results 


(Contd.) 


Operation 


Operands 


Result 


FPTAN 


±0 


*o 


FSIN (or SIN result of FSINCOS) 


±0 


*o 


FCOS (or COS result of 
FSINCOS) 


±0 


+1 


FPATAN 


±0 + + X 


*o 




±0 X 


*7t 




±X -s- ±0 


#71/2 




±0 +0 


*0 




±0 ■*■ -0 


*7l 




+oo -;- +0 


+71/2 






-7t/2 




+0 -s- + oo 


*o 




+0 4- — oo 


*n 


F2XM1 


+0 


+0 




-0 


-0 


FYL2X 


±Y x log(±0) 


Zero Divide 




±0 x log(±0) 


Invalid Operation 


FYL2XP1 


+ Yxlog(±0+1) 


*o 




- Yxlog(±0+1) 


-*0 



NOTES: 

X and Y denote nonzero positive operands 



1 When extreme underflow denormalizes the result to zero. 

2 Sign determined by rounding mode: + for nearest, up, or chop, - for down 

3 When < X < 1 and rounding mode is not up. 

4 When -1 < X < and rounding mode is not down. 

* Sign of original zero operand. 

# Sign of original X operand. 

-# Complement of sign of original X operand. 

© Exclusive OR of the signs of the operands 



7-12 




SPECIAL COMPUTATIONAL SITUATIONS 



Table 7-9. Infinity Operands and Results 



Operation 


Operands 


Result 


FLD.FBLD 


±00 


*oo 


FST, FSTP , FRN D 1 NT 


±00 


*oo 


FCHS 


+00 
— 00 


-00 
+00 


FABS 


±00 


+00 


Addition 


+00 plus +<*> 


+00 




—00 plus — 00 


— 00 




+00 plus -00 


Invalid Operation 




-00 plus +00 


Invalid Operation 




±00 plus ±X 


*oo 




±X plus ±00 




Subtraction 


+00 minus -<*> 


+00 




-00 minus +00 


— 00 




+00 minus +00 


Invalid Operation 




-00 minus -«> 


Invalid Operation 




±00 minus ±X 


*oo 




±X minus ±00 


— *oo 


Multiplication 


±00 x ±00 


00 




±00 x ±Y, ±Y x ±00 


00 




±00 x ±Y, ±Y x ±00 


00 




±0 x ±00, ±00 x ±0 


Invalid Operation 


Division 


+00 -f- +00 


Invalid Operation 




+00 -±x 


00 




±X + ±00 





FPREM,FPREM1 


+00 rem ±°o 


Invalid Operation 




±00 rem ±X 


Invalid Operation 




±X rem ±00 


$X, Q = 


FSQRT 


— 00 


Invalid Operation 




+00 


+00 



I 



7-13 



SPECIAL COMPUTATIONAL SITUATIONS 




Table 7-9. Infinity Operands and Results (Contd.) 



Operation 


Operands 


Result 


Compare 


+00 ] -)-oo 
— 00 \ — 00 


+00 = -f"°° 
— 00 = — 00 




+00 ; -00 


+oo> -00 




— 00 \ +00 


— 00 < +00 




+00 :±X 


+00 > X 




-oo:±X 


—00 < X 




±X :+oo 


X < +00 




±X :-oo 


X > +00 




+00 


+00 > 


FTST 


-00 


-00 <o 


FSCALE 


±00 scaled by -00 


Invalid Operation 




±00 scaled by +00 






±00 scaled by ±X 


*oo 




±0 scaled by -°° 


±01 




±0 scaled by <*> 


Invalid Operation 




±Y scaled by +00 


#00 




±Y scaled by -00 


#0 


FXTRACT 


±00 


ST = *oo, ST(1) = +00 


FXAM 


-j-00 


CO = C2 = 1 ;C1 = C3 = 




— 00 


CO = C1 = C2 = 1 ; C3 = 


FPATAN 


±00 -±x 


*n/2 




±Y-+oo 


#0 




±Y -00 


#71 




+00 — +00 


*7l/4 




+oo-s — 00 


*3tc/4 




±00 + ±0 


*n/2 




+0 ^-+00 


+0 






+71 




-0 - +00 


-0 




-0 -5- -00 


-71 



7-14 



1 




SPECIAL COMPUTATIONAL SITUATIONS 



Table 7-9. Operands and Results (Contd.) 



Operation 


Operands 


Result 


F2XM1 


+00 
-00 


+00 
-1 


FYL2X 


±00 X log (1) 


Invalid Operation 




±00 x log (X>1 ) 


*oo 




±00 x log (0<X<1 ) 


— *oo 




±Y x log (+00) 


#00 




±0 x log (+00) 


Invalid Operation 




±Y x log (-00) 


Invalid Operation 


FYL2XP1 


±00 x log (1) 


Invalid Operation 




±00 x log (X>0) 


#00 




±00 x log 


— *oo 




(-1<X<0) 


#00 




±Y x log (+00) 


Invalid Operation 




±0 x log (+00) 


Invalid Operation 




±Y x log (-00) 





NOTES: 



X Zero or nonzero, positive, finite operand 

Y Nonzero positive, finite operand 
Sign of original infinity operand. 

-* Complement of sign of original infinity operand 

$ Sign of original operand, 

e Exclusive OR of the signs of the operands 

# Sign of the original Y operand. 

1 Sign of original zero operand. 



7.1 .4. NAN (Not-A-Number) 

A NaN (Not a Number) is a member of a class of special values that exists in the real formats 
only. A NaN has an exponent of 1 1..1 IB, may have either sign, and may have any significand 
except 1 A 00..00B, which is assigned to the infinities. A NaN in a register is tagged special. 

There are two classes of NaN: signaling (SNaN) and quiet (QNaN). Among the QNaNs, the 
value real indefinite is of special interest. 



7-15 



SPECIAL COMPUTATIONAL SITUATIONS 



7.1.4.1. SIGNALING NANS 

A signaling NaN is a NaN that has a zero as the most significant bit of its fraction. The rest of 
the significand may be set to any value. The FPU never generates a signaling NaN as a result; 
however, it recognizes signaling NaNs when they appear as operands. Arithmetic operations 
(as defined at the beginning of this chapter) on a signaling NaN cause an invalid-operation 
exception (except for load operations from the stack, FXCH, FCHS, and FABS). 

By unmasking the invalid operation exception, the programmer can use signaling NaNs to trap 
to the exception handler. The generality of this approach and the large number of NaN values 
that are available provide the sophisticated programmer with a tool that can be applied to a 
variety of special situations. 

For example, a compiler could use signaling NaNs as references to uninitialized (real) array 
elements. The compiler could preinitialize each array element with a signaling NaN whose 
significand contained the index (relative position) of the element. If an application program 
attempted to access an element that it had not initialized, it would use the NaN placed there by 
the compiler. If the invalid operation exception were unmasked, an interrupt would occur, and 
the exception handler would be invoked. The exception handler could determine which 
element had been accessed, since the operand address field of the exception pointers would 
point to the NaN, and the NaN would contain the index number of the array element. 

7.1.4.2. QUIET NANS 

A quiet NaN is a NaN that has a one as the most significant bit of its significand. The processor 
creates the quiet NaN real indefinite (defined below) as its default response to certain 
exceptional conditions. The processor may derive other QNaNs by converting an SNaN. The 
processor converts a SNaN by setting the most significant bit of its significand to one, thereby 
generating a QNaN. The remaining bits of the significand are not changed; therefore, 
diagnostic information that may be stored in these bits of the SNaN is propagated into the 
QNaN. 

The processor will generate the special QNaN, real indefinite, as its masked response to an 
invalid operation exception. This NaN is signed negative; its significand is encoded 1 A 100..00. 
All other NaNs represent values created by programmers or derived from values created by 
programmers. 

Both quiet and signaling NaNs are supported in all operations. A QNaN is generated as the 
masked response for invalid-operation exceptions and as the result of an operation in which at 
least one of the operands is a QNaN. The processor applies the rules shown in Table 7-10 
when generating a QNaN. 



7-16 




SPECIAL COMPUTATIONAL SITUATIONS 



Table 7-10. Rules for Generating QNaNs 



Operation 


Action 


Real operation on an SNaN and a QNaN. 
Real operation on two SNaNs. 

Real operation on two QNaNs. 

Real operation on an SNaN and another number. 

Real operation on a QNaN and another number. 
Invalid operation that does not involve NaNs. 


Deliver the QNaN operand. 

Deliver the QNaN that results from converting the 
SNaN that has the larger significand. 

Deliver the QNaN that has the larger significand. 

Deliver the QNaN that results from converting the 
SNaN. 

Deliver the QNaN. 

Deliver the default QNaN real indefinite. 



Note that handling of a QNaN operand has greater priority than all exceptions except certain 
invalid-operation exceptions (refer to the section "Exception Priority" in this chapter). 

Quiet NaNs could be used, for example, to speed up debugging. In its early testing phase, a 
program often contains multiple errors. An exception handler could be written to save 
diagnostic information in memory whenever it was invoked. After storing the diagnostic data, 
it could supply a quiet NaN as the result of the erroneous instruction, and that NaN could point 
to its associated diagnostic area in memory. The program would then continue, creating a 
different NaN for each error. When the program ended, the NaN results could be used to 
access the diagnostic data saved at the time the errors occurred. Many errors could thus be 
diagnosed and corrected in one test run. 

In embedded applications which use computed results in further computations, an undetected 
QNaN can invalidate all subsequent results. Such applications should therefore periodically 
check for QNaNs and provide a recovery mechanism to be used if a QNaN result is detected. 



7.1.5. Indefinite 

For each numeric data type, one unique encoding is reserved for representing the special value 
indefinite. The processor produces this encoding as its response to a masked invalid-operation 
exception. 

In the case of reals, the indefinite value is a QNaN as discussed in the prior section. 

Packed decimal indefinite may be stored with a FBSTP instruction; attempting to use this 
encoding in a FBLD instruction, however, will have an undefined result; thus indefinite cannot 
be loaded from a packed decimal integer. 



7-17 



SPECIAL COMPUTATIONAL SITUATIONS 



In the binary integers, the same encoding may represent either indefinite or the largest negative 
number supported by the format (-2^, -2^, or -2^3). The processor will store this encoding 
as its masked response to an invalid operation, or when the value in a source register represents 
or rounds to the largest negative integer representable by the destination. In situations where its 
origin may be ambiguous, the invalid-operation exception flag can be examined to see if the 
value was produced by an exception response. When this encoding is loaded or used by an 
integer arithmetic or compare operation, it is always interpreted as a negative number; thus, 
indefinite cannot be loaded from a binary integer. 



7.1 .6. Encoding of Data Types 

Table 7-2 through Table 7-5 show how each of the special values just described is encoded for 
each of the numeric data types. In these tables, the least-significant bits are shown to the right 
and are stored in the lowest memory addresses. The sign bit is always the left-most bit of the 
highest-addressed byte. 

7.1 .6.1 . UNSUPPORTED FORMATS 

The extended format permits many bit patterns that do not fall into any of the previously 
mentioned categories. Table 7-6 shows these unsupported formats. Some of these encodings 
were supported by the Intel287 math coprocessor; however, most of them are not supported by 
the Intel387, Intel486, and Pentium FPUs. These changes are required due to changes made in 
the final version of IEEE Std 754 that eliminated these data types. 

The categories of encodings formerly known as pseudo-NaNs, pseudoinfinities, and unnormal 
numbers are not supported. The Intel387, Intel486 and Pentium FPU's raise the invalid- 
operation exception when they are encountered as operands. 

The encodings formerly known as pseudodenormal numbers are not generated by the Pentium 
processor; however, they are correctly utilized when encountered as operands. The exponent is 
treated as if it were 00.. 01 and the mantissa is unchanged. The denormal exception is raised. 



7.1.7. Numeric Exceptions 

The FPU can recognize six classes of numeric exception conditions while executing numeric 
instructions: 

1 . I — Invalid operation 

— Stack fault 

— IEEE standard invalid operation 

2. Z — Divide-by-zero 

3. D — Denormalized operand 

4. O — Numeric overflow 



7-18 



i 



SPECIAL COMPUTATIONAL SITUATIONS 



5. U — Numeric underflow 

6. P — Inexact result (precision) 



7.1.8. Handling Numeric Exceptions 

When numeric exceptions occur, the processor takes one of two possible courses of action: 

• The FPU can itself handle the exception, producing the most reasonable result and 
allowing numeric program execution to continue undisturbed. 

• A software exception handler can be invoked to handle the exception. 

Each of the six exception conditions described above has a corresponding flag bit in the FPU 
status word and a mask bit in the FPU control word. If an exception is masked (the 
corresponding mask bit in the control word =1), the processor takes an appropriate default 
action and continues with the computation. If the exception is unmasked (mask = 0), a 
software exception handler is invoked immediately before execution of the next WAIT or a 
floating-point instruction other than FNINIT, FNCLEX, FNSTSW, FNSTSW AX, FNSTCW, 
FNSTENV, FNSAVE. Depending on the value of the NE bit of the CRO control register, the 
exception handler is invoked either (NE =1) through interrupt vector 16 or (NE = 0) through 
an external interrupt. 

Note that when exceptions are masked, the FPU may detect multiple exceptions in a single 
instruction, because it continues executing the instruction after performing its masked 
response. For example, the FPU could detect a denormalized operand, perform its masked 
response to this exception, and then detect an underflow. 

7.1 .8.1 . AUTOMATIC EXCEPTION HANDLING 

The processor has a default fix-up activity for every possible exception condition it may 
encounter. These masked-exception responses are designed to be safe and are generally 
acceptable for most numeric applications. 

As an example of how even severe exceptions can be handled safely and automatically using 
the default exception responses, consider a calculation of the parallel resistance of several 
values using only the standard formula (Figure 7-1). If Rl becomes zero, the circuit resistance 
becomes zero. With the divide-by-zero and precision exceptions masked, the processor will 
produce the correct result. 



i 



7-19 



SPECIAL COMPUTATIONAL SITUATIONS 




+ * 1 

1 

EQUIVALENT RESISTANCE = 

1 1 1 

Ri R2 R 3 

APM8 



Figure 7-1. Arithmetic Example Using Infinity 

By masking or unmasking specific numeric exceptions in the FPU control word, programmers 
can delegate responsibility for most exceptions to the processor, reserving the most severe 
exceptions for programmed exception handlers. Exception-handling software is often difficult 
to write, and the masked responses have been tailored to deliver the most reasonable result for 
each condition. For the majority of applications, masking all exceptions yields satisfactory 
results with the least programming effort. Certain exceptions can usefully be left unmasked 
during the debugging phase of software development, and then masked when the clean 
software is actually run. An invalid-operation exception for example, typically indicates a 
program error that must be corrected. 

The exception flags in the FPU status word provide a cumulative record of exceptions that 
have occurred since these flags were last cleared. Once set, these flags can be cleared only by 
executing the FCLEX (clear exceptions) instruction, by reinitializing the FPU with FINIT, or 
by overwriting the flags with an FRSTOR or FLDENV instruction. This allows a programmer 
to mask all exceptions, run a calculation, and then inspect the status word to see if any 
exceptions were detected at any point in the calculation. 

7.1 .8.2. SOFTWARE EXCEPTION HANDLING 

If the Pentium and Intel486 FPU encounters an unmasked exception condition, a software 
exception handler is invoked immediately before execution of the next WAIT or non-control 
floating-point instruction. The exception handler is invoked either through interrupt vector 16 
or through an external interrupt, depending on the value of the NE bit of the CRO control 
register. 

If NE = 1, an unmasked floating-point exception results in interrupt 16, immediately before the 
7-20 ■ 



SPECIAL COMPUTATIONAL SITUATIONS 



execution of the next non-control floating-point or WAIT instruction. Interrupt 16 is an 
operating-system call that invokes the exception handler. Chapter 14 contains a general 
discussion of exceptions and interrupts. 

If NE = (and the IGNNE# input is inactive), an unmasked floating-point exception causes the 
processor to freeze immediately before executing the next non-control floating-point or WAIT 
instruction. The frozen processor waits for an external interrupt, which must be supplied by 
external hardware in response to the FERR# output of the processor. (Regardless of the value 
of NE, an unmasked numerical exception causes the FERR# output to be activated.) In this 
case, the external interrupt invokes the exception-handling routine. If NE = but the IGNNE# 
input is active, the processor disregards the exception and continues. Error reporting via 
external interrupt is supported for DOS compatibility. Chapter 23 contains further discussion 
of compatibility issues. 

If the Intel387 NPX encounters an unmasked exception condition, it signals the exception to 
the Intel386 CPU using the ERROR# status line between the two processors. See Chapter 23 
for differences in FPU exception handling. 

The exception-handling routine is normally a part of the systems software. Typical exception 
responses may include: 

• Incrementing an exception counter for later display or printing 

• Printing or displaying diagnostic information (e.g., the FPU environment and registers) 

• Aborting further execution, or using the exception pointers to build an instruction that will 
run without exception and executing it 

Applications programmers should consult their operating system's reference manuals for the 
appropriate system response to numerical exceptions. For systems programmers, some details 
on writing software exception handlers are provided in Chapter 14. 



7.1 .9. Invalid Operation 

This exception may occur in response to two general classes of operations: 

1 . Stack operations 

2. Arithmetic operations 

The stack flag (SF) of the status word indicates which class of operation caused the exception. 
When SF is 1 a stack operation has resulted in stack overflow or underflow; when SF is 0, an 
arithmetic instruction has encountered an invalid operand. 

7.1 .9.1 . STACK EXCEPTION 

When SF is 1, indicating a stack operation, the 0/U# bit of the condition code (bit CI) 
distinguishes between stack overflow and underflow as follows: 

0/U# = 1 Stack overflow — an instruction attempted to push down a nonempty stack 

location. 

0/U# = Stack underflow — an instruction attempted to read an operand from an empty 

stack location. 

| 7-21 



SPECIAL COMPUTATIONAL SITUATIONS 



When the invalid-operation exception is masked, the FPU returns the QNaN indefinite. This 
value overwrites the destination register, destroying its original contents. 

When the invalid-operation exception is not masked, an exception handler is invoked. TOP is 
not changed, and the source operands remain unaffected. 

7.1 .9.2. INVALID ARITHMETIC OPERATION 

This class includes the invalid operations defined in IEEE Std 854. The FPU reports an invalid 
operation in any of the cases shown in Table 7-11. Also shown in this table are the FPU's 
responses when the invalid exception is masked. When unmasked, an exception handler is 
invoked, and the operands remain unaltered. An invalid operation generally indicates a 
program error. 



Table 7-11. Masked Responses to Invalid Operations 



Condition 


Masked Response 


Any arithmetic operation on an unsupported format. 


Return the QNaN indefinite. 


Any arithmetic operation on a signaling NaN. 


Return a QNaN (refer to the section "Rules for 
Generating QNaNs"). 


Compare and test operations: one or both operands 
is a NaN. 


Set condition codes "not comparable." 


Addition of opposite-signed infinities or subtraction 
of like-signed infinities. 


Return the QNaN indefinite. 


Multiplication: oo x 0; or x oo. 


Return theQNaN indefinite. 


Division: oo + oo; or + 0. 


Return the QNaN indefinite. 


Remainder instructions FPREM, FPREM1 when 
modulus (divisor) is zero or dividend is oo. 


Return the QNaN indefinite; set C 2 = 0. 


Trigonometric instructions FCOS, FPTAN, FSIN, 
FSINCOS when argument is oo. 


Return theQNaN indefinite; set C 2 = 0. 


FSQRT of negative operand (except FSQRT 
(-0) = -0), FYL2X of negative operand (except 
FYL2X (-0) = FYL2XP1 of operand more 
negative than -1 . 


Return the QNaN indefinite 


FIST(P) instructions when source register is empty, 
a NaN, oo, or exceeds representable range of 
destination. 


Store integer indefinite. 


FBSTP instruction when source register is empty, a 
NaN, oo, or exceeds 18 decimal digits. 


Store packed decimal indefinite. 


FXCH instruction when one or both registers are 
tagged empty. 


Change empty registers to the QNaN indefinite and 
then perform exchange. 



7.1 .1 0. Division by Zero 

If an instruction attempts to divide a finite nonzero operand by zero, the FPU will report a 
7-22 ■ 



SPECIAL COMPUTATIONAL SITUATIONS 



zero-divide exception. This is possible for F(I)DIV(R)(P) as well as the other instructions that 
perform division internally: FYL2X and FXTRACT. The masked response for FDIV is to 
return an infinity signed with the exclusive OR of the sign of the two operands. FYL2X returns 
an infinity signed with the opposite sign of the non-zero operand. For FXTRACT, ST(1) is set 
to -oo; ST is set to zero with the same sign as the original operand. If the divide-by-zero 
exception is unmasked, an exception handler is invoked; the operands remain unaltered. 



7.1 .1 1 . Denormal Operand 

If an arithmetic instruction attempts to operate on a denormal operand, the FPU reports the 
denormal-operand exception. Denormal operands may have reduced significance due to lost 
low-order bits, therefore it may be advisable in certain applications to preclude operations on 
these operands. This can be accomplished by an exception handler that responds to unmasked 
denormal operand exceptions. Most users will mask this exception so that computation may 
proceed; any loss of accuracy will be analyzed by the user when the final result is delivered. 

When this exception is masked, the FPU sets the DE-bit in the status word, then proceeds with 
the instruction. Gradual underflow and denormal numbers will produce results at least as good 
as, and often better than what could be obtained from a machine that flushes underflows to 
zero. In fact, a denormal operand in single- or double-precision format will be normalized to 
the extended-real format when loaded into the FPU. Subsequent operations will benefit from 
the additional precision of the extended-real format used internally. 

When this exception is not masked, the DE-bit is set and the exception handler is invoked. The 
operands are not changed by the instruction and are available for inspection by the exception 
handler. 

The Pentium FPU, Intel486 FPU, and Intel387 math coprocessors handle denormal values 
differently than the 8087 and Intel287 math coprocessors. This change is due to revisions in 
the IEEE standard before being approved. The difference in operation occurs when the 
denormal exception is masked. The Pentium FPU, Intel486 FPU, and Intel387 math 
coprocessors will automatically normalize denormals. The 8087 and Intel287 math 
coprocessors will generate a denormal result. 

The difference in denormal handling is usually not an issue. The denormal operand exception 
is normally masked for the Intel387, Intel486, and Pentium FPUs. For programs that also run 
on an Intel287 math coprocessor, the denormal exception is often unmasked and an exception 
handler is provided to normalize any denormal values. Such an exception handler is redundant 
for the Pentium, Intel486 and Intel387 DX FPUs. The default exception handler should be 
used. See Chapter 23 for more information on the handling of exceptions by the various Intel 
architectures. 

A program can detect at run-time whether it is running on an Pentium, Intel486, or Intel387 
FPU or the older 8087/Intel287 math coprocessors. See Chapter 5 for example code sequences 
to determine the presence of 8087/Intel287 and Intel387 math coprocessors, as well as 
processor type. This example can be used to selectively mask the denormal exception for 
Intel387 DX, Intel486 or Pentium FPUs. A denormal exception handler should also be 
provided to support 8087/Intel287 math coprocessors. This code example can also be used to 
set a flag to allow use of new instructions added to the Intel387, Intel486, and Pentium FPUs 
beyond the instructions of the 8087/Intel287 math coprocessors. 



i 



7-23 



SPECIAL COMPUTATIONAL SITUATIONS 



7.1 .1 2. Numeric Overflow and Underflow 

If the exponent of a numeric result is too large for the destination real format, the FPU signals 
a numeric overflow. Conversely, if the exponent of a result is too small to be represented in the 
destination format, a numeric underflow is signaled. If either of these exceptions occur, the 
result of the operation is outside the range of the destination real format. 

Typical algorithms are most likely to produce extremely large and small numbers in the 
calculation of intermediate, rather than final, results. Because of the great range of the 
extended-precision format, overflow and underflow are relatively rare events in most 
numerical applications. 

7.1.12.1. OVERFLOW 

The overflow exception can occur whenever the rounded true result would exceed in 
magnitude the largest finite number in the destination format. The exception can occur in the 
execution of most of the arithmetic instructions and in some of the conversion instructions; 
namely, FST(P), F(I)ADD(P), F(I)SUB(R)(P), F(I)MUL(P), FDIV(R)(P), FSCALE, FYL2X, 
and FYL2XP1. 

The response to an overflow condition depends on whether the overflow exception is masked: 

• Overflow exception masked. The value returned depends on the rounding mode as 
Table 7-12 illustrates. 

• Overflow exception not masked. The unmasked response depends on whether the 
instruction is supposed to store the result on the stack or in memory: 

— If the destination is the stack, then true result is divided by 2 24>576 and rounded. (The 
bias 24,576 is equal to 3 x 2 13 .) The significand is rounded to the appropriate 
precision (according to the precision control (PC) bit of the control word, for those 
instructions controlled by PC, otherwise to extended precision). The roundup bit (CI) 
of the status word is set if the significand was rounded upward. The biasing of the 
exponent by 24,576 normally translates the number as nearly as possible to the middle 
of the exponent range so that, if desired, it can be used in subsequent scaled operations 
with less risk of causing further exceptions. With the instruction FSCALE, however, it 
can happen that the result is too large and overflows even after biasing. In this case, 
the unmasked response is exactly the same as the masked round-to-nearest response, 
namely ± infinity. The intention of this feature is to ensure the trap handler will 
discover that a translation of the exponent by -24574 would not work correctly 
without obliging the programmer of Decimal-to-Binary or Exponential functions to 
determine which trap handler, if any, should be invoked. 

— If the destination is memory (this can occur only with the store instructions), then no 
result is stored in memory. Instead, the operand is left intact in the stack. Because the 
data in the stack is in extended-precision format, the exception handler has the option 
either of reexecuting the store instruction after proper adjustment of the operand or of 
rounding the significand on the stack to the destination's precision as the standard 
requires. The exception handler should ultimately store a value into the destination 



7-24 



SPECIAL COMPUTATIONAL SITUATIONS 

location in memory if the program is to continue. 



Table 7-12. Masked Overflow Results 



Rounding Mode 


Sign of True Result 


Result 


To nearest 


+ 




Toward -<*> 


+ 


Largest finite positive number 

— 00 


Toward +°° 


+ 


+00 

Largest finite negative number 


Toward zero 


+ 


Largest finite positive number 
Largest finite negative number 



7.1.12.2. UNDERFLOW 

Underflow can occur in the execution of the instructions FST(P), FADD(P), FSUB(RP), 
FMUL(P), F(I)DIV(RP), FSCALE, FPREM(l), FPTAN, FSIN, FSINCOS, FPATAN, F2XM1, 
FYL2X,and FYL2XP1. 

Two related events contribute to underflow: 

1 . Creation of a tiny (denormal) result which, because it is so small, may cause some other 
exception later (such as overflow upon division). 

2. Creation of an inexact result; i.e. the delivered result differs from what would have been 
computed were both the exponent range and precision unbounded. 

Which of these events triggers the underflow exception depends on whether the underflow 
exception is masked: 

1 . Underflow exception masked. The underflow exception is signaled when the result is both 
tiny and inexact. 

2. Underflow exception not masked. The underflow exception is signaled when the result is 
tiny, regardless of inexactness. 

The response to an underflow exception also depends on whether the exception is masked: 

1. Masked response. The result is denormal or zero. The precision exception is also 
triggered. 

2. Unmasked response. The unmasked response depends on whether the instruction is 
supposed to store the result on the stack or in memory 

— If the destination is the stack, then the true result is multiplied by 2 24 ' 576 and rounded. 
(The bias 24,576 is equal to 3 x 2 13 .) The significand is rounded to the appropriate 
precision (according to the precision control (PC) bit of the control word, for those 
instructions controlled by PC, otherwise to extended precision). The roundup bit (CI) 



1 



7-25 



SPECIAL COMPUTATIONAL SITUATIONS 



of the status word is set if the significand was rounded upward. 

The biasing of the exponent by 24,576 normally translates the number as nearly as 
possible to the middle of the exponent range so that, if desired, it can be used in 
subsequent scaled operations with less risk of causing further exceptions. With the 
instruction FSCALE, however, it can happen that the result is too tiny and underflows 
even after biasing. In this case, the unmasked response is exactly the same as the 
masked round-to-nearest response, namely ±0. The intention of this feature is to 
ensure the trap handler will discover that a translation by +24576 would not work 
correctly without obliging the programmer of Decimal-to-Binary or Exponential 
functions to determine which trap handler, if any, should be invoked. 

— If the destination is memory (this can occur only with the store instructions), then no 
result is stored in memory. Instead, the operand is left intact in the stack. Because the 
data in the stack is in extended-precision format, the exception handler has the option 
either of reexecuting the store instruction after proper adjustment of the operand or of 
rounding the significand on the stack to the destination's precision as the standard 
requires. The exception handler should ultimately store a value into the destination 
location in memory if the program is to continue. 



7.1 .1 3. Inexact (Precision) 

This exception condition occurs if the result of an operation is not exactly representable in the 
destination format. For example, the fraction 1/3 cannot be precisely represented in binary 
form. This exception occurs frequently and indicates that some (generally acceptable) accuracy 
has been lost. 

By their nature, the transcendental instructions cause the inexact exception for their core cases. 
Table 7-13 lists the core cases for each of the transcendental instructions. 



Table 7-13. Transcendental Core Ranges 



Instruction 


Core Range 


FSIN 


I e | < 2 63 


FCOS 


| | < 2 63 


FSINCOS 


| 8 | < 2 63 


FPTAN 


| | < 2 63 


FPATAN 


no restriction 


F2XM1 


-1 < X < 1 


FYL2X* 


X>0 


FYL2XP1 * 


-(1-(a/2/2))<ST<a/2 -1 



NOTES: For these 2-operand instructions, Y should be normal for the core cases. 



The CI (roundup) bit of the status word indicates whether the inexact result was rounded up 
(Cl = l)or chopped (C1 = 0). 

7-26 I 



SPECIAL COMPUTATIONAL SITUATIONS 



The inexact exception accompanies the underflow exception when there is also a loss of 
accuracy. When underflow is masked, the underflow exception is signaled only when there is a 
loss of accuracy; therefore the precision flag is always set as well. When underflow is 
unmasked, there may or may not have been a loss of accuracy; the precision bit indicates 
which is the case. 

This exception is provided for applications that need to perform exact arithmetic only. Most 
applications will mask this exception. The FPU delivers the rounded or over/underflowed 
result to the destination, regardless of whether a trap occurs. 



7-1.14. Exception Priority 

The processor deals with exceptions according to a predetermined precedence. Precedence in 
exception handling means that higher-priority exceptions are flagged and results are delivered 
according to the requirements of that exception. Lower-priority exceptions may not be flagged 
even if they occur. For example, dividing an SNaN by zero causes an invalid-operand 
exception (due to the SNaN) and not a zero-divide exception; the masked result is the QNaN 
real indefinite, not <*>. A denormal or inexact (precision) exception, however, can accompany a 
numeric underflow or overflow exception. 

The precedence among numeric exceptions is as follows: 

1. Invalid operation exception, subdivided as follows: 

a. Stack underflow. 

b. Stack overflow. 

c. Operand of unsupported format. 

d. SNaN operand. 

2. QNaN operand. Though this is not an exception, if one operand is a QNaN, dealing with it 
has precedence over lower-priority exceptions. For example, a QNaN divided by zero 
results in a QNaN, not a zero-divide exception. 

3. Any other invalid-operation exception not mentioned above or zero divide. 

4. Denormal operand. If masked, then instruction execution continues, and a lower-priority 
exception can occur as well. 

5. Numeric overflow and underflow. Inexact result (precision) can be flagged as well. 

6. Inexact result (precision). 



7.1.15. Standard Underflow/Overflow Exception Handler 

As long as the underflow and overflow exceptions are masked, no additional software is 
required to cause the output of the processor to conform to the requirements of IEEE Std 854. 
When unmasked, these exceptions give the exception handler an additional option in the case 
of store instructions. No result is stored in memory; instead, the operand is left intact on the 
stack. The handler may round the significand of the operand on the stack to the destination's 
precision as the standard requires, or it may adjust the operand and reexecute the faulting 
instruction. 



7-27 



Numeric 

Programming 

Examples 



i 



® 



CHAPTER 8 

NUMERIC PROGRAMMING EXAMPLES 



The following sections contain examples of numeric programs written in ASM386/Intel486. 
These examples are intended to illustrate some of the techniques useful for programming 
numeric applications. 



As discussed earlier, several numeric instructions post their results to the condition code bits of 
the FPU status word. Although there are many ways to implement conditional branching 
following a comparison, the basic approach is as follows: 

• Execute the comparison. 

• Store the status word. (The FPU status word can be stored directly into AX register.) 

• Inspect the condition code bits. 

• Jump on the result. 

Example 8-1 is a code fragment that illustrates how two memory-resident double-format real 
numbers might be compared (similar code could be used with the FTST instruction). The 
numbers are called A and B, and the comparison is A to B. 



8.1 . CONDITIONAL BRANCHING EXAMPLE 



Example 8-1. Conditional Branching for Compares 



A 



DQ 
DQ 



B 



FLD 



A 



LOAD A ONTO TOP OF FPU STACK 

COMPARE A; B POP A 

STORE RESULT TO AX REGISTER 



FCOMP B 
FSTSWAX 



CPU AX REGISTER CONTAINS CONDITION CODES 

(RESULTS OF COMPARE) 
LOAD CONDITION CODES INTO FLAGS 



SAHF 



USE CONDI TONAL JUMPS TO DETERMINE ORDERING OF A TO B 



JP A_B_UNORDERED ; TEST C2 (PF) 



8-1 



NUMERIC PROGRAMMING EXAMPLES 



JB A_LESS 
JE A_EQUAL 
A_GREATER : 



TEST CO (CF) 
TEST C3 (ZF) 

CO (CF) = 0, C3 (ZF) = 



A_EQUAL 



CO (CF) 1, C3 (ZF) = 



A_LESS 



CD (CF) = 1, C3 (ZF) = 



A_B_UNORDERED : 



C2 (PF) = 1 



The comparison itself requires loading A onto the top of the FPU register stack and then 
comparing it to B, while popping the stack with the same instruction. The status word is then 
written into the AX register. 

A and B have four possible orderings, and bits C3, C2, and CO of the condition code indicate 
which ordering holds. These bits are positioned in the upper byte of the FPU status word so as 
to correspond to the zero, parity, and carry flags (ZF, PF, and CF), when the byte is written 
into the flags. The code fragment sets ZF, PF, and CF of the EFLAGS register to the values of 
C3, C2, and CO of the FPU status word, and then uses the conditional jump instructions to test 
the flags. The resulting code is extremely compact, requiring only seven instructions. 

The FX AM instruction updates all four condition code bits. Example 8-2 shows how a jump 
table can be used to determine the characteristics of the value examined. The jump table 
(FXAM_TBL) is initialized to contain the 32-bit displacement of 16 labels, one for each 
possible condition code setting. Note that four of the table entries contain the same value, 
"EMPTY." The first two condition code settings correspond to "EMPTY." The two other table 
entries that contain "EMPTY" will never be used on the 32-bit processors with integrated FPU 
or the Intel387 math coprocessor, but may be used if the code is executed with an Intel287 
math coprocessor. 



Example 8-2. Conditional Branching for FXAM 



JUMP TABLE FOR EXAMINE ROUTINE 



FXAM-TBL DD POSS_UNNORM, POS NAN, NEG_UNNORN, NEG_NAN , 

& POS_NORM, POS_INFINITY, NEG_NORM , 

& NEG_NFINITY, POS_ZERO, EMPTY, NEG_ZERO 

& EMPTY, POS_DENORM, EMPTY, NEG_DENORM, EMPTY 



EXAMINE ST AND STORE RESULT (CONDITION CODES) 



FXAM 



XOR 

FSTSWAX 



EAX, EAX 



CLEAR EAX 



CALCULATE OFFSET INTO JUMP TABLE 



8-2 



NUMERIC PROGRAMMING EXAMPLES 



AND 
SHR 
SAL 
OR 

XOR 



AX, 
EAX 
AH, 
AL, 



0100011100000000B 



CLEAR ALL BITS EXCEPT C3 , C2-C0 



6 

5 

AH 



AH, AH 



SHIFT C2-C0 INTO PLACE (000XXX00) 
POSITION C3 (00X00000) 
DROP C3 IN ADJACENT TO C2 
(00XXXX00) 

CLEAR OUT THE OLD COPY OF C3 



; JUMP TO THE ROUTINE ' ADDRESSED' BY CONDITION CODE 



JMP FXAM_TBL [ EAX ] 



HERE ARE THE JUMP TARGETS, ONE TO HANDLE 
EACH POSSIBLE RESULT OF FX AM 



POS_UNNORM: 

POS_NAM: 

NEG_UNNOM : 

NEG_NAM: 

POS_NORM : 

POS_INFINITY: 

NEG_NORM : 

NEG_INFINITY: 

POS_ZERO: 

EMPTY : 

NEG_ZERO: 

POS_DENORM: 

NEG_DENORM: 



The program fragment performs the FXAM and stores the status word. It then manipulates the 
condition code bits to finally produce a number in register AX that equals the condition code 
times 2. This involves zeroing the unused bits in the byte that contains the code, shifting C3 to 
the right so that it is adjacent to C2, and then shifting the code to multiply it by 2. The resulting 
value is used as an index that selects one of the displacements from FXAM_TBL (the 
multiplication of the condition code is required because of the 2-byte length of each value in 
FXAM_TBL). The unconditional JMP instruction effectively vectors through the jump table to 
the labeled routine that contains code (not shown in the example) to process each possible 
result of the FXAM instruction. 



i 



8-3 



NUMERIC PROGRAMMING EXAMPLES 



8.2. EXCEPTION HANDLING EXAMPLES 

There are many approaches to writing exception handlers. One useful technique is to consider 
the exception handler procedure as consisting of "prologue," "body," and "epilogue" sections 
of code. This procedure is invoked via interrupt number 16. 

In the transfer of control to the exception handler due to an INTR, NMI, or SMI, interrupts 
have been disabled by hardware. The prologue performs all functions that must be protected 
from possible interruption by higher-priority sources. Typically, this involves saving registers 
and transferring diagnostic information from the FPU to memory. When the critical processing 
has been completed, the prologue may re-enable interrupts to allow higher-priority interrupt 
handlers to preempt the exception handler. 

The body of the exception handler examines the diagnostic information and makes a response 
that is necessarily application-dependent. This response may range from halting execution, to 
displaying a message, to attempting to repair the problem and proceed with normal execution. 

The epilogue essentially reverses the actions of the prologue, restoring the processor so that 
normal execution can be resumed. The epilogue must not load an unmasked exception flag into 
the FPU or another exception will be requested immediately. 

The following code examples show the ASM386/Intel486 coding of three skeleton exception 
handlers. They show how prologues and epilogues can be written for various situations, but 
provide comments indicating only where the application dependent exception handling body 
should be placed. 

The first two are very similar; their only substantial difference is their choice of instructions to 
save and restore the FPU. The tradeoff here is between the increased diagnostic information 
provided by FNSAVE and the faster execution of FNSTENV. For applications that are 
sensitive to interrupt latency or that do not need to examine register contents, FNSTENV 
reduces the duration of the "critical region," during which the processor does not recognize 
another interrupt request. 

After the exception handler body, the epilogues prepare the processor to resume execution 
from the point of interruption (i.e., the instruction following the one that generated the 
unmasked exception). Notice that the exception flags in the memory image that is loaded into 
the FPU are cleared to zero prior to reloading (in fact, in these examples, the entire status word 
image is cleared). 

Example 8-3 and Example 8-4 assume that the exception handler itself will not cause an 
unmasked exception. Where this is a possibility, the general approach shown in Example 8-5 
can be employed. The basic technique is to save the full FPU state and then to load a new 
control word in the prologue. Note that considerable care should be taken when designing an 
exception handler of this type to prevent the handler from being reentered endlessly. 

Example 8-3. Full-State Exception Handler 

SAVE_ALL PROC 
/ 

; SAVE REGISTERS, ALLOCATE STACK SPACE 
; FOR FPU STATE IMAGE 



8-4 



i 



irrtel 



® 



NUMERIC PROGRAMMING EXAMPLES 



PUSH EBP 



MOV EBP, ESP 
SUB ESP, 108 
;SAVE FULL FPU STATE, ENABLE INTERRUPTS 
FNSAVE [EBP- 10 8] 
STI 

; APPLICATION-DEPENDENT EXCEPTION HANDLING 
; CODE GOES HERE 

; CLEAR EXCEPTION FLAGS IN STATUS WORD 

; (WHICH IS IN MEMORY) 

; RESTORE MODIFIED STATE IMAGE 

MOV BYTE PTR [EBP-104] , OH 

FRSTOR [EBP-108] 
; DEALLOCATE STACK SPACE, RESTORE REGISTERS 

MOV ESP, EBP 



POP EBP 

; RETURN TO INTERRUPTED CALCULATION 

I RET 
SAVE_ALL ENDP 



Example 8-4. Reduced-Latency Exception Handler 



S AVE_ENVI RONMENT PROC 

; SAVE REGISTERS, ALLOCATE STACK SPACE 
; FOR FPU ENVIRONMENT 
PUSH EBP 



;SAVE ENVIRONMENT, ENABLE INTERRUPTS 
FNSTENV . . [EBP-28] 



APPLICATION-DEPENDENT EXCEPTION HANDLING 
CODE GOES HERE 

CLEAR EXCEPTION FLAGS IN STATUS WORD 
(WHICH IS IN MEMORY) 

RESTORE MODIFIED ENVI RONEMNT IMAGE 
MOV BYTE PTR [EBP-24], OH 



MOV 



EBP, ESP 
ESP, 28 



SUB 



STI 



I 



8-5 



NUMERIC PROGRAMMING EXAMPLES 



FLDENV [EBP-2 8] 
; DEALLOCATE STACK SPACE, RESTORE REGISTERS 
MOV ESP, EBP 



POP EBP 

; RETURN TO INTERRUPTED CALCULATION 
IRET 

S AVE_ENVR I ONEMNT ENDP 



Example 8-5. Reentrant Exception Handler 



LOCAL_CONTROL DW ? ; ASSUME INITIALIZED 



REENTRANT PROC 

; SAVE REGISTERS, ALLOCATE STACK SPACE 
; FOR FPU STATE IMAGE 
PUSH EBP 



MOV EBP, ESP 

SUB ESP, 108 

; SAVE STATE, LOAD NEW CONTROL WORD, 
; ENABLE INTERRUPTS 

FN SAVE [EBP- 10 8] 

FLDCW LOCAL_CONTROL 

STI 



APPLICATION-DEPENDENT EXCEPTION HANDLING 
CODE GOES HERE 

AN UNMASKED EXCEPTION GENERATED HERE WILL 
CAUSE THE EXCEPTION HANDLER TO BE REENTERED. 
IF LOCAL STORAGE IS NEEDED, IT MUST BE ALLOCATED 
ON THE STACK. 



CLEAR EXCEPTION FLAGS IN STATUS WORD 

(WHICH IS IN MEMORY) 
RESTORE MODIFIED STATE IMAGE 

MOV BYTE PTR [EBP-104], OH 

FRSTOR [EBP-10 8] 



8-6 




NUMERIC PROGRAMMING EXAMPLES 



; DEALLOCATE STACK SPACE, RESTORE REGISTERS 
MOV ESP, EBP 



POP EBP 

; RETURN TO POINT OF INTERRUPTION 

IRET 
REENTRANT ENDP 



8.3. FLOATING-POINT TO ASCII CONVERSION EXAMPLES 

Numeric programs must typically format their results at some point for presentation and 
inspection by the program user. In many cases, numeric results are formatted as ASCII strings 
for printing or display. This example shows how floating-point values can be converted to 
decimal ASCII character strings. Example 8-6 was developed using Intel's assemblers. 
Modification will need to be made to meet the requirements of other vendor's assemblers or 
their interface to high level languages. 

Shortness, speed, and accuracy were chosen rather than providing the maximum number of 
significant digits possible. An attempt is made to keep integers in their own domain to avoid 
unnecessary conversion errors. 

Using the extended precision real number format, this routine achieves a worst case accuracy 
of three units in the 16th decimal position for a noninteger value or integers greater than 10 . 
This is double precision accuracy. With values having decimal exponents less than 100 in 
magnitude, the accuracy is one unit in the 17th decimal position. 

Higher precision can be achieved with greater care in programming, larger program size, and 
lower performance. 



Example 8-6. Floating-Point to ASCII Conversion Routine 

SOURCE 

+1 $title ('Convert a floating point number to ASCII') 



name float ing_to_ascil 
public f loating_to_ascii 

extrn get_power_10 : near , tos_status : near 

This subroutine will convert the floating point 
number in the top of the NPX stack to an ASCII 
string and separate power of 10 scaling value 
(in binary) . The maximum width of the ASCII string 
formed is controlled by a parameter which must be 
>1 . Unnormal values, denormal values, and pseudo 



i 



8-7 



NUMERIC PROGRAMMING EXAMPLES 



zeros will be correctly converted. However, 
unnormals and pseudo zeros are no longer supported 
formats on the Intel486 processor in conformance with 
the IEEE floating point standard) and hence 
not generated internally. A returned value will 
indicate how many binary bits of precision were lost 
in an unnormal or denormal value. The magnitude 
(in terms of binary power) of a psuedo zero will also 
be indicated. Integers less than 10**18 in magnitude 
are accurately converted if the destination ASCII 
string field is wide enough to hold all the digits. 
Otherwise the value is converted to scientific notation 

The status of the conversion is indentified by the 
return value, it can be: 

Conversion complete, string size is defined 

1 invalid arguments 

2 exact integer conversion, string_size is defined 

3 indefinite 

4 + NAN (Not A Number) 

5 - NAN 

6 + Infinity 

7 - Infinity 

8 pseudo zero found, string_size is defined 

The PLM-386/486 calling convention is: 

f loating_to_ascii : 

procedure (number, denormal_ptr , string__ptr , size_ptr 
field_size, power_ptr) word external: 
declare (denormal__ptr , string_ptr, size_ptr) 
pointer; 

declare field_size word, 
string_size based size_ptr word; 
declare number real; 

declare denormal integer based denormal_ptr ; 
declare power integer based power_ptr; 
end f loating_to_ascii; 

The floating point value is expected to be 
on the top of the FPU stack. This subroutine 
expects 3 free entries on the FPU stack and 
will pop the passed value off when done. The 
generated ASCII string will have a leading 
character either '-' or indicating the sign 

of the value. The ASCII decimal digits will 
immediately follow. The numeric value of the 
ASCII string is (ASCII STRING. )*10 power. If 
the given number was zero, the ASCII string will 



8-8 



NUMERIC PROGRAMMING EXAMPLES 



contain a sign and a single zero character. The 
value string_size indicates the total length of 
ASCII string including the sign character. 
String (0) will always hold the sign. It is 
possible for string_size to be less than 
field_size. This occurs for zeroes of integer 
values. A psuedo zero will return a special 
return code. The denormal count will indicate 
the power of two originally associated with the 
value. The power of ten and ASCII string will 
be as if the value was an ordinary zero. 

This subroutine is accurate up to a maximum of 
18 decimal digits for integers. Integer values 
will have a decimal power of zero associated 
with the item. For non-integers, the result will be 
accurate to within 2 decimal digits of the 16th 
decimal place (double precision). The exponeniate 
instruction is also used for scaling the value into 
the range acceptable for the BCD data type. The 
rounding mode in effect on entry to the 
subroutine is used for the conversion. 

The following registers are not transparent: 

eax ebx edx esi edi eflags 



Define the stack layout. 



ebp_save 

es_save 

return_ptr 

power_ptr 

f ield_size 

size_ptr 

string_ptr 



equ 
equ 
equ 
equ 
equ 
equ 
equ 



denormal__ptr equ 



dword ptr [ebp] 
ebp_save + size ebp_save 
es_save + size es_save 
return_ptr + size return_ptr 
power_ptr + size power_ptr 
field_size + size field_size 
size_ptr + size size_ptr 
string_ptr + size string_ptr 



parms_size equ size power_ptr + size field_size 
& size size_ptr + size string_ptr -t 

& size denormal_ptr 



Define Constants used 

BCD DIGITS equ 18 ; number of digits in bcd_value 

WORD_SIZE equ 4 

BCD_SIZE equ 10 

MINUS equ 1 ; Define return values 



i 



8-9 



NUMERIC PROGRAMMING EXAMPLES 



NAN 


equ 


4 , 


INFINITY 


equ 


6 , 


INDEFINITE 


equ 


3 , 


PSUDO-ZERO 


equ 


8 , 


INVALID 


equ 


-2 , 


ZERO 


equ 


-4 


DENORMAL 


equ 


-6 


UNNORMAL 


equ 


-8 


NORMAL 


equ 





EXACT 


equ 


2 



The exact values chosen 
here are important . They must 
correspond to the possible return 
values and be in the same numeric 
order as tested by the program. 



Define layout of temporary storage area. 

power_two equ word ptr [EBP - WORD_SIZE] 

bcd_value equ tbyte ptr power_two - BCD_SIZE 

bcd_byte equ byte ptr bcd_value 

fraction equ bcd_value 



local_size equ size power_two + size bcd_value 

Allocate stack space for the temporaries so 
the stack will be big enough 



stack stackseg ( local_size+6 ) 
; space for locals 



allocate stack 



code segment public er 
extrn power_table : qword 

; Constants used by this function 

even ; Optimize for 16 bits 

const 10 dw ; Adjustment value for 

; too big BCD 

; Convert the C3 , C2 , CI, CO encoding from tos_status 
; into meaningful bit flags and values. 

status_table db UNNORMAL, NAN, UNNORMAL + MINUS, 

& NAN + MINUS, NORMAL, INFINITY, 

& NORMAL + MINUS, INFINITY + MINUS, 

& ZERO, INVALID, ZERO + MINUS, INVALID, 

& DENORMAL, INVALID, DENORMAL + MINUS, INVALID 

f loating_to_ascii proc 

call tos_status ; Look at status of ST(0) 

; Get descriptor from table 
movzxeax, staus_table [eax] 



8-10 



inteJ 



NUMERIC PROGRAMMING EXAMPLES 



cmp al, INVALID ; Look for empty ST ( ) 

jne not_empty 

; ST(0) is empty! Return the status value, 
ret parms_size 

Remove infinity from stack and exit. 

f ound_inf inity : 

fstp st(0) ; OK to leave fstp running 

jmp short exit_proc 

String space is too small 
Return invalid code. 

small_string : 

mov al, INVALID 

exit_proc : 

leave ; Restore stack setup 

pop es 

ret parms_size 

ST(0) is NAN or indefinite. Store the 

value in memory and look at the fraction 

field to separate indefinite from an ordinary NAN. 



NAN_or_inde finite : 
fstp fraction 

test al, MINUS 
fwait 

jz exit__proc 



remove value from stack 

for examination 

Look at sign bit 

Insure store is done 

Can't be indefinite if positive 



mov ebx, 0C0000000H ; Match against upper 32 bits of fraction 

; Compare bits 63-32 

sub ebx, dword ptr fraction + 4 

; Bits 31-0 must be zero 

or ebx, dword ptr fraction 
jnz exit_proc 

; Set return value for idefinite value 
mov al , INDEFINITE 
jmp exit_proc 

Allocate stack space for local variables 
and establish parameter addressability. 



8-11 



NUMERIC PROGRAMMING EXAMPLES 



not_empty : 

push es ; Save working register 

enter local_size, 0; Setup stack addressing 



check for enough string space 
mov ecx, field_size 
cmp ecx, 2 
jl small_string 



dec ecx 



; adjust for sign character 



; See if string is too large for BCD 
cmp ecx , BCD_DIGITS 
jbe size_ok 

; Else set maximum string size 

mov ecx, BCD_DIGITS 
size_ok : 

cmp al, INFINITY ; Look for infinity 



Return status value for + or - inf 
jge f ound_inf inity 

cmp al, NAN ; Look for NAN INDEFINITE 

jge NAN_or_inde finite 

Set default return values and check that 
the number is normalized. 



f abs 



mov 
mov 



xor edx,edx 
mov edi, denormal_ptr 
[edi] , dx 
ebx, power_ptr 
mov [ebx] , dx 
mov dl , al 
and dl, 1 
add dl , EXACT 
cmp al , ZERO 
jae convert_integer 
fstp fraction 
fwait 

mov al, bcd_byte + 7 

or byte ptr bcd_byte +7, 80h 

fid fraction 

fxtract 

test al, 80h 

jnz normal_value 



use positive value only 
sign bit in al has true sign of 
value 

form constant 
zero denormal count 



zero power of ten value 



; Test for zero 

; skip power code if value is zero 



8-12 



irrtel 



NUMERIC PROGRAMMING EXAMPLES 



fldl 

f sub 

ftst 

f stswax 

sahf 

jnz set_unnormal_count 
Found a psuedo zero 
fldlg2 



Develop power of ten estimate 



add dl, PSUEDO_ZERO - EXACT 
fmulpst (2) , st 
f xch 

fistpword ptr [ebx] 
jmp convert_integer 



; Get power of ten 
; set power of ten 



set_unnormal_count : 
fxtract 

f xch 
f chs 

fistpword ptr [edi] 



Get original fraction, 
now normalized 
Get unnormal count 

set unnormal count 



Calculate the decimal magnitude associated 
with this number to within one order. This 
error will always be inevitable due to 
rounding and lost precision. As a result, 
we will deliberately fail to consider the 
LOG10 of the fraction value in calculating 
the order. Since the fraction will always 
be 1 <= f < 2, its LOG10 will not change 
the basic accuracy of the function. To 
get the decimal order of magnitude, simply 
multiply the power of two by LOG10(2) and 
truncate the result to an integer. 



normal_value : 
fstp fraction 

fist power_two 
fldlg2 

fmul 

fistpword ptr [ebx] 



Save the fraction field 
for later use 
Save power of two 
Get LOG10 (2) 

Power_two is now safe to use 
Form LOG10(of exponent of number) 
Any rounding mode will work here 



Check if the magnitude of the number rules 
out treating it is an integer. 



i 



8-13 



NUMERIC PROGRAMMING EXAMPLES 



in 



CX has the maximum number of decimal digits 
allowed. 



fwait 



Wait for power-ten to be valid 



Get power of ten of value 
movsxsi, word ptr [ebx] 

sub esi, ecx ; Form scaling factor necessary in ax 

ja adjust_result ; Jump if number will not fit 

The number is between 1 and 10** ( f ield_size) . 
Test if it is an integer. 

fild power_two ; Restore original number 

sub dl, NORMAL_EXACT ; Convert to exact return value 

fid fraction 

f scale ; Form full value, this 

; is safe here 
fst st(l) ; Copy value for compare 

frndint ; Test if its an integer 

fcomp ; Compare values 

fstswax ; Save status 

sahf ; C3=l implies it was an integer 

jnz convert_integer 

fstp st(0) ; Remove non integer value 

add dl, NORMAL_EXACT ; Restore original return 



Scale the number to within the range allowed 
by the BCD format. The scaling operation should 
produce a number within one decimal order of 
magnitude of the largest decimal number 
representative within the given string width. 

The scaling power of ten value is in si. 



adjust_result : 



mov 
mov 



neg 



eax, esi 

word ptr [ebx] 



call get_power__10 
fraction 



fid 
fmul 
mov 
shl 



esi , 
esi , 



ecx 
3 



fild power_two 



; Setup for powlO 
Set initial power 

of ten return value 
Subtract one for each order of 
magnitude the value is scaled by 
Scaling factor is returned as 
exponent and fraction 
Get fraction 
Combine fractions 
Form power of ten of the maximum 
BCD value to fit in 
the string 

combine powers of two 



8-14 




NUMERIC PROGRAMMING EXAMPLES 



faddpst (2) , st 

fscale ; Form full value 

; exponent was safe 

fstp st(l) ; remove exponent 

; Test the adjusted value against a table 

; of exact powers of ten. The combined errors 

; of the magnitude estimate and power function 

; can result in a value one order of magnitude 

; too small or too large to fit correctly in 

; the BCD field. To handle this problem, pretest 

; the adjusted value, if it is too small or 

; large, then adjust it by ten and adjust the 

; power of ten value. 

test_power : 



compare against exact power entry. Use the next 
entry since cx has been decremenated by one 



fcom power_table [esi] 
f stswax 
sahf 

jb test_f or_small 

fdiv constlO 

and dl, not EXACT; 

inc word ptr [ebx] 

jmp short in_range 

test_f or_small : 

fcom power_table [esi 

f stswax 

sahf 



ftype power__table 
; No wait is necessary 
; If C3 = CO = then 
; too big 

; Else adjust value 
Remove exact flag 
; Adjust power of ten value 
; Convert the value to a BCD 
; integer 



DC 



m_range 



f imul constlO 
dec word ptr [ebx] 
in_range : 
f rndint 



Test relative size 
no wait is nessesary 
If CO = then 

st (0) >= lower_bound 
Convert the value to a 
BCD integer 

Adjust value into range 
Adjust power of ten value 

Form integer value 



Assert: <= TOS <= 999,999,999,999,999,999 
The TOS number will be exactly representable 
in 18 digit BCD format. 

convert_integer : 

fbstpbcd_value ; Store as BCD format number 

; While the store BCD runs, setup registers 



8-15 



NUMERIC PROGRAMMING EXAMPLES 



; for the conversion to ASCII. 



mov esi, BCD_SIZE-2 
mov cx, 0F04h 
mov ebx , 1 
Field for sign 
mov edi , string_ptr 



mov 

mov 

eld 

mov 

test 

jz 



ax, ds 
es , ax 

al, 

dl, MINUS 
positive_result 



; Initial BCD index value 
Set shift count and mask 
Set initial size of ASCII 

Get address of start of 
ASCII string 
Copy ds to es 

Set autoincrement mode 

Clear sign field 

Look for negative value 



mov al, 1 - ' 
positive_result : 
stosb 



and dl , 
fwait 



not MINUS; 



; Bump string pointer 
; past sign 
Turn off sign bit 
; Wait for fbstp to finish 



Register usage: 



ah: 


BCD byte value in use 


al: 


ASCII character value 


dx: 


Return value 


ch: 


BCD mask = OFh 


cl: 


BCD shift count = 4 


ebx: 


ASCII string field width 


esi : 


BCD field index 


edi : 


ASCII string field pointer 


ds , es : 


ASCII string segment base 



Remove leading zereos from the number. 
skip_leading_zeroes : 



mov 


ah, bcd_byte [esi] ; Get BCD byte 


mov 


al , ah 


Copy value 


shr 


al,cl 


Get high order digit 


and 


al, OFh 


Set zero flag 


jnz 


enter_odd 


Exit loop if leading 






non zero found 



mov 
and 
jnz 



al , ah 
al, Ofh 
enter even 



Get BCD byte again 
Get low order digit 
Exit loop if non zero 
digit found 



dec 
jns 



Decrement BCD index 



skip_leading_zeroes 



8-16 



NUMERIC PROGRAMMING EXAMPLES 



The significand was all zeroes. 

mov al, '0' ; Set initial zero 

stosb 

inc ebx ; Bump string length 

jmp short exit_with_value 

Now expand the BCD string into digit 
per byte values 0-9. 



digit_loop : 

mov ah, bcd_byte [esi ] 
mov al , ah 
shr al,cl 



; Get BCD byte 

; Get high order digit 



enter_odd: 

add al , ' ' 

stosb 

mov al , ah 
and al,0Fh 
inc ebx 
enter_even: 

add al, '0' 

stosb 

inc ebx 

dec esi 

jns digit__loop 



; Convert to ASCII 

; Put digit into ASCII 

; string area 

; Get low order digit 

; Bump field size counter 

; Convert to ASCII 

; Put digit into ASCII area 

; Bump field size counter 

; Go to next BCD byte 



; Conversion complete, set the string 
; size and reminder. 

exit_with_value : 

mov edi,size_ptr 

mov word ptr [edi],bx 

mov eax,edx ; set return value 

jmp exit_proc 



f loating_to_ascii endp 
code ends 
end 



+ 1 $title (calculate the value of 10**eax) 

; This subroutine will calculate the 

; value of 10**eax. For values of 

; <= eax <19, the result will exact. 

; All registers are transparent 

; and results are returned on the TOS 

; as two numbers, exponent in st(l) and 



8-17 



NUMERIC PROGRAMMING EXAMPLES 



fraction is st(0). The exponent value 
can be larger than the largest 
exponent of an extended real format 
number. Three stack entries are used. 



public 



name 



get_power 10 

get_power_10 , power_table 



stack stackseg 8 



code 



segment public er 



Use exact values from 1.0 to lel8. 



even 



optimize 16 bit access 



power_table dq 1 . , le , le2 , le3 
dq 164,165,166,167 
dq le8, le9, lelO, lell 
dq lel2, lel3, Iel4,lel5 
dq Iel6,lel7, lel8 

get_power_10 proc 

cmp eax,18 ; Test for <= ax < 19 

ja out_of_range 

fid power_table [eax*8] ; Get exact value 

fxtract ; Separate power 



; Calculate the value using the 

; exponentiate instruction. The following 

; relations are used: 

; 10**x= 2** (log2 (10) *x) 

; 2**(I+F) = * 2**F 

; if st(l) - I and st(0) = 2**F then 

; f scale produces 2**(I+F) 

out_of -range : 

fldl2t ; TOS = LOG2 (10) 

enter 4 , 

; Save power of 10 value, P 
mov [edp-4] , eax 

; TOS,X= LOG2(10)*P = LOG2(10**P) 
fimul dword ptr [edp-4] 

fldl ; Set TOS = 1.0 



ret 



and fraction 

OK to leave fxtract runni: 



8-18 



NUMERIC PROGRAMMING EXAMPLES 



f chs 

fid st(l) 
f rndint 



f xch st ( 2 ) 
f sub st , st ( 2 ) 



; Restore original rounding 
pop eax 
f x2ml 
leave 
f subr 
ret 

get_power_10 endp 

code ends 
end 



; Copy power value 

; in base two 

; TOS = I: -inf < I <= x 

; where I is an integer 

; Rounding mode does 

; not matter 

; TOS = x, st (1) =1.0 
; st(2) =1 
; TOS,F = x - I: 
; -1.0 < TOS <= 1.0 

rol 

; TOS = 2** (F) - 1.0 
; Restore stack 
; Form 2** (F) 

; OK to leave fsubr running 



+1 $Title (Determine TOS register contents) 

; This subroutine will return a value 

; from 0-15 in eax corresponding 

; to the contents of FPU TOS. All 

; registers are transparent and no 

; errors are possible. The return 

; value corresponds to 03,02,01,00 

; of FXAM instuction. 

name tos_status 
public tos_status 



stack stackseg 6 
code segment public er 



tos_status proc 



fxam 

f stswax 

mov al,ah 

and eax,4007h 

shr ah, 3 

or al, ah 



Get status of TOS register 
Get current status 
Put bits 10-8 into bits 2-0 
Mask out bits c3,c2,cl,c0 
Put bits c3 into bit 11 
Put c3 into bit 3 



8-19 



NUMERIC PROGRAMMING EXAMPLES 



mov ah, ; Clear return value 

ret 

tos_status endp 

code ends 
end 



8.3.1. Function Partitioning 

Three separate modules implement the conversion. Most of the work of the conversion is done 

in the module FLOATING TO_ASCII. The other modules are provided separately, because 

they have a more general use. One of them, GET_POWER__10, is also used by the ASCII to 
floating-point conversion routine. The other small module, TOS_STATUS, identifies what, if 
anything, is in the top of the numeric register stack. 



8.3.2. Exception Considerations 

Care is taken inside the function to avoid generating exceptions. Any possible numeric value is 
accepted. The only possible exception is insufficient space on the numeric register stack. 

The value passed in the numeric stack is checked for existence, type (NaN or infinity), and 
status (denormal, zero, sign). The string size is tested for a minimum and maximum value. If 
the top of the register stack is empty, or the string size is too small, the function returns with an 
error code. 

Overflow and underflow is avoided inside the function for very large or very small numbers. 



8.3.3. Special instructions 

The functions demonstrate the operation of several numeric instructions, different data types, 
and precision control. Shown are instructions for automatic conversion to BCD, calculating the 
value of 10 raised to an integer value, establishing and maintaining concurrency, data 
synchronization, and use of directed rounding on the FPU. 

Without the extended precision data type and built-in exponential function, the double 
precision accuracy of this function could not be attained with the size and speed of the shown 
example. 

The function relies on the numeric BCD data type for conversion from binary floating-point to 
decimal. It is not difficult to unpack the BCD digits into separate ASCII decimal digits. The 
major work involves scaling the floating-point value to the comparatively limited range of 
BCD values. To print a 9-digit result requires accurately scaling the given value to an integer 
between 10 8 and 10 9 . For example, the number +0.123456789 requires a scaling factor of 10 9 
to produce the value +123456789.0, which can be stored in 9 BCD digits. The scale factor 
must be an exact power of 10 to avoid changing any of the printed digit values. 

These routines should exactly convert all values exactly representable in decimal in the field 
size given. Integer values that fit in the given string size are not be scaled, but directly stored 



8-20 



NUMERIC PROGRAMMING EXAMPLES 



into the BCD form. Noninteger values exactly representable in decimal within the string size 
limits are also exactly converted. For example, 0.125 is exactly representable in binary or 
decimal. To convert this floating-point value to decimal, the scaling factor is 1000, resulting in 
125. When scaling a value, the function must keep track of where the decimal point lies in the 
final decimal value. 



8.3.4. Description of Operation 

Converting a floating-point number to decimal ASCII takes three major steps: identifying the 
magnitude of the number, scaling it for the BCD data type, and converting the BCD data type 
to a decimal ASCII string. 

Identifying the magnitude of the result requires finding the value X such that the number is 
represented by I x 10 x , where 1.0 < I < 10.0. Scaling the number requires multiplying it by a 
scaling factor 10 s , so that the result is an integer requiring no more decimal digits than 
provided for in the ASCII string. 

Once scaled, the numeric rounding modes and BCD conversion put the number in a form easy 
to convert to decimal ASCII by host software. 

Implementing each of these three steps requires attention to detail. To begin with, not all 
floating-point values have a numeric meaning. Values such as infinity, indefinite, or NaN may 
be encountered by the conversion routine. The conversion routine should recognize these 
values and identify them uniquely. 

Special cases of numeric values also exist. Denormals have numeric values, but should be 
recognized because they indicate that precision was lost during some earlier calculations. 

Once it has been determined that the number has a numeric value, and it is normalized (setting 
appropriate denormal flags, if necessary, to indicate this to the calling program), the value 
must be scaled to the BCD range. 



8.3.5. Scaling the Value 

To scale the number, its magnitude must be determined. It is sufficient to calculate the 
magnitude to an accuracy of 1 unit, or within a factor of 10 of the required value. After scaling 
the number, a check is made to see if the result falls in the range expected. If not, the result can 
be adjusted one decimal order of magnitude up or down. The adjustment test after the scaling 
is necessary due to inevitable inaccuracies in the scaling value. 

Because the magnitude estimate for the scale factor need only be close, a fast technique is 
used. The magnitude is estimated by multiplying the power of 2, the unbiased floating-point 
exponent, associated with the number by log 10 2. Rounding the result to an integer produces an 
estimate of sufficient accuracy. Ignoring the fraction value can introduce a maximum error of 
0.32 in the result. 

Using the magnitude of the value and size of the number string, the scaling factor can be 
calculated. Calculating the scaling factor is the most inaccurate operation of the conversion 
process. The relation lO x =2 (X * log2l °) is used for this function. The exponentiate instruction 
F2XM1 is used. 



i 



8-21 



NUMERIC PROGRAMMING EXAMPLES 



Due to restrictions on the range of values allowed by the F2XM1 instruction, the power of 2 
value is split into integer and fraction components. The relation 2 (I + F) = 2 1 x 2 F allows using 
the FSCALE instruction to recombine the 2 value, calculated through F2XM1, and the 2 1 part. 

8.3.5.1 . INACCURACY IN SCALING 

The inaccuracy in calculating the scale factor arises because of the trailing zeros placed into 
the fraction value of the power of two when stripping off the integer valued bits. For each 
integer valued bit in the power of 2 value separated from the fraction bits, one bit of precision 
is lost in the fraction field due to the zero fill occurring in the least significant bits. 

Up to 14 bits may be lost in the fraction because the largest allowed floating point exponent 
value is 2 14 -1. These bits directly reduce the accuracy of the calculated scale factor, thereby 
reducing the accuracy of the scaled value. For numbers in the range of 10 ±3 °, a maximum of 8 
bits of precision are lost in the scaling process. 

8.3.5.2. AVOIDING UNDERFLOW AND OVERFLOW 

The fraction and exponent fields of the number are separated to avoid underflow and overflow 
in calculating the scaling values. For example, to scale 10" 4932 to 10 8 requires a scaling factor 
of 10 4950 , which cannot be represented by the the Intel FPU's. 

By separating the exponent and fraction, the scaling operation involves adding the exponents 
separate from multiplying the fractions. The exponent arithmetic involves small integers, all 
easily represented by the Intel FPU's. 

8.3.5.3. FINAL ADJUSTMENTS 

It is possible that the power function (Get_Power_10) could produce a scaling value such that 
it forms a scaled result larger than the ASCII field could allow. For example, scaling 
9.9999999999999999 x 10 4900 by 1.00000000000000010 x 10" 4883 produces 
1.00000000000000009 x 10 18 . The scale factor is within the accuracy of the FPU and the 
result is within the conversion accuracy, but it cannot be represented in BCD format. This is 
why there is a post-scaling test on the magnitude of the result. The result can be multiplied or 
divided by 10, depending on whether the result was too small or too large, respectively. 



8,3.6- Output Format 

For maximum flexibility in output formats, the position of the decimal point is indicated by a 
binary integer called the power value. If the power value is zero, then the decimal point is 
assumed to be at the right of the rightmost digit. Power values greater than zero indicate how 
many trailing zeros are not shown. For each unit below zero, move the decimal point to the left 
in the string. 

The last step of the conversion is storing the result in BCD and indicating where the decimal 
point lies. The BCD string is then unpacked into ASCII decimal characters. The ASCII sign is 
set corresponding to the sign of the original value. 



1-22 




NUMERIC PROGRAMMING EXAMPLES 



8.4. TRIGONOMETRIC CALCULATION EXAMPLES 

In this example, the kinematics of a robot arm is modeled with the 4 x 4 homogeneous 
transformation matrices proposed by Denavit and Hartenberg 1 ' 2 . The translational and 
rotational relationships between adjacent links are described with these matrices using the D-H 
matrix method. For each link, there is a 4 x 4 homogeneous transformation matrix that 
represents the link's coordinate system (L x ) at the joint (J^ with respect to the previous link's 
coordinate system (J^, L^). The following four geometric quantities completely describe the 
motion of any rigid joint/link pair (J i? Lj), as Figure 8-1 illustrates. 

Gi = The angular displacement of the x i axis from the axis by rotating around the z lA axis 
(anticlockwise). 

di = The distance from the origin of the (i-l) th coordinate system along the Zj.j axis to the x 1 
axis. 

a { = The distance of the origin of the i th coordinate system from the z lA axis along the -x { 
axis. 

ocp The angular displacement of the z 1 axis from the z- x _ x about the x t axis (anticlockwise). 



Zi-1 



Xm 




APM1 



Figure 8-1 . Relationships Between Adjacent Joints 



i 



8-23 



NUMERIC PROGRAMMING EXAMPLES 



intel 



The D-H transformation matrix A . _ ^ for adjacent coordinate frames (from jointi.i to jointj is 
calculated as follows: 



A ; - 1 ~ ^z,d x T z ,e x T x?a x T xa 



where: 
T z ,d 

T z ,8 

T 

x v a 



represents a translation along the Zj.j axis 
represents a rotation of angle about the z iA axis 
represents a translation along the x { axis 
represents a rotation of angle a about the x { axis 



a;-,= 



cos 6; 
sin 0; 







- cos a,- sin 6 ; sin a,- sin 0, cos 0, 



cos OLj cos Z 
sin a, 




- sin OLj cos 0; sin 0; 



cos a t 







1 — ■ 



The composite homogeneous matrix T which represents the position and orientation of the 
joint/link pair with respect to the base system is obtained by successively multiplying the D-H 
transformation matrices for adjacent coordinate frames. 

Example 8-7 illustrates how the transformation process can be accomplished using the 
floating-point capabilities of the Intel architectures. The program consists of two major 
procedures-. The first procedure TRANS_PROC is used to calculate the elements in each D-H 
matrix, A. ,. The second procedure MATRIXMUL_PROC finds the product of two 
successive D-H matrices. 



T - A Q x A x x . . . x A . _ x 



Example 8-7. Robot Arm Kinematics Example 

NAME ROT_MATRIX_CAL 

; This example illustrates the use 

; of the Intel486™ floating point 

; instuctions, in paticular, the 

; FSINCOS function which gives both 

; the SIN and COS values. 

; The program calculates the 

; composite matrix for base to end- 

; effector transformation. 

; Only the kinematics is considered in 
; this example. 



8-24 



i 




NUMERIC PROGRAMMING EXAMPLES 



If the composite matrix mentioned above 

is given by: 

tin = Al x A2 ... x An 

Tin is found by successively calling 

trans_proc and matrixmul__proc until 

all matrices have been exhausted. 

trans_proc calculates entries in each 
A(A1, . . . ,An) while matrixmul_proc 
performs the matrix multiplication for 
Ai and Ai+1. matrixmul_proc in turn 
calls matrix_row and matrix_elem to 
do the multiplication. 



; Define stack space 

trans_stack stackseg 400 

; Define the matrix structure for 
; 4x4 transformational matrices 

a_matrix struc 



all 


dq 


? 


al2 


dq 


? 


al3 


dq 


? 


al4 


dq 


? 


a21 


dq 


? 


a22 


dq 


? 


a23 


dq 


9 


a24 


dq 


9 


a31 


dq Oh 




a32 


dq 


9 


a33 


dq 


9 


a34 


dq 


? 


a41 


dq Oh 




a42 


dq Oh 




a43 


dq Oh 




a44 


dq lh 





a_matrix ends 

; Assume One joint in the storage 
; allocation and hence for 
; two seats of parameters; however, 
; more joints are possible 

alp_deg struc 
alpha_degl dd ? 
alpha_deg2 dd ? 



i 



8-25 



NUMERIC PROGRAMMING EXAMPLES 



alp_deg ends 



tht_deg struc 

theta_deg dd ? 
tht_deg ends 



a_array struc 

Al dd ? 

A2 dd ? 
A_array ends 



D_array struc 

Dl dq ? 

D2 dq ? 
D_array ends 



trans-data is the data segment 



trans_data segment rw public 



Amx 
Bmx 
Tmx 

ALPHA_DEG 

THETA_DEG 

A_VECTOR 

D_VECTOR 

ZERO 

dl80 

NUM_JOINT 
NUM_ROW 
NUM_COL 
REVERSE 



a_matrix<> 
a_matrix<> 
a _marix<> 
alp_deg<> 
tht_deg<> 
A_array<> 
D_array<> 
dd 
dd 180 
equ 1 
equ 4 
equ 4 
db lh 



trans_data ends 



assume ds : trans_data, es : trans_data 



; Trans code contains the procedures 
; for calculating matrix elements and 
; matrix multiplications 

trans_code segment er publlic 
trans_proc proc far 



; Calculate alpha and theta in radians 
; from their values in degrees 



fldpi 
fdiv dl80 



8-26 



NUMERIC PROGRAMMING EXAMPLES 



; Duplicate pi/180 
fid st(0) 

fmul qword ptr ALPHA_DEG [ ecx* 8 ] 
fxch st(l) 

fmul qword ptr THETA_DEG [ ecx* 8 ] 



; theta (radians ) in ST and 
; alpha (radians ) in ST(1) 

; Calculate matrix elements 



all 




cos 


theta 




al2 




-cos alpha * sin 


thet 


al3 




sin 


alpha * sin 


theta 


al4 




A *cos theta 




a21 




sin 


theta 




a22 




cos 


alpha * cos 


theta 


a23 




sin 


alpha * cos 


theta 


a24 




A * 


sin theta 




a32 




sin 


alpha 




a33 




cos 


alpha 




a34 




D 






a31 




a41 


= a2 = a43 = 


0.0 


a44 




1 







ebx contains the offset for the matrix 



f sincos 

fid st(0) 

fst [ebx] .all 

fmul qword ptr 

f stp [ebx] .al4 

fxch st(l) 

fst [exb] .a21 

fid st 

fmul qword ptr 

f stp [ebx] .a24 

fid st (2) 
f sincos 



fst [ebx] . a33 

fxch st(l) 

fst [ebx] .a32 

fid st (2) 



; cos theta in ST 
; sin theta inst(l) 
; duplicate cos theta 
; cos theta in all 
A_VECTOR[ecx*8] 

A* cos theta in al4 
sin theta in ST 
sin theta in a21 
duplicate sin theta 
A_VECTOR[ecx*8] 

A * sin theta in a24 
alpha in ST 
cos alpha in ST 
sin alpha in ST(1) 
sin theta in ST (2) 
cos theta in ST (3) 
cos alpha in a33 
sin alpha in ST 
sin sin alpha in a32 
sin theta in ST 
sin alpha in ST (1) 



i 



8-27 



NUMERIC PROGRAMMING EXAMPLES 



fmul st,st(l) 

fstp [ebx] ,al3 

fmul st , st (3 ) 
f chs 

fstp [ebx] . a23 

fid st(2) 



fmul st,st(l) 
fstp [ebx] .a22 
fmul st,st(l) 



;sin alpha * sin theta 
; stored in a 13a 
;costheta * sin alpha 
;cos theta * sin alpha 
; stored in a23 
;cos theta in ST 
;cos alpha in ST(1) 
;sin theta in ST (2) 
;cos theta in ST (3) 
;cos theta * cos alpha 
; stored in a22 
;cos alpha * sin theta 



To take advantage of parallel operations 
between the IU and FPU 
push eax ; save eax 



also move D into a34 in a faster way 
mov eax, dword ptr D_VECTOR [ecx*8 ] 
mov dword ptr [ebx + 88] , eax 
mov eax, dword ptr D_VECTOR [ecx * 8 + 4] 
mov dword ptr [ebx + 92], eax 
pop eax ; restore eax 

fchs ;cos alpha * sin theta 

fstp [ebx] . al2 /stored in al2 

;and all nonzero elements 
;have been calculated 

ret 



trans__proc endp 
matrix_elem proc far 



; This procedure calculates the dot product of the ith row 
; of the first matrix and the jth column of the second 
; matrix: 



; TIJ where TIJ = sum of Aik x Bkj over k 

; parameters passed from the calling routine, 
; matrix_row: 
; ESI = (i-1) *8 
; EDI = (j-l)*8 

; local register, EBP = (k-l)*8 
push ebp ; save ebp 

push ecx ; ecx to be used as a tmp reg 

mov ecx, esi ; save it for later indexing 

; locating the element in the first matrix, A 



8-28 



irrtel 



NUMERIC PROGRAMMING EXAMPLES 



imul ecx, NUM_COL 



ecx contains offset due 
to preceding rows; the 
offset is from the beginning 
of the matrix 



xor ebp , ebp 



clear ebp, which will be 
used as a temp reg to index (k) 
across the ith row of the first 
matrix as well as down the jth 
column of the second matrix 



clear Tij for accumulating Aik*Bkj 
mov dword ptr [edx] [edi] , ebp 
mov dword ptr [edx] [edi +4] , ebp 



push ecx 



save on stack: esi * num_col = 
the offset of the beginning of 
the ith row from the 
beginning of the A matrix 



NXT_k : 

add ecx, 



ebp ;get to the kth column entry 

;of the ith row of the A matrix 



load Aik into FPU 
fid qword ptr [eax] [ecx] 



locating Bkj 
mov ecx, ebp 
imul ecx, NUM_ROW 



add ecx, edi 



fmul qword ptr [ebx] [ecx] 
pop ecx 
push ecx 



ecx contains the offset of the 
beginning of the kth row from 
the beginning of the B matrix 
get to the jth column 
of the kth row of the B matrix 
;Aik & Bkj 
;esi * num_col in ecx again 
;also at top of program stack 



add to the result in the output matrix, Tij 
add ecx, edi 



accumulating the sum of Aik 
fadd qword ptr [edx] [ecx] 
fstp qword ptr [edx] [ecx] 



Bkj 



increment k by 1, 
add ebp , 8 



ebp by 8 



Has k reached the width of the matrix yet? 
cmp ebp, NUM_C0L*8 



8-29 



NUMERIC PROGRAMMING EXAMPLES 



in 



j 1 NXT_k 

; Restore registers 

pop ecx ; clear esi_num_col from stack 

pop ecx /restore ecx 

pop ebp ; restore ebp 

ret 

matrix_elem endp 

matrix_row proc far 
xor edi, edi 
; scan across a row 

NXT_COL : 

call matrix_elem 

add edi, 8 

cmp edi, NUM_C0L*8 

j 1 NXT_COL 

ret 

matrix_row endp 
matrixmul_j?roc proc far 

; This procedure does the matrix multiplication by calling 
; matrix_row to calculate entries in each row. 

; The matrix multiplication is performed in the following 

; manner, 

; Tij = Aik x Bkj 

; where i and j denote the row and column 

; respectively and k is the index for scanning 

; across the ith row of the first matrix and 

; the jth column of the second matrix. 



mov ebp, esp 

mov edx, dword ptr [ebp+4] 

mov ebx, dword ptr [ebp+8] 

mov eax, dword ptr [ebp+12] 



use base pointer for indexing 
offset Tmx in edx 
offset Bmx in ebx 
offset Amx in eax 



; setup esi and edi 

; edi points to the column 

; esi points to the row 

xor esi, esi ; clear esi 

NXT_ROW: 

call matrix_row 

add esi, 8 

cmp esi, NUM_ROW * 8 



8-30 



NUMERIC PROGRAMMING EXAMPLES 



j 1 NXT_ROW 

ret 12 ;pop off matrix pointers 

matrixmul_proc endp 
trans_code ends 

.*******************************^ 

Main Program 
.****************************************** 

main_code segment er 
START : 

mov esp, stackstart trans_stack 

pushad ; save all registers 

; ECX denotes the number of joints where 

; number of matrices = NUM_JOINT + 1 

; Find the first matrix (from the base of the 

; system to the first joint) and call it Bmx 

xor ecx, ecx ;lst matrix 

mov ebx, offset Bmx 

call trans_proc ; is Bmx 

inc ecx 

NXT_MATRIX: 

; From the 2nd matrix and on, it will be stored in Amx. 

; The result from the first matrix mult, is stored in 

; Tmx but will be accessed as Bmx in the next multiplication. 

; As a matter of fact, the roles of Bmx and Tmx alternate in 

; successive multiplications. This is achieved by reversing 

; the order of the Bmx and Tmx pointers being passed onto the 

; program stack. Thus, this is invislbe to the matrix 

; mutliplication procedure. 

; REVERSE serves as the indicator 

; REVERSE = means that the result is to be placed in Tmx 

mov ebx, offset Amx ; f ind Amx 

call trans_proc 

inc ecx 

xor REVERSE, lh 

j n z Bmx_a s_Tmx 

; No reversing. Bms as the second input 
; matrix while Tmx as the output matrix. 



i 



8-31 



NUMERIC PROGRAMMING EXAMPLES 




push offset Amx 

push offset Bmx 

push offset Tmx 

jmp CONTINUE 

; Reversing. Tmx as the second input 
; matrix while Bms as the output matrix. 
Bmx_as_Tmx : 

push offset Amx 

push offset Tmx /reversing the 

push offset Bmx /pointers passed 

CONTINUE: 

call matrixmul_proc 
cmp ecx, NUM_JOINT 
jle NXT_MATRIX 

; if REVERSE = 1 then the final answer 
; will be in Bmx, otherwise in Tmx. 

popad 

main_code ends 

end START , ds : trans_data , ss : trans_stack 



J. Denavit and R.S. Hartenberg, "A Kinematic Notation for Lower-Pair Mechanisms Based on Matrices", 
Applied Mechanics, June 1955, pp. 215-221. 

C.S. George Lee, "Robot Arm Kinematics, Dynamics, and Control," IEEE Computer, Dec. 1982. 



8-32 



Part II 



System Programming 



Real- Address Mode 
System Architecture 



i 



intel 



CHAPTER 9 

REAL-ADDRESS MODE SYSTEM ARCHITECTURE 



The real-address mode of the Pentium processor runs programs written for the 8086, 8088, 
80186, or 80188 processors, or for the real-address mode of an Intel 286, Intel386, or Intel486 
processor. 

The architecture of the processor in this mode is almost identical to that of the 8086, 8088, 
80186, and 80188 processors. To a programmer, a 32-bit processor in real-address mode 
appears as a high-speed 8086 processor or real-mode Intel 286 processor with extensions to the 
instruction set and registers. The principal features of this architecture are defined in Chapter 3 
and Chapter 4. 

This chapter discusses certain additional topics which complete the system programmer's view 
of real-address mode: 

• Address formation. 

• Interrupt and exception handling. 

• Real-address mode exceptions. 

For information on input and output both in real-address mode and protected mode, refer to 
Chapter 15. 



9,1 . ADDRESS TRANSLATION 

In real-address mode, the processor does not interpret selectors by referring to descriptors; 
instead, it forms linear addresses as an 8086 processor would. It shifts the selector left by four 
bits to form a 20-bit base address. The effective address is extended with four clear bits in the 
upper bit positions and added to the base address to create a linear address, as shown in Figure 
9-1. 



LINEAR 
ADDRESS 



19 18 17 16 15 14 13 12 11 10 9 8 7 



5 4 3 2 1 



BASE 


16-BIT SEGMENT SELECTOR 





+ 


19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 


3 2 10 


OFFSET 





16-BIT EFFECTIVE ADDRESS 



20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 



xxxxxxxxxxxxxxxxxxxxx 



Figure 9-1. 8086 Address Translation 



i 



1-1 



REAL-ADDRESS MODE SYSTEM ARCHITECTURE 



Because of the possibility of a carry, the resulting linear address may have as many as 21 
significant bits. An 8086 program may generate linear addresses anywhere in the range to 
10_FFEFH (1 megabyte plus approximately 64K bytes) of the linear address space. (Note, 
however, that on the Intel486 and Pentium processors, the A20M# signal can be used in real- 
address mode to mask address signal A20, thereby mimicking the 20-bit wrap-around behavior 
of the 8086 processor) Because paging is not available in real-address mode, the linear address 
is used as the physical address. 

Unlike the 8086 and Intel 286 processors, but like the Intel386 and Intel486 processors, the 
Pentium processor can generate 32-bit effective addresses using an address override prefix; 
however in real-address mode, the value of a 32-bit address may not exceed 65,535 without 
causing an exception. For full compatibility with Intel 286 real-address mode, pseudo- 
protection faults (interrupt 12 or 13 with no error code) occur if an effective address is 
generated outside the range through 65,535. 



9.2. REGISTERS AND INSTRUCTIONS 

The register set available in real-address mode includes all the registers defined for the 8086 
processor plus the new registers introduced with the Intel386 processor and Intel387 
coprocessor: FS, GS, debug registers, control registers, test registers, and floating-point unit 
registers. New instructions which explicitly operate on the segment registers FS and GS are 
available, and the new segment-override prefixes can be used to cause instructions to use the 
FS and GS registers for address calculations. 

The instruction codes which generate invalid-opcode exceptions include instructions from 
protected mode which move or test protected-mode segment selectors and segment descriptors, 
i.e., the VERR, VERW, LAR, LSL, LTR, STR, LLDT, and SLDT instructions. Programs 
executing in real-address mode are able to take advantage of the new application-oriented 
instructions added to the architecture with the introduction of the 80186, 80188, Intel 286, 
Intel386, Intel486, and Pentium processors. 

Unlike the 8086 and Intel 286 processors, but like the Intel386 and Intel486 processors, the 
Pentium processor offers an operand-size override prefixe which enables access to 32-bit 
operands. This prefix should not be used, however, if compatibility with the 8086 or Intel 286 
processors is desired. 



9.3. INTERRUPT AND EXCEPTION HANDLING 

Interrupts and exceptions in real-address mode work much as they do on an 8086 processor. 
Interrupts and exceptions call interrupt procedures through an interrupt table. The processor 
scales the interrupt or exception identifier by four to obtain an index into the interrupt table. 
The entries of the interrupt table are far pointers to the entry points of interrupt or exception 
handler procedures. When an interrupt occurs, the processor pushes the current values of the 
CS and IP registers onto the stack, disables interrupts, clears the TF flag, and transfers control 
to the location specified in the interrupt table. An IRET instruction at the end of the handler 
procedure reverses these steps before returning control to the interrupted procedure. 
Exceptions do not return error codes in real-address mode. 



9-2 



i 



REAL-ADDRESS MODE SYSTEM ARCHITECTURE 



The primary difference in the interrupt handling of the 32-bit processors in real-address mode 
compared to the 8086 processor is that the location and size of the interrupt table depend on 
the contents of the IDTR register. Ordinarily, this fact is not apparent to programmers, 
because, after reset initialization, the IDTR register contains a base address of and a limit of 
3FFH, which is compatible with the 8086 processor. However, the LIDT instruction can be 
used in real-address mode to change the base and limit values in the IDTR register. See 
Chapter 9 for details on the IDTR register, and the LIDT and SIDT instructions. If an interrupt 
occurs and its entry in the interrupt table is beyond the limit stored in the IDTR register, a 
double-fault exception is generated. 

9.4. REAL-ADDRESS MODE EXCEPTIONS 

The processor reports some exceptions differently when executing in real-address mode than 
when executing in protected mode. Table 9-1 details the real-address-mode exceptions. 



9-3 



REAL-ADDRESS MODE SYSTEM ARCHITECTURE 




Table 9-1. Exceptions and Interrupts 









Does the Return Address 








ruini to ine 






Source of the 


Instruction Which Caused 


Description 


Vector 


Exception 


the Exception? 


Divide Error 





DIV and IDIV instructions 


yes 


Debug 


1 


Any 


*1 


NMI 


2 


Nonmaskable Interrupt 


yes 


Breakpoint 


3 


INT instruction 


no 


Overflow 


4 


INTO instruction 


no 


Bounds Check 


5 


BOUND instruction 


yes 


Invalid Opcode 


6 


Reserved opcodes and improper 
use of LOCK prefix 


yes 


Device not available 


7 


ESC or WAIT instructions 


yes 


Double Fault 


8 


Interrupt table limit too small, fault 
occurring while handling another 
fault 


yes 


Reserved 


9 






IrivalW Task State 
Sepment 3 


10 


MP, CALL, J RETT instructions, 
interrupts and exceptions 


' yes , ^ 


Segment not present 3 - 


11 


Any instruction which changes 
segments- . 


« yes - * ; 


Stack Exception 


12 


Stack operation crosses address 
limit (beyond offset FFFFH) 


yes 


CS, DS, ES, FS, GS 
Segment Overrun 


13 


Word memory reference beyond 
offset FFFFH. An attempt to 
execute past the end of CS 
segment. 


yes 


"ape roiyll 




Pmy instruction inai references 
memory 


P . 


Reserved 


15 






Floating-Point Error 


16 


ESC or WAIT instructions 


yes 2 




17 * 


Any data reference - 


no 


Intel Reserved 


18-31 






Software Interrupt 


0-255 


INT n instructions 


no 


Maskable Interrupt 


32-255 




yes 



NOTES: 

1 . Some debug exceptions point to the faulting instruction, others point to the following instruction. The 
exception handler can test the DR6 register to determine which has occurred. 

2. Floating-point errors are reported on the first ESC or WAIT instruction after the ESC instruction which 
generated the error. 

3. Exceptions 10, 11, 14 and 17 do not occur in Real Mode, but are possible in virtual 8086 mode. 



9-4 



Protected Mode 
System Architecture 
Overview 



i 



Intel 

CHAPTER 10 

PROTECTED MODE SYSTEM ARCHITECTURE 

OVERVIEW 



Many of the architectural features of the processor are used only by system programmers. This 
chapter presents an overview of these features. Application programmers may need to read this 
chapter, and the following chapters which describe the use of these features, in order to 
understand the hardware facilities used by system programmers to create a reliable and secure 
environment for application programs. The system-level architecture also supports powerful 
debugging features which application programmers may wish to use during program 
development. 

The system-level features of the architecture include: 

• Memory Management 

• Protection 

• Multitasking 

• Exceptions and Interrupts 

• Input/Output 

• Initialization and Mode Switching 

• FPU Management 

• Debugging 

• Cache Management 

• Multiprocessing 

These features are supported by registers and instructions, all of which are introduced in the 
following sections. The purpose of this chapter is not to explain each feature in detail, but 
rather to place the remaining chapters about protected mode and systems programming in 
perspective. When a register or instruction is mentioned, it is accompanied by an explanation 
or a reference to a following chapter. 

10.1. SYSTEM REGISTERS 

The registers intended for use by system programmers fall into these categories: 

• EFLAGS Register 

• Memory-Management Registers 

• Control Registers 

• Debug Registers 

The system registers control the execution environment of application programs. Most systems 
restrict access to these facilities by application programs (although systems can be built where 



10-1 



PROTECTED MODE SYSTEM ARCHITECTURE OVERVIEW 



intel 



all programs run at the most privileged level, in which case application programs are allowed 
to modify these facilities). 



10-1-1. System Flags 

The system flags of the EFLAGS register control I/O, maskable interrupts, debugging, task 
switching, and the virtual-8086 mode. An application program should ignore these system 
flags, and should not attempt to change their state. In some systems, an attempt to change the 
state of a system flag by an application program results in an exception. These flags are shown 
in Figure 10-1. 



hl/30/2Q 




I 27 


fa 






/23/22/21/20/19/18/17/16/15/14/13 12/ll/lO/9 /8 


7 6 


/5/4/3/2/1/0 
























I 


V 


V 


A 


V 


R 




N 


10 





D 


1 


T 


s 


Z 




A 




P 




C 






f 


■0 





o 








D 


I 

P 


I 

F 


C 


M 




1 




PL 


F 


F 


F 


F 


F 


F 


f 






F 


% 


F 


V 


\ 


w 


\\ 


\ 


\ \ 


\ 


\ 


\ 


\ 


\ 


\ 


\ 




\ 


\ 


\\ 


\ 


\ 


\ 


\ 


\ 


\\ 


\ \ 



ID IDENTIFICATION FLAG 



VIP 
VIF 
AC 
VM 
RF 
NT 



VIRTUAL INTERRUPT PENDING 
VIRTUAL INTERRUPT FLAG - 

ALIGNMENT CHECK 

VIRTUAL 8086 MODE 

RESUME FLAG 



NESTED TASK FLAG 
IOPL I/O PRIVILEGE LEVEL 



IF 
TF 



□ 



INTERRUPT ENABLE FLAG 
TRAP FLAG 



BIT POSITIONS SHOWN AS OR 1 ARE INTEL RESERVED. DO NOT USE. 
ALWAYS SET THEM TO THE VALUE PREVIOUSLY READ. 



Figure 10-1. System Flags 



ID (Identification Flag, bit 21) 

The ability of a program to set and clear the ID flag indicates that the processor supports the 
CPUID instruction. Refer to Chapter 25 for details about CPUID. 

VIP (Virtual Interrupt Pending Flag, bit 20) 

The VIP flag together with the VIF enable each applications program in a multitasking 
environment to have virtualized versions of the system's IF flag. For more on the use of these 
flags in virtual-8086 mode and in protected mode, refer to Appendix H. 



VIF (Virtual Interrupt Flag, bit 19) 

The VIF is a virtual image of IF (the interrupt flag) used with VIP. 



10-2 



PROTECTED MODE SYSTEM ARCHITECTURE OVERVIEW 



AC (Alignment Check Mode, bit 18) 

Setting the AC flag and the AM bit in the CRO register enables alignment checking on memory 
references. An alignment-check exception is generated when reference is made to an unaligned 
operand, such as a word at an odd byte address or a double word at an address which is not an 
integral multiple of four. Alignment-check exceptions are generated only in user mode 
(privilege level 3). Memory references which default to privilege level 0, such as segment 
descriptor loads, do not generate this exception even when caused by a memory reference in 
user-mode. 

The alignment-check exception can be used to check alignment of data. This is useful when 
exchanging data with other processors, such as the i860™ microprocessor, which require all 
data to be aligned. The alignment-check exception can also be used by interpreters to flag 
some pointers as special by misaligning the pointer. This eliminates overhead of checking each 
pointer and only handles the special pointer when used. 

VM (VirtuaI-8086 Mode, bit 17) 

Setting the VM flag places the processor in virtual-8086 mode, which is an emulation of the 
programming environment of an 8086 processor. See Chapter 22 for more information. 

RF (Resume Flag, bit 16) 

The RF flag temporarily disables debug faults so that an instruction can be restarted after a 
debug fault without immediately causing another debug fault. The debugger sets this flag with 
the IRETD instruction when returning to the interrupted program. The RF flag is not affected 
by the POPF, POPFD or IRET instructions. See Chapter 14 and Chapter 17 for details. 

NT (Nested Task, bit 14) 

The processor sets and tests the nested task flag to control chaining of interrupted and called 
tasks. The NT flag affects the operation of the IRET instruction. The NT flag is affected by the 
POPF, POPFD, and IRET instructions. Improper changes to the state of this flag can generate 
unexpected exceptions in application programs. See Chapter 13 and Chapter 14 for more 
information on nested tasks. 

IOPL (I/O Privilege Level, bits 12 and 13) 

The I/O privilege level is used by the protection mechanism to control access to the I/O 
address space. The privilege level of the code segment currently executing (CPL) and the 
IOPL determine whether this field can be modified by the POPF, POPFD, and IRET 
instructions. See Chapter 15 for more information. 

IF (Interrupt-Enable Flag, bit 9) 

Setting the IF flag puts the processor in a mode in which it responds to maskable interrupt 
requests (INTR interrupts). Clearing the IF flag disables these interrupts. The IF flag has no 
effect on either exceptions or nonmaskable interrupts (NMI interrupts). The CPL and IOPL 
determine whether this field can be modified by the CLI, STI, POPF, POPFD, and IRET 
instructions. See Chapter 14 for more details about interrupts. 

■ 10-3 



PROTECTED MODE SYSTEM ARCHITECTURE OVERVIEW 



TF (Trap Flag, bit 8) 

Setting the TF flag puts the processor into single-step mode for debugging. In this mode, the 
processor generates a debug exception after each instruction, which allows a program to be 
inspected as it executes each instruction. Single-stepping is just one of several debugging 
features of the processor. If an application program sets the TF flag using the POPF, POPFD, 
or IRET instructions, a debug exception is generated. See Chapter 14 and Chapter 17 for more 
information. 



1 0.1 .2. Memory-Management Registers 

Four registers of the processor specify the locations of the data structures which control 
segmented memory management, as shown in Figure 10-2. Special instructions are provided 
for loading and storing these registers. The GDTR and IDTR registers can be loaded with 
instructions which get a six-byte block of data from memory. The LDTR and TR registers can 
be loaded with instructions which take a 16-bit segment selector as an operand. The remaining 
bytes of these registers are then loaded automatically by the processor from the descriptor 
referenced by the operand. 



SYSTEM ADDRESS REGISTERS 

32-BIT LINEAR BASE ADDRESS LIMIT 
47 1615 



GDTR 
IDTR 



SYSTEM SEGMENT 
REGISTERS 

15 



TR 
LDTR 



DESCRIPTOR REGISTERS (AUTOMATICALLY LOADED) 
32-BIT LINEAR BASE ADDRESS 32-BIT SEGMENT LIMIT ATTRIBUTES 



SELECTOR 








+1 




SELECTOR 













Figure 10-2. Memory Management Registers 



Most systems protect the instructions which load memory-management registers from use by 
application programs (although a system in which no protection is used is possible). 

GDTR Global Descriptor Table Register 

This register holds the 32-bit base address and 16-bit segment limit for the global descriptor 
table (GDT). When a reference is made to data in memory, a segment selector is used to find a 
segment descriptor in the GDT or LDT. A segment descriptor contains the base address for a 
segment. See Chapter 1 1 for an explanation of segmentation. 



10-4 



PROTECTED MODE SYSTEM ARCHITECTURE OVERVIEW 



LDTR Local Descriptor Table Register 

This register holds the 32-bit base address, 32-bit segment limit, descriptor attributes, and 16- 
bit segment selector for the local descriptor table (LDT). The segment which contains the LDT 
has a segment descriptor in the GDT. There is no segment selector for the GDT. When a 
reference is made to data in memory, a segment selector is used to find a segment descriptor in 
the GDT or LDT. A segment descriptor contains the base address for a segment. See 
Chapter 1 1 for an explanation of segmentation. 

IDTR Interrupt Descriptor Table Register 

This register holds the 32-bit base address and 16-bit segment limit for the interrupt descriptor 
table (IDT). When an interrupt occurs, the interrupt vector is used as an index to get a gate 
descriptor from this table. The gate descriptor contains a pointer used to start up the interrupt 
handler. See Chapter 14 for details of the interrupt mechanism. 

TR Task Register 

This register holds the 32-bit base address, 32-bit segment limit, descriptor attributes, and 16- 
bit segment selector for the task currently being executed. It references a task state segment 
(TSS) descriptor in the global descriptor table. See Chapter 13 for a description of the 
multitasking features of the processor. 



1 0.1 .3. Control Registers 

Figure 10-3 shows the format of the control registers CRO, CR1, CR2, CR3, and CR4. Most 
systems prevent application programs from loading the control registers (although an 
unprotected system would allow this). Application programs can read these registers; for 
example, reading CRO to determine if a numerics coprocessor is present. Forms of the MOV 
instruction allow these registers to be loaded from or stored in general registers. For example: 

MOV EAX, CRO 
MOV CR3 , EBX 

Refer to Chapter 16 for a list of the initial values of all these registers. 



10-5 



PROTECTED MODE SYSTEM ARCHITECTURE OVERVIEW 



/31 30 29 28 2726 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7/ 6/ 5/ 4/ 3/ 2/ lj~t 



M l 01 IH J IO 



CR4 



CR3 



CR2 



CR1 



CRO 



PAGE DIRECTORY BASE 



PAGE FAULT LINEAR ADDRESS 



c 


N 


D 


W 



n y& n ts. i\ \n \a \i \\ \s> % % 1 q>\ n\ A A \\^ 



Figure 10-3. Control Registers 



The CRO register contains system control flags, which control modes or indicate states which 
apply generally to the processor, rather than to the execution of an individual task. A program 
should not attempt to change any of the reserved bit positions. Reserved bits should always be 
set to the value previously read. 

PG (Paging, bit 31 of CRO) 

This bit enables paging when set and disables paging when clear. See Chapter 1 1 for more 
information about paging. See Chapter 16 for information on how to enable paging. 

When an exception is generated during paging, the CR2 register has the 32-bit linear address 
which caused the exception. See Chapter 14 for more information about handling exceptions 
generated during paging (page faults). 

When paging is used, the CR3 register has the 20 most-significant bits of the address of the 
page directory (the first-level page table). The CR3 register is also known as the page-directory 
base register (PDBR). Note that the page directory must be aligned to a page boundary, so the 
low 12 bits of the register are not used as address bits. Unlike the Intel386 DX processor, the 
Intel486 and Pentium processors assign functions to two of these bits. These are: 

• PCD (Page-Level Cache Disable, bit 4 of CR3) 

The state of this bit is driven on the PCD pin during bus cycles which are not paged, such 
as interrupt acknowledge cycles, when paging is enabled. It is driven during all bus cycles 
when paging is not enabled. The PCD pin is used to control caching in an external cache 
on a cycle-by-cycle basis. 

• PWT (Page-Level Writes Transparent, bit 3 of CR3) 

The state of this bit is driven on the PWT pin during bus cycles which are not paged, such 
as interrupt acknowledge cycles, when paging is enabled. It is driven during all bus cycles 



10-6 



PROTECTED MODE SYSTEM ARCHITECTURE OVERVIEW 



when paging is not enabled. The PWT pin is used to control write-through in an external 
cache on a cycle-by-cycle basis. 

CD (Cache Disable, bit 30 of CRO) 

This bit enables the internal cache fill mechanism when clear and disables it when set. Cache 
misses do not cause cache line fills when the bit is set. Note that cache hits are not disabled; to 
completely disable the cache, the cache must be invalidated. See Chapter 1 8 for information on 
caching. 

NW (Not Write-through, bit 29 of CRO) 

This bit enables write-throughs and cache invalidation cycles when clear and disables 
invalidation cycles and write-throughs which hit in the cache when set. See Chapter 18 for 
information on caching. 

AM (Alignment Mask, bit 18 of CRO) 

This bit allows alignment checking when set and disables alignment checking when clear. 
Alignment checking is performed only when the AM bit is set, the AC flag is set, and the CPL 
is 3 (user mode). 

WP (Write Protect, bit 16 of CRO) 

When set, this bit write-protects user-level pages against supervisor-level writes. When this bit 
is clear, read-only user-level pages can be written by a supervisor process. This feature is 
useful for implementing the copy-on- write method of creating a new process (forking) used by 
some operating systems, such as UNIX. 

NE (Numeric Error, bit 5 of CRO) 

This bit enables the standard mechanism for reporting floating-point numeric errors when set. 
When NE is clear and the IGNNE# input is active, numeric errors are ignored. When the NE 
bit is clear and the IGNNE# input is inactive, a numeric error causes the processor to stop and 
wait for an interrupt. The interrupt is generated by using the FERR# pin to drive an input to the 
interrupt controller (the FERR# pin emulates the ERROR# pin of the Intel287 and Intel387 DX 
math coprocessors). The NE bit, IGNNE# pin, and FERR# pin are used with external logic to 
implement PC-style error reporting. 

ET (Extension Type, bit 4 of CRO) 

This bit is one to indicate support of Intel387 DX math coprocessor instructions (on the 
Pentium microprocessor, this bit is reserved). 

TS (Task Switched, bit 3 of CRO) 

The processor sets the TS bit with every task switch and tests it when interpreting floating- 
point arithmetic instructions. This bit allows delaying save/restore of numeric context until the 
numeric data is actually used. The CLTS instruction clears this bit. 



10-7 



PROTECTED MODE SYSTEM ARCHITECTURE OVERVIEW 



39EM (Emulation, bit 2 of CRO) 

When the EM bit is set, execution of a numeric instruction generates the coprocessor-not- 
available exception. The EM bit must be set when the processor does not have a floating-point 
unit. 

MP (Monitor coprocessor, bit 1 of CRO) 

On the Intel 286 and Intel386 DX processors, the MP bit controls the function of the WAIT 
instruction, which is used to synchronize with a coprocessor. When running Intel 286 and 
Intel386 DX CPU programs on processors with the Intel486 and Pentium FPUs, this bit should 
be set. The MP bit should be reset in the Intel486 SX CPU. 

PE (Protection Enable, bit of CRO) 

Setting the PE bit enables segment-level protection. See Chapter 12 for more information 
about protection. See Chapter 16 for information on how to enable paging. 

The CR4 register contains bits that enable certain architectural extensions. This register is new 
with the Pentium microprocessor. 

VME (Virtual-8086 Mode Extensions, bit of CR4) 

Setting this bit to 1 enables support for a virtual interrupt flag in virtual-8086 mode. This 
feature can improve the performance of virtual-8086 applications by eliminating the overhead 
of faulting to a virtual-8086 monitor for emulation of certain operations. Refer to Appendix H 
for more information on this feature. 

PVI (Protected-Mode Virtual Interrupts, bit 1 of CR4) 

Setting this bit to 1 enables support for a virtual interrupt flag in protected mode. This feature 
can enable some programs designed for execution at privilege level to execute at privilege 
level 3. Refer to Appendix H for more information on this feature. 

TSD (Time Stamp Disable, bit 2 of CR4) 

Setting this bit to 1 makes RDTSC (read from time stamp counter) a privileged instruction. 
Refer to Chapter 25 for details on the RDTSC instruction. 

DE (Debugging Extensions, bit 3 of CR4) 

Setting this bit to 1 enables I/O breakpoints. Refer to Chapter 17 for more information on 
debugging. 

PSE (Page Size Extensions, bit 4 of CR4) 

Setting this bit to 1 enables four-megabyte pages. Refer to Appendix H for information about 
this feature. 



10-8 



intel 



PROTECTED MODE SYSTEM ARCHITECTURE OVERVIEW 



MCE (Machine Check Enable, bit 6 of CR4) 

Setting this bit to 1 enables the machine check exception. 



1 0.1 .4. Debug Registers 

The debug registers bring advanced debugging abilities to the processor, including data 
breakpoints and the ability to set instruction breakpoints without modifying code segments 
(useful in debugging ROM-based software). Only programs executing at the highest privilege 
level can access these registers. See Chapter 17 for a complete description of their formats and 
use. The debug registers are shown in Figure 10-4. 



131 30/29 28/27 26/25 24/23 22/21 20/19 18/17 16/15 14/13/12 11 10/ 9/ 8/ 7/ 6/ 5/ 4/ 3/ 2/ 1 / C 



LEN 

3 



R/W 


LEN 


R/W 


LEN 


R/W 


LEN 


3 


2 


2 


1 


1 






R/W 




% 1111 111111 11 1 1 1 



s o i 



11111 1 it t 



RESERVED 



RESERVED 



BREAKPOINT 3 LINEAR ADDRESS 



BREAKPOINT 2 LINEAR ADDRESS 



BREAKPOINT 1 LINEAR ADDRESS 



BREAKPOINT LINEAR ADDRESS 



> g.\ ■& -a is* t& dyra nis ?a \% \^ \i \\ \^ \i w \* ^ ft s \ •& i \ * 
□ RESERVED BITS. DO NOT DEFINE. 



DR7 



DR6 



DR5 



DR4 



DR3 



DR2 



DR1 



DRO 



Figure 10-4. Debug Registers 



10-9 



PROTECTED MODE SYSTEM ARCHITECTURE OVERVIEW 



10.2. SYSTEM INSTRUCTIONS 

System instructions deal with functions such as: 

1. Verfication of pointer parameters (see Chapter 12): 



Instruction 


Description 


Useful to 
Application? 


Protected from 
Application? 


ARPL 


Adjust RPL 


No 


No 


LAR 


Load Access Rights 


Yes 


No 


LSL 


Load Segment Limit 


Yes 


No 


VERR 


Verify for Reading 


Yes 


No 


VERW 


Verify for Writing 


Yes 


No 


Addressing descriptor tables (see Chapter 11): 


Instruction 


Description 


Useful to 
Application? 


Protected from 
Application? 


LLDT 


Load LDT Register 


No 


Yes 


SLDT 


Store LDT Register 


Yes 


No 


LGDT 


Load GDT Register 


No 


Yes 


SGDT 


Store GDT Register 


No 


No 


Multitasking (see Chapter 13): 


Instruction 


Description 


Useful to 
Application? 


Protected from 
Application? 


LTR 


Load Task Register 


No 


Yes 


STR 


Store Task Register 


Yes 


No 


Floating-Point Numerics (see Chapter 6): 


Instruction 


Description 


Useful to 
Application? 


Protected from 
Application? 


CLTS 


Clear TS bit in CRO 


No 


Yes 


ESC 


Escape Instructions 


Yes 


No 


WAIT 


Wait Until Coprocessor Not 
Busy 


Yes 


No 



10-10 




PROTECTED MODE SYSTEM ARCHITECTURE OVERVIEW 



5. Input and Output (see Chapter 15): 







Useful to 


Protected from 


Instruction 


Description 


Application? 


Application? 


IN 


Input 


Yes 


Can be 


OUT 


Output 


Yes 


Can be 


INS 


Input String 


Yes 


Can be 


OUTS 


Output String 


Yes 


Can be 


Interrupt control (see Chapter 14): 






Useful to 


Protected from 


Instruction 


Description 


Application? 


Application? 


CLI 


Clear IF flag 


Can be 


Can be 


STI 


Set IF flag 


Can be 


Can be 


LIDT 


Load IDT Register 


No 


Yes 


SIDT 


Store IDT Register 


No 


No 


Debugging (see Chapter 17): 






Useful to 


Protected from 


Instruction 


Description 


Application? 


Application? 


MOV 


Load and store debug 
registers 


No 


Yes 


Cache Management (see Chapter 18): 






Useful to 


Protected from 


Instruction 


Description 


Application? 


Application? 


INVD 


Invalidate cache, 
no write-back 


No 


Yes 


WBINVD 


Invalidate cache, 
with write-back 


No 


Yes 


INVLPG 


Invalidate TLB entry 


No 


Yes 



I 



10-11 



PROTECTED MODE SYSTEM ARCHITECTURE OVERVIEW 

9. System Control: 



Instruction 


Description 


Useful to 
Application? 


Protected from 
Application? 


SMSW 


Store MSW 


Yes 


No 


LMSW 


Load MSW 


No 


Yes 


MOV 


Load And Store Control 
Register 


No 


Yes 


HLT 


Halt Processor 


No 


Yes 


LOCK 


Bus Lock 


No 


No 


RSM 


Return from system 
management mode 


No 


Yes 



The SMSW and LMSW instructions are provided for compatibility with the 16-bit 
Intel 286 processor. Programs for 32-bit processors such as the Pentium microprocessor 
should not use these instructions. Instead, they should access the Control Registers using 
forms of the MOV instruction. The LMSW instruction does not affect the PG, CD, NW, 
AM, WP, NE or ET bits, and it cannot be used to clear the PE bit. 

The HLT instruction stops the processor until an enabled interrupt or RESET signal is 
received. (Note that the NMI and SMI interrupts are always enabled.) A special bus cycle 
is generated by the processor to indicate halt mode has been entered. Hardware may 
respond to this signal in a number of ways. An indicator light on the front panel may be 
turned on. An NMI interrupt for recording diagnostic information may be generated. Reset 
initialization may be invoked. Software designers may need to be aware of the response of 
hardware to halt mode. 

The LOCK instruction prefix is used to invoke a locked (atomic) read-modify-write 
operation when modifying a memory operand. The LOCK# signal is asserted and the 
processor does not respond to requests for bus control during a locked operation. This 
mechanism is used to allow reliable communications between processors in 
multiprocessor systems. 

In addition to the chapters mentioned above, detailed information about each of these 
instructions can be found in the instruction reference chapter, Chapter 25. 




10-12 



intgl® 

11 



Protected Mode 

Memory 

Management 



i 



CHAPTER 1 1 

PROTECTED MODE MEMORY MANAGEMENT 



Memory management is a hardware mechanism which lets operating systems create simplified 
environments for running programs. For example, when several programs are running at the 
same time, they must each be given an independent address space. If they all had to share the 
same address space, each would have to perform difficult and time-consuming checks to avoid 
interfering with the others. 

Memory management consists of segmentation and paging. Segmentation is used to give each 
program several independent, protected address spaces. Paging is used to support an 
environment where large address spaces are simulated using a small amount of RAM and some 
disk storage. System designers can choose to use either or both of these mechanisms. When 
several programs are running at the same time, either mechanism can be used to protect 
programs against interference from other programs. 

Segmentation allows memory to be completely unstructured and simple, like the memory 
model of an 8-bit processor, or highly structured with address translation and protection. The 
memory management features apply to units called segments. Each segment is an independent, 
protected address space. Access to segments is controlled by data which describes its size, the 
privilege level required to access it, the kinds of memory references which can be made to it 
(instruction fetch, stack push or pop, read operation, write operation, etc.), and whether it is 
present in memory. 

Segmentation is used to control memory access, which is useful for catching bugs during 
program development and for increasing the reliability of the final product. It also is used to 
simplify the linkage of object code modules. There is no reason to write position-independent 
code when full use is made of the segmentation mechanism, because all memory references 
can be made relative to the base addresses of a module's code and data segments. Segmentation 
can be used to create ROM-based software modules, in which fixed addresses (fixed, in the 
sense that they cannot be changed) are offsets from a segment's base address. Different 
software systems can have the ROM modules at different physical addresses because the 
segmentation mechanism will direct all memory references to the right place. 

In a simple memory architecture, all addresses refer to the same address space. This is the 
memory model used by 8-bit microprocessors, such as the 8080 processor, where the logical 
address is the physical address. The 32-bit processors in protected mode can be used in this 
way by mapping all segments into the same address space and keeping paging disabled. This 
might be done where an older design is being updated to 32-bit technology without also 
adopting the new architectural features. 

An application also could make partial use of segmentation. A frequent cause of software 
failures is the growth of the stack into the instruction code or data of a program. Segmentation 
can be used to prevent this. The stack can be put in an address space separate from the address 
space for either code or data. Stack addresses always would refer to the memory in the stack 
segment, while data addresses always would refer to memory in the data segment. The stack 
segment would have a maximum size enforced by hardware. Any attempt to grow the stack 
beyond this size would generate an exception. 



i 



11-1 



PROTECTED MODE MEMORY MANAGEMENT 



A complex system of programs can make full use of segmentation. For example, a system in 
which programs share data in real time can have precise control of access to that data. Program 
bugs appear as exceptions generated when a program makes improper access. This is useful as 
an aid to debugging during program development, and it also can be used to trigger error- 
recovery procedures in systems delivered to the end user. 

Segmentation hardware translates a segmented (logical) address into an address for a 
continuous, unsegmented address space, called a linear address. If paging is enabled, paging 
hardware translates a linear address into a physical address. If paging is not enabled, the linear 
address is used as the physical address. The physical address appears on the address bus 
coming out of the processor. 

Paging is a mechanism used to simulate a large, unsegmented address space using a small, 
fragmented address space and some disk storage. Paging provides access to data structures 
larger than the available memory space by keeping them partly in memory and partly on disk. 

Paging is applied to units of 4 kilobytes called pages. When a program attempts to access a 
page which is on disk, the program is interrupted in a special way. Unlike other exceptions and 
interrupts, an exception generated due to address translation restores the contents of the 
processor registers to values which allow the exception-generating instruction to be re- 
executed. This special treatment enables instruction restart; that is, it allows the operating 
system to read the page from disk, update the mapping of linear addresses to physical 
addresses for that page, and restart the program. This process is transparent to the program. 

Paging is optional. If an operating system never enables the paging mechanism, linear 
addresses will be used as physical addresses. This might be done where a design using a 16-bit 
processor is being updated to use a 32-bit processor. An operating system written for a 16-bit 
processor does not use paging because the size of its address space is so small (64K bytes) that 
it is more efficient to swap entire segments between RAM and disk, rather than individual 
pages. 

Paging would be enabled for operating systems, such as UNIX, which can support demand- 
paged virtual memory. Paging is transparent to application software, so an operating system 
intended to support application programs written for 16-bit processors can run those programs 
with paging enabled. Unlike paging, segmentation is not transparent to application programs. 
Programs which use segmentation must be run with the segments they were designed to use. 



1 1 .1 . SELECTING A SEGMENTATION MODEL 

A model for the segmentation of memory is chosen on the basis of reliability and performance. 
For example, a system which has several programs sharing data in real time would get 
maximum performance from a model which checks memory references in hardware. This 
would be a multisegment model. 

At the other extreme, a system which has just one program may get higher performance from 
an unsegmented or "flat" model. The elimination of "far" pointers and segment-override 
prefixes reduces code size and increases execution speed. Context switching is faster, because 
the contents of the segment registers no longer have to be saved or restored. 

Some of the benefits of segmentation also can be provided by paging. For example, data can 
be shared by mapping the same pages onto the address space of each program. 



11-2 



i 




PROTECTED MODE MEMORY MANAGEMENT 



11.1.1. Flat Model 

The simplest model is the flat model. In this model, all segments are mapped to the entire 
physical address space. A segment offset can refer to either code or data areas. To the greatest 
extent possible, this model removes the segmentation mechanism from the architecture seen by 
either the system designer or the application programmer. This might be done for a 
programming environment like UNIX, which supports paging but does not support 
segmentation. 

A segment is defined by a segment descriptor. At least two segment descriptors must be 
created for a flat model, one for code references and one for data references. Both descriptors 
have the same base address value. Whenever memory is accessed, the contents of one of the 
segment registers are used to select a segment descriptor. The segment descriptor provides the 
base address of the segment and its limit, as well as access control information (see Figure 
11-1). 

ROM usually is put at the top of the physical address space, because the processor begins 
execution at FFFF_FFFOH. RAM is placed at the bottom of the address space because the 
initial base address for the DS data segment after reset initialization is 0. 

For a flat model, each descriptor has a base address of and a segment limit of 4 gigabytes. By 
setting the segment limit to 4 gigabytes, the segmentation mechanism is kept from generating 
exceptions for memory references which fall outside of a segment. Exceptions could still be 
generated by the paging or segmentation protection mechanisms, but these also can be 
removed from the memory model. 



SEGMENT CODE AND DATA SEGMENT PHYSICAL 

REGISTERS DESCRIPTORS MEMORY 




EPROM 



DRAM 



APM93 



Figure 11-1. Flat Model 



I 



11-3 



PROTECTED MODE MEMORY MANAGEMENT 



intel 



1 1 .1 .2. Protected Flat Model 

The protected flat model is like the flat model, except the segment limits are set to include only 
the range of addresses for which memory actually exists. A general-protection exception will 
be generated on any attempt to access unimplemented memory. This might be used for systems 
in which the paging mechanism is disabled, because it provides a minimum level of hardware 
protection against some kinds of program bugs. 

In this model, the segmentation hardware prevents programs from addressing nonexistent 
memory locations. The consequences of being allowed access to these memory locations are 
hardware-dependent. For example, if the processor does not receive a READY# signal (the 
signal used to acknowledge and terminate a bus cycle), the bus cycle does not terminate and 
program execution stops. 

Although no program should make an attempt to access these memory locations, an attempt 
may occur as a result of program bugs. Without hardware checking of addresses, it is possible 
that a bug could suddenly stop program execution. With hardware checking, programs fail in a 
controlled way. A diagnostic message can appear and recovery procedures can be attempted. 

An example of a protected flat model is shown in Figure 11-2. Here, segment descriptors have 
been set up to cover only those ranges of memory which exist. A code and a data segment 
cover the EPROM and DRAM of physical memory. The code segment base and limit can 
optionally be set to allow access to DRAM area. The data segment limit must be set to the sum 
of EPROM and DRAM sizes. If memory-mapped I/O is used, it can be addressed just beyond 
the end of DRAM area. 



SEGMENT 
REGISTERS 



SEGMENT 
DESCRIPTORS 



PHYSICAL 
MEMORY 



CS 



ACCESS I LIMIT 



BASE ADDRESS 




ES 



SS 



DS 



FS 



ACCESS 1 LIMIT 



BASE ADDRESS 



GS 



EPROM 



MEMORY I/O 



DRAM 



4G 



Figure 11-2. Protected Flat Model 



11-4 



i 



PROTECTED MODE MEMORY MANAGEMENT 



11.1 .3. Multisegment Model 

The most sophisticated model is the multisegment model. Here, the full capabilities of the 
segmentation mechanism are used. Each program is given its own table of segment descriptors, 
and its own segments. The segments can be completely private to the program, or they can be 
shared with specific other programs. Access between programs and particular segments can be 
individually controlled. 

Up to six segments can be ready for immediate use. These are the segments which have 
segment selectors loaded in the segment registers. Other segments are accessed by loading 
their segment selectors into the segment registers (see Figure 11-3). 

Each segment is a separate address space. Even though they may be placed in adjacent blocks 
of physical memory, the segmentation mechanism prevents access to the contents of one 
segment by reading beyond the end of another. Every memory operation is checked against the 
limit specified for the segment it uses. An attempt to address memory beyond the end of the 
segment generates a general-protection exception. 

The segmentation mechanism only enforces the address range specified in the segment 
descriptor. It is the responsibility of the operating system to allocate separate address ranges to 
each segment. There may be situations in which it is desirable to have segments which share 
the same range of addresses. For example, a system can have both code and data stored in a 
ROM. A code segment descriptor would be used when the ROM is accessed for instruction 
fetches. A data segment descriptor would be used when the ROM is accessed as data. 



11-5 



PROTECTED MODE MEMORY MANAGEMENT 



SEGMENT 
REGISTERS 



SEGMENT 
DESCRIPTORS 



PHYSICAL 
MEMORY 



CS 




ACCESS I LIMIT 




BASE ADDRESS 



SS 




ACCESS I LIMIT 
BASE ADDRESS 








DS 


ACCESS | LIMIT 


-> 


BASE ADDRESS 



ES 




ACCESS 1 LIMIT 




-> 


BASE ADDRESS 





FS 




ACCESS I LIMIT 


-> 


BASE ADDRESS 








GS 




ACCESS | LIMIT 




BASE ADDRESS 



ACCESS 



LIMIT 



BASE ADDRESS 



iCCESS I LIMIT 
BASE ADDRESS 



ACCESS 



LIMIT 



BASE ADDRESS 




ACCESS | LIMIT 




BASE ADDRESS 


> 



Figure 11-3. Multisegment Model 



1 1 .2. SEGMENT TRANSLATION 

A logical address consists of the 16-bit segment selector for its segment and a 32-bit offset into 
the segment. The logical address is checked for access rights and range. If it passes these tests, 
the logical address is translated into a linear address by adding the offset to the base address of 
the segment. The base address comes from the segment descriptor, a data structure in memory 
which provides the size and location of a segment, as well as access control information. The 
segment descriptor comes from one of two tables, the global descriptor table (GDT) or the 
local descriptor table (LDT). There is one GDT for all programs in the system and one LDT 
for each separate program being run. If the operating system allows, different programs can 
share the same LDT. The system also can be set up with no LDTs; all programs will then use 
the GDT. 

Every logical address is associated with a segment (even if the system maps all segments into 
the same linear address space). Although a program can have thousands of segments, only six 
can be available for immediate use. These are the six segments whose segment selectors are 
loaded in the processor. The segment selector holds information used to translate the logical 
address into the corresponding linear address. 



11-6 



PROTECTED MODE MEMORY MANAGEMENT 



Separate segment registers exist in the processor for each kind of memory reference (code 
space, stack space, and data spaces). They hold the segment selectors for the segments 
currently in use. Access to other segments requires loading a segment register using a form of 
the MOV instruction. Up to four data spaces can be available at the same time, thus providing 
a total of six segment registers. 

When a segment selector is loaded, the base address, segment limit, and access control 
information also are loaded into the segment register. The processor does not reference the 
descriptor tables in memory again until another segment selector is loaded. The information 
saved in the processor allows it to translate addresses without making extra bus cycles. In 
systems in which multiple processors have access to the same descriptor tables, it is the 
responsibility of software to reload the segment registers when the descriptor tables are 
modified. If this is not done, an old segment descriptor cached in a segment register might be 
used after its memory -resident version has been modified. 

The segment selector contains a 13-bit index into one of the descriptor tables. The index is 
scaled by eight (the number of bytes in a segment descriptor) and added to the 32-bit base 
address of the descriptor table. The base address comes from either the global descriptor table 
register (GDTR) or the local descriptor table register (LDTR). These registers hold the linear 
address of the beginning of the descriptor tables. A bit in the segment selector specifies which 
table to use, as shown in Figure 1 1-4. 



11-7 



PROTECTED MODE MEMORY MANAGEMENT 



SEGMENT 
SELECTOR 



GLOBAL 
DESCRIPTOR 
TABLE 



LOCAL 
DESCRIPTOR 
TABLE 



Tl=0 



Tl=1 



I LIMIT 
BASE ADDRESS 



GDTR 



SELECTOR 
I LIMIT 
BASE ADDRESS 



LDTR 



Figure 11-4. Tl Bit Selects Descriptor Table 



The translated address is the linear address, as shown in Figure 1 1-5. If paging is not used, it is 
also the physical address. If paging is used, a second level of address translation produces the 
physical address. This translation is described in Section 11.3. 



11-8 



irrte 1 



PROTECTED MODE MEMORY MANAGEMENT 



LOGICAL 
ADDRESS 



15 



31 



SELECTOR 



OFFSET 



V 

DESCRIPTOR TABLE 





BASE >^ 


SEGMENT 
DESCRIPTOR 


ADDRESS ^ 





LINEAR 
ADDRESS 



31 



Figure 11-5. Segment Translation 



1 1 .2.1 . Segment Registers 

Each kind of memory reference is associated with a segment register. Code, data, and stack 
references each access the segments specified by the contents of their segment registers. More 
segments can be made available by loading their segment selectors into these registers during 
program execution. 

Every segment register has a "visible" part and an "invisible" part, as shown in Figure 11-6. 
There are forms of the MOV instruction to load the visible part of these segment registers. The 
invisible part is loaded by the processor. 



VISIBLE PART 


INVISIBLE PART 






SELECTOR 


BASE ADDRESS, LIMIT, ETC. 


CS 








SS 








DS 








ES 








FS 








GS 






APM104 



Figure 11-6. Segment Registers 



I 



11-9 



PROTECTED MODE MEMORY MANAGEMENT 



The operations which load these registers are instructions for application programs (described 
in Chapter 4). There are two kinds of these instructions: 

1. Direct load instructions such as the MOV, POP, LDS, LES, LSS, LGS, and LFS 
instructions. These instructions explicitly reference the segment registers. 

2. Implied load instructions such as the far pointer versions of the CALL and JMP 
instructions. These instructions change the contents of the CS register as an incidental part 
of their function. 

When one of these instructions is executed, the visible part of the segment register is loaded 
with a segment selector. The processor automatically loads the invisible part of the segment 
register with information (such as the base address) from the descriptor table. Because most 
instructions refer to segments whose selectors already have been loaded into segment registers, 
the processor can add the logical-address offset to the segment base address with no 
performance penalty. 



1 1 .2.2. Segment Selectors 

A segment selector points to the information which defines a segment, called a segment 
descriptor. A program may have more segments than the six whose segment selectors occupy 
segment registers. When this is true, the program uses forms of the MOV instruction to change 
the contents of these registers when it needs to access a new segment. 

A segment selector identifies a segment descriptor by specifying a descriptor table and a 
descriptor within that table. Segment selectors are visible to application programs as a part of a 
pointer variable, but the values of selectors are usually assigned or modified by link editors or 
linking loaders, not application programs. Figure 11-7 shows the format of a segment selector. 



ffS 14 13 12 11 10 9 8 7 6 5 4 3/2/1 



INDEX 



RPL 



S3 



AA 



TABLE INDICATOR 

0=GDT 
1=LDT 

REQUESTOR PRIVILEGE LEVEL 
00 = MOST PRIVILEGED 
11 = LEAST 



Figure 11-7. Segment Selector 



Index: Selects one of 8192 descriptors in a descriptor table. The processor multiplies the index 
value by 8 (the number of bytes in a segment descriptor) and adds the result to the base address 
of the descriptor table (from the GDTR or LDTR register). 



11-10 



i 



PROTECTED MODE MEMORY MANAGEMENT 



Table Indicator bit: Specifies the descriptor table to use. A clear bit selects the GDT; a set bit 
selects the current LDT. 

Requestor Privilege Level: When this field of a selector contains a privilege level having a 
greater value (i.e., less privileged) than the program, it effectively overrides the program's 
privilege level for accesses that use that selector. When a program uses a less privileged 
segment selector, memory accesses take place at the lesser privilege level. This is used to 
guard against a security violation in which a less privileged program uses a more privileged 
program to access protected data. 

For example, system utilities or device drivers must run with a high level of privilege in order 
to access protected facilities such as the control registers of peripheral interfaces. But they 
must not interfere with other protected facilities, even if a request to do so is received from a 
less privileged program. If a program requested reading a sector of disk into memory occupied 
by a more privileged program, such as the operating system, the RPL can be used to generate a 
general-protection exception when the less privileged segment selector is used. This exception 
occurs even though the program using the segment selector would have a sufficient privilege 
level to perform the operation on its own. 

Because the first entry of the GDT is not used by the processor, a selector which has an index 
of and a table indicator of (i.e., a selector which points to the first entry of the GDT) is used 
as a "null selector." The processor does not generate an exception when a segment register 
(other than the CS or SS registers) is loaded with a null selector. It does, however, generate an 
exception when a segment register holding a null selector is used to access memory. This 
feature can be used to initialize unused segment registers. 



1 1 .2.3. Segment Descriptors 

A segment descriptor is a data structure in memory which provides the processor with the size 
and location of a segment, as well as control and status information. Descriptors typically are 
created by compilers, linkers, loaders, or the operating system, but not application programs. 
Figure 11-8 illustrates the general descriptor format. All types of segment descriptors use a 
variation of this basic format. 



i 



11-11 



PROTECTED MODE MEMORY MANAGEMENT 



intel 



131 30 29 28 27 26 25 24/23/22I21J20/19 18 17 16/15/14 13/12/11 10 9 8/7 6 5 4 3 2 1 



BASE ADDRESS 15:00 







D 




A 


SEG 




D 








BASE 31:24 


G 


/ 


'i 


V 


LIMIT 


P 


P 


S 


TYPE 


BASE 23:16 1 






B 




L 


19:16 




L 









SEGMENT LIMIT 15:00 



+0 



V3\ 3s 7a n is i^ 1 ^ m ^ \i \\ \^ \i \\ % s> 1 as ^ ^ i \ 



AVL AVAILABLE FOR USE BY SYSTEM SOFTWARE 

BASE SEGMENT BASE ADDRESS 

D/B DEFAULT OPERATION SIZE 

(0 = 16-BIT SEGMENT; 1 = 32-BIT SEGMENT) 
DPL DESCRIPTOR PRIVILEGE LEVEL 

G GRANULARITY 
LIMIT SEGMENT LIMIT 

P SEGMENT PRESENT 

S DESCRIPTOR TYPE 

(0 = SYSTEM; 1 = APPLICATION) 
TYPE SEGMENT TYPE 

□ RESERVED 



Figure 11-8. Segment Descriptors 



Base: Defines the location of the segment within the 4 gigabyte physical address space. The 
processor puts together the three base address fields to form a single 32-bit value. Segment 
base values should be aligned to 16 byte boundaries to allow programs to maximize 
performance by aligning code/data on 16 byte boundaries. 

Granularity bit: Turns on scaling of the Limit field by a factor of 4096 (2 12 ). When the bit is 
clear, the segment limit is interpreted in units of one byte; when set, the segment limit is 
interpreted in units of 4K bytes. Note that the twelve least significant bits of the address are not 
tested when scaling is used. For example, a limit of with the Granularity bit set results in 
valid offsets from to 4095. Also note that only the Limit field is affected. The base address 
remains byte granular. 

Limit: Defines the size of the segment. The processor puts together the two limit fields to form 
a 20-bit value. The processor interprets the segment size in one of two ways, depending on the 
setting of the Granularity bit: 

1 . If the Granularity bit is clear, the segment size is from 1 byte to 1 megabyte, in increments 
of 1 byte. 

2. If the Granularity bit is set, the segment size is from 4 kilobytes to 4 gigabytes, in 
increments of 4K bytes. 



11-12 



i 



PROTECTED MODE MEMORY MANAGEMENT 



For expand-up segments, a logical address can have an offset ranging from to the limit. 
Other offsets generate exceptions. Expand-down segments reverse the sense of the Limit field; 
they can be addressed with any offset except those from to the limit (see the Type field, 
below). This is done to allow segments to be created in which increasing the value held in the 
Limit field allocates new memory at the bottom of the segment's address space, rather than at 
the top. Expand-down segments are intended to hold stacks, but it is not necessary to use them. 
If a stack is going to be put in a segment which does not need to change size, it can be a 
normal data segment. 

S bit: Determines whether a given segment is a system segment or a code or data segment. If 
the S bit is set, then the segment is either a code or a data segment. If it is clear, then the 
segment is a system segment. 

D bit/B bit: In a code segment, this bit is called the D bit, and it indicates the default length for 
operands and effective addresses. If the D bit is set, then 32-bit operands and 32-bit effective 
addressing modes are assumed. If it is clear, then 16-bit operands and addressing modes are 
assumed. In a data segment, this bit is called the B bit, and it controls two aspects of stack 
operation: 

1. The size of the stack pointer register. If B = 1, pushes, pops and calls all use 32-bit ESP 
register; if B = 0, stack operations use the 16-bit SP register. 

2. The upper bound of an expand-down stack. In expand-down segments, the Limit field 
specifies the lower bound of the stack segment, while the upper bound is an address of all 
1-bits. If B = 1, the upper bound is FFFF_FFFFH; if B = 0, the upper bound is FFFFH. 

Type: The interpretation of this field depends on whether the segment descriptor is for an 
application segment or a system segment. System segments have a slightly different descriptor 
format, discussed in Chapter 12. The Type field of a memory descriptor specifies the kind of 
access which may be made to a segment, and its direction of growth (see Table 11-1). 

For data segments, the three lowest bits of the type field can be interpreted as expand-down 
(E), write enable (W), and accessed (A). For code segments, the three lowest bits of the type 
field can be interpreted as conforming (C), read enable (R), and accessed (A). 

Data segments can be read-only or read/write. Stack segments are data segments which must 
be read/write. Loading the SS register with a segment selector for any other type of segment 
generates a general-protection exception. If the stack segment needs to be able to change size, 
it can be an expand-down data segment. The meaning of the segment limit is reversed for an 
expand-down segment. The valid offsets in an expand-down segment are those which generate 
exceptions in expand-up segments. Expand-up segments must be addressed by offsets which 
are equal or less than the segment limit. Offsets into expand-down segments always must be 
greater than the segment limit. This interpretation of the segment limit causes memory space to 
be allocated at the bottom of the segment when the segment limit is decreased, which is correct 
for stack segments because they grow toward lower addresses. If the stack is given a segment 
which does not change size, the segment does not need to be expand-down. 



i 



11-13 



PROTECTED MODE MEMORY MANAGEMENT 




Table 11-1. Application Segment Types 





11 


10 


9 


8 


Descriptor 




Type 




E 


w 


A 


Type 


Description 

















Data 


Read-Only 


1 











1 


Data 


Read-Only, accessed 


2 








1 





Data 


Read/Write 


3 








1 


1 


Data 


Read/Write, accessed 


4 





1 








Data 


Read-Only, expand-down 


5 





1 





1 


Data 


Read-Only, expand-down, accessed 


6 





1 


1 





Data 


Read/Write, expand-down 


7 





1 


1 


1 


Data 


Read/Write, expand-down, accessed 




11 


10 


9 


8 


Descriptor 




Type 




c 


R 


A 


Type 


Description 


8 













Code 


Execute-Only 


9 










1 


Code 


Execute-Only, accessed 


10 







1 





Code 


Execute/Read 


11 







1 


1 


Code 


Execute/Read, accessed 


12 




1 








Code 


Execute-Only, conforming 


13 




1 





1 


Code 


Execute-Only, conforming, accessed 


14 




1 


1 





Code 


Execute/Read-Only, conforming 


15 




1 


1 


1 


Code 


Execute/Read-Only, conforming, accessed 



Code segments can be execute-only or execute/read. An execute/read segment might be used, 
for example, when constants have been placed with instruction code in a ROM. In this case, 
the constants can be read either by using an instruction with a CS override prefix or by placing 
a segment selector for the code segment in a segment register for a data segment. 

Code segments can be either conforming or non-conforming. A transfer of execution into a 
more privileged conforming segment keeps the current privilege level. A transfer into a non- 
conforming segment at a different privilege level results in a general-protection exception, 
unless a task gate is used (see Chapter 13 for a discussion of multitasking). System utilities 
which do not access protected facilities, such as data-conversion functions (e.g., 
EBCDIC/ASCII translation, Huffman encoding/decoding, math library) and some types of 
exceptions (e.g., Divide Error, INTO-detected overflow, and BOUND range exceeded) may be 
loaded in conforming code segments. 

The A (accessed) bit of the Type field is set by the processor to indicate that a segment has 
been loaded into a segment register. By clearing the A-bit initially, then testing it later, 
software can monitor segment usage. For example, a program development system might clear 
all of the Accessed bits for the segments of an application. If the application crashes, the states 
of these bits can be used to generate a map of all the segments accessed by the application. 
Unlike the breakpoints provided by the debugging mechanism (Chapter 17), the usage 
information applies to segment usage rather than linear address matches. 

The processor may update the Type field when a segment is accessed, even if the access is a 
read cycle. If the descriptor tables have been put in ROM, it may be necessary for hardware to 
prevent the ROM from being enabled onto the data bus during a write cycle. It also may be 
necessary to return the READY# signal to the processor when a write cycle to ROM occurs, 
otherwise the cycle does not terminate. These features of the hardware design are necessary for 
using ROM-based descriptor tables with the Intel386 DX processor, which always sets the 



11-14 



I 



PROTECTED MODE MEMORY MANAGEMENT 



Accessed bit when a segment descriptor is loaded. The Intel486 and Pentium processors, 
however, only set the Accessed bit if it is not already set. Writes to descriptor tables in ROM 
can be avoided by setting the Accessed bits in every descriptor. 

DPL (Descriptor Privilege Level): Defines the privilege level of the segment. This is used to 
control access to the segment, using the protection mechanism described in Chapter 12. 

Segment-Present bit: If this bit is clear, the processor generates a segment-not-present 
exception when a selector for the descriptor is loaded into a segment register. This is used to 
detect access to segments which have become unavailable. A segment can become unavailable 
when the system needs to create free memory. Items in memory, such as character fonts or 
device drivers, which currently are not being used are deallocated. An item is deallocated by 
marking the segment "not present" (this is done by clearing the Segment-Present bit). The 
memory occupied by the segment then can be put to another use. The next time the deallocated 
item is needed, the segment-not-present exception will indicate the segment needs to be loaded 
into memory. When this kind of memory management is provided in a manner invisible to 
application programs, it is called virtual memory. A system can maintain a total amount of 
virtual memory far larger than physical memory by keeping only a few segments present in 
physical memory at any one time. 

Figure 11-9 shows the format of a descriptor when the Segment-Present bit is clear. When this 
bit is clear, the operating system is free to use the locations marked Available to store its own 
data, such as information regarding the whereabouts of the missing segment. 



/31 30 29 28 2726 25 24 23 22 21 20 19 18 17 16 


/15/14 13 


/12 


/// 10 9 8 


(7 6 5 4 3 2 1 Ok 


AVAILABLE 





D 
P 
L 


s 


TYPE 


AVAILABLE 


AVAILABLE 


\S\ *2fc *2& n 'i&'&'lMSt 11 'iS AS \% M A<o Vb ^ Y2. \\ \^ S * 


i § * ^ 1 \ ^ 



APM103 



Figure 11-9. Segment Descriptor (Segment Not Present) 



1 1 .2.4. Segment Descriptor Tables 

A segment descriptor table is an array of segment descriptors. There are two kinds of 
descriptor tables: 

• The global descriptor table (GDT) 

• The local descriptor tables (LDT) 

There is one GDT for all tasks, and an LDT for each task being run. A descriptor table is an 
array of segment descriptors, as shown in Figure 1 1-10. A descriptor table is variable in length 
and can contain up to 8192 (2 13 ) descriptors. The first descriptor in the GDT is not used by the 
processor. A segment selector to this "null descriptor" does not generate an exception when 

■ 11-15 



PROTECTED MODE MEMORY MANAGEMENT 



intel 



loaded into a data segment register (DS, ES, FS, or GS), but it always generates an exception 
when an attempt is made to access memory using the descriptor. By initializing the segment 
registers with this segment selector, accidental reference to unused segment registers can be 
guaranteed to generate an exception. 



GLOBAL DESCRIPTOR TABLE 



LOCAL DESCRIPTOR TABLE 



FIRST DESCRIPTOR IN 
GDT IS NQT USED 



+38 
+30 
+28 
+20 
+18 



+10 
+8 
+0 

1\ 



GDTR REGISTER 



LIMIT 



BASE ADDRESS 



LDT DESCRIPTOR 
(GDT ENTRY #2) 



NOTE: ADDRESSES SHOWN IN HEXADECIMAL 



+38 
+30 
+28 
+20 
+18 
+10 
+8 
+0 



ACCESS 


LIMIT 


BASE ADDRESS 



Figure 11-10. Descriptor Tables 



1 1 .2.5. Descriptor Table Base Registers 

The processor finds the global descriptor table (GDT) and interrupt descriptor table (IDT) 
using the GDTR and IDTR registers. These registers hold 32-bit base addresses for tables in 
the linear address space. They also hold 16-bit limit values for the size of these tables. 

When the IDTR and GDTR registers are loaded or stored, a 48-bit "pseudo-descriptor" is 
accessed in memory, as shown in Figure 11-11. To avoid alignment check faults in user mode 
(privilege level 3). the pseudo-descriptor should be located at an odd word address (i.e., an 
address which is 2 MOD 4). This causes the processor to store an aligned word, followed by an 
aligned doubleword. User-mode programs normally do not store pseudo-descriptors, but the 
possibility of generating an alignment check fault can be avoided by aligning pseudo- 
descriptors in this way. 



11-16 




PROTECTED MODE MEMORY MANAGEMENT 



BIT POSITION — ^ 
BYTE ORDER 


47 




16 15 ( 


3 


BASE ADDRESS 


LIMIT 




5 


4 3 


2 


1 












APM98 



Figure 11-11. Pseudo-Descriptor Format 



The base addresses of the GDT and IDT should be aligned on an eight-byte boundary to 
maximize performance of cache line fills. 

The limit values for both the GDT and IDT are expressed in bytes. As with segments, the limit 
value is added to the base address to get the address of the last valid byte. A limit value of zero 
results in exactly one valid byte. Because segment descriptors are always eight bytes long, the 
limit should always be one less than an integral multiple of eight (i.e., 8N - 1). The LGDT and 
SGDT instructions write and read the GDTR register; the LIDT and SIDT instructions write 
and read the IDTR register. 

A third descriptor table is the local descriptor table (LDT). It is identified by a 16-bit segment 
selector held in the LDTR register. The LLDT and SLDT instructions write and read the 
segment selector in the LDTR register. The LDTR register also holds the base address and 
limit for the LDT, but these are loaded automatically by the processor from the segment 
descriptor for the LDT (which is taken from the GDT). The LDT should be aligned on an 
eight-byte boundary to maximize performance of cache line fills. 



11.3. PAGE TRANSLATION 

A linear address is a 32-bit address into a uniform, unsegmented address space. This address 
space can be a large physical address space (i.e., an address space composed of several 
gigabytes of RAM), or paging can be used to simulate this address space using a small amount 
of RAM and some disk storage. When paging is used, a linear address is translated into its 
corresponding physical address, or an exception is generated. The exception gives the 
operating system a chance to read the page from disk (perhaps sending a different page out to 
disk in the process), then restart the instruction which generated the exception. 

Paging is different from segmentation through its use of fixed-size pages. Unlike segments, 
which usually are the same size as the code or data structures they hold, pages have a fixed 
size. If segmentation is the only form of address translation which is used, a data structure 
which is present in physical memory will have all of its parts in memory. If paging is used, a 
data structure can be partly in memory and partly in disk storage. 



11-17 



PROTECTED MODE MEMORY MANAGEMENT 



The information which either maps linear addresses into physical addresses or raises 
exceptions is held in data structures in memory called page tables. As with segmentation, this 
information is cached within the CPU to minimize the number of bus cycles required for 
address translation. Unlike segmentation, the address translation caches are completely 
invisible to application programs. The processor's caches for address translation information 
are called translation lookaside buffers (TLB). The TLBs satisfy most requests for reading the 
page tables. Extra bus cycles occur only when the TLBs cannot satisfy a request. This typically 
happens when a page has not been accessed for a long time. 



1 1 .3.1 . Paging Options 

Paging is enabled when bit 3 1 (the PG bit) of the CRO register is set. This bit usually is set by 
the operating system during software initialization. (Refer to Chapter 16 for information on 
how to change PG.) When paging is enabled, a second stage of address translation is used to 
generate the physical address from the linear address. If paging is not enabled, the linear 
address is used as the physical address. The PG bit must be set if the operating system is 
running more than one program in virtual- 808 6 mode or if demand-paged virtual memory is 
used. 

1 1 .3.2. Linear Address 

Figure 11-12 shows the format of a linear address for a 4K page. 



FORMAT 

FOR 
4 KBYTE 

PAGE 



/31 30 29 28 27 26 25 24 23 22/21 20 19 18 17 16 15 14 13 12/l1 10 9 876543210 


DIR 


TABLE 


OFFSET 


\ 


\ \ 



APM94 



Figure 11-12. Format of a Linear Address 

Figure 11-13 shows how the processor translates the DIRECTORY, TABLE, and OFFSET 
fields of a linear address into the physical address by consulting page tables. The addressing 
mechanism uses the DIRECTORY field as an index into a page directory. It uses the TABLE 
field as an index into the page table determined by the page directory. It uses the OFFSET field 
to address an operand within the page specified by the page table. 



11.3.3. Page Tables 

A page table is an array of 32-bit entries. A page table is itself a page, and contains 4096 bytes 
of data or at most IK 32-bit entries. Four kilobyte pages, including page directories and page 
tables, are aligned to 4K-byte boundaries. Two levels of tables are used to address a page of 



11-18 



PROTECTED MODE MEMORY MANAGEMENT 



memory. At the highest level is a page directory. A page directory holds up to IK entries that 
address page tables of the second level. A page table of the second level addresses up to IK 
pages in physical memory. All the tables addressed by one page directory, therefore, can 
address 1M (2 20 ) four-Kbyte pages. If each page contains 4K (2 ) bytes, the tables of one page 
directory can span a linear address space of four gigabytes (2 20 x 2 12 = 2 32 ). For information 
on support of page sizes larger than 4K, see Appendix H. 



DIR 



TABLE 



PAGE DIRECTORY 



DIR ENTRY 



OFFSET 



PAGE TABLE 



PG TBL ENTRY 



PAGE FRAME 



PHYS ADDRESS 



Figure 11-13. Page Translation 



The physical address of the current page directory is stored in the CR3 register, also called the 
page directory base register (PDBR). Memory management software has the option of using 
one page directory for all tasks, one page directory for each task, or some combination of the 
two. See Chapter 16 for information on initialization of the CR3 register. See Chapter 13 for 
how the contents of the CR3 register can change for each task. 



1 1 .3.4. Page-Table Entries 

Page-table and page-directory entries for 4K pages have one of the formats shown by Figure 
1 1-14. For information on page-table and page-directory formats for pages larger than 4K, see 
Appendix H. 



11-19 



PROTECTED MODE MEMORY MANAGEMENT 



intel 



PRESENT 

WRITABLE 

USER 

WRITE-THROUGH 
CACHE DISABLE 
ACCESSED 



PAGE SIZE (0 INDICATES 4 KBYTE) 

AVAILABLE FOR SYSTEMS PROGRAMMER USE 



/31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12/11 10 9/8/7/6/5/4/3, 



PAGE DIR 
ENTRY 















P 


P 








PAGE FRAME ADDRESS 31 ..12 


AVAIL 










A 


C 


W 


U 


w 
















D 


T 








\ 






\ 


\ 


\ 


\ 


\ 


\ 


\ 





/2/1/Qk 



PRESENT 

WRITABLE 

USER 

WRITE-THROUGH 

CACHE DISABLE 

ACCESSED 

DIRTY 

AVAILABLE FOR SYSTEMS PROGRAMMER USE 



/31 30 29 28 2726 25 24 23 22 21 20 19 18 17 16 15 14 13 12/11 10 9 



PAGE TABLE 
ENTRY 













P 


P 








PAGE FRAME ADDRESS 31 ..12 


AVAIL 


DO 


D 


A 


C 


W 


U 


W 


P 












D 


T 









7 6 5 4 3 2 



□ RESERVED BY INTEL CORPORATION (SHOULD BE ZERO) 



Figure 11-14. Format of Page Directory and Page Table Entries for 4K Pages 



1 1 .3.4.1 . PAGE FRAME ADDRESS 

The page frame address specifies the physical starting address of a page. In a page directory, 
the page frame address is the address of a page table. In a second-level page table, the page 
frame address is the address of the four kilobyte page that contains the desired memory 
operand or instructions. 



11.3.4.2. PRESENT BIT 

The Present bit indicates whether the page frame address in a page table entry maps to a page 
in physical memory. When set, the page is in memory. 

When the Present bit is clear, the page is not in memory, and the rest of the page table entry is 
available for the operating system, for example, to store information regarding the 
whereabouts of the missing page. Figure 11-15 illustrates the format of a page table entry when 
the Present bit is clear. 



11-20 




PROTECTED MODE MEMORY MANAGEMENT 



hi 30 29 28 27 26 25 24 23 22 21 20 19 13 17 16 15 14 13 12 11 10 9 


8 7 6 5 4 3 2 


/ 




A V A I L A B L 


E 




r 




\ 


\ 



APM100 



Figure 11-15. Format of a Page Table Entry for a Not-Present Page 

If the Present bit is clear in either level of page tables when an attempt is made to use a page 
table entry for address translation, a page-fault exception is generated. In systems which 
support demand-paged virtual memory, the following sequence of events then occurs: 

1. The operating system copies the page from disk storage into physical memory. 

2. The operating system loads the page frame address into the page table entry and sets its 
Present bit. Other bits, such as the dirty and accessed bits, may be set, too. 

3. Because a copy of the old page table entry may still exist in a translation lookaside buffer 
(TLB), the operating system invalidates them. See Section 11.3.5. for a discussion of 
TLBs and how to invalidate them. 

4. The program which caused the exception is then restarted. 

Note that there is no Present bit in CR3 for the page directory itself. The page directory may be 
not-present while the associated task is suspended, but the operating system must ensure that 
the page directory indicated by the CR3 image in a process's TSS is present in physical 
memory before the process is dispatched. The page directory must also remain in memory as 
long as the task is active. 

1 1 .3.4.3. ACCESSED AND DIRTY BITS 

These bits provide data about page usage in both levels of page tables. The Accessed bit is 
used to report read or write access to a page or to a second-level page table. The Dirty bit is 
used to report write access to a page. These bits are set by the hardware; however, the 
processor does not implicitly clear either of these bits. 

The processor sets the Accessed bit in both levels of page table before a read or write operation 
to a page. The processor sets the Dirty bit before a write operation to an address mapped by 
that page table entry. Only the Dirty bit in the second-level page table is used; the processor 
does not use the Dirty bit of the page directory. 

The operating system may use the Accessed bit when it needs to create some free memory by 
sending a page or second-level page table to disk storage. By periodically clearing the 
Accessed bits in the page tables, it can see which pages have been used recently. Pages which 
have not been used are candidates for sending out to disk. 

The operating system may use the Dirty bit when a page is sent back to disk. By clearing the 
Dirty bit when the page is brought into memory, the operating system can see if it has received 
any write access. If there is a copy of the page on disk and the copy in memory has not 



11-21 



PROTECTED MODE MEMORY MANAGEMENT 



received any writes, there is no need to update disk from memory. 

See Chapter 19 for how the processor updates the Accessed and Dirty bits in multiprocessor 
systems. 

1 1 .3.4.4. READ/WRITE AND USER/SUPERVISOR BITS 

The Read/Write and User/Supervisor bits are used for protection checks applied to pages, 
which the processor performs at the same time as address translation. See Chapter 12 for more 
information on protection. 

1 1 .3.4.5. PAGE-LEVEL CACHE CONTROL BITS 

The PCD and PWT bits are used for page-level cache management. Software can control the 
caching of individual pages or second-level page tables using these bits. See Chapter 18 for 
more information on caching. 



1 1 .3.5. Translation Lookaside Buffers 

The processor stores the most recently used page table entries in on-chip caches called 
translation lookaside buffers or TLBs. The Pentium microprocessor has separate TLB's for the 
data and instruction caches. Most paging is performed using the contents of the TLBs. Bus 
cycles to the page tables in memory are performed only when the TLBs do not contain the 
translation information for a requested page. 

The TLBs are invisible to application programs (with PL>0), but not to operating systems 
(PL=0). Operating-system programmers must invalidate the TLBs (dispose of their page table 
entries) immediately following and every time there are changes to entries in the page tables 
(including when the present bit is set to zero). If this is not done, old data which has not 
received the changes might be used for address translation and as a result, subsequent page 
table references could be incorrect. 

The operating system can invalidate the TLBs by loading the CR3 register. The CR3 register 
can be loaded in either of two ways: 

1. Explicit loading using MOV instructions, such as: 

MOV CR3, EAX 

2. Implicit loading by a task switch which changes the contents of the CR3 register. (See 
Chapter 13 for more information on task switching.) 

When the mapping of an individual page is changed, the operating system should use the 
INVLPG instruction. Where possible, INVLPG invalidates only an individual TLB entry; 
however, in some cases, INVLPG invalidates the entire instruction-cache TLB. 



11-22 




PROTECTED MODE MEMORY MANAGEMENT 



1 1 .4. COMBINING SEGMENT AND PAGE TRANSLATION 

Figure 11-16 combines Figure 11-5 and Figure 11-13 to summarize both stages of translation 
from a logical address to a physical address when paging is enabled. Options available in both 
stages of address translation can be used to support several different styles of memory 
management. 



11.4.1. Flat Model 

When a 32-bit processor is used to run software written without segments, it may be desirable 
to remove the segmentation features of the processor. The 32-bit processors do not have a 
mode bit for disabling segmentation, but the same effect can be achieved by mapping the 
stack, code, and data spaces to the same range of linear addresses. The 32-bit offsets used by 
32-bit processor instructions can cover a four-gigabyte linear address space. 

When paging is used, the segments can be mapped to the entire linear address space. If more 
than one program is being run at the same time, the paging mechanism can be used to give 
each program a separate address space. 



1 1 .4.2. Segments Spanning Several Pages 

The architecture allows segments which are larger than the size of a page. For example, a large 
data structure may span thousands of pages. If paging were not used, access to any part of the 
data structure would require the entire data structure to be present in physical memory. With 
paging, only the page containing the part being accessed needs to be in memory. 



1 1 .4.3. Pages Spanning Several Segments 

Segments also can be smaller than the size of a page. If one of these segments is placed in a 
page which is not shared with another segment, the extra memory is wasted. For example, a 
small data structure, such as a 1-byte semaphore, occupies 4K bytes if it is placed in a page by 
itself. If many semaphores are used, it is more efficient to pack them into a single page. 



i 



11-23 



PROTECTED MODE MEMORY MANAGEMENT 



LOGICAL ADDRESS 

31 



SELECTOR 



OFFSET 



DESCRIPTOR TABLE 



SEGMENT 
DESCRIPTOR 



LINEAR ADDRESS (4K PAGE) 



OR 




DIR 


TABLE 


OFFSET 





PAGE DIRECTORY 



r> 



4K DIR ENTRY 



4M DIR ENTRY 



4K PAGE FRAME 









1 — > 


OPERAND 




PAGE TABLE 










PG TBL ENTRY 







4M PAGE FRAME 



CR3 



DIR 



OPERAND 



OFFSET 



LINEAR ADDRESS (4M PAGE) 



Figure 11-16. Combined Segment and Page Address Translation 



1 1 .4.4. Non-Aligned Page and Segment Boundaries 

The architecture does not enforce any correspondence between the boundaries of pages and 
segments. A page can contain the end of one segment and the beginning of another. Likewise, 
a segment can contain the end of one page and the beginning of another. 



11-24 



PROTECTED MODE MEMORY MANAGEMENT 



1 1 .4.5. Aligned Page and Segment Boundaries 

Memory-management software may be simpler and more efficient if it enforces some 
alignment between page and segment boundaries. For example, if a segment which can fit in 
one page is placed in two pages, there may be twice as much paging overhead to support 
access to that segment. 



1 1 .4.6. Page-Table Per Segment 

An approach to combining paging and segmentation which simplifies memory-management 
software is to give each segment its own page table, as shown in Figure 11-17. This gives the 
segment a single entry in the page directory which provides the access control information for 
paging the segment. 



PAGE FRAMES 



LDT 



PAGE DIRECTORY PAGE TABLES 





















DESCRIPTOR 


PDE 


-> 


DESCRIPTOR 


PDE 


-> 



















PTE 



I 



PTE 



PTE 













PTE 


PTE 





LDT 



PAGE DIRECTORY PAGE TABLES LE- 



PAGE FRAMES 

APM92 



Figure 11-17. Each Segment Can Have Its Own Page Table 



i 



11-25 



1 

} 

} 

} 

} 

} 

} 

} 

} 

} 

} 

} 

} 

} 

} 

} 

} 

} 

} 

} 

} 

} 

] 

} 

] 

} 

] 

} 

] 

] 

] 

] 

] 

] 

} 

] 

} 

] 

} 

\ 

} 

} 

} 

} 

] 

} 

} 

} 

} 

} 

} 

} 

} 



12 



Protection 



i 



CHAPTER 12 
PROTECTION 



Protection is necessary for reliable multitasking. Protection can be used to prevent tasks from 
interfering with each other. For example, protection can keep one task from overwriting the 
instructions or data of another task. 

During program development, the protection mechanism can give a clearer picture of program 
bugs. When a program makes an unexpected reference to the wrong memory space, the 
protection mechanism can block the event and report its occurrence. 

In end-user systems, the protection mechanism can guard against the possibility of software 
failures caused by undetected program bugs. If a program fails, its effects can be confined to a 
limited domain. The operating system can be protected against damage, so diagnostic 
information can be recorded and automatic recovery attempted. 

Protection can be applied to segments and pages. Two bits in a processor register define the 
privilege level of the program currently running (called the current privilege level or CPL). 
The CPL is checked during address translation for segmentation and paging. 

Although there is no control register or mode bit for turning off the protection mechanism, the 
same effect can be achieved by assigning privilege level (the highest level of privilege) to all 
segment selectors, segment descriptors, and page table entries. 



12.1. SEGMENT-LEVEL PROTECTION 

Protection provides the ability to limit the amount of interference a malfunctioning program 
can inflict on other programs and their data. Protection is a valuable aid in software 
development because it allows software tools (operating system, debugger, etc.) to survive in 
memory undamaged. When an application program fails, the software is available to report 
diagnostic messages, and the debugger is available for post-mortem analysis of memory and 
registers. In production, protection can make software more reliable by giving the system an 
opportunity to initiate recovery procedures. 

Each memory reference is checked to verify that it satisfies the protection checks. All checks 
are made before the memory cycle is started; any violation prevents the cycle from starting and 
results in an exception. Because checks are performed in parallel with address translation, there 
is no performance penalty. There are five protection checks: 

1 . Type check 

2. Limit check 

3. Restriction of addressable domain 

4. Restriction of procedure entry points 

5. Restriction of instruction set 

A protection violation results in an exception. See Chapter 14 for an explanation of the 



i 



12-1 



PROTECTION 



exception mechanism. This chapter describes the protection violations which lead to 
exceptions. 



12.2. SEGMENT DESCRIPTORS AND PROTECTION 

Figure 12-1 shows the fields of a segment descriptor which are used by the protection 
mechanism. Individual bits in the Type field also are referred to by the names of their 
functions. 

When the operating system creates a descriptor, its sets the protection parameters. In general, 
application programmers do not need to be concerned about protection parameters. 

When a program loads a segment selector into a segment register, the processor loads both the 
base address of the segment and the protection information. The invisible part of each segment 
register has storage for the base, limit, type, and privilege level. While this information is 
resident in the segment register, subsequent protection checks on the same segment can be 
performed with no performance penalty. 



1 2.2.1 . Type Checking 

In addition to the descriptors for application code and data segments, the processor has 
descriptors for system segments and gates. These are data structures used for managing tasks 
(Chapter 13) and exceptions and interrupts (Chapter 14). Table 12-1 lists all the types defined 
for system segments and gates. Note that not all descriptors define segments; gate descriptors 
hold pointers to procedure entry points. 



12-2 



intel 



PROTECTION 



DATA SEGMENT DESCRIPTOR 



/31 30 29 28 27 26 25 24/23/22/21/20/19 18 17 16/15/14 13/l2/ll/l0/9 /8 / 7 6 5 4 3 2 1 



BASE 31:24 



LIMIT 
19:16 



SEGMENT BASE 15:00 



W 



BASE 23:16 



SEGMENT LIMIT 15:00 



\3A 3ft IS TSt 11 7£> 'ZS 16, 13. H i\ 7a ^ w ^> i § % ^ ^ i \ ? 



+4 



+0 



CODE SEGMENT DESCRIPTOR 



j31 30 29 28 27 26 25 24/23/22/21/20/19 18 17 I6/15/U 13/12/11/10/9 /8 / 7 6 5 4 3 2 1 



BASE 31:24 



LIMIT 
19:16 



SEGMENT BASE 15:00 



BASE 23:16 



SEGMENT LIMIT 15:00 



\3A 3ft IS 3& 11 1<c> 75 1&, 13. 'a. 1\ 1ft \S M \fr\\*> ^ \3, \1 W \ft S ^> 1 ft S \ 3, 1 \ ft 



+4 



+0 



SYSTEM SEGMENT DESCRIPTOR 



/31 30 29 28 27262524/23/22/21/20/19 18 17 16/15/14 13/12/11 10 9 8/7 6 5 4 3 2 1 O 



BASE 31:24 



LIMIT 
19:16 



SEGMENT BASE 15:00 



TYPE 



BASE 23:16 



SEGMENT LIMIT 15:00 



\3A 3ft Ifr 11 1ft IS IN 13. 72. 1\ 1ft \S M \ft\Va U \3> M \\ Aft % 3>\l ft ft \ 3, 1 \ ft 



+4 



+0 



A ACCESSED 

B BIG 

C CONFORMING 

D DEFAULT 



E EXPAND-DOWN 

G GRANULARITY 

R READABLE 

LIMIT SEGMENT LIMIT 



DPL DESCRIPTOR PRIVILEGE LEVEL W WRITABLE 
□ RESERVED 



Figure 12-1. Descriptor Fields Used for Protection 



12-3 



PROTECTION 




Table 12-1. System Segment and Gate Types 



Type 


uesciipiion 


Decimal 


Binary 





0000 


reserved 


1 


000 1 


Available 16-Bit TSS 


2 


00 10 


LDT 


3 


00 11 


Busy 16-Bit TSS 


4 


01 00 


16-Bit Call Gate 


5 


0101 


Task Gate 


6 


110 


16-Bit Interrupt Gate 


7 


111 


16-Bit Trap Gate 


8 


1000 


reserved 


9 


100 1 


Available 32-Bit TSS 


10 


10 10 


reserved 


11 


10 11 


Busy 32-Bit TSS 


12 


1100 


32-Bit Call Gate 


13 


110 1 


reserved 


14 


1110 


32-Bit Interrupt Gate 


15 


1111 


32-Bit Trap Gate 



The Type fields of code and data segment descriptors include bits which further define the 
purpose of the segment (see Figure 12-1): 

• The Writable bit in a data-segment descriptor controls whether programs can write to the 
segment. 

• The Readable bit in an executable-segment descriptor specifies whether programs can read 
from the segment (e.g., to access constants stored in the code space). A readable, 
executable segment may be read in two ways: 

1 . With the CS register, by using a CS override prefix. 

2. By loading a selector for the descriptor into a data-segment register (the DS, ES, FS, 
or GS registers). 

Type checking can be used to detect programming errors which would attempt to use segments 
in ways not intended by the programmer. The processor examines type information on two 
kinds of occasions: 

1. When a selector for a descriptor is loaded into a segment register. Certain segment 
registers can contain only certain descriptor types; for example: 

— The CS register only can be loaded with a selector for an executable segment. 

— Selectors of executable segments which are not readable cannot be loaded into data- 




PROTECTION 



segment registers. 

— Only selectors of writable data segments can be loaded into the SS register. 

2. When instructions access segments whose descriptors are already loaded into segment 
registers. Certain segments can be used by instructions only in certain predefined ways; for 
example: 

— No instruction may write into an executable segment. 

— No instruction may write into a data segment if the writable bit is not set. 

— No instruction may read an executable segment unless the readable bit is set. 

12.2.2. Limit Checking 

The Limit field of a segment descriptor prevents programs from addressing outside the 
segment. The effective value of the limit depends on the setting of the G bit (Granularity bit). 
For data segments, the limit also depends on the E bit (Expansion Direction bit). The E bit is a 
designation for one bit of the Type field, when referring to data segment descriptors. 

When the G bit is clear, the limit is the value of the 20-bit Limit field in the descriptor. In this 
case, the limit ranges from to F_FFFFH (2 20 - 1 or 1 megabyte). When the G bit is set, the 
processor scales the value in the Limit field by a factor of 2 . In this case the limit ranges from 
OFFFH (2 12 - 1 or 4K bytes) to FFFF_FFFFH (2 32 - 1 or 4 gigabytes). Note that when scaling 
is used, the lower twelve bits of the address are not checked against the limit; when the G bit is 
set and the segment limit is 0, valid offsets within the segment are through 4095. 

For all types of segments except expand-down data segments, the value of the limit is one less 
than the size, in bytes, of the segment. The processor causes a general-protection exception in 
any of these cases: 

• Attempt to access a memory byte at an address > limit 

• Attempt to access a memory word at an address > (limit - 1) 

• Attempt to access a memory double word at an address > (limit - 3) 

• Attempt to access a memory quadword at an address > (limit - 7) 

For expand-down data segments, the limit has the same function but is interpreted differently. 
In these cases the range of valid offsets is from (limit + 1) to 2 32 - 1 if B-bit = 1 and 2 16 - 1 if 
B-bit = 0. An expand-down segment has maximum size when the segment limit is 0. 

Limit checking catches programming errors such as runaway subscripts and invalid pointer 
calculations. These errors are detected when they occur, so identification of the cause is easier. 
Without limit checking, these errors could overwrite critical memory in another module, and 
the existence of these errors would not be discovered until the damaged module crashed, an 
event which may occur long after the actual error. Protection can block these errors and report 
their source. 

In addition to limit checking on segments, there is limit checking on the descriptor tables. The 
GDTR, LDTR, and IDTR registers contain a 16-bit limit value. It is used by the processor to 
prevent programs from selecting a segment descriptor outside the descriptor table. The limit of 
a descriptor table identifies the last valid byte of the table. Because each descriptor is eight 
bytes long, a table which contains up to N descriptors should have a limit of 8N - 1. 



i 



12-5 



PROTECTION 



A selector may be given a zero value. Such a selector refers to the first descriptor in the GDT, 
which is not used. Although this descriptor can be loaded into a segment register, any attempt 
to reference memory using this descriptor will generate a general-protection exception. 



1 2.2.3. Privilege Levels 

The protection mechanism recognizes four privilege levels, numbered from to 3. The greater 
numbers mean lesser privileges. If all other protection checks are satisfied, a general-protection 
exception is generated if a program attempts to access a segment using a less privileged level 
(greater privilege number) than that applied to the segment. 

Although no control register or mode bit is provided for turning off the protection mechanism, 
the same effect can be achieved by assigning all privilege levels the value of 0. (The PE bit in 
the CRO register is not an enabling bit for the protection mechanism alone; it is used to enable 
protected mode, the mode of program execution in which the full 32-bit architecture is 
available. When protected mode is disabled, the processor operates in real-address mode, 
where it appears as a fast, enhanced 8086 processor.) 

Privilege levels can be used to improve the reliability of operating systems. By giving the 
operating system the greatest privilege (numerically lowest privilege level), it is protected from 
damage by bugs in other programs. If a program crashes, the operating system has a chance to 
generate a diagnostic message and attempt recovery procedures. 

Another level of privilege can be established for other parts of the system software, such as the 
programs which handle peripheral devices. If a device driver crashes, the operating system 
should be able to report a diagnostic message, so it makes sense to protect the operating system 
against bugs in device drivers. A device driver, however, may service an important peripheral 
such as a disk drive. If the application program crashes, the device driver should not corrupt 
the directory structure of the disk, so it makes sense to protect device drivers against bugs in 
applications. Device drivers should be given an intermediate privilege level between the 
operating system and the application programs. Application programs are given the least 
privilege (numerically greatest level). 

Figure 12-2 shows how these levels of privilege can be interpreted as rings of protection. The 
center is for the segments containing the most critical software, usually the kernel of an 
operating system. Outer rings are for less critical software. 

The following data structures contain privilege levels: 

• The lowest two bits of the CS segment register hold the current privilege level (CPL). This 
is the privilege level of the program being run. The lowest two bits of the SS register also 
hold a copy of the CPL. Normally, the CPL is equal to the privilege level of the code 
segment from which instructions are being fetched. The CPL changes when control is 
transferred to a code segment with a different privilege level. 

• Segment descriptors contain a field called the descriptor privilege level (DPL). The DPL is 
the privilege level applied to a segment. 

• Segment selectors contain a field called the requestor privilege level (RPL). The RPL is 
intended to represent the privilege level of the procedure which created the selector. If the 
RPL is a less privileged level than the CPL, it overrides the CPL. When a more privileged 



12-6 




PROTECTION 



program receives a segment selector from a less privileged program, the RPL causes the 
memory access to take place at the less privileged level. 

Privilege levels are checked when the selector of a descriptor is loaded into a segment register. 
The checks used for data access differ from those used for transfers of execution among 
executable segments; therefore, the two types of access are considered separately in the 
following sections. 



PROTECTION RINGS 




APM78 



Figure 12-2. Protection Rings 



12,3. RESTRICTING ACCESS TO DATA 

To address operands in memory, a segment selector for a data segment must be loaded into a 
data-segment register (the DS, ES, FS, GS, or SS registers). The processor checks the 
segment's privilege levels. The check is performed when the segment selector is loaded. As 
Figure 12-3 shows, three different privilege levels enter into this type of privilege check. 

The three privilege levels which are checked are: 

1 . The CPL (current privilege level) of the program. This is held in the two least-significant 
bit positions of the CS register. 

2. The DPL (descriptor privilege level) of the segment descriptor of the segment containing 



12-7 



PROTECTION 



intel 



the operand. 

3. The RPL (requestor's privilege level) of the selector used to specify the segment 
containing the operand. This is held in the two lowest bit positions of the segment register 
used to access the operand (the SS, DS, ES, FS, or GS registers). If the operand is in the 
stack segment, the RPL is the same as the CPL. 



OPERAND SEGMENT DESCRIPTOR 



131 30 29 28 27 26 25 24/23122/21 /20/19 18 17 16/wfl4 13 /12 11 10 9 8 f 7 6 5 4 3 2 1 [ 



V\ 3fo y& *n ^ ta t& ts \% \% m mi ^ \i \\ \^ s ^ i ^ s n ^ i 



CURRENT CODE SEGMENT REGISTER 



+4 



+0 





CPL 









OPERAND SEGMENT SELECTOR 



RPL 



CPL CURRENT PRIVILEGE LEVEL 
DPL DESCRIPTOR PRIVILEGE LEVEL 
RPL REQUESTED PRIVILEGE LEVEL 



V T 



PRIVILEGE 
CHECK 



Figure 12-3. Privilege Check for Data Access 



Instructions may load a segment register only if the DPL of the segment is the same or a less 
privileged level (greater privilege number) than the less privileged of the CPL and the 
selector's RPL. 

The addressable domain of a task varies as its CPL changes. When the CPL is 0, data segments 
at all privilege levels are accessible; when the CPL is 1, only data segments at privilege levels 
1 through 3 are accessible; when the CPL is 3, only data segments at privilege level 3 are 
accessible. 

Systems that use only two of the four possible privilege levels should use levels and 3. 



12-8 



PROTECTION 



12.3.1. Accessing Data in Code Segments 

It may be desirable to store data in a code segment, for example, when both code and data are 
provided in ROM. Code segments may legitimately hold constants; it is not possible to write to 
a segment defined as a code segment, unless a data segment is mapped to the same address 
space. The following methods of accessing data in code segments are possible: 

1. Load a data-segment register with a segment selector for a nonconforming, readable, 
executable segment. 

2. Load a data-segment register with a segment selector for a conforming, readable, 
executable segment. 

3. Use a code-segment override prefix to read a readable, executable segment whose selector 
already is loaded in the CS register. 

The same rules for access to data segments apply to case 1 . Case 2 is always valid because the 
privilege level of a code segment with a set Conforming bit is effectively the same as the CPL, 
regardless of its DPL. Case 3 is always valid because the DPL of the code segment selected by 
the CS register is the CPL. 



12.4. RESTRICTING CONTROL TRANSFERS 

Control transfers are provided by the JMP, CALL, RET, INT, and IRET instructions, as well as 
by the exception and interrupt mechanisms. Exceptions and interrupts are special cases 
discussed in Chapter 14. This chapter discusses only the JMP, CALL, and RET instructions. 

The near forms of the JMP, CALL, and RET instructions transfer program control within the 
current code segment, and therefore are subject only to limit checking. The processor checks 
that the destination of the JMP, CALL, or RET instruction does not exceed the limit of the 
current code segment. This limit is cached in the CS register, so protection checks for near 
transfers do not degrade performance . 

The operands of the far forms of the JMP and CALL instruction refer to other segments, so the 
processor performs privilege checking. There are two ways a JMP or CALL instruction can 
refer to another segment: 

1 . The operand selects the descriptor of another executable segment. 

2. The operand selects a call gate descriptor. 

As Figure 12-4 shows, two different privilege levels enter into a privilege check for a control 
transfer which does not use a call gate: 

1. The CPL (current privilege level). 

2. The DPL of the descriptor of the destination code segment. 



12-9 



PROTECTION 



DESTINATION CODE SEGMENT DESCRIPTOR 



131 30 29 28 27 26 25 24/23/22/21/20/19 18 17 16/15/U 13/12 11 10 9 8/7 6 5 4 3 2 1 HI 



TYPE 
1 1 C R A 

L_ 



CURRENT CODE SEGMENT REGISTER 





CPL 









+4 



+0 



VYY 



C CONFORMING BIT 

CPL CURRENT PRIVILEGE LEVEL 

DPL DESCRIPTOR PRIVILEGE LEVEL 



PRIVILEGE 
CHECK 



Figure 12-4. Privilege Check for Control Transfer Without Gate 



Normally the CPL is equal to the DPL of the segment which the processor is currently 
executing. The CPL may, however, be greater (less privileged) than the DPL if the current 
code segment is a conforming segment (as indicated by the Type field of its segment 
descriptor). A conforming segment runs at the privilege level of the calling procedure. The 
processor keeps a record of the CPL cached in the CS register; this value can be different from 
the DPL in the segment descriptor of the current code segment. 

The processor only permits a JMP or CALL instruction directly into another segment if either 
of the following privilege rules is satisfied: 

• The DPL of the segment is equal to the CPL. 

• The segment is a conforming code segment, and its DPL is less (more privileged) than the 
CPL. 

Conforming segments are used for programs, such as math libraries and some kinds of 
exception handlers, which support applications but do not require access to protected system 
facilities. When control is transferred to a conforming segment, the CPL does not change, even 
if the selector used to address the segment has a different RPL. This is the only condition in 
which the CPL may be different from the DPL of the current code segment. 

Most code segments are not conforming. For these segments, control can be transferred 
without a gate only to other code segments at the same level of privilege. It is sometimes 
necessary, however, to transfer control to higher privilege levels. This is accomplished with the 



12-10 



PROTECTION 



CALL instruction using call-gate descriptors, which is explained in Chapter 13. The JMP 
instruction may never transfer control to a nonconforming segment whose DPL does not equal 
the CPL. 

12,5. GATE DESCRIPTORS 

To provide protection for control transfers among executable segments at different privilege 
levels, the processor uses gate descriptors. There are four kinds of gate descriptors: 

• Call gates 

• Trap gates 

• Interrupt gates 

• Task gates 

Task gates are used for task switching and are discussed in Chapter 13. Chapter 14 explains 
how trap gates and interrupt gates are used by exceptions and interrupts. This chapter is 
concerned only with call gates. Call gates are a form of protected control transfer. They are 
used for control transfers between different privilege levels. They only need to be used in 
systems in which more than one privilege level is used. Figure 12-5 illustrates the format of a 
call gate. 



32-BIT CALL GATE 



131 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 


/15/14 13/12/11 10 9 8 


/7 6 5/4 3 2 1 


OFFSET IN SEGMENT 31:16 


P 


D 
P 
L 





110 





DWORD 
COUNT 


SEGMENT SELECTOR 


OFFSET IN SEGMENT 15:00 


V\ 'JKi •& 11 ?£> ?A ?& 22. 1\ "2X> \«& \S> Yl \$ 


\V=> A* \3> V2. \\ \^ ^ * 


1 ^ ^ 


K % 1 \ <i 



DPL DESCRIPTOR PRIVILEGE LEVEL 
P SEGMENT PRESENT 

APM70 



Figure 12-5. Call Gate 

A call gate has two main functions: 

1 . To define an entry point of a procedure. 

2. To specify the privilege level required to enter a procedure. 

CALL and JMP instructions use call gate descriptors in the same manner as code segment 
descriptors. When the hardware recognizes that the segment selector for the destination refers 
to a gate descriptor, the operation of the instruction is determined by the contents of the call 
gate. A call gate descriptor may reside in the GDT or in an LDT, but not in the interrupt 
descriptor table (IDT). 



12-11 



PROTECTION 



The selector and offset fields of a gate form a pointer to the entry point of a procedure. A call 
gate guarantees that all control transfers to other segments go to a valid entry point, rather than 
to the middle of a procedure (or worse, to the middle of an instruction). The operand of the 
control transfer instruction is not the segment selector and offset within the segment to the 
procedure's entry point. Instead, the segment selector points to a gate descriptor, and the offset 
is not used. Figure 12-6 shows this form of addressing. 



K- 

15 



SELECTOR 



DESTINATION ADDRESS 

Qi NOT USED 

, J1 



OFFSET WITHIN SEGMENT 



DESCRIPTOR TABLE 



V 

+><- 



t 



PROCEDURE ENTRY 
POINT 















OFFSET 


DPL 


COUNT 


SELECTOR 


OFFSET 


























BASE 




DPL 


BASE 


BASE 

















GATE 

DESCRIPTOR 



CODE SEGMENT 
DESCRIPTOR 



Figure 12-6. Call Gate Mechanism 



As shown in Figure 12-7, four different privilege levels are used to check the validity of a 
control transfer through a call gate. 



12-12 



PROTECTION 



CALL GATE 



/31 3029 28 272625 24 23 22 21 20 19 18 17 16 Zl5/l4 13 /12 11 10 9 8/7 6 5/4 3 2 10 



\s\ 3ft ?a> n i<c> "n. is va \* \i vz. \\ ^ <a ^ i § s \ % i 



DESTINATION CODE SEGMENT DESCRIPTOR 



/31 30 29 28 27 26 25 24/23/22/21/20/19 18 17 16/15 /14 13/12 11 10 9 8/7 6 5 4 3 2 1 ?, 



\ g\ ^ ■as ts> t\ i& ■as 1% ra. i\ ?a m wyb \\ vs. \i \\ ^ ^ ^>\i s s \ ^ 



CURRENT CODE SEGMENT REGISTER 



CPL 



CALL GATE SELECTOR 



RPL 



+4 



+0 



+4 



+0 



CPL CURRENT PRIVILEGE LEVEL 
DPL DESCRIPTOR PRIVILEGE LEVEL 
RPL REQUESTED PRIVILEGE LEVEL 



PRIVILEGE 
CHECK 



Figure 12-7. Privilege Check for Control Transfer with Call Gate 



The privilege levels checked during a transfer of execution through a call gate are: 

1 . The CPL (current privilege level). 

2. The RPL (requestor's privilege level) of the segment selector used to specify the call gate. 

3. The DPL (descriptor privilege level) of the gate descriptor. 

4. The DPL of the segment descriptor of the destination code segment. 

■ 12-13 



PROTECTION 



The DPL field of the gate descriptor determines from which privilege levels the gate may be 
used. One code segment can have several procedures which are intended for use from different 
privilege levels. For example, an operating system may have some services which are intended 
to be used by both the operating system and application software, such as routines to handle 
character I/O, while other services may be intended only for use by operating system, such as 
routines which initialize device drivers. 

Gates can be used for control transfers to more privileged levels or to the same privilege level 
(though they are not necessary for transfers to the same level). Only CALL instructions can use 
gates to transfer to more privileged levels. A JMP instruction can use a gate only to transfer 
control to a code segment with the same privilege level, or to a conforming code segment with 
the same or a more privileged level. 

For a JMP instruction to a nonconforming segment, both of the following privilege rules must 
be satisfied; otherwise, a general-protection exception is generated. 

• MAX (CPL,RPL) < gate DPL 

• Destination code segment DPL = CPL 

For a CALL instruction (or for a JMP instruction to a conforming segment), both of the 
following privilege rules must be satisfied; otherwise, a general-protection exception is 
generated. 

• MAX (CPL,RPL) < gate DPL 

• Destination code segment DPL < CPL 

12.5.1. Stack Switching 

A procedure call to a more privileged level does the following: 

1. Changes the CPL. 

2. Transfers control (execution). 

3. Switches stacks. 

All inner protection rings (privilege levels 0, 1, and 2), have their own stacks for receiving 
calls from less privileged levels. If the caller were to provide the stack, and the stack was too 
small, the called procedure might crash as a result of insufficient stack space. Instead, the 
processor prevents less privileged programs from crashing more privileged programs by 
creating a new stack when a call is made to a more privileged level. The new stack is created, 
parameters are copied from the old stack, the contents of registers are saved, and execution 
proceeds normally. When the procedure returns, the contents of the saved registers restore the 
original stack. 

The processor finds the space to create new stacks using the task state segment (TSS), as 
shown in Figure 12-8. (Chapter 13 discusses the TSS in more detail.) Each task has its own 
TSS. The TSS contains initial stack pointers for the inner protection rings. The operating 
system is responsible for creating each TSS and initializing its stack pointers. (If the operating 
system does not use TSSs for multitasking, it still must allocate at least one TSS for this stack- 
related purpose.) An initial stack pointer consists of a segment selector and an initial value for 



12-14 



PROTECTION 



the ESP register (an initial offset into the segment). The initial stack pointers are strictly read- 
only values. The processor does not change them while the task runs. These stack pointers are 
used only to create new stacks when calls are made to more privileged levels. These stacks 
disappear when the called procedure returns. The next time the procedure is called, a new stack 
is created using the initial stack pointer. 



32-BIT TASK STATE SEGMENT 

31 











64 






SS2 


18 




ESP2 


14 






SS1 


10 




ESP1 


OC 






SSO 


8 




ESPO 


4 











NOTE: BYTE ADDRESSES ARE IN HEXADECIMAL 


APM73 



Figure 12-8. Initial Stack Pointers in a TSS 



When a call gate is used to change privilege levels, a new stack is created by loading an 
address from the TSS. The processor uses the DPL of the destination code segment (the new 
CPL) to select the initial stack pointer for privilege level 0, 1, or 2. 

The DPL of the new stack segment must equal the new CPL; if not, a TSS fault is generated. It 
is the responsibility of the operating system to create stacks and stack-segment descriptors for 
all privilege levels which are used. The stacks must be read/write as specified in the Type 
fields of their segment descriptors. They must contain enough space, as specified in the Limit 
fields, to hold the contents of the SS and ESP registers, the return address, and the parameters 
and temporary variables required by the called procedure. 

As with calls within a privilege level, parameters for the procedure are placed on the stack. The 
parameters are copied to the new stack. The parameters can be accessed within the called 
procedure using the same relative addresses which would have been used if no stack switching 
had occurred. The count field of a call gate tells the processor how many doublewords (up to 
31) to copy from the caller's stack to the stack of the called procedure. If the count is 0, no 
parameters are copied. 



i 



12-15 



PROTECTION 



iniel 



If more than 31 doublewords of data need to be passed to the called procedure, one of the 
parameters can be a pointer to a data structure, or the saved contents of the SS and ESP 
registers may be used to access parameters in the old stack space. 

The processor performs the following stack-related steps in executing a procedure call between 
privilege levels. 

1 . The stack of the called procedure is checked to make certain it is large enough to hold the 
parameters and the saved contents of registers; if not, a stack exception is generated. 

2. The old contents of the SS and ESP registers are pushed onto the stack of the called 
procedure as two doublewords (the 16-bit SS register is zero-extended to 32 bits; the zero- 
extended upper word is Intel reserved; do not use). 

3. The parameters are copied from the stack of the caller to the stack of the called procedure. 

4. A pointer to the instruction after the CALL instruction (the old contents of the CS and EIP 
registers) is pushed onto the new stack. The contents of the SS and ESP registers after the 
call point to this return pointer on the stack. 

Figure 12-9 illustrates the stack frame before, during, and after a successful interlevel 
procedure call and return. 



OLD STACK 
BEFORE CALL: 



PARM1 



PARM 2 



PARM3 



NEW STACK: 
AFTER CALL, 
BEFORE RETURN: 



OLD SS 



ESP 



OLD ESP 



PARM 1 



PARM 2 



PARM 3 



OLD CS 



OLD EIP 



OLD STACK, AFTER RETURN 
WITH FAR RET N (N = 3): 



:<■ 



ESP 



ESP 



APM79 



Figure 12-9. Stack Frame During Interievel Call 

The TSS does not have a stack pointer for a privilege level 3 stack, because a procedure at 
privilege level 3 cannot be called by a less privileged procedure. The stack for privilege level 3 
is preserved by the contents of the SS and EIP registers which have been saved on the stack of 
the privilege level called from level 3. 

A call using a call gate does not check the values of the words copied onto the new stack. The 
called procedure should check each parameter for validity. A later section discusses how the 
ARPL, VERR, VERW, LSL, and LAR instructions can be used to check pointer values. 



12-16 



PROTECTION 



12.5.2. Returning from a Procedure 

The near forms of the RET instruction only transfer control within the current code segment, 
therefore are subject only to limit checking. The offset to the instruction following the CALL 
instruction is popped from the stack into the EIP register. The processor checks that this offset 
does not exceed the limit of the current code segment. 

The far form of the RET instruction pops the return address which was pushed onto the stack 
by an earlier far CALL instruction. Under normal conditions, the return pointer is valid, 
because it was generated by a CALL or INT instruction. Nevertheless, the processor performs 
privilege checking because of the possibility that the current procedure altered the pointer or 
failed to maintain the stack properly. The RPL of the code-segment selector popped off the 
stack by the return instruction should have the privilege level of the calling procedure. 

A return to another segment can change privilege levels, but only toward less privileged levels. 
When a RET instruction encounters a saved CS value whose RPL is numerically greater (less 
privileged) than the CPL, a return across privilege levels occurs. A return of this kind performs 
these steps: 

1. The checks shown in Table 12-2 are made, and the CS, EIP, SS, and ESP registers are 
loaded with their former values, which were saved on the stack. 

2. The old contents of the SS and ESP registers (from the top of the current stack) are 
adjusted by the number of bytes indicated in the RET instruction. The resulting ESP value 
is not checked against the limit of the stack segment. If the ESP value is beyond the limit, 
that fact is not recognized until the next stack operation. (The contents of the SS and ESP 
registers for the returning procedure are not preserved; normally, their values are the same 
as those contained in the TSS.) 

3. The contents of the DS, ES, FS, and GS segment registers are checked. If any of these 
registers refer to segments whose DPL is less than the new CPL (excluding conforming 
code segments), the segment register is loaded with the null selector (Index = 0, TI = 0). 
The RET instruction itself does not signal exceptions in these cases; however, any 
subsequent memory reference using a segment register containing the null selector will 
cause a general-protection exception. This prevents less privileged code from accessing 
more privileged segments using selectors left in the segment registers by a more privileged 
procedure. 



12-17 



PROTECTION 




Table 12-2. Interlevel Return Checks 



Type of Check 


Exception Type 


Error Code 


Top-of-stack + 7 must be within stack segment limit 


stack 





RPL of return code segment must be greater than the CPL 


protection 


Return CS 


Rpturn r.nrip c:pnmpnt ^plpptnr mn^t hp nnn-niill 


nrntpptinn 


Rpturn OS 


Rpti irn phHp Qonmont Hpcprintnr - mi ict ho within Hocrrintnr tahlo 
ncjiuiii uuuc ocyiiifcjiii ucooi i|jiui iiiuoi uc wiuiiii uoooi ijjiui ictuic 

limit 


nrntootinn 
yj\ UicUliui I 


Rpti irn PQ 


Return segment descriptor must be a code segment 


protection 


Return CS 


Rotiirn pnHo conrnDnt ic nrocont 


coninont not nrocont 
otJLjl 1 lt;l 11 1 IUI [JlcotMIl 


Rpti irn PQ 
riclUi 1 1 wO 


DPL of return non-conforming code segment must equal RPL 
of return code segment selector, or DPL of return conforming 
code segment must be less than or equal to RPL of return code 
segment selector 


nrotpction 


Return CS 


ESP + N + 15* must be within the stack segment limit 


stack fault 





Segment selector at ESP + N + 12* must be non-null 


protection 


Return SS 


Segment descriptor at ESP + N + 12* must be within descriptor 
table limit 


protection 


Return SS 


Stack segment descriptor must be read/write 


protection 


Return SS 


Stack segment must be present 


stack fault 


Return SS 


Old stack segment DPL must be equal to RPL of old code 
segment 


protection 


Return SS 


Old stack segment selector must have an RPL equal to the 
DPL of the old stack segment 


protection 


Return SS 



N is the value of the immediate operand supplied with the RET instruction. 



12.6. INSTRUCTIONS RESERVED FOR THE OPERATING SYSTEM 

Instructions which can affect the protection mechanism or influence general system 
performance can only be executed by trusted procedures. The processor has two classes of 
such instructions: 

1 . Privileged instructions — those used for system control. 

2. Sensitive instructions — those used for I/O and I/O-related activities. 

1 2,6.1 . Privileged Instructions 

The instructions which affect protected facilities can be executed only when the CPL is (most 
privileged). If one of these instructions is executed when the CPL is not 0, a general-protection 
exception is generated. These instructions include: 

CLTS —Clear Task-Switched Flag 



12-18 



i 



® 



PROTECTION 



HLT 



— Halt Processor 

— Invalidate Cache 

—Invalidate TLB Entry 

— Load GDT Register 

— Load IDT Register 

— Load LDT Register 

— Load Machine Status Word 



INVD 

INVLPG 

LGDT 



LIDT 



LLDT 



LMSW 



LTR 



— Load Task Register 

— Move to Control Register n 

— Move to Debug Register n 

— Write Back and Invalidate Cache 



MOV to/from CRn 
MOV to/from DRn 
WBINVD 



12.6.2. Sensitive Instructions 

Instructions which deal with I/O need to be protected, but they also need to be used by 
procedures executing at privilege levels other than (the most privileged level). The 
mechanisms for protection of I/O operations are covered in detail in Chapter 15. 



Pointer validation is necessary for maintaining isolation between privilege levels. It consists of 
the following steps: 

1 . Check whether the supplier of the pointer is allowed to access the segment. 

2. Check whether the segment type is compatible with its use. 

3. Check whether the pointer offset exceeds the segment limit. 

Although the processor automatically performs checks 2 and 3 during instruction execution, 
software must assist in performing the first check. The ARPL instruction is provided for this 
purpose. Software also can use steps 2 and 3 to check for potential violations, rather than 
waiting for an exception to be generated. The LAR, LSL, VERR, and VERW instructions are 
provided for this purpose. 

LAR (Load Access Rights) is used to verify that a pointer refers to a segment of a compatible 
privilege level and type. The LAR instruction has one operand: a segment selector for the 
descriptor whose access rights are to be checked. Conforming code segments may be accessed 
from any privilege level. Any other segment descriptor must be readable at a privilege level 
which is numerically greater (less privileged) than the CPL and the selector's RPL. If the de- 
scriptor is readable, the LAR instruction gets the second doubleword of the descriptor, masks 
this value with OOFxFFOOH, stores the result into the specified 32-bit destination register, and 
sets the ZF flag. (The x indicates that the corresponding four bits of the stored value are 
undefined.) Once loaded, the access rights can be tested. All valid descriptor types can be 
tested by the LAR instruction. If the RPL or CPL is greater than the DPL, or if the segment 
selector would exceed the limit for the descriptor table, zero is returned, and the ZF flag is 
cleared. 



12.7. INSTRUCTIONS FOR POINTER VALIDATION 



I 



12-19 



PROTECTION 



LSL (Load Segment Limit) allows software to test the limit of a segment descriptor. If the 
descriptor referenced by the segment selector (in memory or a register) is readable at the CPL, 
the LSL instruction loads the specified 32-bit register with a 32-bit, byte granular limit 
calculated from the concatenated limit fields and the G bit of the descriptor. This only can be 
done for descriptors which describe segments (data, code, task state, and local descriptor 
tables); gate descriptors are inaccessible. (Table 12-3 lists in detail which types are valid and 
which are not.) Interpreting the limit is a function of the segment type. For example, 
downward-expandable data segments (stack segments) treat the limit differently than other 
kinds of segments. For both the LAR and LSL instructions, the ZF flag is set if the load was 
successful; otherwise, the ZF flag is cleared. 



Table 12-3. Valid Descriptor Types for LSL Instruction 



Type Code 


Descriptor Type 


Valid? 





reserved 


no 


1 


reserved 


no 


2 


LDT 


yes 


3 


reserved 


no 


4 


reserved 


no 


5 


Task Gate 


no 


6 


reserved 


no 


7 


reserved 


no 


8 


reserved 


no 


9 


Available 32-bit TSS 


yes 


A 


reserved 


no 


B 


Busy 32-bit TSS 


yes 


C 


32-bit Call Gate 


no 


D 


reserved 


no 


E 


32-bit Interrupt Gate 


no 


F 


32-bit Trap Gate 


no 



An additional check, the alignment check, can be applied at CPL = 3. When both the AM bit in 
CRO and the AC flag are set, unaligned memory references generate exceptions. This is useful 
for programs which use the low two bits of pointers to identify the type of data structure they 
address. For example, a subroutine in a math library may accept pointers to numeric data 
structures. If the type of this structure is assigned a code of 10 (binary) in the lowest two bits of 
pointers to this type, math subroutines can correct for the type code by adding a displacement 
of -10 (binary). If the subroutine should ever receive the wrong pointer type, an unaligned 
reference would be produced, which would generate an exception. Alignment checking 
accelerates the processing of programs written in symbolic-processing (i.e., Artificial 
Intelligence) languages such as Lisp, Prolog, Smalltalk, and C++. It can be used to speed up 
pointer tag type checking. 



12-20 



i 



PROTECTION 



1 2.7.1 . Descriptor Validation 

The processor has two instructions, VERR and VERW, which determine whether a segment 
selector points to a segment which can be read or written using the CPL. Neither instruction 
causes a protection fault if the segment cannot be accessed. 

VERR (Verify for Reading) verifies a segment for reading and sets the ZF flag if that 
segment is readable using the CPL. The VERR instruction checks the following: 

• The segment selector points to a segment descriptor within the bounds of the GDT or an 
LDT. 

• The segment selector indexes to a code or data segment descriptor. 

• The segment is readable and has a compatible privilege level. 

The privilege check for data segments and nonconforming code segments verifies that the DPL 
must be a less privileged level than either the CPL or the selector's RPL. Conforming segments 
are not checked for privilege level. 

VERW (Verify for Writing) provides the same capability as the VERR instruction for 
verifying writability. Like the VERR instruction, the VERW instruction sets the ZF flag if the 
segment can be written. The instruction verifies the descriptor is within bounds, is a segment 
descriptor, is writable, and has a DPL which is a less privileged level than either the CPL or the 
selector's RPL. Code segments are never writable, whether conforming or not. 



1 2.7.2. Pointer Integrity and RPL 

The requestor's privilege level (RPL) can prevent accidental use of pointers which crash more 
privileged code from a less privileged level. 

A common example is a file system procedure, FREAD (file_id, n_bytes, buffer_ptr). This 
hypothetical procedure reads data from a disk file into a buffer, overwriting whatever is 
already there. It services requests from programs operating at the application level, but it must 
run in a privileged mode in order to read from the system I/O buffer. If the application 
program passed this procedure a bad buffer pointer, one which pointed at critical code or data 
in a privileged address space, the procedure could cause damage which would crash the 
system. 

Use of the RPL can avoid this problem. The RPL allows a privilege override to be assigned to 
a selector. This privilege override is intended to be the privilege level of the code segment 
which generated the segment selector. In the above example, the RPL would be the CPL of the 
application program which called the system level procedure. The processor automatically 
checks any segment selector loaded into a segment register to determine whether its RPL 
allows access. 

To take advantage of the processor's checking of the RPL, the called procedure need only 
check that all segment selectors passed to it have an RPL for the same or a less privileged level 
as the original caller's CPL. This guarantees that the segment selectors are not more privileged 
than their source. If a selector is used to access a segment which the source would not be able 
to access directly, i.e. the RPL is less privileged than the segment's DPL, a general-protection 



I 



12-21 



PROTECTION 




exception is generated when the selector is loaded into a segment register. 

ARPL (Adjust Requested Privilege Level) adjusts the RPL field of a segment selector to be 
the larger (less privileged) of its original value and the value of the RPL field for a segment 
selector stored in a general register. The RPL fields are the two least significant bits of the 
segment selector and the register. The latter normally is a copy of the caller's CS register on the 
stack. If the adjustment changes the selector's RPL, the ZF flag is set; otherwise, the ZF flag is 
cleared. 



12.8. PAGE-LEVEL PROTECTION 

Protection applies to both segments and pages. When the flat model for memory segmentation 
is used, page-level protection prevents programs from interfering with each other. 

Each memory reference is checked to verify that it satisfies the protection checks. All checks 
are made before the memory cycle is started; any violation prevents the cycle from starting and 
results in an exception. Because checks are performed in parallel with address translation, there 
is no performance penalty. There are two page-level protection checks: 

1. Restriction of addressable domain. 

2. Type checking. 

A protection violation results in an exception. See Chapter 14 for an explanation of the 
protected-mode exception mechanism. This chapter describes the protection violations which 
lead to exceptions. 



12.8.1. Page-Table Entries Hold Protection Parameters 

Figure 12-10 highlights the fields of a page table entry which control access to pages. The 
protection checks are applied for both first- and second-level page tables. 



/Si $029* 


$8 27 26 25 24 23 22 21 20 19 18 17 10 1$ 14 IS 12/11 10 9 


/8 ?/$/$/4/S/2/f 


h 




PA®£ FRAME ADDRESS 31:12 


AVAIL 


GO 


D 


A 


P 

C 



P 

w 

T 


U 

/ 

s 


R 

/ 

W 


p 




\ 


\ 


\ 


\ 


\ 


\ 


\ \ 


\ 


\ 


> 



R/W READ/WRITE 

U/S USER/SUPERVISOR apm?7 



Figure 12-10. Protection Fields of a Page Table Entry 



12.8.1.1. RESTRICTING ADDRESSABLE DOMAIN 

Privilege is interpreted differently for pages than for segments. With segments, there are four 
privilege levels, ranging from (most privileged) to 3 (least privileged). With pages, there are 
two levels of privilege: 

12-22 ■ 



PROTECTION 



1 . Supervisor level (U/S=0) — for the operating system, other system software (such as device 
drivers), and protected system data (such as page tables). 

2. User level (U/S=l) — for application code and data. 

The privilege levels used for segmentation are mapped into the privilege levels used for 
paging. If the CPL is 0, 1, or 2, the processor is running at supervisor level. If the CPL is 3, the 
processor is running at user level.When the processor is running at supervisor level, all pages 
are accessible. When the processor is running at user level, only pages from the user level are 
accessible. 

12.8.1-2. TYPE CHECKING 

Only two types of pages are recognized by the protection mechanism: 

1 . Read-only access (R/W=0). 

2 . Read/write access (R/W = 1 ) . 

When the processor is running at supervisor level with the WP bit in the CRO register clear (its 
state following reset initialization), all pages are both readable and writable (write-protection is 
ignored). When the processor is running at user level, only pages which belong to user level 
and are marked for read/write access are writable. User-level pages which are read/write or 
read-only are readable. Pages from the supervisor level are neither readable nor writable from 
user level. A general-protection exception is generated on any attempt to violate the protection 
rules. 

Unlike the Intel386 DX processor, the Intel486 and Pentium processors allow user-mode pages 
to be write-protected against supervisor mode access. Setting the WP bit in the CRO register 
enables supervisor-mode sensitivity to user-mode, write-protected pages. 

The supervisor write-protect feature is also useful for implementing the copy-on- write strategy 
used by some operating systems, such as UNIX, for task creation (also called forking or 
spawning). When a new task is created, it is possible to copy the entire address space of the 
parent task. This gives the child task a complete, duplicate set of the parent's segments and 
pages. An alternative strategy, copy-on- write, saves memory space and time by mapping the 
child's segments and pages to the same segments and pages used by the parent task. A private 
copy of a page gets created only when one of the tasks writes to the page. By using the WP bit, 
the supervisor can detect an attempt to write to a user-level page, and can copy the page at that 
time. 



12.8.2. Combining Protection of Both Levels of Page Tables 

For any one page, the protection attributes of its page directory entry (first-level page table) 
may differ from those of its second-level page table entry. The processor checks the protection 
for a page by examining the protection specified in both the page directory (first-level page 
table) and the second-level page table. Table 12-4 shows the protection provided by the 
possible combinations of protection attributes when the WP bit is clear. 



12-23 



PROTECTION 



12.8.3. Overrides to Page Protection 

Certain accesses are checked as if they are privilege-level accesses, for any value of CPL: 

• Access to segment descriptors (LDT, GDT, TSS and IDT). 

• Access to inner stack during a CALL instruction, or exceptions and interrupts, when a 
change of privilege level occurs. 



Table 12-4. Combined Page Directory and Page Table Protection 



Page Directory Entry 


Page Table Entry 


Combined Effect 


Privilege 


Access Type 


Privilege 


Access Type 


Privilege 


Access Type 


User 


Read-Only 


User 


Read-Only 


User 


Read-Only 


User 


Read-Only 


User 


Read-Write 


User 


Read-Only 


User 


Read-Write 


User 


Read-Only 


User 


Read-Only 


User 


Read-Write 


User 


Read-Write 


User 


Read/Write 


User 


Read-Only 


Supervisor 


Read-Only 


Supervisor 


Read/Write* 


User 


Read-Only 


Supervisor 


Read-Write 


Supervisor 


Read/Write* 


User 


Read-Write 


Supervisor 


Read-Only 


Supervisor 


Read/Write* 


User 


Read-Write 


Supervisor 


Read-Write 


Supervisor 


Read/Write* 


Supervisor 


Read-Only 


User 


Read-Only 


Supervisor 


Read/Write* 


Supervisor 


Read-Only 


User 


Read-Write 


Supervisor 


Read/Write* 


Supervisor 


Read-Write 


User 


Read-Only 


Supervisor 


Read/Write* 


Supervisor 


Read-Write 


User 


Read-Write 


Supervisor 


Read/Write* 


Supervisor 


Read-Only 


Supervisor 


Read-Only 


Supervisor 


Read/Write* 


Supervisor 


Read-Only 


Supervisor 


Read-Write 


Supervisor 


Read/Write* 


Supervisor 


Read-Write 


Supervisor 


Read-Only 


Supervisor 


Read/Write* 


Supervisor 


Read-Write 


Supervisor 


Read-Write 


Supervisor 


Read/Write* 



NOTE: 



*lf the WP bit of CRO is set, the access type is Read-Only 



12.9. COMBINING PAGE AND SEGMENT PROTECTION 

When paging is enabled, the processor first evaluates segment protection, then evaluates page 
protection. If the processor detects a protection violation at either the segment level or the page 
level, the operation does not go through; an exception occurs instead. If an exception is 
generated by segmentation, no paging exception is generated for the operation. 

For example, it is possible to define a large data segment which has some parts which are read- 
only and other parts which are read-write. In this case, the page directory (or page table) 



12-24 



i 




PROTECTION 



entries for the read-only parts would have the U/S and R/W bits specifying no write access for 
all the pages described by that directory entry (or for individual pages specified in the second- 
level page tables). This technique might be used, for example, to define a large data segment, 
part of which is read-only (for shared data or ROMmed constants). This defines a flat data 
space as one large segment, with flat pointers used to access this flat space, while protecting 
shared data, shared files mapped into the virtual space, and supervisor areas. 



12-25 



intel 



13 



Protected-Mode 
Multitasking 



i 



CHAPTER 13 
PROTECTED-MODE MULTITASKING 



The Pentium processor provides hardware support for multitasking. A task is a program which 
is running, or waiting to run while another program is running. A task is invoked by an 
interrupt, exception, jump, or call. When one of these forms of transferring execution is used 
with a destination specified by an entry in one of the descriptor tables, this descriptor can be a 
type which causes a new task to begin execution after saving the state of the current task. 
There are two types of task-related descriptors which can occur in a descriptor table: task state 
segment descriptors and task gates. When execution is passed to either kind of descriptor, a 
task switch occurs. 

A task switch is like a procedure call, but it saves more processor state information. A task 
switch transfers execution to a completely new environment, the environment of a task. This 
requires saving the contents of nearly all the processor registers, including the EFLAGS 
register and the segment registers. Unlike procedures, tasks are not re-entrant. A task switch 
does not push anything on the stack. The processor state information is saved in a data 
structure in memory, called a task state segment. 

The registers and data structures which support multitasking are: 

• Task state segment. 

• Task state segment descriptor. 

• Task register. 

• Task gate descriptor. 

With these structures, the processor can switch execution from one task to another, saving the 
context of the original task to allow the task to be restarted. The processor also offers two other 
task-management features: 

1. Interrupts and exceptions can cause task switches (if needed in the system design). The 
processor can not only perform a task switch to handle the interrupt or exception, but it 
can automatically switch back when the interrupt or exception returns. This mechanism 
can handle interrupts that occur during interrupt tasks. 

2. With each switch to another task, the processor also can switch to another LDT. This can 
be used to give each task a different logical-to-physical address mapping. This is an 
additional protection feature, because tasks can be isolated and prevented from interfering 
with one another. The PDBR register also is reloaded. This allows the paging mechanism 
to be used to enforce the isolation between tasks. 



13-1 



PROTECTED-MODE MULTITASKING 




Use of the multitasking mechanism is optional. In some applications, it may not be the best 
way to manage program execution. Where extremely fast response to interrupts is needed, the 
time required to save the processor state may be too great. A possible compromise in these 
situations is to use the task-related data structures, but perform task switching in software. This 
allows a smaller processor state to be saved. This technique can be one of the optimizations 
used to enhance system performance after the basic functions of a system have been 
implemented. 



13.1. TASK STATE SEGMENT 

The processor state information needed to restore a task is saved in a type of segment, called a 
task state segment or TSS. Figure 13-1 shows the format of a TSS for tasks designed for 32-bit 
CPUs (compatibility with 16-bit 80286 tasks is provided by a different kind of TSS; see 
Chapter 23). The fields of a TSS are divided into two main categories: 

1. Dynamic fields the processor updates with each task switch. These fields store: 

— The general registers (EAX, ECX, EDX, EBX, ESP, EBP, ESI, and EDI). 

— The segment registers (ES, CS, SS, DS, FS, and GS). 

— The flags register (EFLAGS). 

— The instruction pointer (EIP). 

— The selector for the TSS of the previous task (updated only when a return is 
expected). 

2. Static fields the processor reads, but does not change. These fields are set up when a task is 
created. These fields store: 

— The selector for the task's LDT. 

— The PDBR of the task (CR3). 

— The logical address of the stacks for privilege levels 0, 1, and 2. 

— The T-bit (debug trap bit) which, when set, causes the processor to raise a debug 
exception when a task switch occurs. (See Chapter 17 for more information on 
debugging.) 

— The base address for the I/O permission bit map and interrupt redirection bitmap. If 
present, these maps are stored in the TSS at higher addresses. The base address points 
to the beginning of the I/O map and the end of the 32-byte interrupt map. (See 
Chapter 15 for more information about the I/O permission bit map and Chapter 22 for 
more information about interrupt redirection.) 



13-2 




PROTECTED-MODE MULTITASKING 



31 


15 







I/O MAP BASE ADDRESS 





T 


64 




SELECTOR FOR TASK'S LDT 


60 


000000000000 


GS 


5C 


0000000000 


FS 


58 





DS 


54 


4 


SS 


50 


0000000000 


CS 


4C 


00000 000000000 


ES 


48 


EDI 


44 


ESI 


40 


EBP 


3C 


ESP 


38 


EBX 


34 


EDX 


30 


ECX 


2C 


EAX 


28 


EFLAGS 


24 


EIP 


20 


CR3 (PDBR) 


1C 


§0000 00000000000 


SS2 


18 


ESP2 


14 





SS1 


10 


ESP1 


C 


if 


SSO 


8 


ESPO 


4 


* % > 0f 


LINK (OLD TSS SELECTOR) 






ADDRESSES ARE SHOWN IN HEXADECIMAL 

NOTE: BITS MARKED AS ARE RESERVED. DO NOT USE. 

APM62 



Figure 13-1. 32-Bit Task State Segment 

If paging is used, it is important to avoid placing a page boundary within the part of the TSS 
which is read by the processor during a task switch (the first 104 bytes). If a page boundary is 
placed within this part of the TSS, the pages on either side of the boundary must be present at 



i 



13-3 



PROTECTED-MODE MULTITASKING 




the same time. In addition, if paging is used, the pages corresponding to the old task's TSS, the 
new task's TSS, and the descriptor table entries for each should be marked as present and 
read/write. It is an unrecoverable error to receive a page fault or general-protection exception 
after the processor has started to read the TSS. 



13.2. TSS DESCRIPTOR 

The task state segment, like all other segments, is defined by a descriptor. Figure 13-2 shows 
the format of a TSS descriptor. 



TSS DESCRIPTOR 



131 30 29 28 27 26 25 24/23/22/21/20/19 18 17 16/15/14 13/12/11 10 9 8/7 6 5 4 3 2 1 ID 



BASE 31:24 



LIMIT 
19:16 



BASE ADDRESS 15:00 



TYPE 
1 1 | B 1 1 



BASE 23:16 



SEGMENT LIMIT 15:00 



\s\ 3s is n i*. ■& ii i\ ■aa ^ \^ \i ^\va ^ i s \ 3, i \ ^\ 



AVL AVAILABLE FOR USE BY SYSTEM SOFTWARE 

B BUSY BIT 

BASE SEGMENT BASE ADDRESS 

DPL DESCRIPTOR PRIVILEGE LEVEL 

G GRANULARITY 

LIMIT SEGMENT LIMIT 

P SEGMENT PRESENT 

TYPE SEGMENT TYPE 



Figure 13-2. TSS Descriptor 



The Busy bit in the Type field indicates whether the task is busy. A busy task is currently 
running or waiting to run. A Type field with a value of 9 indicates an inactive task; a value of 
1 1 (decimal) indicates a busy task. Tasks are not recursive. The processor uses the Busy bit to 
detect an attempt to call a task whose execution has been interrupted. 

The Base, Limit, and DPL fields and the Granularity bit and Present bit have functions similar 
to their use in data-segment descriptors. The Limit field must have a value equal to or greater 
than 67H, one byte less than the minimum size of a task state. An attempt to switch to a task 
whose TSS descriptor has a limit less than 67H generates an exception. A larger limit is 
required if an I/O permission map is used. A larger limit also may be required for the operating 
system, if the system stores additional data in the TSS. 

A procedure with access to a TSS descriptor can cause a task switch. In most systems, the DPL 
fields of TSS descriptors should be less than 3, so only privileged software can perform task 
switching. 

Access to a TSS descriptor does not give a procedure the ability to read or modify the 
descriptor. Reading and modification only can be done using a data descriptor mapped to the 
same location in memory. Loading a TSS descriptor into a segment register generates an 



13-4 



i 



PROTECTED-MODE MULTITASKING 



exception. TSS descriptors only may reside in the GDT. An attempt to access a TSS using a 
selector with a set TI bit (which indicates the current LDT) generates an exception. 

13.3. TASK REGISTER 

The task register (TR) is used to find the current TSS. Figure 13-3 shows the path by which the 
processor accesses the TSS. 

The task register has both a visible part (i.e., a part which can be read and changed by 
software) and an invisible part (i.e., a part maintained by the processor and inaccessible to 
software). The selector in the visible portion indexes to a TSS descriptor in the GDT. The 
processor uses the invisible portion of the TR register to retain the base and limit values from 
the TSS descriptor. Keeping these values in a register makes execution of the task more 
efficient, because the processor does not need to fetch these values from memory to reference 
the TSS of the current task. 

The LTR and STR instructions are used to modify and read the visible portion of the task 
register. Both instructions take one operand, a 16-bit segment selector located in memory or a 
general register. 

LTR (Load task register) loads the visible portion of the task register with the operand, 
which must index to a TSS descriptor in the GDT. The LTR instruction also loads the invisible 
portion with information from the TSS descriptor. The LTR instruction is a privileged 
instruction; it may be executed only when the CPL is 0. The LTR instruction generally is used 
during system initialization to put an initial value in the task register; afterwards, the contents 
of the TR register are changed by events which cause a task switch. 

STR (Store task register) stores the visible portion of the task register in a general register or 
memory. The STR instruction is privileged. 



i 



13-5 



PROTECTED-MODE MULTITASKING 



irrtel 



TASK STATE SEGMENT 



VISIBLE PART 



SELECTOR 



A 



INVISIBLE PART 



BASE ADDRESS 



SEGMENT LIMIT 



GLOBAL 
DESCRIPTOR TABLE 



, 

TSS DESCRIPTOR 



TR 



Figure 13-3. Task Register 



13,4. TASK GATE DESCRIPTOR 

A task gate descriptor provides an indirect, protected reference to a task. Figure 13-4 illustrates 
the format of a task gate. 



13-6 



i 



intel 



PROTECTED-MODE MULTITASKING 



TASK GATE DESCRIPTOR 



131 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16/15/14 13/1^11 10 9 8/7 6 5 4 3 2 1 1 



TSS SEGMENT SELECTOR 



TYPE 
10 1 



3ft 78> 11 3& 75> «2A TEL 1\ *2R> \$ \* VT \N \^ XL \\ \fr % % 1 % *> ft, 3, 1 \ 



+4 



+0 



DPL DESCRIPTOR PRIVILEGE LEVEL 
P SEGMENT PRESENT 

TYPE SEGMENT TYPE 



Figure 13-4. Task Gate Descriptor 



The Selector field of a task gate indexes to a TSS descriptor. The RPL in this selector is not 
used. 

The DPL of a task gate controls access to the descriptor for a task switch. A procedure may not 
select a task gate descriptor unless the selector's RPL and the CPL of the procedure are 
numerically less than or equal to the DPL of the descriptor. This prevents less privileged 
procedures from causing a task switch. (Note that when a task gate is used, the DPL of the 
destination TSS descriptor is not used.) 

A procedure with access to a task gate can cause a task switch, as can a procedure with access 
to a TSS descriptor. Both task gates and TSS descriptors are provided to satisfy three needs: 

1. The need for a task to have only one Busy bit. Because the Busy bit is stored in the TSS 
descriptor, each task should have only one such descriptor. There may, however, be 
several task gates which select a single TSS descriptor. 

2. The need to provide selective access to tasks. Task gates fill this need, because they can 
reside in an LDT and can have a DPL which is different from the TSS descriptor's DPL. A 
procedure which does not have sufficient privilege to use the TSS descriptor in the GDT 
(which usually has a DPL of 0) can still call another task if it has access to a task gate in 
its LDT. With task gates, the operating system can limit task switching to specific tasks. 

3. The need for an interrupt or exception to cause a task switch. Task gates also may reside in 
the IDT, which allows interrupts and exceptions to cause task switching. When an 
interrupt or exception supplies a vector to a task gate, the processor switches to the 
indicated task. 

Figure 13-5 illustrates how both a task gate in an LDT and a task gate in the IDT can identify 
the same task. 



i 



13-7 



PROTECTED-MODE MULTITASKING 



LOCAL 
DESCRIPTOR TABLE 



GLOBAL 
DESCRIPTOR TABLE 



TASK STATE 
SEGMENT 



TASK GATE 



-TSS DESCRIPTOR - 



INTERRUPT 
DESCRIPTOR TABLE 



TASK GATE 



Figure 13-5. Task Gates Reference Tasks 



13.5. TASK SWITCHING 

The processor transfers execution to another task in any of four cases: 

1. The current task executes a JMP or CALL to a TSS descriptor. 

2. The current task executes a JMP or CALL to a task gate. 

3. An interrupt or exception indexes to a task gate in the IDT. 

4. The current task executes an IRET when the NT flag is set. 

The JMP, CALL, and IRET instructions, as well as interrupts and exceptions, are all ordinary 
mechanisms of the processor which can be used in circumstances in which no task switch 

13-8 I 



PROTECTED-MODE MULTITASKING 



occurs. The descriptor type (when a task is called) or the NT flag (when the task returns) make 
the difference between the standard mechanism and the form which causes a task switch. 

To cause a task switch, a JMP or CALL instruction can transfer execution to either a TSS 
descriptor or a task gate. The effect is the same in either case: the processor transfers execution 
to the specified task. 

An exception or interrupt causes a task switch when it indexes to a task gate in the IDT. If it 
indexes to an interrupt or trap gate in the IDT, a task switch does not occur. See Chapter 14 for 
more information on the interrupt mechanism. 

An interrupt service routine always returns execution to the interrupted procedure, which may 
be in another task. If the NT flag is clear, a normal return occurs. If the NT flag is set, a task 
switch occurs. The task receiving the task switch is specified by the TSS selector in the TSS of 
the interrupt service routine. 

A task switch has these steps: 

1. Check that the current task is allowed to switch to the new task. Data-access privilege 
rules apply to JMP and CALL instructions. The DPL of the TSS descriptor and the task 
gate must be numerically greater (e.g., lower privilege level) than or equal to both the CPL 
and the RPL of the gate selector. Exceptions, interrupts, and IRET instructions are 
permitted to switch tasks regardless of the DPL of the destination task gate or TSS 
descriptor. 

2. Check that the TSS descriptor of the new task is marked present and has a valid limit 
(greater than or equal to 67H). Errors restore any changes made in the processor state 
when an attempt is made to execute the error-generating instruction. This lets the return 
address for the exception handler point to the error-generating instruction, rather than the 
instruction following the error-generating instruction. The exception handler can fix the 
condition which caused the error, and restart the task. The intervention of the exception 
handler can be completely transparent to the application program. 

3. Save the state of the current task. The processor finds the base address of the current TSS 
in the task register. The processor registers are copied into the current TSS (the EAX, 
ECX, EDX, EBX, ESP, EBP, ESI, EDI, ES, CS, SS, DS, FS, GS, and EFLAGS registers, 
and the instruction pointer). 

4. Load the TR register with the selector to the new task's TSS descriptor, set the new task's 
Busy bit, and set the TS bit in the CRO register. The selector is either the operand of a JMP 
or CALL instruction, or it is taken from a task gate. 

5. Load the new task's state from its TSS and continue execution. The registers loaded are the 
LDTR register; the PDBR (CR3); the EFLAGS register; the general registers EIP, EAX, 
ECX, EDX, EBX, ESP, EBP, ESI, EDI; and the segment registers ES, CS, SS, DS, FS, 
and GS. Any errors detected in this step occur in the context of the new task. To an 
exception handler, the first instruction of the new task appears not to have executed. 

Note that the state of the old task is always saved when a task switch occurs. If the task is 
resumed, execution starts with the instruction which normally would have been next. The 
registers are restored to the values they held when the task stopped running. 

Every task switch sets the TS (task switched) bit in the CRO register. The TS bit is useful to 
system software for coordinating the operations of the integer unit with the floating-point unit. 

■ 13-9 



PROTECTED-MODE MULTITASKING 



The TS bit indicates that the context of the floating-point unit may be different from that of the 
current task. Chapter 6 discusses the TS bit and the FPU in more detail. 

Exception service routines for exceptions caused by task switching (exceptions resulting from 
steps 5 through 17 shown in Table 13-1 may be subject to recursive calls if they attempt to 
reload the segment selector which generated the exception. The cause of the exception (or the 
first of multiple causes) should be fixed before reloading the selector. 

The privilege level at which the old task was running has no relation to the privilege level of 
the new task. Because the tasks are isolated by their separate address spaces and task state 
segments, and because privilege rules control access to a TSS, no privilege checks are needed 
to perform a task switch. The new task begins executing at the privilege level indicated by the 
RPL of the new contents of the CS register, which are loaded from the TSS. 



13-10 




PROTECTED-MODE MULTITASKING 



Table 13-1. Checks Made during a Task Switch 



Qtort 

oiep 


Condition Checked 


Exception 1 


Error Code Reference 


1 


TSS descriptor is present in memory 


NP 


New Task's TSS 


2 


TSS descriptor is not busy 


TS (for IRET); GP (for 
JMP, CALL, INT) 


Task's backlink TSS 


3 


TSS segment limit greater than or equal to 
108 


TS 


New Task's TSS 


4 


Registers are loaded from the values in the TSS 


5 


LDT selector of new task is valid 2 


TS 


New Task's LDT 


6 


Code segment DPL matches selector RPL 




New Code Segment 


7 


SS selector is valid 2 


TS 


New Stack Segment 


8 


Stack segment is present in memory 


SF 


New Stack Segment 


9 


Stack segment DPL matches CPL 


TS 


New stack segment 


10 


LDT of new task is present in memory 


TS 


New Task's LDT 


11 


CS selector is valid 2 


TS 


New Code Segment 


12 


Code segment is present in memory 


NP 


New Code Segment 


13 


Stack segment DPL matches selector RPL 


TS 


New Stack Segment 


14 


DS, ES, FS, and GS selectors are valid 2 


TS 


New Data Segment 


15 


DS, ES, FS, and GS segments are readable 


TS 


New Data Segment 


16 


DS, ES, FS, and GS segments are present 
in memory 


NP 


New Data Segment 


17 


DS, ES, FS, and GS segment DPL greater 
than or equal to CPL (unless these are 
conforming segments) 


TS 


New Data Segment 



NOTES: Future Intel processors may use a different order of checks. 

1 . NP = Segment-not-present exception, GP = General-protection exception, TS = Invalid-TSS exception, 
SF = Stack exception. 

2. A selector is valid if it is in a compatible type of table (e.g., an LDT selector may not be in any table 
except the GDT), occupies an address within the table's segment limit, and refers to a compatible type 
of descriptor (e.g., a selector in the CS register only is valid when it indexes to a descriptor for a code 
segment; the descriptor type is specified in its Type field). 



13.6. TASK LINKING 

The Link field of the TSS and the NT flag are used to return execution to the previous task. 
The NT flag indicates whether the currently executing task is nested within the execution of 
another task, and the Link field of the current task's TSS holds the TSS selector for the higher- 
level task, if there is one (see Figure 13-6). 



i 



13-11 



PROTECTED-MODE MULTITASKING 



T0P TA L S E K VEL M0 ™ LY 

TASK TASK JASK JASK 



TSS 



NT=0 



TSS 



NT=1 



TSS 



NT=1 



EFLAGS 



NT=1 




Figure 13-6. Nested Tasks 



When an interrupt, exception, jump, or call causes a task switch, the processor copies the 
segment selector for the current task state segment into the TSS for the new task and sets the 
NT flag. The NT flag indicates the Link field of the TSS has been loaded with a saved TSS 
selector. The new task releases control by executing an IRET instruction. When an IRET 
instruction is executed, the NT flag is checked. If it is set, the processor does a task switch to 
the previous task. Table 13-2 summarizes the uses of the fields in a TSS which are affected by 
task switching. 



Table 13-2. Effect of a Task Switch on Busy, NT, and Link Fields 



Field 


Effect of Jump 


Effect of CALL 
Instruction or 
Interrupt 


Effect of IRET 
Instruction 


Busy bit of new task 


Bit is set. Must have 
been clear before. 


Bit is set. Must have 
been clear before. 


No change. Must be set. 


Busy bit of old task 


Bit is cleared. 


No change. Bit is 
currently set. 


Bit is cleared. 


NT flag of new task 


No change. 


Flag is set. 


No change. 


NT flag of old task 


No change. 


No change. 


Flag is cleared. 


Link field of new task. 


No change. 


Loaded with selector 
for old task's TSS. 


No change. 


Link field of old task. 


No change. 


No change. 


No change. 



Note that the NT flag may be modified by software executing at any privilege level. It is 
13-12 ■ 



® 



PROTECTED-MODE MULTITASKING 



possible for a program to set its NT bit and execute an IRET instruction, which would have the 
effect of invoking the task specified in the Link field of the current task's TSS. To keep 
spurious task switches from succeeding, the operating system should initialize the Link field of 
every TSS it creates. 



1 3.6.1 . Busy Bit Prevents Loops 

The Busy bit of the TSS descriptor prevents re-entrant task switching. There is only one saved 
task context, the context saved in the TSS, therefore a task only may be called once before it 
terminates. The chain of suspended tasks may grow to any length, due to multiple interrupts, 
exceptions, jumps, and calls. The Busy bit prevents a task from being called if it is in this 
chain. A re-entrant task switch would overwrite the old TSS for the task, which would break 
the chain. 

The processor manages the Busy bit as follows: 

1 . When switching to a task, the processor sets the Busy bit of the new task. 

2. When switching from a task, the processor clears the Busy bit of the old task if that task is 
not to be placed in the chain (i.e., the instruction causing the task switch is a JMP or IRET 
instruction). If the task is placed in the chain, its Busy bit remains set. 

3. When switching to a task, the processor generates a general-protection exception if the 
Busy bit of the new task already is set. 

In this way, the processor prevents a task from switching to itself or to any task in the chain, 
which prevents re-entrant task switching. 

The Busy bit may be used in multiprocessor configurations, because the processor asserts a bus 
lock when it sets or clears the Busy bit. This keeps two processors from invoking the same task 
at the same time. (See Chapter 19 for more information on multiprocessing.) 



13.6.2. Modifying Task Linkages 

Modification of the chain of suspended tasks may be needed to resume an interrupted task 
before the task which interrupted it. A reliable way to do this is: 

1 . Disable interrupts. 

2. First change the Link field in the TSS of the interrupting task, then clear the Busy bit in 
the TSS descriptor of the task being removed from the chain. 

3 . Re-enable interrupts . 



The LDT selector and PDBR (CR3) field of the TSS can be used to give each task its own 
LDT and page tables. Because segment descriptors in the LDTs are the connections between 
tasks and segments, separate LDTs for each task can be used to set up individual control over 
these connections. Access to any particular segment can be given to any particular task by 



13 



7. TASK ADDRESS SPACE 



I 



13-13 



PROTECTED-MODE MULTITASKING 



placing a segment descriptor for that segment in the LDT for that task. If paging is enabled, 
each task can have its own set of page tables for mapping linear addresses to physical 
addresses. 

It also is possible for tasks to have the same LDT. This is a simple and memory-efficient way 
to allow some tasks to communicate with or control each other, without dropping the 
protection barriers for the entire system. 

Because all tasks have access to the GDT, it also is possible to create shared segments accessed 
through segment descriptors in this table. 



13.7.1 . Task Linear-to-Physical Space Mapping 

The choices for arranging the linear-to-physical mappings of tasks fall into two general classes: 

1. One linear-to-physical mapping shared among all tasks. When paging is not enabled, this 
is the only choice. Without paging, all linear addresses map to the same physical 
addresses. When paging is enabled, this form of linear- to-physical mapping is obtained by 
using one page directory for all tasks. The linear space may exceed the available physical 
space if demand-paged virtual memory is supported. 

2. Independent linear-to-physical mappings for each task. This form of mapping comes from 
using a different page directory for each task. Because the PDBR (page directory base 
register) is loaded from the TSS with each task switch, each task may have a different page 
directory. 

The linear address spaces of different tasks may map to completely distinct physical addresses. 
If the entries of different page directories point to different page tables and the page tables 
point to different pages of physical memory, then the tasks do not share any physical 
addresses. 

The task state segments must lie in a space accessible to all tasks so that the mapping of TSS 
addresses does not change while the processor is reading and updating the TSSs during a task 
switch. The linear space mapped by the GDT also should be mapped to a shared physical 
space; otherwise, the purpose of the GDT is defeated. Figure 13-7 shows how the linear spaces 
of two tasks can overlap in the physical space by sharing page tables. 



13.7.2. Task Logical Address Space 

By itself, an overlapping linear-to-physical space mapping does not allow sharing of data 
among tasks. To share data, tasks must also have a common logical-to-linear space mapping; 
i.e., they also must have access to descriptors which point into a shared linear address space. 
There are three ways to create shared logical-to-physical address-space mappings: 

1. Through the segment descriptors in the GDT. All tasks have access to the descriptors in 
the GDT. If those descriptors point into a linear-address space which is mapped to a 
common physical-address space for all tasks, then the tasks can share data and 
instructions. 

2. Through shared LDTs. Two or more tasks can use the same LDT if the LDT selectors in 



13-14 



■ntel 



PROTECTED-MODE MULTITASKING 



their TSSs select the same LDT for use in address translation. Segment descriptors in the 
LDT addressing linear space mapped to overlapping physical space provide shared 
physical memory. This method of sharing is more selective than sharing by the GDT; the 
sharing can be limited to specific tasks. Other tasks in the system may have different LDTs 
which do not give them access to the shared areas. 

3. Through segment descriptors in the LDTs which map to the same linear address space. If 
the linear address space is mapped to the same physical space by the page mapping of the 
tasks involved, these descriptors permit the tasks to share space. Such descriptors are 
commonly called aliases. This method of sharing is even more selective than those listed 
above; other descriptors in the LDTs may point to independent linear addresses which are 
not shared. 



TSS 



TASKATSS 



PAGE 
DIRECTORIES 



PAGE 
TABLES 



PDBR 











PDE 


-> 


PDE 



PTE 



PTE 



PTE 



SHARED PT 



TASK B TSS 













PDBR 


PDE 






PDE 



PTE 



PTE 



PTE 



PTE 



TSS 



PAGE 
DIRECTORIES 



PAGE 
TABLES 



PAGE FRAMES 



TASK A 
PAGE 



TASK A 
PAGE 



TASK A 
PAGE 



SHARED 
PAGE 



SHARED 
PAGE 



TASK B 
PAGE 



TASK B 
PAGE 



PAGE FRAMES 

APM57 



Figure 13-7. Overlapping Linear-to-Physical Mappings 



13-15 



intel 



14 



Protected-Mode 
Exceptions and 
Interrupts 



i 



intel. 

CHAPTER 14 

PROTECTED-MODE EXCEPTIONS AND 

INTERRUPTS 



Exceptions and interrupts are forced transfers of execution to a task or a procedure. The task or 
procedure is called a handler. Interrupts occur at random times during the execution of a 
program, in response to signals from hardware. Exceptions occur when instructions are 
executed which provoke exceptions. Usually, the servicing of interrupts and exceptions is 
performed in a manner transparent to application programs. Interrupts are used to handle 
events external to the processor, such as requests to service peripheral devices. Exceptions 
handle conditions detected by the processor in the course of executing instructions, such as 
division by zero. 

There are two sources for interrupts and two sources for exceptions: 

1 . Interrupts 

— Maskable interrupts, which are received on the CPU's INTR input pin. Maskable 
interrupts do not occur unless the interrupt-enable flag (IF) is set. 

— Nonmaskable interrupts, which are received on the NMI (Non-Maskable Interrupt) 
input of the processor. The processor does not provide a mechanism to prevent 
nonmaskable interrupts. 

2. Exceptions 

— Processor-detected exceptions. These are further classified as faults, traps, and aborts. 

— Programmed exceptions. The INTO, INT 3, INT n, and BOUND instructions may 
trigger exceptions. These instructions often are called "software interrupts," but the 
processor handles them as exceptions. 

This chapter explains the features of the processor which control and respond to interrupts. 

1 4.1 . EXCEPTION AND INTERRUPT VECTORS 

The processor associates an identifying number with each different type of interrupt or 
exception. This number is called a vector. 

The NMI interrupt and the exceptions are assigned vectors in the range through 31. Not all of 
these vectors are currently used by the processor; unassigned vectors in this range are reserved 
for possible future uses. Do not use unassigned vectors. 

The vectors for maskable interrupts are determined by hardware. External interrupt controllers 
(such as Intel's 8259A Programmable Interrupt Controller) put the vector on the processor's 
bus during its interrupt-acknowledge cycle. Any vectors in the range 32 through 255 can be 
used. Table 14-1 shows the assignment of exception and interrupt vectors. 



i 



14-1 



PROTECTED-MODE EXCEPTIONS AND INTERRUPTS 




Table 14-1. Exception and Interrupt Vectors 



Vector Number 


Description 





Divide Error 


1 


Debug Exception 


2 


NMI Interrupt 


3 


Breakpoint 


4 


INTO-detected Overflow 


5 


BOUND Range Exceeded 


6 


Invalid Opcode 


7 


Device Not Available 


8 


Double Fault 


9 


Coprocessor Segment Overrun (reserved) 


10 


Invalid Task State Segment 


11 


Segment Not Present 


12 


Stack Fault 


13 


General Protection 


14 


Page Fault 


15 


(Intel reserved. Do not use.) 


16 


Floating-Point Error 


17 


Alignment Check 


18 


Machine Check* 


19-31 


(Intel reserved. Do not use.) 


32-255 


Maskable Interrupts 



NOTE: 

*Machine check is a model-specific exception, available on the Pentium™ microprocessor only. It may not 
be continued or may not be continued with a compatible implementation in future processor generations. 



Exceptions are classified as faults, traps, or aborts depending on the way they are reported and 
whether restart of the instruction which caused the exception is supported. 

Faults — A fault is an exception which is reported at the instruction boundary prior to the 
instruction in which the exception was detected. The fault is reported with the machine 
restored to a state which permits the instruction to be restarted. The return address for the fault 
handler points to the instruction which generated the fault, rather than the instruction following 
the faulting instruction. 

Traps — A trap is an exception which is reported at the instruction boundary immediately after 
the instruction in which the exception was detected. 

Aborts — An abort is an exception which does not always report the location of the instruction 
causing the exception and does not allow restart of the program which caused the exception. 



14-2 



i 



PROTECTED-MODE EXCEPTIONS AND INTERRUPTS 



Aborts are used to report severe errors, such as hardware errors and inconsistent or illegal 
values in system tables. 



14.2. INSTRUCTION RESTART 

For most exceptions and interrupts, transfer of execution does not take place until the end of 
the current instruction. This leaves the EIP register pointing at the instruction which comes 
after the instruction which was being executed when the exception or interrupt occurred. If the 
instruction has a repeat prefix, transfer takes place at the end of the current iteration with the 
registers set to execute the next iteration. But if the exception is a fault, the processor registers 
are restored to the state they held before execution of the instruction began. This permits 
instruction restart. 

Instruction restart is used to handle exceptions which block access to operands. For example, 
an application program could make reference to data in a segment which is not present in 
memory. When the exception occurs, the exception handler must load the segment (probably 
from a hard disk) and resume execution beginning with the instruction which caused the 
exception. At the time the exception occurs, the instruction may have altered the contents of 
some of the processor registers. If the instruction read an operand from the stack, it is 
necessary to restore the stack pointer to its previous value. All of these restoring operations are 
performed by the processor in a manner completely transparent to the application program. 

When a fault occurs, the EIP register is restored to point to the instruction which received the 
exception. When the exception handler returns, execution resumes with this instruction. 

14.3. ENABLING AND DISABLING INTERRUPTS 

Certain conditions and flag settings cause the processor to inhibit certain kinds of interrupts 
and exceptions. 

1 4.3.1 . NMI Masks Further NMIs 

While an NMI interrupt handler is executing, the processor disables additional calls to the 
procedure or task which handles the interrupt until the next IRET instruction is executed. This 
prevents stacking up calls to the interrupt handler. It is recommended that interrupt gates be 
used for NMI's in order to disable nested maskable interrupts, since an IRET instruction from 
the maskable-interrupt handler would re-enable NMI. 



14.3.2. IF Masks INTR 

The IF flag can turn off servicing of interrupts received on the INTR pin of the processor. 
When the IF flag is clear, INTR interrupts are ignored; when the IF flag is set, INTR interrupts 
are serviced. As with the other flag bits, the processor clears the IF flag in response to a 
RESET signal. The STI and CLI instructions set and clear the IF flag. 

CLI (Clear Interrupt-Enable Flag) and STI (Set Interrupt-Enable Flag) put the IF flag (bit 
9 in the EFLAGS register) in a known state. These instructions may be executed only if the 

■ 14-3 



PROTECTED-MODE EXCEPTIONS AND INTERRUPTS 



CPL is an equal or more privileged level than the IOPL. A general-protection exception is 
generated if they are executed with a lesser privileged level. 

The IF flag also is affected by the following operations: 

• The PUSHF instruction stores all flags on the stack, where they can be examined and 
modified. The POPF instruction can be used to load the modified form back into the 
EFLAGS register. 

• Task switches and the POPF and IRET instructions load the EFLAGS register; therefore, 
they can be used to modify the setting of the IF flag. 

• Interrupts through interrupt gates automatically clear the IF flag, which disables interrupts. 
(Interrupt gates are explained later in this chapter). 

14.3.3. RF Masks Debug Faults 

The RF flag in the EFLAGS register is used to prevent servicing an instruction breakpoint fault 
multiple times. RF works as follows: 

• Before entry into any fault handler, the processor sets the RF bit in the EFLAGS image 
that it pushes onto the stack of the handler. Normally the RF image on the stack does not 
need to be changed by software. 

• RF itself is set by the fault handler when it executes the IRETD instruction to return to the 
faulting instruction. IRETD transfers the EFLAGS image from the stack into the EFLAGS 
register. (POPF and POPFD do not transfer the RF image into the EFLAGS register.) 

• RF is cleared by the processor at successful termination of every instruction, except after 
the IRET instruction and after JMP, CALL, or INT instructions that cause a task switch. 
Therefore, RF remains set for no more than one instruction — the one executed 
immediately after the IRET. 

• When set, RF causes the processor to suppress reporting of instruction breakpoint faults. 

Because instruction breakpoint faults are the highest priority faults, they are always reported 
before any other faults for the same instruction. RF is zero for the first attempt to execute the 
instruction and one for all attempts to restart the instruction after an instruction breakpoint or 
any other fault. This ensures that an instruction breakpoint fault is reported only once. (See 
Chapter 17 for more information on debugging.) 



14.3.4. MOV or POP to SS Masks Some Exceptions and Interrupts 

Software which needs to change stack segments often uses a pair of instructions; for example: 

MOV SS, AX 

MOV ESP, StackTop 

If an interrupt or exception occurs after the segment selector has been loaded but before the 
ESP register has been loaded, these two parts of the logical address into the stack space are 
inconsistent for the duration of the interrupt or exception handler. 

To prevent this situation, the processor inhibits interrupts, debug exceptions, and single-step 
14-4 ■ 



PROTECTED-MODE EXCEPTIONS AND INTERRUPTS 



trap exceptions after either a MOV to SS instruction or a POP to SS instruction, until the 
instruction boundary following the next instruction is reached. General-protection faults may 
still be generated. If the LSS instruction is used to modify the contents of the SS register, the 
problem does not occur. 



14.4. PRIORITY AMONG SIMULTANEOUS EXCEPTIONS AND 
INTERRUPTS 

If more than one exception or interrupt is pending at an instruction boundary, the processor 
services them in a predictable order. The priority among classes of exception and interrupt 
sources is shown in Table 14-2. While priority among these classes is consistent throught the 
architecture, exceptions within each class are implementation-dependent and may vary from 
processor to processor. The processor first services a pending exception or interrupt from the 
class which has the highest priority, transferring execution to the first instruction of the 
handler. Lower priority exceptions are discarded; lower priority interrupts are held pending. 
Discarded exceptions are re-issued when the interrupt handler returns execution to the point of 
interruption. 



Table 14-2. Priority Among Simultaneous Exceptions and Interrupts 



Priority 


Class 


Descriptions 


Highest 


Class 1 


Traps on the Previous Instruction 

- Breakpoints 

- Debug Trap Exceptions (TF flag set, T bit in TSS set, or data/IO 
breakpoint) 




Class 2 


External Interrupts 

- NMI Interrupts 

- Maskable Interrupts 




Class 3 


Faults from fetching next instruction 

- Code Breakpoint Fault 

- Code Segment Limit Violation 

- Page Fault on Prefetch 




Class 4 


Faults from Decoding the next instruction 

- Illegal Opcode 

- Instruction length > 15 bytes 

- Coprocessor Not Available 


Lowest 


Class 5 


Faults on Executing an Instruction 

- General Detection 

- FP error (from previous FP instruction) 

- Interrupt on Overflow 

- Bound 

- Invalid TSS 

- Segment Not Present 

- Stack Exception 

- General Protection 

- Data Page Fault 

- Alignment Check 



14.5. INTERRUPT DESCRIPTOR TABLE 

The interrupt descriptor table (IDT) associates each exception or interrupt vector with a 
■ 14-5 



PROTECTED-MODE EXCEPTIONS AND INTERRUPTS 



irrtel 



descriptor for the procedure or task which services the associated event. Like the GDT and 
LDTs, the IDT is an array of 8-byte descriptors. Unlike the GDT, the first entry of the IDT 
may contain a descriptor. To form an index into the IDT, the processor scales the exception or 
interrupt vector by eight, the number of bytes in a descriptor. Because there are only 256 
vectors, the IDT need not contain more than 256 descriptors. It can contain fewer than 256 
descriptors; descriptors are required only for the interrupt vectors which may occur. 

The IDT may reside anywhere in physical memory. As Figure 14-1 shows, the processor 
locates the IDT using the IDTR register. This register holds both a 32-bit base address and 16- 
bit limit for the IDT. The LIDT and SIDT instructions load and store the contents of the IDTR 
register. Both instructions have one operand, which is the address of six bytes in memory. 



IDTR REGISTER 



47 



16 15 



IDT BASE ADDRESS 



IDT LIMIT 



V 



INTERRUPT 
DESCRIPTOR TABLE 



GATE FOR 
INTERRUPT #N 



GATE FOR 
INTERRUPT #3 



GATE FOR 
INTERRUPT #2 



GATE FOR 
INTERRUPT #1 



Figure 14-1. IDTR Locates IDT in Memory 



If a vector references a descriptor beyond the limit, the processor enters shutdown mode. In 
this mode, the processor stops executing instructions until an NMI interrupt is received or reset 
initialization is invoked. The processor generates a special bus cycle to indicate it has entered 
shutdown mode. Software designers may need to be aware of the response of hardware to 
receiving this signal. For example, hardware may turn on an indicator light on the front panel, 
generate an NMI interrupt to record diagnostic information, or invoke reset initialization. 

LIDT (Load IDT register) loads the IDTR register with the base address and limit held in the 
memory operand. This instruction can be executed only when the CPL is 0. It normally is used 
by the initialization code of an operating system when creating an IDT. An operating system 
also may use it to change from one IDT to another. 



14-6 



PROTECTED-MODE EXCEPTIONS AND INTERRUPTS 

SIDT (Store IDT register) copies the base and limit value stored in IDTR to memory. This 
instruction can be executed at any privilege level. 

14.6. IDT DESCRIPTORS 

The IDT may contain any of three kinds of descriptors: 

• Task gates 

• Interrupt gates 

• Trap gates 

Figure 14-2 shows the format of task gates, interrupt gates, and trap gates. (The task gate in an 
IDT is the same as the task gate in the GDT or an LDT already discussed in Chapter 13.) 




I 



14-7 



PROTECTED-MODE EXCEPTIONS AND INTERRUPTS 



TASK GATE 



RESERVED 


P 


DPL 


10 1 


RESERVED 


TSS SEGMENT SELECTOR 


RESERVED 



+4 



+0 



\s\ sis is n *g> ^ ia> n i\ ift \% \i \i \n \& m w \ft ^ ^ 1 ft s *>< 1 ^ \ 
INTERRUPT GATE 



/31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16/15/14 13/12 11 10 9 8/7 6 5/4 3 2 1 0, 


OFFSET 31 ..16 


P 


DPL 


1110 





RESERVED 


SEGMENT SELECTOR 


OFFSET 15..0 



+4 



+0 



TRAP GATE 



OFFSET 31 ..16 


P 


DPL 


1111 





RESERVED 


SEGMENT SELECTOR 


OFFSET 15..0 


\&\ *3>ft «2a 11 1ft 1ft 1*, 1M2.1\ 1ft ^ \ft W \ftVft \N V& \1 \\ \ft ^ * 1 ft ft H % 1 \ ft^ 



+0 



DPL DESCRIPTOR PRIVILEGE LEVEL 

OFFSET OFFSET TO PROCEDURE ENTRY POINT 
P SEGMENT PRESENT BIT 

RESERVED DO NOT USE 

SELECTOR SEGMENT SELECTOR FOR DESTINATION 
CODE SEGMENT 



Figure 14-2. IDT Gate Descriptors 



14.7. INTERRUPT TASKS AND INTERRUPT PROCEDURES 

Just as a CALL instruction can call either a procedure or a task, so an exception or interrupt 
can "call" an interrupt handler as either a procedure or a task. When responding to an 
exception or interrupt, the processor uses the exception or interrupt vector to index to a 
descriptor in the IDT. If the processor indexes to an interrupt gate or trap gate, it calls the 
handler in a manner similar to a CALL to a call gate. If the processor finds a task gate, it 
causes a task switch in a manner similar to a CALL to a task gate. 

14-8 ■ 



PROTECTED-MODE EXCEPTIONS AND INTERRUPTS 



1 4.7.1 . Interrupt Procedures 

An interrupt gate or trap gate indirectly references a procedure which runs in the context of the 
currently executing task, as shown in Figure 14-3. The selector of the gate points to an 
executable-segment descriptor in either the GDT or the current LDT. The offset field of the 
gate descriptor points to the beginning of the exception or interrupt handling procedure. 



IDT 



DESTINATION 
CODE SEGMENT 



INTERRUPT 
VECTOR 



INTERRUPT OR 
TRAP pATE 



OFFSET 



SEGMENT SELECTOR 



>(+>-> 



GDT OR LDT 



SEGMENT 
DESCRIPTOR 



INTERRUPT 
PROCEDURE 



BASE ADDRESS 



Figure 14-3. Interrupt Procedure Call 



The processor calls an exception or interrupt handling procedure in much the same manner as a 
procedure call; the differences are explained in the following sections. 



14-7.1.1. STACK OF INTERRUPT PROCEDURE 

Just as with a transfer of execution using a CALL instruction, a transfer to an exception or 
interrupt handling procedure uses the stack to store the processor state. As Figure 14-4 shows, 
an interrupt pushes the contents of the EFLAGS register onto the stack before pushing the 
address of the interrupted instruction. 



14-9 



PROTECTED-MODE EXCEPTIONS AND INTERRUPTS 



NO PRIVILEGE LEVEL 
CHANGE, NO ERROR CODE 



OLD EFLAGS 



OLD CS 



OLD EIP 



NO PRIVILEGE LEVEL 
CHANGE, WITH ERROR CODE 



OLD ESP 



NEW ESP 



OLD EFLAGS 



OLD CS 



OLD EIP 



ERROR CODE 



OLD ESP 



NEW ESP 



PRIVILEGE LEVEL 
CHANGE, NO ERROR CODE 



PRIVILEGE LEVEL 
CHANGE, WITH ERROR CODE 



UNUSED 



OLD SS 



OLD ESP 



OLD EFLAGS 



OLD CS 



OLD EIP 



ESP FROM 
TSS 



NEW ESP 



UNUSED 



OLD SS 



OLD ESP 



OLD EFLAGS 



OLD CS 



OLD EIP 



ERROR CODE 



ESP FROM 
TSS 



NEW ESP 



Figure 14-4. Stack Frame after Exception or Interrupt 



Certain types of exceptions also push an error code on the stack. An exception handler can use 
the error code to help diagnose the exception. 



14.7.1.2. RETURNING FROM AN INTERRUPT PROCEDURE 

An interrupt procedure differs from a normal procedure in the method of leaving the 
procedure. The IRET instruction is used to exit from an interrupt procedure. The IRET 
instruction is similar to the RET instruction except that it increments the contents of the ESP 
register by an extra four bytes and restores the saved flags into the EFLAGS register. The 
IOPL field of the EFLAGS register is restored only if the CPL is 0. The IF flag is changed only 
ifCPL<IOPL. 



14.7.1.3. FLAG USAGE BY INTERRUPT PROCEDURE 

Interrupts using either interrupt gates or trap gates cause the TF flag to be cleared after its 
current value is saved on the stack as part of the saved contents of the EFLAGS register. In so 
doing, the processor prevents instruction tracing from affecting interrupt response. A 



14-10 



inlel 



® 



PROTECTED-MODE EXCEPTIONS AND INTERRUPTS 



subsequent IRET instruction restores the TF flag to the value in the saved contents of the 
EFLAGS register on the stack. 

The difference between an interrupt gate and a trap gate is its effect on the IF flag. An interrupt 
which uses an interrupt gate clears the IF flag, which prevents other interrupts from interfering 
with the current interrupt handler. A subsequent IRET instruction restores the IF flag to the 
value in the saved contents of the EFLAGS register on the stack. An interrupt through a trap 
gate does not change the IF flag. 



The privilege rule which governs interrupt procedures is similar to that for procedure calls: the 
processor does not permit an interrupt to transfer execution to a procedure in a less privileged 
segment (numerically greater privilege level). An attempt to violate this rule results in a 
general-protection exception. 

Because interrupts generally do not occur at predictable times, this privilege rule effectively 
imposes restrictions on the privilege levels at which exception and interrupt handling 
procedures can run. Either of the following techniques can be used to keep the privilege rule 
from being violated. 

• The exception or interrupt handler can be placed in a conforming code segment. This 
technique can be used by handlers for certain exceptions (divide error, for example). 
These handlers must use only the data available on the stack. If the handler needs data 
from a data segment, the data segment would have to have privilege level 3, which would 
make it unprotected. 

• The handler can be placed in a code segment with privilege level 0. This handler would 
always run, no matter what CPL the program has. 

1 4.7.2. Interrupt Tasks 

A task gate in the IDT indirectly references a task, as Figure 14-5 illustrates. The segment 
selector in the task gate addresses a TSS descriptor in the GDT. 



14.7.1 A 



PROTECTION IN INTERRUPT PROCEDURES 



i 



14-11 



PROTECTED-MODE EXCEPTIONS AND INTERRUPTS 



IDT 



TSS 



INTERRUPT 
VECTOR 



> 


i 






i 




i 

— TASK GATE — 






1 
1 






1 

I , , ,. 




1 

1 






TSS SELECTOR 






GDT 






1 






1 




— T" " - 
— TSS DESCRIPTOR — 




1 






1 




1 





TSS BASE ADDRESS 



Figure 14-5. Interrupt Task Switch 



When an exception or interrupt calls a task gate in the IDT, a task switch results. Handling an 
interrupt with a separate task offers two advantages: 

• The entire context is saved automatically. 

• The interrupt handler can be isolated from other tasks by giving it a separate address 
space. This is done by giving it a separate LDT. 

A task switch caused by an interrupt operates in the same manner as the other task switches 
described in Chapter 13. The interrupt task returns to the interrupted task by executing an 
IRET instruction. 

Some exceptions return an error code. If the task switch is caused by one of these, the 
processor pushes the code onto the stack corresponding to the privilege level of the interrupt 
handler. 

When interrupt tasks are used in an operating system, there are actually two mechanisms which 
can dispatch tasks: the software scheduler (part of the operating system) and the hardware 
scheduler (part of the processor's interrupt mechanism). The software scheduler needs to 
accommodate interrupt tasks which may be dispatched when interrupts are enabled. 



14-12 



i 




PROTECTED-MODE EXCEPTIONS AND INTERRUPTS 



14.8. ERROR CODE 

With exceptions related to a specific segment, the processor pushes an error code onto the 
stack of the exception handler (whether it is a procedure or task). The error code has the format 
shown in Figure 14-6. The error code resembles a segment selector; however instead of an 
RPL field, the error code contains two one-bit fields: 

1 . The processor sets the EXT bit if an event external to the program caused the exception. 

2. The processor sets the IDT bit if the index portion of the error code refers to a gate 
descriptor in the IDT. 



1 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16/15 14 13 12 11 10 9 8 7 6 5 4 3 


(* 


h 


/of 


RESERVED 


SELECTOR INDEX 


T 
1 


i 

D 


E 
X 










T 


T 




\ 




\ 


\ 


\ 


\ 



APM121 



Figure 14-6. Error Code 

If the IDT bit is not set, the TI bit indicates whether the error code refers to the GDT (TI bit 
clear) or to the LDT (TI bit set). The remaining 13 bits are the upper bits of the selector for the 
segment. In some cases the error code is null (i.e., all bits in the lower word are clear). 

The error code is pushed on the stack as a doubleword. This is done to keep the stack aligned 
on addresses which are multiples of four. The upper half of the doubleword is reserved. 



14.9. EXCEPTION CONDITIONS 

The following sections describe conditions which generate exceptions. Each description 
classifies the exception as a fault, trap, or abort. This classification provides information 
needed by system programmers for restarting the procedure in which the exception occurred: 

• Faults — The saved contents of the CS and EIP registers point to the instruction which 
generated the fault. 

• Traps — The saved contents of the CS and EIP registers stored when the trap occurs point 
to the instruction to be executed after the instruction which generated the trap. If a trap is 
detected during an instruction which transfers execution, the saved contents of the CS and 
EIP registers reflect the transfer. For example, if a trap is detected in a JMP instruction, the 
saved contents of the CS and EIP registers point to the destination of the JMP instruction, 
not to the instruction at the next address above the JMP instruction. 



14-13 



PROTECTED-MODE EXCEPTIONS AND INTERRUPTS 



• Aborts — An abort is an exception which permits neither precise location of the instruction 
causing the exception nor restart of the program which caused the exception. Aborts are 
used to report severe errors, such as hardware errors and inconsistent or illegal values in 
system tables. 



1 4.9.1 . Interrupt 0— Divide Error 

The divide-error fault occurs during a DIV or an IDIV instruction when the divisor is zero. 

14.9.2. Interrupt 1 — Debug Exceptions 

The processor generates a debug exception for a number of conditions; whether the exception 
is a fault or a trap depends on the condition, as shown below: 

Instruction address breakpoint fault 
Data address breakpoint trap 
General detect fault 
Single-step trap 
Task-switch breakpoint trap 

The processor does not push an error code for this exception. An exception handler can 
examine the debug registers to determine which condition caused the exception. See 
Chapter 17 for more detailed information about debugging and the debug registers. 



14.9.3. Interrupt 3 — Breakpoint 

The INT 3 instruction generates a breakpoint trap. The INT 3 instruction is one byte long, 
which makes it easy to replace an opcode in a code segment in RAM with the breakpoint 
opcode. The operating system or a debugging tool can use a data segment mapped to the same 
physical address space as the code segment to place an INT 3 instruction in places where it is 
desired to call the debugger. Debuggers use breakpoints as a way to suspend program 
execution in order to examine registers, variables, etc. 

The saved contents of the CS and EIP registers point to the byte following the breakpoint. If a 
debugger allows the suspended program to resume execution, it replaces the INT 3 instruction 
with the original opcode at the location of the breakpoint, and it decrements the saved contents 
of the EIP register before returning. See Chapter 17 for more information on debugging. 



1 4.9.4. Interrupt 4— Overflow 

The overflow trap occurs when the processor executes an INTO instruction with the OF flag 
set. Because signed and unsigned arithmetic both use some of the same instructions, the 
processor cannot determine when overflow actually occurs. Instead, it sets the OF flag when 
the results, if interpreted as signed numbers, would be out of range. When doing arithmetic on 
signed operands, the OF flag can be tested directly or the INTO instruction can be used. 



14-14 



PROTECTED-MODE EXCEPTIONS AND INTERRUPTS 



1 4.9.5. Interrupt 5— Bounds Check 

The bounds-check fault is generated when the processor, while executing a BOUND 
instruction, finds that the operand exceeds the specified limits. A program can use the BOUND 
instruction to check a signed array index against signed limits defined in a block of memory. 



14.9.6. Interrupt 6— Invalid Opcode 

The invalid-opcode fault is generated when an invalid opcode is detected by the execution unit. 
(The exception is not detected until an attempt is made to execute the invalid opcode; i.e., 
prefetching an invalid opcode does not cause this exception.) No error code is pushed on the 
stack. The exception can be handled within the same task. 

This exception also occurs when the type of operand is invalid for the given opcode. Examples 
include an intersegment JMP instruction using a register operand, or an LES instruction with a 
register source operand. 

A third condition which generates this exception is the use of the LOCK prefix with an 
instruction which may not be locked. Only certain instructions may be used with bus locking, 
and only forms of these instructions which write to a destination in memory may be used. All 
other uses of the LOCK prefix generate an invalid-opcode exception. 

Following is a list of undefined opcodes that are reserved by Intel. These opcodes, even though 
undefined, do not generate interrupt 6. 

• D6 

• Fl 

14.9.7. Interrupt 7 — Device Not Available 

The device-not- available fault is generated by either of two conditions: 

• The processor executes an ESC instruction, and the EM bit of the CRO register is set. 

• The processor executes a WAIT instruction (with MP=1) or ESC instruction, and the TS 
bit of the CRO register is set. 

Interrupt 7 thus occurs when the programmer wants ESC instructions to be handled by 
software (EM set), or when a WAIT or ESC instruction is encountered and the context of the 
floating-point unit is different from that of the current task. 

On the Intel 286 and Intel386 processors, the MP bit in the CRO register is used with the TS bit 
to determine if WAIT instructions should generate exceptions. For programs running on the 
Pentium, Intel486 DX, and Intel487 SX processors, the MP bit should always be set. For 
programs running on the Intel486 SX, MP should be clear. 



14-15 



PROTECTED-MODE EXCEPTIONS AND INTERRUPTS 



14.9.8. Interrupt 8— Double Fault 

Normally, when the processor detects an exception while trying to call the handler for a prior 
exception, the two exceptions can be handled serially. If, however, the processor cannot handle 
them serially, it signals the double-fault exception instead. To determine when two faults are to 
be signalled as a double fault, the processor divides the exceptions into three classes: benign 
exceptions, contributory exceptions, and page faults. Table 14-3 shows this classification. 
Then, comparing the classes of the first and second exception, the processor signals a double- 
fault in the cases indicated by Table 14-4. 



Table 14-3. Interrupt and Exception Classes 



Class 


Vector Number 


Description 




1 


Debug Exceptions 




2 


NMI Interrupt 


Benign 


3 


Breakpoint 


Exceptions 


4 


Overflow 


and Interrupts 


5 


Bounds Check 




6 


Invalid Opcode 




7 


Device Not Available 




16 


Floating-Point Error 







Divide Error 


Contributory 


10 


Invalid TSS 


Exceptions 


11 


Segment Not Present 




12 


Stack Fault 




13 


General Protection 


Page Faults 


14 


Page Fault 



Table 14-4. Double Fault Conditions 



First Exception 


Second Exception 


Benign 


Contributory 


Page Fault 


Benign 


OK 


OK 


OK 


Contributory 


OK 


Double Fault 


OK 


Page Fault 


OK 


Double Fault 


Double Fault 



An initial segment or page fault encountered while prefetching instructions is outside the 
domain of Table 14-4. Any further faults generated while the processor is attempting to 
transfer control to the appropriate fault handler could still lead to a double-fault sequence. 

The processor always pushes an error code onto the stack of the double-fault handler; however, 
the error code is always 0. The faulting instruction may not be restarted. If any other exception 
occurs while attempting to call the double-fault handler, the processor enters shutdown mode. 
This mode is similar to the state following execution of a HLT instruction. No instructions are 
executed until an NMI interrupt or a RESET signal is received. If the shutdown occurs while 
the processor is executing an NMI interrupt handler, then only a RESET can restart the 



14-16 



PROTECTED-MODE EXCEPTIONS AND INTERRUPTS 



processor. The processor generates a special bus cycle to indicate it has entered shutdown 
mode. 



14.9.9. Interrupt 9— (Intel reserved. Do not use.) 

Interrupt 9, the coprocessor-segment overrun abort, is generated in Intel386 CPU-based 
systems with an Intel387 math coprocessor when the Intel386 CPU detects a page or segment 
violation while transferring the middle portion of an Intel387 math coprocessor operand. This 
interrupt is generated neither by the Pentium processor nor by the Intel486 processor; interrupt 
13 occurs instead. 



14.9.10. Interrupt 10— Invalid TSS 

An invalid-TSS fault is generated if a task switch to a segment with an invalid TSS is 
attempted. A TSS is invalid in the cases shown in Table 14-5. An error code is pushed onto the 
stack of the exception handler to help identify the cause of the fault. The EXT bit indicates 
whether the exception was caused by a condition outside the control of the program (e.g., if an 
external interrupt using a task gate attempted a task switch to an invalid TSS). 



Table 14-5. Invalid TSS Conditions 



Error Code Index 


Description 


TSS segment 


TSS segment limit less than 67H 


LDT segment 


Invalid LDT or LDT not present 


Stack segment 


Stack segment selector exceeds descriptor table limit 


Stack segment 


Stack segment is not writable 


Stack segment 


Stack segment DPL not compatible with CPL 


Stack segment 


Stack segment selector RPL not compatible with CPL 


Code segment 


Code segment selector exceeds descriptor table limit 


Code segment 


Code segment is not executable 


Code segment 


Non-conforming code segment DPL not equal to CPL 


Code segment 


Conforming code segment DPL greater than CPL 


Data segment 


Data segment selector exceeds descriptor table limit 


Data segment 


Data segment not readable 



This fault can occur either in the context of the original task or in the context of the new task. 
Until the processor has completely verified the presence of the new TSS, the exception occurs 
in the context of the original task. Once the existence of the new TSS is verified, the task 
switch is considered complete; i.e., the TR register is loaded with a selector for the new TSS 
and, if the switch is due to a CALL or interrupt, the Link field of the new TSS references the 
old TSS. Any errors discovered by the processor after this point are handled in the context of 
the new task. 



14-17 



PROTECTED-MODE EXCEPTIONS AND INTERRUPTS 



To ensure a TSS is available to process the exception, the handler for an invalid-TSS exception 
must be a task called using a task gate. 

14.9.11. Interrupt 11 — Segment Not Present 

The segment-not-present fault is generated when the processor detects that the present bit of a 
descriptor is clear. The processor can generate this fault in any of these cases: 

• While attempting to load the CS, DS, ES, FS, or GS registers; loading the SS register, 
however, causes a stack fault. 

• While attempting to load the LDT register using an LLDT instruction; loading the LDT 
register during a task switch operation, however, causes an invalid-TSS exception. 

• While attempting to use a gate descriptor which is marked segment-not-present. 

This fault is restartable. If the exception handler loads the segment and returns, the interrupted 
program resumes execution. 

If a segment-not-present exception occurs during a task switch, not all the steps of the task 
switch are complete. During a task switch, the. processor first loads all the segment registers, 
then checks their contents for validity. If a segment-not-present exception is discovered, the 
remaining segment registers have not been checked and therefore may not be usable for 
referencing memory. The segment-not-present handler should not rely on being able to use the 
segment selectors found in the CS, SS, DS, ES, FS, and GS registers without causing another 
exception. The exception handler should check all segment registers before trying to resume 
the new task; otherwise, general protection faults may result later under conditions which make 
diagnosis more difficult. There are three ways to handle this case: 

1 . Handle the segment-not-present fault with a task. The task switch back to the interrupted 
task causes the processor to check the registers as it loads them from the TSS. 

2. Use the PUSH and POP instructions on all segment registers. Each POP instruction causes 
the processor to check the new contents of the segment register. 

3. Check the saved contents of each segment register in the TSS, simulating the test which 
the processor makes when it loads a segment register. 

This exception pushes an error code onto the stack. The EXT bit of the error code is set if an 
event external to the program caused an interrupt which subsequently referenced a not-present 
segment. The IDT bit is set if the error code refers to an IDT entry (e.g., an INT instruction 
referencing a not-present gate). 

An operating system typically uses the segment-not-present exception to implement virtual 
memory at the segment level. A not-present indication in a gate descriptor, however, usually 
does not indicate that a segment is not present (because gates do not necessarily correspond to 
segments). Not-present gates may be used by an operating system to trigger exceptions of 
special significance to the operating system. 



14-18 



i 



PROTECTED-MODE EXCEPTIONS AND INTERRUPTS 



14.9.12. Interrupt 12 — Stack Exception 

A stack fault is generated under two conditions: 

• As a result of a limit violation in any operation which refers to the SS register. This 
includes stack-oriented instructions such as POP, PUSH, ENTER, and LEAVE, as well as 
other memory references which implicitly or explicitly use the SS register (for example, 
MOV AX, [BP+6] or MOV AX, SS:[EAX+6]). The ENTER instruction generates this 
exception when there is too little space for allocating local variables. 

• When attempting to load the SS register with a descriptor which is marked segment-not- 
present but is otherwise valid. This can occur in a task switch, a CALL instruction to a 
different privilege level, a return to a different privilege level, an LSS instruction, or a 
MOV or POP instruction to the SS register. 

When the processor detects a stack exception, it pushes an error code onto the stack of the 
exception handler. If the exception is due to a not-present stack segment or to overflow of the 
new stack during an interlevel CALL, the error code contains a selector to the segment which 
caused the exception (the exception handler can test the present bit in the descriptor to 
determine which exception occurred); otherwise, the error code is 0. 

An instruction generating this fault is restartable in all cases. The return address pushed onto 
the exception handler's stack points to the instruction which needs to be restarted. This 
instruction usually is the one which caused the exception; however, in the case of a stack 
exception from loading a not-present stack-segment descriptor during a task switch, the 
indicated instruction is the first instruction of the new task. 

When a stack exception occurs during a task switch, the segment registers may not be usable 
for addressing memory. During a task switch, the selector values are loaded before the 
descriptors are checked. If a stack exception is generated, the remaining segment registers have 
not been checked and may cause exceptions if they are used. The stack fault handler should not 
expect to use the segment selectors found in the CS, SS, DS, ES, FS, and GS registers without 
causing another exception. The exception handler should check all segment registers before 
trying to resume the new task; otherwise, general protection faults may result later under 
conditions where diagnosis is more difficult. 

14.9.13. Interrupt 13 — General Protection 

All protection violations which do not cause another exception cause a general-protection 
exception. This includes (but is not limited to): 

• Exceeding the segment limit when using the CS, DS, ES, FS, or GS segments. 

• Exceeding the segment limit when referencing a descriptor table. 

• Transferring execution to a segment which is not executable. 

• Writing to a read-only data segment or a code segment. 

• Reading from an execute-only code segment. 

• Loading the SS register with a selector for a read-only segment (unless the selector comes 
■ 14-19 



PROTECTED-MODE EXCEPTIONS AND INTERRUPTS 



from a TSS during a task switch, in which case an invalid-TSS exception occurs). 

• Loading the SS, DS, ES, FS, or GS register with a selector for a system segment. 

• Loading the DS, ES, FS, or GS register with a selector for an execute-only code segment. 

• Loading the SS register with the selector of an executable segment. 

• Accessing memory using the DS, ES, FS, or GS register when it contains a null selector. 

• Switching to a busy task. 

• Violating privilege rules. 

• Exceeding the instruction length limit of 15 bytes (this only can occur when redundant 
prefixes are placed before an instruction). 

• Loading the CRO register with a set PG bit (paging enabled) and a clear PE bit (protection 
disabled). 

• Interrupt or exception through an interrupt or trap gate from virtual- 808 6 mode to a 
handler at a privilege level other than 0. 

• Attempting to write a one into a reserved bit of CR4. 

The general-protection exception is a fault. In response to a general-protection exception, the 
processor pushes an error code onto the exception handler's stack. If loading a descriptor 
causes the exception, the error code contains a selector to the descriptor; otherwise, the error 
code is null. The source of the selector in an error code may be any of the following: 

• An operand of the instruction. 

• A selector from a gate which is the operand of the instruction. 

• A selector from a TSS involved in a task switch. 

14.9.14- Interrupt 14— Page Fault 

A page fault occurs when paging is enabled (the PG bit in the CRO register is set) and the 
processor detects one of the following conditions while translating a linear address to a 
physical address: 

• The page-directory or page-table entry needed for the address translation has a clear 
Present bit, which indicates that a page table or the page containing the operand is not 
present in physical memory. 

• The procedure does not have sufficient privilege to access the indicated page. 

If a page fault is caused by a page level protection violation, the access bits in the page- 
directory are set when the faults occur. The access bit in the page table is only set if there are 
no page level protection violations. 

The processor provides the page fault handler two items of information which aid in 
diagnosing the exception and recovering from it: 

• An error code on the stack. The error code for a page fault has a format different from that 
for other exceptions (see Figure 14-7). The error code tells the exception handler three 
things: 



14-20 



PROTECTED-MODE EXCEPTIONS AND INTERRUPTS 



a. Whether the exception was due to a not-present page, to an access rights violation, or 
to use of a reserved bit. 

b. Whether the processor was executing at user or supervisor level at the time of the 
exception. 

c. Whether the memory access which caused the exception was a read or write. 

• The contents of the CR2 register. The processor loads the CR2 register with the 32-bit 
linear address which generated the exception. The exception handler can use this address 
to locate the corresponding page directory and page table entries. If another page fault 
occurs during execution of the page fault handler, the handler will push the contents of the 
CR2 register onto the stack. 



fat m mm #7 m & m m & $1 w w m tr w m u w 12 11 m 9 s 7 & 5 4/3 /? // fo l 





R 


U 


w 




RESERVED 


S 


/ 


/ 


P 




V 


S 


R 





P THE FAULT WAS CAUSED BY A NOT-PRESENT PAGE. 

1 THE FAULT WAS CAUSED BY A PAGE-LEVEL PROTECTION VIOLATION. 

W/R THE ACCESS CAUSING THE FAULT WAS A READ. 
1 THE ACCESS CAUSING THE FAULT WAS A WRITE. 

U/S THE ACCESS CAUSING THE FAULT ORIGINATED WHEN THE 
PROCESSOR WAS EXECUTING IN SUPERVISOR MODE. 
1 THE ACCESS CAUSING THE FAULT ORIGINATED WHEN THE 
PROCESSOR WAS EXECUTING IN USER MODE. 

RSV 1 THE PAGE FAULT OCCURRED BECAUSE A 1 WAS DETECTED 
IN ONE OF THE RESERVED BIT POSITIONS OF A PAGE TABLE 
ENTRY OR PAGE DIRECTORY ENTRY THAT WAS MARKED PRESENT. 
OTHERWISE. 

APM126 



Figure 14-7. Page Fault Error Code 

14.9.14.1. PAGE FAULT DURING TASK SWITCH 

These operations during a task switch cause access to memory: 

1. Write the state of the original task in the TSS of that task. 

2. Read the GDT to locate the TSS descriptor of the new task. 

3. Read the TSS of the new task to check the types of segment descriptors from the TSS. 

4. May read the LDT of the new task in order to verify the segment registers stored in the 
new TSS. 

A page fault can result from accessing any of these operations. In the last two cases the 
exception occurs in the context of the new task. The instruction pointer refers to the next 
instruction of the new task, not to the instruction which caused the task switch (or the last 
instruction to be executed, in the case of an interrupt). If the design of the operating system 



I 



14-21 



PROTECTED-MODE EXCEPTIONS AND INTERRUPTS 



permits page faults to occur during task-switches, the page-fault handler should be called 
through a task gate. 

14.9.14.2. PAGE FAULT WITH INCONSISTENT STACK POINTER 

Special care should be taken to ensure that a page fault does not cause the processor to use an 
invalid stack pointer (SS:ESP). Software written for Intel 16-bit processors often uses a pair of 
instructions to change to a new stack; for example: 

MOV SS, AX 

MOV SP, StackTop 

With the 32-bit processors, because the second instruction accesses memory, it is possible to 
get a page fault after the selector in the SS segment register has been changed but before the 
contents of the SP register have received the corresponding change. At this point, the two parts 
of the stack pointer SS:SP (or, for 32-bit programs, SS:ESP) are inconsistent. The new stack 
segment is being used with the old stack pointer. 

The processor does not use the inconsistent stack pointer if the handling of the page fault 
causes a stack switch to a well defined stack (i.e., the handler is a task or a more privileged 
procedure). However, if the page fault occurs at the same privilege level and in the same task 
as the page fault handler, the processor will attempt to use the stack indicated by the 
inconsistent stack pointer. 

In systems which use paging and handle page faults within the faulting task (with trap or 
interrupt gates), software executing at the same privilege level as the page fault handler should 
initialize a new stack by using the LSS instruction rather than an instruction pair shown above. 
When the page fault handler is running at privilege level (the normal case), the problem is 
limited to programs which run at privilege level 0, typically the kernel of the operating system. 



14-9-15. Interrupt 16 — Floating-Point Error 

A floating-point-error fault signals an error generated by a floating-point arithmetic 
instruction. Interrupt 16 can occur only if the NE bit in the CRO register is set. Numeric 
processing exceptions have already been introduced previously in Chapter 7. 

If NE = 1, an unmasked floating-point exception results in interrupt 16, immediately before the 
execution of the next non-control floating-point or WAIT instruction. Interrupt 16 is an 
operating-system call that invokes the exception handler. Chapter 14 contains a general 
discussion of exceptions and interrupts. 

If NE = (and the IGNNE# input is inactive), an unmasked floating-point exception causes the 
processor to freeze immediately before executing the next non-control floating-point or WAIT 
instruction. The frozen processor waits for an external interrupt, which must be supplied by 
external hardware in response to the FERR# output of the Intel486 or Pentium processor (the 
FERR# is similar to the ERROR# pin of the Intel387 math coprocessor). Regardless of the 
value of NE, an unmasked numerical exception causes the FERR# output of the Intel486 and 
Pentium processors to be activated. In this case, the external interrupt invokes the exception- 
handling routine. If NE = but the IGNNE# input is active, the processor disregards the 
exception and continues. Error reporting via external interrupt is supported for DOS 
compatibility. Chapter 23 contains further discussion of compatibility issues. 



14-22 



i 



PROTECTED-MODE EXCEPTIONS AND INTERRUPTS 



When handling numeric errors, the processor has two responsibilities: 

• It must not disturb the numeric context when an error is detected. 

• It must clear the error and attempt recovery from the error. 

Although the manner in which programmers may treat these responsibilities varies from one 
implementation to the next, most exception handlers will include these basic steps: 

• Store the FPU environment (control, status, and tag words, operand and instruction 
pointers) as it existed at the time of the exception. 

• Clear the exception bits in the status word. 

• Enable interrupts if disabled due to an INTR, NMI, or SMI exception. 

• Identify the exception by examining the status and control words in the saved 
environment. 

• Take some system-dependent action to rectify the exception. 

• Return to the interrupted program and resume normal execution. 

1 4.9.1 5.1 . NUMERICS EXCEPTION HANDLING 

Recovery routines for numeric exceptions can take a variety of forms. They can change the 
arithmetic and programming rules of the FPU. These changes may redefine the default fix-up 
for an error, change the appearance of the FPU to the programmer, or change how arithmetic is 
defined on the FPU. 

A change to an exception response might be to perform denormal arithmetic on denormals 
loaded from memory. A change in appearance might be extending the register stack into 
memory to provide an "infinite" number of numeric registers. The arithmetic of the FPU can 
be changed to automatically extend the precision and range of variables when exceeded. All 
these functions can be implemented on the processor via numeric exceptions and associated 
recovery routines in a manner transparent to the application programmer. 

Some other possible application-dependent actions might include: 

• Incrementing an exception counter for later display or printing 

• Printing or displaying diagnostic information (e.g., the FPU environment and registers) 

• Aborting further execution 

• Storing a diagnostic value (a NaN) in the result and continuing with the computation 

Notice that an exception may or may not constitute an error, depending on the application. 
Once the exception handler corrects the condition causing the exception, the floating-point 
instruction that caused the exception can be restarted, if appropriate. This cannot be 
accomplished using the IRET instruction, however, because the trap occurs at the ESC or 
WAIT instruction following the offending ESC instruction. The exception handler must obtain 
(using FSAVE or FSTENV) the address of the offending instruction in the task that initiated it, 
make a copy of it, execute the copy in the context of the offending task, and then return via 
IRET to the current instruction stream. 

In order to correct the condition causing the numeric exception, exception handlers must 



i 



14-23 



PROTECTED-MODE EXCEPTIONS AND INTERRUPTS 



recognize the precise state of the FPU at the time the exception handler was invoked, and be 
able to reconstruct the state of the FPU when the exception initially occurred. To reconstruct 
the state of the FPU, programmers must understand that different classes of exceptions are 
recognized at different times (before or after) execution of a numeric instruction. 

Invalid operation, zero divide, and denormal operand exceptions are detected before an 
operation begins, whereas overflow, underflow, and precision exceptions are not raised until a 
true result has been computed. When a before exception is detected, the FPU register stack and 
memory have not yet been updated, and appear as if the offending instructions has not been 
executed. 

When an after exception is detected, the register stack and memory appear as if the instruction 
has run to completion; i.e., they may be updated. (However, in a store or store-and-pop 
operation, unmasked over/underflow is handled like a before exception; memory is not 
updated and the stack is not popped.) The following programming examples include an outline 
of several exception handlers to process numeric exceptions. 

14.9.15.2. SIMULTANEOUS EXCEPTION RESPONSE 

In cases where multiple exceptions arise simultaneously, the FPU signals one exception 
according to the precedence list below. This means, for example, that an SNaN divided by zero 
results in an invalid operation, not in a zero-divide exception; the masked result is the QNaN 
real indefinite, not °°. A denormal or inexact (precision) exception, however, can accompany a 
numeric underflow or overflow exception. 

The precedence among numeric exceptions is as follows: 

1. Invalid operation exception, subdivided as follows: 

— Stack underflow. 

— Stack overflow. 

— Operand of unsupported format. 

— SNaN operand. 

2. QNaN operand. Though this is not an exception, if one operand is a QNaN, dealing with it 
has precedence over lower-priority exceptions. For example, a QNaN divided by zero 
results in a QNaN, not a zero-divide exception. 

3. Any other invalid-operation exception not mentioned above or zero divide. 

4. Denormal operand. If masked, then instruction execution continues, and a lower-priority 
exception can occur as well. 

5. Numeric overflow and underflow. Inexact result (precision) can be flagged as well. 

6. Inexact result (precision). 

14.9-16. Interrupt 17 — Alignment Check 

An alignment-check fault can be generated for access to unaligned operands. For example, a 
word stored at an odd byte address, or a doubleword stored at an address which is not an 
integer multiple of four. Table 14-6 lists the alignment requirements by data type. To enable 
alignment checking, the following conditions must be true: 

14-24 ■ 



PROTECTED-MODE EXCEPTIONS AND INTERRUPTS 



• AM bit in the CRO register is set 

• AC flag is set 

• CPL is 3 (user mode) 



Table 14-6. Alignment Requirements by Data Type 



Data Type 


Address Must Be Divisible By 


WORD 


2 


DWORD 


4 


Short REAL 


4 


Long REAL 


8 


TEMPREAL 


8 


Selector 


2 


48-bit Segmented Pointer 


4 


32-bit Flat Pointer 


4 


32-bit Segmented Pointer 


2 


48-bit "Pseudo-Descriptor" 


4 


FSTENV/FLDENV save area 


4 or 2, depending on operand size 


FSAVE/FRSTOR save area 


4 or 2, depending on operand size 


Bit String 


4 



Alignment checking is useful for programs which use the low two bits of pointers to identify 
the type of data structure they address. For example, a subroutine in a math library may accept 
pointers to numeric data structures. If the type of this structure is assigned a code of 10 
(binary) in the lowest two bits of pointers to this type, math subroutines can correct for the type 
code by adding a displacement of -10 (binary). If the subroutine should ever receive the wrong 
pointer type, an unaligned reference would be produced, which would generate an exception. 

Alignment-check faults are generated only in user mode (privilege level 3). Memory 
references which default to privilege level 0, such as segment descriptor loads, do not generate 
alignment-check faults, even when caused by a memory reference made in user mode. 

Storing a 48 -bit pseudo-descriptor (the memory image of the contents of a descriptor table 
base register) in user mode can generate an alignment-check fault. Although user-mode 
programs do not normally store pseudo-descriptors, the fault can be avoided by aligning the 
pseudo-descriptor to an odd word address (i.e., an address which is 2 MOD 4). 

FSAVE and FRSTOR instructions generate unaligned references which can cause alignment- 
check faults. These instructions are rarely needed by application programs. 



14.9.17. Interrupt 18 — Machine Check 

Machine check is a model- specific exception, available only on the Pentium microprocessor. It 
may not be continued or may not be continued with a compatible implementation on future 

■ 14-25 



PROTECTED-MODE EXCEPTIONS AND INTERRUPTS 



processor generations. Refer to the Pentium™ Processor Data Book for an explanation of its 
implementation and use. 



14.10. EXCEPTION SUMMARY 

Table 14-7 summarizes the exceptions recognized by the Pentium processor. 



Table 14-7. Exception Summary 



Description 


Vector 
Number 


Return Address 
Points to Faulting 
Instruction? 


Exception 
Type 


Source of the 
Exception 


Division by Zero 





Yes 


FAULT 


DIV and IDIV 
instructions 


Debug Exceptions 


1 


*1 


*1 


Any code or data 
reference 


Breakpoint 


3 


No 


TRAP 


INT 3 instruction 


Overflow 


4 


No 


TRAP 


INTO instruction 


Dounas unecK 





yes 


PAI II T 
rMUL 1 


duuiniu insxruction 


Invalid Opcode 


6 


Yes 


FAULT 


Reserved Opcodes 


Device Not 

MVcllldDie 


7 


Yes 


FAULT 


ESC and WAIT 
insirucTions 


Dni ihlo F-ai lit 
uuuuic rauii 


Q 
O 


T CO 


ARORT 

nuU n 1 


/Ally llloUUUUUII 


Invalid TSS 


10 


Yes 2 


FAULT 


IMP OAI 1 IRFT 
instructions, interrupts, 
and exceptions 


Segment Not 
Present 


11 


Yes 2 


FAULT 


Any instruction which 
changes segments 


Stack Fault 


12 


Yes 


FAULT 


Stack operations 


General Protection 


13 


Yes 


FAULT/TRAP 3 


Any code or data 
reference 


Page Fault 


14 


Yes 


FAULT 


Any code or data 
reference 


Floating-Point Error 


16 


Yes 


FAULT 4 


ESC and WAIT 
instructions 


Alignment Check 


17 


Yes 


FAULT 


Any data reference 


Machine Check 


18 






(model dependent) 


Software Interrupt 


to 255 


No 


TRAP 


INT n instructions 



14-26 



i 



PROTECTED-MODE EXCEPTIONS AND INTERRUPTS 



NOTES: 

1 . Debug exceptions are either traps or faults. The exception handler can distinguish between traps and 
faults by examining the contents of the DR6 register. 

2. Restartability is conditional during task switches as documented in section 7.5. 

3. All general-protection faults are restartable. If the fault occurs while attempting to call the handler, the 
interrupted program is restartable, but the interrupt may be lost. 

4. Floating-point errors are not reported until the first ESC or WAIT instruction following the ESC 
instruction which generated the error. 



14.11. ERROR CODE SUMMARY 

Table 14-8 summarizes the error information that is available with each exception. 



Table 14-8. Error Code Summary 





Vector 


Is an Error 


Description 


Number 


Code Generated? 


Divide Error 





No 


Debug Exceptions 


1 


No 


Breakpoint 


3 


No 


Overflow 


4 


No 


Bounds Check 


5 


No 


Invalid Opcode 


6 


No 


Device Not Available 


7 


No 


Double Fault 


8 


Yes (always zero) 


Invalid TSS 


10 


Yes 


Segment Not Present 


11 


Yes 


Stack Fault 


12 


Yes 


General Protection 


13 


Yes 


Page Fault 


14 


Yes (special format) 


Floating-Point Error 


16 


No 


Alignment Check 


17 


Yes (always zero) 


Machine Check 


18 


(model dependent) 


Software Interrupt 


0-255 


No 



14-27 



intel 

Input/Output 



I 



CHAPTER 15 
INPUT/OUTPUT 



Input/output is accomplished through I/O ports, which are registers connected to peripheral 
devices. An I/O port can be an input port, an output port, or a bidirectional port. Some I/O 
ports are used for carrying data, such as the transmit and receive registers of a serial interface. 
Other I/O ports are used to control peripheral devices, such as the control registers of a disk 
controller. 

The input/output architecture is the programmer's model of how these ports are accessed. The 
discussion of this model includes: 

• Methods of addressing I/O ports. 

• Instructions which perform I/O operations. 

• The I/O protection mechanism. 



15.1. I/O ADDRESSING 

The processor allows I/O ports to be addressed in either of two ways: 

• Through a separate I/O address space accessed using I/O instructions. 

• Through memory-mapped I/O, where I/O ports appear in the address space of physical 
memory. 

The use of a separate I/O address space is supported by special instructions and a hardware 
protection mechanism. When memory-mapped I/O is used, the general-purpose instruction set 
can be used to access I/O ports, and protection is provided using segmentation or paging. Some 
system designers may prefer to use the I/O facilities built into the processor, while others may 
prefer the simplicity of a single physical address space. 

Hardware designers use these ways of mapping I/O ports into the address space when they 
design the address decoding circuits of a system. I/O ports can be mapped so that they appear 
in the I/O address space or the address space of physical memory (or both). 



1 5.1 .1 . I/O Address Space 

The processor provides a separate I/O address space, distinct from the address space for 
physical memory, where I/O ports can be placed. The I/O address space consists of 2 16 (64K) 
individually addressable 8-bit ports; any two consecutive 8-bit ports can be treated as a 16-bit 
port, and any four consecutive ports can be a 32-bit port. Extra bus cycles are required if a port 
crosses the boundary between two doublewords in physical memory. 

The M/IO# pin of the processor indicates when a bus cycle to the I/O address space occurs. 
When a separate I/O address space is used, it is the responsibility of the hardware designer to 
make use of this signal to select I/O ports rather than memory. In fact, the use of the separate 



INPUT/OUTPUT 



I/O address space simplifies the hardware design because these ports can be selected by a 
single signal; unlike other processors, it is not necessary to decode a number of upper address 
lines in order to set up a separate I/O address space. 

A program can specify the address of a port in two ways. With an immediate byte constant, the 
program can specify: 

• 256 8-bit ports numbered through 255. 

• 128 16-bit ports numbered 0, 2, 4, ... , 252, 254. 

• 64 32-bit ports numbered 0, 4, 8, ... , 248, 252. 
Using a value in the DX register, the program can specify: 

• 8-bit ports numbered through 65535. 

• 16-bit ports numbered 0, 2, 4, ... , 65532, 65534. 

• 32-bit ports numbered 0, 4, 8, ... , 65528, 65532. 

The processor can transfer 8, 16, or 32 bits to a device in the I/O space. Like words in memory, 
16-bit ports should be aligned to even addresses so that all 16 bits can be transferred in a single 
bus cycle. Like doublewords in memory, 32-bit ports should be aligned to addresses which are 
multiples of four. The processor supports data transfers to unaligned ports, but there is a 
performance penalty because an extra bus cycle must be used. 

The IN and OUT instructions move data between a register and a port in the I/O address space. 
The instructions INS and OUTS move strings of data between the memory address space and 
ports in the I/O address space. 

I/O port addresses 0F8H through OFFH are reserved for use by Intel Corporation. Do not 
assign I/O ports to these addresses. 

The exact order of bus cycles used to access ports which require more than one bus cycle is 
undefined and is not guaranteed to remain the same in future Intel products. If software needs 
to produce a particular order of bus cycles, this order must be specified explicitly. For 
example, to load a word-length port at 4H followed by loading a word port at 2H, two word- 
length instructions must be used, rather than a single doubleword instruction at 2H. 

Note that, although the processor automatically masks parity errors for certain types of bus 
cycles, such as interrupt acknowledge cycles, it does not mask parity for bus cycles to the I/O 
address space. Programmers may need to be aware of this behavior as a possible source of 
parity errors. 



1 5.1 .2. Memory-Mapped I/O 

I/O devices may be placed in the address space for physical memory. This is called memory- 
mapped I/O. As long as the devices respond like memory components, they can be used with 
memory-mapped I/O. 

Memory-mapped I/O provides additional programming flexibility. Any instruction which 
references memory may be used to access an I/O port located in the memory space. For 
example, the MOV instruction can transfer data between any register and a port. The AND, 
OR, and TEST instructions may be used to manipulate bits in the control and status registers of 



15-2 



i 




INPUT/OUTPUT 



peripheral devices (see Figure 15-1). Memory-mapped I/O can use the full instruction set and 
the full complement of addressing modes to address I/O ports. 



PHYSICAL MEMORY 








ROM 


N 




INPUT/OUTPUT PORT 






INPUT/OUTPUT PORT 






INPUT/OUTPUT PORT 






RAM 









APM109 



Figure 15-1. Memory-Mapped I/O 



Using an I/O instruction for an I/O write can also be advantageous because it guarantees that 
the write will be completed before the next instruction begins execution. If I/O writes are used 
to control system hardware, then this sequence of events is desirable, since it guarantees that 
the next instruction will be executed in the new system hardware state. Refer to Section 15.4 
for more information on serialization of I/O operations. 

If caching is enabled in real-address mode, designers should consider if it is advantageous to 
prevent caching of I/O data, whether by using the PCD bit of page table entries or by using the 
KEN# signal. 



15.2. I/O INSTRUCTIONS 

The I/O instructions provide access to the processor's I/O ports for the transfer of data. These 
instructions have the address of a port in the I/O address space as an operand. There are two 
kinds of I/O instructions: 

1 . Those which transfer a single item (byte, word, or doubleword) to or from a register. 



15-3 



INPUT/OUTPUT 



2. Those which transfer strings of items (strings of bytes, words, or doublewords) located in 
memory. These are known as "string I/O instructions" or "block I/O instructions." 

These instructions cause the M/IO# signal to be driven low (logic 0) during a bus cycle, which 
indicates to external hardware that access to the I/O address space is taking place. 



1 5.2.1 . Register I/O Instructions 

The I/O instructions IN and OUT move data between I/O ports and the EAX register (32-bit 
I/O), the AX register (16-bit I/O), or the AL (8-bit I/O) register. The IN and OUT instructions 
address I/O ports either directly, with the address of one of 256 port addresses coded in the 
instruction, or indirectly using an address in the DX register to select one of 64K port 
addresses. 

IN (Input from Port) transfers a byte, word, or doubleword from an input port to the AL, AX, 
or EAX registers. A byte IN instruction transfers 8 bits from the selected port to the AL 
register. A word IN instruction transfers 16 bits from the port to the AX register. A 
doubleword IN instruction transfers 32 bits from the port to the EAX register. 

OUT (Output from Port) transfers a byte, word, or doubleword from the AL, AX, or EAX 
registers to an output port. A byte OUT instruction transfers 8 bits from the AL register to the 
selected port. A word OUT instruction transfers 16 bits from the AX register to the port. A 
doubleword OUT instruction transfers 32 bits from the EAX register to the port. 



1 5.2.2. Block I/O Instructions 

The INS and OUTS instructions move blocks of data between I/O ports and memory. Block 
I/O instructions use an address in the DX register to address a port in the I/O address space. 
These instructions use the DX register to specify: 

• 8-bit ports numbered through 65535. 

• 16-bit ports numbered 0, 2, 4, ... , 65532, 65534. 

• 32-bit ports numbered 0, 4, 8, ... , 65528, 65532. 

Block I/O instructions use either the (E)SI or (E)DI register to address memory. For each 
transfer, the (E)SI or (E)DI register is incremented or decremented, as specified by the DF flag. 

The INS and OUTS instructions, when used with repeat prefixes, perform block input or 
output operations. The repeat prefix REP modifies the INS and OUTS instructions to transfer 
blocks of data between an I/O port and memory. These block I/O instructions are string 
instructions (see Chapter 3 for more on string instructions). They simplify programming and 
increase the speed of data transfer by eliminating the need to use a separate LOOP instruction 
or an intermediate register to hold the data. 

The string I/O instructions operate on byte strings, word strings, or doubleword strings. After 
each transfer, the memory address in the ESI or EDI registers is incremented or decremented 
by 1 for byte operands, by 2 for word operands, or by 4 for doubleword operands. The DF flag 
controls whether the register is incremented (the DF flag is clear) or decremented (the DF flag 
is set). 



15-4 



i 



INPUT/OUTPUT 



INS (Input String from Port) transfers a byte, word, or doubleword string element from an 
input port to memory. The INSB instruction transfers a byte from the selected port to the 
memory location addressed by the ES and EDI registers. The INSW instruction transfers a 
word. The INSD instruction transfers a doubleword. A segment override prefix cannot be used 
to specify an alternate destination segment. Combined with a REP prefix, an INS instruction 
makes repeated read cycles to the port, and puts the data into consecutive locations in memory. 

OUTS (Output String from Port) transfers a byte, word, or doubleword string element from 
memory to an output port. The OUTSB instruction transfers a byte from the memory location 
addressed by the DS and ESI registers to the selected port. The OUTSW instruction transfers a 
word. The OUTSD instruction transfers a doubleword. A segment override prefix can be used 
to specify an alternate source segment. Combined with a REP prefix, an OUTS instruction 
reads consecutive locations in memory, and writes the data to an output port. 

15.3. PROTECTED-MODE I/O 

When the processor is running in protected mode, I/O operates as in real-address mode, but 
with additional protection features: 

• References to memory-mapped I/O ports, like any other memory reference, are subject to 
access protection and control by both the segmentation and the paging mechanism. Refer 
to Chapter 12 for a complete discussion of memory protection. 

• The execution of I/O instructions is also subject to two protection mechanisms: 

a. The IOPL field in the EFLAGS register controls access to the I/O instructions. 

b. The I/O permission bit map of a TSS segment controls access to individual ports in 
the I/O address space. 

These protection mechanisms are available only when a separate I/O address space is used. 



15.3.1. I/O Privilege Level 

In systems where I/O protection is used, access to I/O instructions is controlled by the IOPL 
field in the EFLAGS register. This permits the operating system to adjust the privilege level 
needed to perform I/O. In a typical protection ring model, privilege levels and 1 have access 
to the I/O instructions. This lets the operating system and the device drivers perform I/O, but 
keeps applications and less privileged device drivers from accessing the I/O address space. 
Applications access I/O through the operating system. 

The following instructions can be executed only if CPL < IOPL: 



IN 


— Input 


INS 


— Input String 


OUT 


— Output 


OUTS 


— Output String 


CLI 


— Clear Interrupt-Enable Flag 


STI 


— Set Interrupt-Enable Flag 



I 



15-5 



INPUT/OUTPUT 



These instructions are called "sensitive" instructions, because they are sensitive to the IOPL 
field. In virtual-8086 mode, the I/O permission bit map further limits access to I/O ports (see 
Chapter 23). 

To use sensitive instructions, a procedure must run at a privilege level at least as privileged as 
that specified by the IOPL field. Any attempt by a less privileged procedure to use a sensitive 
instruction results in a general-protection exception. Because each task has its own copy of the 
EFLAGS register, each task can have a different IOPL. 

A task can change IOPL only with the POPF and IRET instructions; however, such changes 
are privileged. No procedure may change its IOPL unless it is running at privilege level 0. An 
attempt by a less privileged procedure to change the IOPL does not result in an exception; the 
IOPL simply remains unchanged. 

The POPF instruction also may be used to change the state of the IF flag (as can the CLI and 
STI instructions); however, changes to the IF flag using the POPF instruction are IOPL- 
sensitive. A procedure may change the setting of the IF flag with a POPF instruction only if it 
runs with a CPL at least as privileged as the IOPL. An attempt by a less privileged procedure 
to change the IF flag does not result in an exception; the IF flag simply remains unchanged. 



15.3.2. I/O Permission Bit Map 

The processor can generate exceptions for references to specific I/O addresses. These 
addresses are specified in the I/O permission bit map in the TSS (see Figure 15-2). The size of 
the map and its location in the TSS are variable. The processor finds the I/O permission bit 
map with the I/O map base address in the TSS. The base address is a 16-bit offset into the TSS. 
This is an offset to the beginning of the bit map. The limit of the TSS is the limit on the size of 
the I/O permission bit map. 



15-6 



I 



INPUT/OUTPUT 



TASK STATE SEGMENT 

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 
11111111 



I/O PERMISSION 
BIT MAP 











I/O MAP BASE 




64H 




















NOTE: I/O MAP BASE MUST NOT EXCEED DFFFH. 

LAST BYTE OF BIT MAP MUST BE FOLLOWED 
BY A BYTE WITH ALL BITS SET. 

APM108 



Figure 15-2. I/O Permission Bit Map 

Because each task has its own TSS, each task has its own I/O permission bit map. Access to 
individual I/O ports can be granted to individual tasks. 

If CPL < IOPL in protected mode, then the processor allows I/O operations to proceed. If CPL 
> IOPL, or if the processor is operating in virtual 8086 mode, then the processor checks the I/O 
permission map. Each bit in the map corresponds to an I/O port byte address; for example, the 
control bit for address 41 (decimal) in the I/O address space is found at bit position 1 of the 
sixth byte in the bit map. The processor tests all the bits corresponding to the I/O port being 
addressed; for example, a doubleword operation tests four bits corresponding to four adjacent 
byte addresses. If any tested bit is set, a general-protection exception is generated. If all tested 
bits are clear, the I/O operation proceeds. 

Because I/O port addresses are not necessarily aligned to word and doubleword boundaries, it 
is possible that the processor may need to access two bytes in the bit map when I/O permission 
is checked. For maximum speed, the processor has been designed to read two bytes for every 
access to an I/O port. To prevent exceptions from being generated when the ports with the 
highest addresses are accessed, an extra byte needs to come after the table. This byte must have 
all of its bits set, and it must be within the segment limit. 

It is not necessary for the I/O permission bit map to represent all the I/O addresses. I/O 
addresses not spanned by the map are treated as if they had set bits in the map. For example, if 
the TSS segment limit is 10 bytes past the bit map base address, the map has 11 bytes and the 
first 80 I/O ports are mapped. Higher addresses in the I/O address space generate exceptions. 



i 



15-7 



INPUT/OUTPUT 



If the I/O bit map base address is greater than or equal to the TSS segment limit, there is no I/O 
permission map, and all I/O instructions generate exceptions. The base address must be less 
than or equal to ODFFFH. 



15.3.3. Paging and Caching 

In protected mode, the paging mechanism can also help control cacheability of I/O buffers and 
memory-mapped I/O addresses. If caching is enabled, either external hardware or the paging 
mechanism (the PCD bit in the page table entry) must be used to prevent caching of I/O data. 

The operating system can also use the segmentation or paging mechanism to manage the data 
space used by the operands of I/O instructions. The AVL (available) fields in segment 
descriptors or page table entries can be used by the operating system to mark pages containing 
I/O buffers as unrelocatable and unswappable. 



15.4. ORDERING OF I/O 

When controlling I/O devices it is often important that memory and I/O operations be carried 
out in precisely the order programmed. For example, a program may write a command to an 
I/O port, then read the status of the I/O device from another I/O port. It is important that the 
status returned be the status of the device after it receives the command, not before. 
Programmers should take care, because there are situations in which the programmed order is 
not preserved by the processor. 

To optimize performance, the Pentium CPU allows memory reads to be reordered ahead of 
buffered writes in most situations. The Intel486 CPU allows memory reads to be reordered 
ahead of buffered writes in certain precisely-defined circumstances. (See the Intel486™ 
Microprocessor Hardware Reference Manual for further details about the operation of the 
write buffer.) Using memory-mapped I/O, therefore, creates the possibility that an I/O read 
might be performed before the memory write of a previous instruction. To eliminate this 
possibility on the Intel486 CPU, use an I/O instruction for the read. To eliminate this 
possibility on the Pentium CPU, insert one of the serializing instructions, such as CPUID, 
between operations. 

When I/O instructions are used instead of memory-mapped I/O, the situation is different in two 
respects: 

1. Some I/O writes are never buffered. The only I/O writes that the Intel486 CPU buffers are 
those from the OUTS instruction. The Pentium CPU does not buffer any I/O writes. 
Therefore, strict ordering of I/O operations is enforced by the processor. 

2. The processor synchronizes I/O instruction execution with external bus activity. Refer to 
Table 15-1. 



15-8 



i 




INPUT/OUTPUT 



Table 15-1. I/O Serialization 



Current 
Instruction 


Processor Holds Execution of ... 


Awaiting for Completion of ... 




Current 
Instruction? 


Next Instruction? 


Pending Stores? 


Current Store? 


IN 


YES 




YES 




INS 


YES 




YES 




REP INS 


YES 




YES 




OUT 




YES 


YES 


YES 


OUTS 




YES 


YES 


YES 


REP OUTS 




YES 


YES 


YES 



Refer to Chapter 13 for more general information on memory access ordering and to 
Chapter 18 for information about other serializing instructions. 



i 



15-9 



intel 



16 



Initialization and Mode 
Switching 



i 



Intel 

CHAPTER 16 

INITIALIZATION AND MODE SWITCHING 



The processor is initialized to a known state following hardware reset in order for software 
execution to begin. When initialized, the processor provides model and stepping information to 
determine what features are available to software. For feature determination by applications at 
run-time, a code example and discussion is provided in Chapter 5. This chapter provides 
processor initialization state information and configuration requirements for both real-address 
and protected mode. This chapter also discusses the process of switching between real-address 
and protected modes which is normally part of the initialization process. A program example 
for switching to protected mode is provided. 

The floating-point units (FPU's) of the Intel x86 architectures (except the Intel 287 NPX) 
operate the same regardless of whether the processor is operating in real-address mode, in 
protected mode, or in virtual 8086 mode. 

To the numerics programmer, the operating mode affects only the manner in which the FPU 
instruction and data pointers are represented in memory following an FSAVE or FSTENV 
instruction. Each of these instructions produces one of four formats depending on both the 
operating mode and on the operand-size attribute in effect for the instruction. The differences 
are detailed in the discussion of the FSAVE and FSTENV instructions in Chapter 25. 



16.1. PROCESSOR INITIALIZATION 

The processor has an input, called the RESET pin, which invokes reset initialization. After 
RESET is asserted, some registers of the processor are set to known states. These known states, 
such as the contents of the EIP register, are sufficient to allow software to begin execution. 
Software then can build the data structures in memory, such as the GDT and IDT tables, which 
are used by system and application software. The internal caches, translation lookaside buffers 
(TLB's) and the branch target buffers (BTB's) are invalidated when RESET is asserted. 

Hardware asserts the RESET signal at power-up. Hardware may assert this signal at other 
times. For example, a button may be provided for manually invoking reset initialization. Reset 
also may be the response of hardware to receiving a halt or shutdown indication. 

The Pentium processor also has an INIT input, which is similar to RESET except it does not 
disturb the internal caches, model specific registers, or floating point state. INIT provides a 
method for switching from protected to real-address mode while maintaining the contents of 
the internal caches. The TLB's and BTB are invalidated by INIT being asserted. 



1 6.1 .1 . Processor State After Reset 

A self test may be requested at power-up. It is the responsibility of the hardware designer to 
provide the request for self test, if desired. If the self test is selected, it takes about 2^ clock 
periods to complete. (This clock count is model-specific and Intel reserves the right to change 
the exact number of periods without notification.) 



16-1 



INITIALIZATION AND MODE SWITCHING 



The EAX register is clear (zero) if the processor passed the test. A non-zero value in the EAX 
register after self test indicates the processor is faulty. If the self test is not requested, the 
contents of the EAX register after reset initialization is zero. 

The EDX register holds a component identifier and revision number after reset initialization, as 
shown in Figure 16-1. The DH register contains the value 3, 4, or 5 to indicate an Intel386 
CPU, Intel486 CPU, or Pentium CPU, respectively. Different values may be returned for the 
various proliferations of these families, for example the Intel386 SX CPU contains 23H. 
Binary object code can be made compatible with other Intel processors by using this number to 
select the correct initialization software. The DL register contains a unique identifier of the 
revision level. The upper word of EDX is reserved following reset. 



EDX REGISTER 



DX REGISTER 



/31 30292827262524232221 20 19 18 17 16/18 14 13 12 11 10 9 8/7 6 5 4 3 2 1 / 



RESERVED 



DEVICE ID (5) 



STEPPING ID 



Figure 16-1. Contents of the EDX Register After Reset 



The state of the CRO register for the Pentium processor following power-up is shown in 
Figure 16-2 (6000001 OH). This state puts the processor into real-address mode with paging 
disabled. The state of the flags and other registers following power-up is shown in Table 16-1. 



16-2 



intel 



INITIALIZATION AND MODE SWITCHING 



PAGING DISABLED 

1 CACHING DISABLED 

1 NOT WRITE-THROUGH 
DISABLED 



ALIGNMENT CHECK DISABLED 
WRITE-PROTECT DISABLED 



/ 31 30 29 23 27 26 25 24 23 22 21 20 19/l8/l7/l6/l5 14 13 12 11 10 9 8 7 8 / 5 '/ 4 / 3 /2 / 1 / ' oi 



RESERVED 



X^X^raXra tx ?a zl t\ 



RESERVED 



\»\%\2A\ V* * 1 ft\*\ 



AAAAAA 



EXTERNAL FLOATING-POINT ERROR REPORTING 

1 (NOT USED) 

NO TASK SWITCH 



ESC INSTRUCTIONS NOT TRAPPED 
WAIT INSTRUCTIONS NOT TRAPPED 
REAL MODE 



Figure 16-2. Contents of CRO Register After Reset 



i 



16-3 



INITIALIZATION AND MODE SWITCHING 




Table 16-1. Processor State Following Reset 



Register 


RESET 
Without BIST 


INIT 


EFLAGS 1 


00000002H 


00000002H 


EIP 


0000FFF0H 


0000FFF0H 


CRO 


6000001 OH 


Note 2 


CR2/CR3/CR4 


00000000H 


00000000H 


CS 


selector=0F000H 

Udoc-urrrruuuun 

limit=0FFFFH 
AR=Present, 
Read/Write, Accessed 


selector=0F000H 

haQP-DFFFFnnnH 
Udoc— urrrruuun 

limit=0FFFFH 
AR=Present, Read/Write, 
Accessed 


SS, DS, ES, FS, GS 


selector=0000 
base=0000H 
limit=0FFFFH 
AR=Present, Read/Write, 
Accessed 


selector=0000 
base=0000H 
limit=0FFFFH 
AR=Present, Read/Write, 
Accessed 


EDX 


000005xxH 


000005xxH 


EAX 


J 





EBX, ECX, ESI, EDI, EBP, ESP 


00000000H 


00000000H 


LDTR 


selector=0000H 
base=00000000H 
limit=0FFFFH 
AR=Present,Read/Write 


selector=0000H 
base=00000000H 
limit=0FFFFH 
AR=Present, Read/Write 


GDTRJDTR 


base=00000000H 
limit=0FFFFH 

AR— Pmcpnt RpaH/WritP 
nn-rlcbcl ll,ricclU/ VVI lie 


base=00000000H 
limit=0FFFFH 

AR— PrPQPnt RpaH/WritP 

AAPl — il CoCl ii,r\t?ciu/ VVI lit? 


DRO, DR1, DR2, DR3 


00000000H 


00000000H 


DR6 


FFFF0FF0H 


FFFF0FF0H 


DR7 


00000400H 


00000400H 


Time Stamp Counter 





Unchanged 


Control and Event Select 





Unchanged 


TR12 





Unchanged 


All Other Model Specific Registers 
(MSR's) 


Undefined 


Unchanged 


Data and Code Cache, TLB's 


Invalid 


Invalid 



16-4 



INITIALIZATION AND MODE SWITCHING 



NOTES: 

1 . The high ten bits of the EFLAGS register are undefined following power-up. Undefined bits are 
reserved. Software should not depend on the states of any of these bits. 

2. CD and NW are unchanged, bit 4 is set to 1 , all other bits are cleared. 

3. If Built-in Self Test is invoked, EAX is only if all tests passed. 



1 6.1 .2. First Instruction Executed 

To generate an address, the base part of a segment register is added to the effective address to 
form the linear address. This is true for all modes of operation, although the base address is 
calculated differently in protected and real-address modes. To fetch an instruction, the base 
portion of the CS register is added to EIP to form a linear address (see Chapter 9 and 
Chapter 1 1 for details on calculating addresses). 

In real-address mode, when the value of the segment register selector is changed, the base 
portion will automatically be changed to this value multiplied by 16. However, immediately 
after reset, the base portion of the CS register behaves differently: It is not 16 times the selector 
value. Instead, the CS selector is 0F000H and the CS base is 0FFFF0000H. The first time the 
CS selector value is changed after reset, it will follow the above rule (base = selector * 16). As 
a result, after reset, the first instruction that is being fetched and executed is at physical 
address: CS.base + EIP = OFFFFFFFOH. This is the address to locate the EPROM with the 
initialization code. This address is located 16 bytes below the uppermost address of the 
physical memory of the Pentium processor. 

Ensure that no far jump or far call is executed until the initialization is completed. If the first 
far jump/call is made during real mode, a new value enters the CS selector (16 bits) and sets 
the value of the CS base to 20 bits only, i.e. the destination address would be in the address 
space to 1M. You might want to be sure that you have valid memory and code in this area. 

The base address for the data segments are set to the bottom of the physical address space 
(address 0), where RAM is expected to be. 



16.2. FPU INITIALIZATION 

During system initialization, systems software can determine the absence or presence of a 
numeric processor extension. Systems software must then initialize the FPU or NPX and set 
flags in CRO to reflect the state of the numeric environment. These activities can be quickly 
and easily performed as part of the overall system initialization. See Chapter 5 for determining 
the processor type and feature recognition. 

A hardware reset leaves the Pentium FPU in a state that is different from the state that is 
obtained by executing the FNINIT instruction as shown in Table 16-2. See Chapter 23 for a 
complete list of initialization differences between these processors following RESET. 

The state of the FPU registers following RESET or INIT is shown in Table 16-2. Following 
RESET, the Pentium FPU contains in ST0-ST7 stack registers with the tags set to zero (01). 
However, the tags are only visible to the programmer by using the FSAVE/FSTENV 
instructions. When these instructions are used, they interpret the stack locations as zero, 
returning tag values of 01. The Pentium processor, in addition, has an INIT pin which, when 
asserted, causes the processor to reset without altering the FPU state. An FNINIT instruction 



i 



16-5 



INITIALIZATION AND MODE SWITCHING 



should be executed after reset. 

Initializing the FPU simply means placing the FPU in a known state unaffected by any activity 
performed earlier. A single FNINIT instruction performs this initialization. All the error masks 
are set, all registers are tagged empty, TOP is set to zero, and default rounding and precision 
controls are set. Table 16-2 shows the state of the FPU following FINIT or FNINIT. 



Table 16-2. FPU State Following FINIT or FNINIT 



Field 


Value 


Interpretation 


Control Word 


037FH 




(Infinity Control)* 





Affine 


Rounding Control 


00 


Round to nearest 


Precision Control 


00 


Extended 


Exception Masks 


H H H -t H •( 


Exceptions masked 


Status Word 


0000H 




(Busy) 







Condition Code 


0000 




Stack Top 


000 


Register is stack top 


Exception Summary 





No exceptions 


Stack Flag 







Exception Flags 


000000 


No exceptions 


Tag Word 


FFFFH 




Tags 


11 


Empty 


Registers 


Not changed 


Not changed 


Exception Pointers 






Instruction Code 





Cleared 


Instruction Address 





Cleared 


Operand Address 





Cleared 



NOTES: 

*The Pentium™, Intel486™, and Intel386™ processors do not have infinity control. This value is listed to 
emphasize that programs written for the Intel287 math coprocessor may not behave the same on the 32-bit 
processors if they depend on this bit. 



16.2.1. Configuring the Numerics Environment 

System software must load the appropriate values into the MP, EM, and NE bits of the control 
register (CR0) to control emulation of floating point instructions by software, 
synchronization between the FPU and CPU context, and software or external interrupt 
handling of floating-point exceptions. These bits are clear on hardware reset of the Pentium 
processor. 

The MP (Monitor coprocessor) bit determines whether WAIT instructions trap when the 
context of the FPU is different from that of the currently executing task. If MP = 1 and TS = 1 , 
then a WAIT instruction will cause a Device Not Available fault (interrupt vector 7). The MP 
bit is used on the Intel 286, Intel386 DX, Intel386 SX and Intel486 SX microprocessors to 
support the use of a WAIT instruction to wait on a device other than a numeric coprocessor. 
The device reports its status through the BUSY# pin. Generally, the MP bit should be set for 
processors with integrated FPU and clear in processors without an integrated FPU or numeric 
processor extension. However, an operating system can choose to save the floating-point 



16-6 



INITIALIZATION AND MODE SWITCHING 



context at every context switch, in which case there would be no need to set the MP bit. 

The EM (EMulate coprocessor) bit determines whether ESC instructions are executed by the 
FPU (EM = 0) or trap via interrupt vector 7 to be handled by software (EM =1). The EM bit is 
used on CPU/NPX systems so that numeric applications can be run in the absence of an NPX 
with a software emulator. For normal operation of Intel processors with an integrated FPU, the 
EM bit should be cleared to 0. The EM bit must be set in the Intel386 DX, Intel386 SX, or 
Intel486 SX CPUs if there is no NPX present. If the EM bit is set and no coprocessor or 
emulator is present, the system will hang. 

The interpretation of different combinations of the EM and MP bits are shown in Table 16-3. 
Recommendations for the different processors is shown in Table 16-4. 



Table 16-3. EM and MP Bits Interpretations 



EM 


MP 


Interpretation 








Numeric instructions are passed to FPU; WAIT ignores TS 





1 


Numeric instructions are passed to FPU; WAIT tests TS 


1 





Numeric instructions trap to emulator; WAIT ignores TS 


1 


1 


Numeric instructions trap to emulator, WAIT tests TS 


Table 16-4. Recommended Values by Processor 


EM 


MP 


Interpretation 


1 





Intel386™ DX, Intel386 SX, and Intel486™ SX CPU's 





1 


Intel386 DX and Intel387™ DX, Intel386 SX and Intel387 SX 
, Intel487™ SX, Intel486 DX, Pentium™ CPU's 



The action taken for floating-point and wait instructions based on the value of these bits is 
shown in Table 16-5. 



16-7 



INITIALIZATION AND MODE SWITCHING 




Table 16-5. Action Taken for Different Combinations of EM, MP, and TS 



CRO Bits 


Instruction Type 


EM 


TS 


MP 


Floating Point 


Wait 











Execute 


Execute 








1 


Execute 


Execute 





1 





Exception 7 


Execute 





1 


1 


Exception 7 


Exception 7 


1 








Exception 7 


Execute 


1 





1 


Exception 7 


Execute 


1 


1 





Exception 7 


Execute 


1 


1 


1 


Exception 7 


Exception 7 



The NE (Numeric Exception) bit determines whether unmasked floating-point exceptions are 
handled through interrupt vector 16 (NE = 1) or through external interrupt (NE = 0). In 
systems using an external interrupt controller to invoke numeric exception handlers, the NE bit 
should be cleared to 0. This option is used for compatibility with the error reporting scheme 
used in DOS based systems. Other systems can make use of the automatic error reporting 
through interrupt 16, and should set the NE bit to 1. Numeric exception handling is discussed 
in a later section. 



16.2.2. FPU Software Emulation 

Setting the EM bit to 1 causes the processor to trap via interrupt vector 7 (Device Not 
Available) to a software exception handler whenever it encounters an ESC instruction. Setting 
this bit has two uses: 

1. The EM bit is used to run numeric applications on an Intel processor without an integrated 
FPU or NPX using a software Intel387 emulator. 

2. Numeric applications designed to be run with a non-standard Intel387 emulator may not 
run successfully without the emulator. Setting the EM bit to 1 makes it possible to run 
such applications, or programs which use non-standard floating-point arithmetic. 

If a math coprocessor is not present in the system, floating point instructions can be emulated. 
The system is set up for software emulation as Table 16-6: 



16-8 



i 




INITIALIZATION AND MODE SWITCHING 



Table 16-6. Software Emulation Settings 



CRO Bit 


Value 


EM 


1 


MP 





NE 


1 



The EM bit must be set in order for software emulation to function properly. Setting the EM 
bit to 1 will cause the processor to trap via interrupt vector 7 (Device Not Available) to a 
software exception handler whenever it encounters an ESC instruction. If the EM bit is set and 
no coprocessor or emulator is present, the system will hang. 

The MP bit can be used during a task switch in protected-mode in conjunction with the TS bit 
to determine if WAIT instructions should trap when the context of the FPU is different from 
that of the currently executing task. When no FPU is present, this information may be 
irrelevent and therefore the bit should be set to 0. 

Regardless of the value of the NE bit, the Intel486 SX processor generates an interrupt vector 7 
(Device Not Available) upon encountering any floating point instruction. It is recommended 
that NE be set to 1 for normal operation. If a Floating Point Unit is present, this bit follows the 
description described in Table 16-3. 



16.3. CACHE ENABLING 

The cache is enabled by clearing the CD and NW bits in the CRO register (they are set upon 
hardware reset as indicated above). This enables caching (write-through for the Intel486 
processor and write-back for the Pentium processor) and cache invalidation cycles. Because all 
cache lines are invalid following reset initialization, it is unnecessary to invalidate the cache 
before enabling caching. See Chapter 18 for complete details of cache handling, including 
implementation of a write-through cache on the Pentium processor using the PWT bit in the 
page table entry. 

Under circumstances where cache lines may be marked as valid, the cache may need to be 
flushed or invalidated before the cache is enabled. This may occur as a result of using the test 
registers to run test patterns through the cache memory as part of confidence testing during 
software initialization. See the Pentium™ Processor Data Book for model-specific details on 
cache testing. 



16.4. SOFTWARE INITIALIZATION IN REAL-ADDRESS MODE 

Note that the processor has several processing modes. It begins execution in a mode 
compatible with an 8086 processor, called real-address mode. After reset initialization, 
software must set up data structures needed for the processor to perform basic system 
functions, such as handling interrupts. If the processor remains in real-address mode, software 
must set up data structures in the form used by the 8086 processor. If the processor is going to 
operate in protected mode, software must set up data structures in the form used by protected 



i 



16-9 



INITIALIZATION AND MODE SWITCHING 



mode and then switch modes (see Section 16.6.). 



1 6.4.1 . System Tables 

In real-address mode, no descriptor tables are used. The interrupt descriptor table (IDT), which 
starts at address (unless the IDTR is changed), needs to be loaded with pointers to exception 
and interrupt handlers before interrupts can be enabled. 



16.4.2. NMI Interrupt 

The NMI interrupt is always enabled (except on nested NMI's). If the interrupt vector table and 
the NMI interrupt handler need to be loaded into RAM, there will be a period of time 
following reset initialization when an NMI interrupt cannot be handled. Hardware must 
provide a mechanism to prevent an NMI interrupt from being generated while software is 
unable to handle it. For example, the IDT and NMI interrupt handler can be provided in 
ROM. This allows an NMI interrupt to be handled immediately after reset initialization. Most 
systems enable/disable NMI by passing the NMI signal through an AND gate controlled by a 
bit in an I/O port. Hardware can clear the bit when the processor is reset, and software can set 
the bit when it is ready to handle NMI interrupts. System software designers should be aware 
of the mechanism used by hardware to protect software from NMI interrupts following reset. 



16.5. SOFTWARE INITIALIZATION IN PROTECTED MODE 

The data structures needed in protected mode are determined by the memory management 
features which are used. The processor supports segmentation models which range from a 
single, uniform address space (flat model) to a highly structured model with several 
independent, protected address spaces for each task (multisegmented model). Paging can be 
enabled for allowing access to large data structures which are partly in memory and partly on 
disk. Both of these forms of address translation require data structures which are set up by the 
operating system and used by the memory management hardware. 



1 6.5.1 . System Tables 

A flat model without paging minimally requires a GDT with one code and one data segment 
descriptor. A null descriptor in the first GDT entry is also required. A flat model with paging 
may provide code and data descriptors for supervisor mode and another set of code and data 
descriptors for user mode. In addition, it requires a page directory and at least one second-level 
page table. (Note: the second-level page table can be eliminated if the page directory contains 
a directory entry pointing to itself, in which case the page directory and page table reside in the 
same page). The stack can be placed in a normal read/write data segment, so no descriptor for 
the stack is required. Before the GDT can be used, the base address and limit for the GDT 
must be loaded into the GDTR register using an LGDT instruction. 

A multi-segmented model may require additional segments for the operating system, as well as 
segments and LDTs for each application program. LDTs require segment descriptors in the 
GDT. Some operating systems allocate new segments and LDTs as they are needed. This 



16-10 



i 



INITIALIZATION AND MODE SWITCHING 



provides maximum flexibility for handling a dynamic programming environment, such as an 
engineering workstation. However, many operating systems use a single LDT for all 
processes, allocating GDT entries in advance. An embedded system, such as a process 
controller, might pre-allocate a fixed number of segments and LDTs for a fixed number of 
application programs. This would be a simple and efficient way to structure the software 
environment of a system which requires real-time performance. 



16.5.2. Interrupts 

If hardware allows interrupts to be generated, the IDT and a gate for the interrupt handler need 
to be created. Before the IDT can be used, the base address and limit for the IDT must be 
loaded into the IDTR register using an LIDT instruction. See Chapter 14 for detailed 
information on this topic. 



16.5.3. Paging 

Unlike segmentation, paging is controlled by a mode bit. If the PG bit in the CRO register is 
clear (its state following reset initialization), the paging mechanism is completely absent from 
the processor architecture seen by programmers. 

If the PG bit is set, paging is enabled. The bit may be set using a MOV CRO instruction. Before 
setting the PG bit, the following conditions must be true: 

• Software has created at least two page tables, the page directory and at least one second- 
level page table if 4K pages are used. For information on 4M pages, see Appendix H. 

• The PDBR register (same as the CR3 register) is loaded with the physical base address of 
the page directory. 

• The processor is in protected mode (paging is not available in real-address mode). If all 
other restrictions are met, the PG and PE bits can be set at the same time. 

The following guidelines for setting the PG bit (as with the PE bit) should be adhered to 
maintain both upwards and downwards compatibility: 

1 . The instruction setting the PG bit should be followed immediately with a JMP instruction. 
A JMP instruction immediately after the MOV CRO instruction changes the flow of 
execution, so it has the effect of emptying the Intel386 and Intel486 processor of 
instructions which have been fetched or decoded. The Pentium processor, however, uses a 
branch target buffer (BTB) for branch prediction, eliminating the need for branch 
instructions to flush the prefetch queue. For more information on the BTB, see 
Appendix H. 

2. The code from the instruction which sets the PG bit through the JMP instruction must 
come from a page which is identity mapped (i.e., the linear address before the jump is the 
same as the physical address after paging is enabled). 

The 32-bit Intel x86 architectures have different requirements for enabling paging and 
switching to protected mode. The Intel386 processor requires following steps I or 2 above. 
The Intel486 processor requires following both steps 1 and 2 above. The Pentium processor 
requires only step 2 but for upwards and downwards code compatibility with the Intel386 and 



16-11 



INITIALIZATION AND MODE SWITCHING 



Intel486 processors, it is recommended both steps 1 and 2 be taken. 
See Chapter 1 1 for complete information on the paging mechanism. 

16.5.4. Tasks 

If the multitasking mechanism is not used and changes to more privileged segments are not 
allowed, it is unnecessary to initialize the TR register. 

If the multitasking mechanism is used or changes to more privileged segments are allowed 
(values for more privileged SS and ESP are obtained from the TSS), a TSS and a TSS 
descriptor for the initialization software must be created. TSS descriptors must not be marked 
as busy when they are created; TSS descriptors should be marked as busy by the CPU only as a 
side-effect of performing a task switch. As with descriptors for LDTs, TSS descriptors reside 
in the GDT. The LTR instruction is used to load a selector for the TSS descriptor of the 
initialization software into the TR register. This instruction marks the TSS descriptor as busy, 
but does not perform a task switch. The selector must be loaded before the software performs 
the first task switch, because a task switch copies the current task state into the TSS. After the 
LTR instruction has been used, further operations on the TR register are performed by task 
switching. As with segments and LDTs, TSSs and TSS descriptors can be either pre-allocated 
or allocated as needed. 

If changes to more privileged segments are allowed, a TSS and TSS descriptor need to be 
created. The processor uses the TSS to obtain the values for the more privileged stack segment 
selector and stack pointer values when transferring control to more privileged segments. 

16.5.5. TLB, BTB and Cache Testing 

As part of the process of switching into protected mode, system programmers may wish to 
perform TLB, BTB, and cache testing. For more information on testing, see Appendix H. 

16.6. MODE SWITCHING 

In order to take full advantage of the 32-bit address space and instruction set, the processor 
must switch from its native real-address mode to protected mode. A system may also find it 
necessary to switch back into real-address mode for system operations. This section identifies 
the steps necessary for software to switch the processor from real-address mode to protected 
mode and from protected mode back into real-address mode. 



1 6.6.1 . Switching to Protected Mode 

Before switching to protected mode, a minimum set of system data structures must be created, 
and the GDT, IDT, and TR registers must be initialized, as discussed in the previous section. 
Once these tables are created, system software can perform the steps to switch into protected 
mode. 

Protected mode is entered by setting the PE bit in the CRO register. The MOV CRO instruction 



16-12 



i 



INITIALIZATION AND MODE SWITCHING 



may be used to set this bit. The same two guidelines for setting the PG bit to enable paging in 
Section 16.5.3. apply for setting the PE bit to enable protected mode. 

After entering protected mode, the segment registers continue to hold the contents they had in 
real address mode. Software should reload all the segment registers. Execution in protected 
mode begins with a CPL of 0. 



16.6.2. Switching Back to Real-Address Mode 

The processor re-enters real-address mode if software clears the PE bit in the CRO register with 
a MOV CRO instruction. A procedure which re-enters real-address mode should proceed as 
follows: 

1 . If paging is enabled, perform the following sequence: 

— Transfer control to linear addresses which have an identity mapping (i.e., linear 
addresses equal physical addresses). Ensure the GDT and IDT are identity mapped. 

— Clear the PG bit in the CRO register. 

— Move zero into the CR3 register to flush the TLB. 

2. Transfer control to a segment which has a limit of 64K (OFFFFH). This loads the CS 
register with the segment limit it needs to have in real mode. Ensure the GDT and IDT are 
in real-address memory (0-1 Meg). 

3. Load segment registers SS, DS, ES, FS, and GS with a selector for a descriptor containing 
the following values, which are appropriate for real mode: 

— Limit = 64K (OFFFFH) 

— Byte granular (G =0) 

— Expand up (E = 0) 

— Writable (W=l) 

— Present (P=l) 

— Base = any value 

Note that if the segment registers are not reloaded, execution continues using the 
descriptors loaded during protected mode. 

4. Disable interrupts. A CLI instruction disables INTR interrupts. NMI interrupts can be 
disabled with external circuitry. 

5 . Clear the PE bit in the CRO register. 

6. Jump to the real mode program using a far JMP instruction. This flushes the instruction 
queue (of the Intel386 and Intel486 processors) and puts appropriate values in the access 
rights of the CS register. This step is not required on the Pentium processor, however, for 
downwards compatibility, a far JMP should be included as part of the switching back to 
real-address mode process. 

7. Use the LIDT instruction to load the base and limit of the real-mode interrupt vector table. 

8. Enable interrupts. 

9. Load the segment registers as needed by the real-mode code. 



16-13 



INITIALIZATION AND MODE SWITCHING 



16.7. INITIALIZATION AND MODE SWITCHING EXAMPLE 

This section provides an initialization and mode switching example that can be incorporated 
into your application. Also provided are some assumptions about the Intel development tools 
that are used which include the ASM386/486 assembler and BLD386 builder. 



16.7.1 . Goal of this Example 

The goal of this example is to move the CPU into protected mode right after reset using 
initialization code that resides in EPROM/Flash and run a simple application. 



16.7.2. Memory Layout Following Reset 

Based on the discussion in Section 16.1. and the values shown in Table 16-1, Figure 16-3 
shows the memory layout following processor reset and the starting point of this example. 



16-14 



INITIALIZATION AND MODE SWITCHING 



AFTER RESET 



[CS.BASE+EIP] 



EIP= 0000 FFFO 
CSBASE = FFFF 0000 
DSBASE = 
ESBASE = 
SSBASE = 
SP = 




64K EPROM 



[SP, DS, SS, ES] 1 

APM112 



Figure 16-3. Processor State After Reset 



16-7.3. The Algorithm 

The main steps of this example are shown in Table 16-7 along with the line numbers from the 
source listing of STARTUP. ASM given in Example 16-1. 



16-15 



INITIALIZATION AND MODE SWITCHING 




Table 16-7. The Algorithm and Related Listing Line Numbers 



ASM Lines 


Description 


From 


To 




157 


157 


Jump (short) to the entry code in the Eprom 


162 


169 


Construct a temporary GDT in RAM with one entry: 

- null 

1 - R/W data segment, Base=0 limit = 4GB 


171 


172 


Load the GDTR to point to the temp GDT 


174 


177 


Load CRO with protected mode bit - move to PM 


179 


181 


Jump near to clear real mode queue 


184 


186 


Load DS, ES registers with GDT[1] descriptor; now both point to the entire physical 
memory space. 


■i QO 

loo 


lyo 


Perform specific board initialization that is imposed by the new protected mode 


iyb 


o-t Q 
<L lo 


Copy the application's GDT from ROM into RAM 


tLd\) 




Copy the application's IDT from ROM into RAM 


OA ^ 


OAO 


Load application's GDTR 


OA A 


245 


Load application's IDTR 


OA~7 


261 


Copy the application's TSS from ROM into RAM 


cX>o 


CX>( 


Update TSS descriptor and other aliases in GDT (GDT alias or IDT alias) 


277 


277 


Load the TR register (without task switch — using LTR instruction) 


282 


286 


Load SS, ESP with the value found in the application's TSS 


287 


287 


Push EFLAGS value found in the application's TSS 


288 


288 


Push CS value found in the application's TSS 


289 


289 


Push EIP value found in the application's TSS 


290 


293 


Load DS, ES with the value found in the application's TSS 


296 


296 


Perform I RET; pop the above values and enter the application code 



16-16 



INITIALIZATION AND MODE SWITCHING 



NOTES: 

If a switch into protected mode is made the CS selector is not changed (by far jump or far call) the original 
base value is retained (if there is no far jump after reset the base will stay 0FFFF0000H; which is the 
location space of the EPROM). 

Interrupts are disabled after reset and should stay that way, otherwise may impose far jump. NMI is not 
disabled and must not be active until the initialization is done. 

The use of TEMP_GDT allows simple transfer of tables from the Eprom to anywhere in the RAM area. A 
GDT entry is constructed with its base pointing to address and a limit of 4GB. When the DS and ES 
registers are loaded with this descriptor, the TEMP_GDT is no longer needed and can be replaced by the 
application GDT. 

The assumption for this code is one TSS no LDTs. If more TSSs exist in the application, they must be 
copied into RAM. If there are LDTs they may be copied as well. 

In some implementations, decoding of the address lines A20 - A31 is not done after reset to simulate the 
early 8086 chip behaviour. In the process of moving into protected mode it may be desirable to set these 
decoders to decode the complete address lines. 



16.7.4. Tool Usage 

In this example, Intel software tools (ASM386 and BLD386) are used. 

The following are assumptions that are used when using the Intel ASM386 and BLD386 to 
generate the initialization code. 

• The ASM386 will generate the right operand size opcodes according to the code segment 
attribute. The attribute is assigned either by the ASM386 invocation controls or in the 
code segment definition. 

• If a code segment that is going to run in real-address mode is defined, it must be set to a 
USE 16 attribute. If 32-bit operands (MOV EAX, EBX) are used in the segment, an 
operand prefix will automatically be generated which will force the CPU to execute a 32- 
bit operation for this instruction although its default code segment attribute is 16-bit. 

• Intel's ASM386 assembler allows specific use of the 16- or 32-bit instructions, for 
example, LGDTW, LGDTD, IRETD. If you are using the generic instruction (LGDT) the 
default segment attribute will be used to generate the right opcode. 



16-17 



INITIALIZATION AND MODE SWITCHING 




Table 16-8. Relationship Between BLD Item and ASM Source File 



Item 


ASM386 and 
Startup.A58 


BLD386 Controls and 
BLD file 


Effect 


Bootstrap 


public startup 
startup: 


bootstrap 
start(startup) 


Near jump at 
0FFFFFFF0H to start 


GDT location 


public GDT EPROM 
GDT EPROM 
TABLE_REG <> 


TABLE 

GDT(location = 
GDTJEPROM) 


The location of the GDT 
will be programmed into 
the GDTJEPROM 
location 


IDT location 


public IDT EPROM 
IDT EPROM 
TABLE_REG <> 


TABLE 
IDT(location = 
IDTJEPROM 


The location of the IDT 
will be programmed into 
the IDT_EPROM 
location 


RAM start 


RAM_START equ 400H 


memory( reserve = 
(0..3FFFH)) 


RAM_START is used as 
the ram destination for 
moving the tables. It 
must be excluded from 
the application's 
segment area. 


Location of the 
application TSS in the 
GDT 


TSSJNDEX EQU 10 


TABLE GDT( 
ENTRY=( 10: 
PROTECTED MODE T 
ASK)) 


Put the descriptor of the 
application TSS in GDT 
entry 1 


EPROM size and 
location 


size and location of the 
initialization code 


SEGMENT startup.code 
(base= 0FFFF0000H) 
...memory (RANGE( 
ROM AREA = 
ROM(x..y)) 


Initialization code size 
must be less than 64K 
and resides at upper 
most 64K of the 4GB 
memory space. 



16.7.5. STARTUP.ASM Listing 

The source code listing to move the CPU into protected mode is provided in Example 16-1. 
This listing does not include any opcode and offset information. 

Example 16-1. STARTUP.ASM 

DOS 5.0 (045-N) 386 (TM) MACRO ASSEMBLER STARTUP 

09:44:51 08/19/92 PAGE 1 

DOS 5.0 (045-N) 3 86 (TM) MACRO ASSEMBLER V4 . , ASSEMBLY OF MODULE 

STARTUP 

OBJECT MODULE PLACED IN startup. obj 

ASSEMBLER INVOKED BY: f:\386tools\ASM386.EXE startup. a58 pw (132 ) 
LINE SOURCE 

1 NAME STARTUP 



16-18 



intel 



INITIALIZATION AND MODE SWITCHING 



2 
3 

4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
39 
40 
41 
42 
43 
44 
45 
46 
47 
48 
49 
50 



ASSUMPTIONS: 

1. The bottom 64K of memory is ram, and can be used for 
scratch space by this module. 

2. The system has sufficient free usable ram to copy the 
initial GDT , IDT, and TSS 



configuration data - must match with build definition 



CS_BASE 



EQU 



0FFFF0000H 



CS_BASE is the linear address of the segment STARTUP_CODE 
- this is specified in the build language file 



RAM_ START 



EQU 



400H 



RAM_START is the start of free, usable ram in the linear 
memory space. The GDT, IDT, and initial TSS will be 
copied above this space, and a small data segment will be 
discarded at this linear address. The 32-bit word at 
RAM_START will contain the linear address of the first 
free byte above the copied tables - this may be useful if 
a memory manager is used. 



TSS__INDEX 



EQU 



10 



TSS_INDEX is the index of the TSS of the first task to 
run after startup 



STRUCTURES and EQU 



; structures for system data 

; TSS structure 
TASK_STATE STRUC 

link DW ? 

link_h DW ? 

ESP0 DD ? 

SS0 DW ? 

SS0_h DW ? 

ESP1 DD ? 



16-19 



INITIALIZATION AND MODE SWITCHING 



R 1 
D 1 


C C 1 


DW 




C O 
JZ 


Obi n 


DW 




D J 


ESP2 


DD 




D4 


COT 
OOZ 


DW 




D D 


ooz n 


DW 




D O 


L.KJ reg 


DD 




R 7 


Ji± Jr reg 


JJJJ 




D O 


iir LjAUo reg 


DD 




^ Q 


EAX reg 


DD 




U 


HjCa reg 


DD 




D 1 


jijjA reg 


DD 




(Z O 
DZ 


EBX__reg 


DD 




D J 


Jio Jr reg 


DD 




b4 


EBP__reg 


DD 




D D 


JliO-L it eg 


DD 


■p 


b o 


EDI__reg 


DD 




£7 
b / 


ES reg 


JJW 




b o 


ES_h 


DW 






CS reg 


JJW 




/ U 


CS_h 


DW 




/ 1 


oo reg 


DW 




7 

/ z 


oo n 


DW 




7 "3 
/ o 


jjo recj 


DW 




1 A 
1 4 


DS_h 


DW 




7 R 
/ D 


r o reg 


JJW 




7 £ 

/ b 


TP C Vi 

r o fl 


DW 




/ / 


GS_reg 


DW 




1 o 
/ o 


GS_h 


DW 




/ y 


LDT_reg 


DW 




o n 
o U 


LDT_h 


DW 




O 1 

o 1 


TRAP_reg 


DW 




o o 
oz 


1 0_map_ba s e 


DW 




Q "3 
o J 


rpA CI/ Orn7\fTiTP TPTVI"H\0 

lAoK_.olAl.bj hJMDo 






Q A 

o4 








O D 


; basic structure 


or a 


0.€ 


o b 


DhSL STRUL 






o / 


1 ' „ r\ -I r- 

±im_U__lb 


DW 




o o 
o o 


U_ _ r\ -I r- 


DW 




O Q 

e>y 


bas_16_23 


DB 


? 


y u 


access 


DB 


p 


Q 1 

y i 


gran 


DB 


7 


Q 

yz 


bas_24_31 


DB 


7 


93 


DESC ENDS 






94 








95 


; structure for use with 


96 


TABLE_REG STRUC 






97 


table_lim 


DW 


7 


98 


table_linear 


DD 


p 


99 


TABLE_REG ENDS 







100 



16-20 



scriptor 



LGDT and LIDT instructions 



INITIALIZATION AND MODE SWITCHING 



101 ; offset of GDT and IDT descriptors in builder generated GDT 

102 GDT_DESC_OFF EQU 1*SIZE(DESC) 

103 IDT_DESC_OFF EQU 2*SIZE(DESC) 
104 

105 ; equates for building temp GDT in ram 

106 LINEAR_SEL EQU 2* SIZE (DESC) 

107 LINEAR_PROTO_LO EQU 00000FFFFH ; LINEAR_ALIAS 

108 LINEAR_PROTO_HI EQU 000CF92 00H 
109 

110 ; Protection Enable Bit in CR0 

111 PE_BIT EQU IB 
112 

113 ; 

114 

115 ; DATA SEGMENT 

116 

117 ; Initially, this data segment starts at linear 0, due to 

118 ; CPU powerup state. 
119 

12 S T ARTU P_DAT A SEGMENT RW 
121 

122 free_mem_linear_base LABEL DWORD 

123 TEMP_GDT LABEL BYTE ; must be first in 
segment 

TEMP_GDT_NULL_DESC DESC <> 
TEMP_GDT_LINEAR_DESC DESC <> 

; scratch areas for LGDT and LIDT instructions 
TEMP_GDT_S CRATCH TABLE_REG <> 
APP_GDT_RAM TABLE_REG <> 

APP_IDT_RAM TABLE_REG <> 

; align end_data 
fill DW ? 

; last thing in this segment - should be on a dword boundary 
end_data LABEL BYTE 

STARTUP_DATA ENDS 



. C0DE SEGMENT 

STARTUP_CODE SEGMENT ER PUBLIC USE16 

; filled in by builder 

PUBLIC GDT_EPROM 
GDT_EPROM TABLE_REG <> 

; filled in by builder 
PUBLIC IDT_EPROM 

16-21 



124 
125 
126 
127 
128 
129 
130 
131 
132 
133 
134 
135 
136 
137 
138 
139 
140 
141 
142 
143 
144 
145 
146 
147 
148 
149 

I 



INITIALIZATION AND MODE SWITCHING 



my 



150 IDT_EPROM TABLE_REG <> 
151 

152 ; entry point into startup code - the bootstrap will vector 

153 ; here with a near JMP generated by the builder. This 

154 ; label must be in the top 64K of linear memory. 
155 

15 6 PUBLIC STARTUP 

157 STARTUP: 

158 

159 ; DS,ES address the bottom 64K of flat linear memory 

160 ASSUME DS : S TARTU P_D AT A , ES : S TARTU P_D AT A 

161 ; See Figure 16-4 

162 ; load GDTR with temporary GDT 

163 LEA EBX , TEMP_GDT ; build the TEMP_GDT in low ram, 

164 MOV DWORD PTR [EBX] , ; where we can address 

165 MOV DWORD PTR [EBX] +4,0 

166 MOV DWORD PTR [EBX] +8, LINEAR_PROTO_LO 

167 MOV DWORD PTR [EBX] +12, LINEAR_PROTO_HI 

168 MOV TEMP_GDT_s cratch . table_linear , EBX 

169 MOV TEMP_GDT_s cratch . table_lim, 15 
170 

171 DB 66H ; execute a 32 bit LGDT 

172 LGDT TEMP_GDT_s cratch 
173 

174 ; enter protected mode 

17 5 MOV EBX,CR0 

17 6 OR EBX,PE_BIT 
177 MOV CR0,EBX 
178 

179 ; clear prefetch queue 

18 JMP CLEAR_LABEL 
181 CLEAR_LABEL: 

182 

183 ; make DS and ES address 4G of linear memory 

184 MOV CX,LINEAR_SEL 

185 MOV DS,CX 

186 MOV ES,CX 
187 

188 ; do board specific initialization 
189 
190 
191 
192 
193 
194 

195 ; See Figure 16-5 

196 ; copy EPROM GDT to ram at: 

197 ; RAM_START + size ( STARTUP_DATA) 

198 MOV EAX , RAM_START 

199 ADD EAX, OFFSET (end_data) 

16-22 



INITIALIZATION AND MODE SWITCHING 



2 MOV EBX , RAM_START 

2 01 MOV ECX, CS_BASE 

2 02 ADD ECX, OFFSET (GDT_EPROM) 

203 MOV ESI , [ECX] . table_linear 

2 04 MOV EDI , EAX 

205 MOVZX ECX, [ECX] . table_lim 

206 MOV APP_GDT_ram [EBX] . table_l im, CX 
2 07 INC ECX 

2 08 MOV EDX , EAX 

209 ' MOV APP_GDT_ram [EBX] . table_linear , EAX 

210 ADD EAX, ECX 

211 REP MOVS BYTE PTR ES :[ EDI ], BYTE PTR DS:[ESI] 
212 

213 ; fixup GDT base in descriptor 

214 MOV ECX, EDX 

215 MOV [EDX] .bas_0_15+GDT_DESC_OFF,CX 

216 ROR ECX, 16 

217 MOV [EDX] .bas_16_2 3+GDT_DESC_OFF,CL 

218 MOV [EDX] .bas_24_31+GDT_DESC_OFF,CH 
219 

220 ; copy EPROM IDT to ram at: 

221 ; RAM_START+size (STARTUP_DATA) +SIZE (EPROM GDT) 

222 MOV ECX, CS_BASE 

223 ADD ECX, OFFSET (IDT_EPROM) 

224 MOV ESI, [ECX] . table_linear 
22 5 MOV EDI, EAX 



22 6 MOVZX ECX, [ECX] . table_lim 

APP_IDT_ram [ EBX ] . table_lim, CX 
ECX 

APP_IDT__ram [EBX] . table_linear , EAX 
EBX, EAX 
EAX, ECX 

BYTE PTR ES: [EDI] , BYTE PTR DS : [ESI] 

; fixup IDT pointer in GDT 
[EDX] .bas_0_15+IDT_DESC_OFF,BX 
EBX, 16 

[EDX] .bas_16_2 3+IDT_DESC_OFF,BL 
[EDX] .bas_2 4_31+IDT_DESC_OFF,BH 

; load GDTR and IDTR 
EBX, RAM_START 

DB 66H ; execute a 32 bit LGDT 

APP_GDT_ram [EBX] 

DB 66H ; execute a 32 bit LIDT 

APP_IDT_r am [ EBX ] 

; move the TSS 
EDI , EAX 

EBX , TSS_INDEX* S I ZE ( DESC ) 



227 
228 
229 
230 
231 
232 
233 
234 
235 
236 
237 
238 
239 
240 
241 
242 
243 
244 
245 
246 
247 
248 
249 



MOV 
INC 
MOV 
MOV 
ADD 
REP MOVS 



MOV 
ROR 
MOV 
MOV 



MOV 

LGDT 

LIDT 



MOV 
MOV 



16-23 



INITIALIZATION AND MODE SWITCHING 




Z DU 


MOV 


ECX, GDT_DESC__OFF ;dui1cL linear address tor TSS 


2 51 


MOV 


GS/CX 


ZdZ 


MOV 


DH,GS: [EBXJ .Das_z4_il 


z bo 


MOV 


DL,GS: LEBXJ .Das_lb_zJ 


z b4 


ROL 


EDX , 1 6 


255 


MOV 


DX,GS: [EBXJ . Das__U__lb 


o c c 
z bo 


MOV 


ESI , EDX 


OCT 

Zd/ 


LSL 


ECX, EBX 


2 58 


INC 


ECX 


2 59 


MOV 


T-ITW7 T"l "A XT 

EDX , EAX 




ADD 


EAX , ECX 


2 61 


•nTi"n If ATTH 

REP MOVS 


BYTE PTR ES : [EDI] , BYTE PTR DS : [ESI] 


O £T O 
ZDZ 






O C "3 




; fixup TSS pointer 


2 64 


MOV 


GS : [EBXJ . Das_0_lb , DX 


zo b 


ROL 


EDX, 16 


"~) c c 
ZOO 


MOV 


GS : [EBXJ . Das_z 4_3 1 , DH 


Zb I 


MOV 


GS: [EBX] .bas_16_23 / DL 


^ r q 
Z DO 


ROL 


EDX, 16 


2 69 


; save start 


of free ram at linear location RAMSTART 


Tin 

z / U 


MOV 


f ree_mem_linear_base+RAM_START , EAX 


Oil 
Z / -L 






O 1 o 
Z 1 Z 


; assume no 


LDT used in the initial task - if necessary, 


Z 1 3 


;code to move the LDT could be added, and should resemble 


21 A 


; that used 


to move the TSS 


275 






Z / D 


; load TR 




O 1 1 

Z 1 1 


LTR 


BX ; No task switch, only descriptor loading 


O ""7 O 
Z / O 


; See Figure 16-6 


**) "7 Q 

z / y 


; load minimal set of registers necessary to simulate task 


z o U 


; switch 




z o 1 






OQO 

z oz 






o o o 


MOV 


AX, [EDX] . SS__reg ; start loading registers 


Z o4 


MOV 


EDI, [EDX] .ESP_reg 


ZOJ 


MOV 


SS, AX 


zoo 


MOV 


ESP, EDI ; stack now valid 


287 


PUSH 


DWORD PTR [EDX] .EFLAGS_reg 


288 


PUSH 


DWORD PTR [EDX] .CS_reg 


z yy 


PUSH 


DWORD PTR [EDX] .EIP_reg 


290 


MOV 


AX, [EDX] .DS_reg 


291 


MOV 


BX, [EDX] .ES_reg 


2 92 


MOV 


DS,AX ; DS and ES no longer linear memory 


293 


MOV 


ES,BX 


294 






295 


; simulate far jump to initial task 


296 


IRETD 




297 






298 


STARTUP_CODE ENDS 



*** WARNING #377 IN 298, (PASS 2) SEGMENT CONTAINS PRIVILEGED 



16-24 



INITIALIZATION AND MODE SWITCHING 



INSTRUCTION (S) 
299 

3 00 END STARTUP, DS : STARTUP_DATA , SS : STARTUP_DATA 

301 

302 

ASSEMBLY COMPLETE, 1 WARNING, NO ERRORS. 



16.7.6. MAIN. ASM Source Code 

The file MAIN. ASM shown in Example 16-2 defines the data and stack segments for this 
application and can be substituted with the main module task written in a high-level language 
that is invoked by the IRET instruction executed by STARTUP.ASM. 

Example 16-2. MAIN. ASM 

NAME main_module 
data SEGMENT RW 

dw 1000 dup(?) 
DATA ENDS 

stack stackseg 800 

CODE SEGMENT ER use32 PUBLIC 
main_start : 

nop 

nop 

nop 

CODE ENDS 

END main_start , ds:data, ss: stack 



i 



16-25 



INITIALIZATION AND MODE SWITCHING 



START: [CSBASE + EIP] 



OFFFF FFFFH 



OFFFF 0000H 



- JUMP NEAR START 

- CONSTRUCT TEMP_GDT 

- LGDT 

- MOVE TO PROTECTED MODE 



DS, ES = GDT[1] 




GDT [1] 
GDT fa] 



BASE. 
LI M IT 



BA S E = Q , I I MIT=4G 



GDT SCRATCH 



TEMP GDT 



APM115 



Figure 16-4. Constructing Temp_GDT and Switching to Protected Mode 
(Lines 162-172 of List File) 



16-26 



intel 



INITIALIZATION AND MODE SWITCHING 



■ MOVE THE GDT, IDT, TSS FROM 
ROM TO RAM 



■ FIX ALIASES 

■ LTR 



TSS 



IDT 



GDT 



TSS RAM 



IDT RAM 



GDT RAM 



OFFFF FFFFH 



RAM_START 



Figure 16-5. Moving The GDT, IDT, and TSS from ROM to RAM 
(Lines 196-261 of List File) 



i 



16-27 



INITIALIZATION AND MODE SWITCHING 



intel 



SS = TSS.SS 
ESP = TSS.ESP 
PUSH TSS.EFLAG 
PUSH TSS.CS 
PUSH TSS.EIP 
ES = TSS.ES 
DS = TSS.DS 
IRET 




RAM START 



Figure 16-6. Task Switching 
(Lines 282-296 of List File) 



16.7.7. Supporting Files 

The batch file shown in Example 16-3 can be used to assemble the source code files 
STARTUP.ASM and MAIN. ASM and build the final application. 



Example 16-3. Batch File to Assemble, Compile and Build the Application 

ASM386 STARTUP . ASM 
ASM3 86 MAIN. ASM 

BLD386 STARTUP. OBJ, MAIN. OBJ buildf ile (EPROM . BLD) bootstrap ( STARTUP) 
Bootload 



16-28 



i 



INITIALIZATION AND MODE SWITCHING 



The BLD386 has several functions in this example: 

• It allocates physical memory location to segments and tables. 

• It generates tables using the build file and the input files. 

• It links object files and resolves references. 

• It generates bootloadable file to be programmed into the EPROM. 

Example 16-4 shows the build file used as input to BLD386 to perform the above functions. 



Example 16-4. Build File 



INIT_BLD_EXAMPLE ; 



SEGMENT 



* SEGMENTS 

startup . startup_code 



(DPL = 0) 

(BASE = 0FFFF0000H) 



TASK 



BOOT_TASK 



(OBJECT - startup, INITIAL, DPL = 0, 
NOT INTENABLED) 
(OBJECT = main_module, DPL = 0, 
NOT INTENABLED) 



PROTECTED_MODE_TASK 



TABLE 



GDT ( 



LOCATION 
ENTRY = ( 
10: 



GDT_EPROM 



PROTECTED_MODE_TASK 
startup . startup_code 
startup . startup_data 



main_module . data 
main_module . code 
mainjnodule . stack 



) , 



IDT ( 



LOCATION 
) ; 



IDT__EPROM 



I 



16-29 



INITIALIZATION AND MODE SWITCHING 



MEMORY 
( 

RESERVE = (0. . 3FFFH 

-- Area for the GDT, IDT, TSS copied from 

ROM 

60000H. . OFFFEFFFFH) 
RANGE = (ROM_AREA = ROM ( 0FFFF0000H . . OFFFFFFFFH) ) 

-- Eprom size 64K 
RANGE = (RAM_AREA = RAM ( 4000H . . 05FFFFH) ) 



END 



16-30 



intel 

17 

Debugging 



i 



Intel 



CHAPTER 17 
DEBUGGING 



The Pentium processor has advanced debugging facilities which are particularly important for 
sophisticated software systems, such as multitasking operating systems. The failure conditions 
for these software systems can be very complex and time-dependent. The debugging features 
of the Pentium processor give the system programmer valuable tools for looking at the 
dynamic state of the processor. 

The debugging support is accessed through the debug registers. The debug registers of the 
Pentium processor hold the addresses of memory and I/O locations, called breakpoints, which 
invoke debugging software (unlike the Intel386 and Intel486 processors which allowed 
debugging of memory accesses only). An exception is generated when a memory or I/O 
operation is made to one of these addresses. A breakpoint is specified for a particular form of 
memory or I/O access, such as an instruction fetch, doubleword memory write operation or a 
word I/O read operation. The debug registers support both instruction breakpoints and data 
breakpoints. 

With other processors, instruction breakpoints are set by replacing normal instructions with 
breakpoint instructions. When the breakpoint instruction is executed, the debugger is called. 
But with the debug registers of the Pentium processor, this is not necessary. By eliminating the 
need to write into the code space, the debugging process is simplified (there is no need shadow 
the ROM code space in RAM) and breakpoints can be set in ROM-based software. In addition, 
breakpoints can be set on reads and writes to data which allows real-time monitoring of 
variables. 

17.1. DEBUGGING SUPPORT 

The features of the architecture which support debugging include: 

• Reserved debug interrupt vector — Specifies a procedure or task to be called when an 
event for the debugger occurs. 

• Debug address registers — Specifies the addresses of up to four breakpoints. 

• Debug control register — Specifies the forms of memory or I/O access for the 
breakpoints. 

• Debug status register — Reports conditions which were in effect at the time of the 
exception. 

• Trap bit of TSS (T-bit) — Generates a debug exception when an attempt is made to 
perform a task switch to a task with this bit set in its TSS. 

• Resume flag (RF) — Suppresses multiple exceptions to the same instruction. 

• Trap flag (TF) — Generates a debug exception after every execution of an instruction. 

• Breakpoint instruction — Calls the debugger (generates a debug exception). This 
instruction is an alternative way to set code breakpoints. It is especially useful when more 



17-1 



DEBUGGING 



than four breakpoints are desired, or when breakpoints are being placed in the source 
code. 

• Reserved interrupt vector for breakpoint exception — Calls a procedure or task when a 
breakpoint instruction is executed. 

These features allow a debugger to be called either as a separate task or as a procedure in the 
context of the current task. The following conditions can be used to call the debugger: 

• Task switch to a specific task. 

• Execution of the breakpoint instruction. 

• Execution of any instruction. 

• Execution of an instruction at a specified address. 

• Read or write of a byte, word, or doubleword at a specified memory address. 

• Write to a byte, word, or doubleword at a specified memory address. 

• Input of a byte or word at a specified I/O address. 

• Output of a byte, word, or doubleword at a specified I/O address. 

• Attempt to change the contents of a debug register. 

17.2. DEBUG REGISTERS 

Six registers control debugging. These registers are accessed by forms of the MOV instruction. 
A debug register may be the source or destination operand for one of these instructions. The 
debug registers are privileged resources; the MOV instructions which access them may be 
executed only at privilege level 0. An attempt to read or write the debug registers from any 
other privilege level generates a general-protection exception. Figure 17-1 shows the format 
of the debug registers. 



17-2 



DEBUGGING 



DEBUG REGISTERS 

131 30/29 28/27 26/25 24/23 22/21 20 /19 18 /17 16 /15 14/13/12 11 10 / 9 /8 {7 /ff /d U /3 /2/l {o i 



L 


R 


L 


R 


L 


R 


L 


R 


E 


/ 


E 


/ 


E 


/ 


E 


/ 


N 


W 


N 


W 


N 


W 


N 


W 


JL, 


3 


JL 


.2 


, 1 


JL. 


9 






1 1 1 1 1 1 1 t 1 1 1 1 1 t 1 1 











111111111 



RESERVED 



RESERVED 



BREAKPOINT 3 LINEAR ADDRESS 



BREAKPOINT 2 LINEAR ADDRESS 



BREAKPOINT 1 LINEAR ADDRESS 



BREAKPOINT LINEAR ADDRESS 



□ 



RESERVED BITS. DO NOT USE 



DR7 



DR6 



DR5 



DR4 



DR3 



DR2 



DR1 



DRO 



Figure 17-1. Debug Registers 



1 7.2.1 . Debug Address Registers (DR0-DR3) 

Each of these registers holds the linear address for one of the four breakpoints. That is, 
breakpoint comparisons are made before physical address translation occurs. Each breakpoint 
condition is specified further by the contents of the DR7 register. 



17.2.2. Debug Control Register (DR7) 

The debug control register shown in Figure 17-1 specifies the type of memory or I/O access 
associated with each breakpoint. Each address in registers DRO to DR3 corresponds to a field 
R/WO to R/W3 in the DR7 register. The DE (Debug Extensions) bit in the CR4 register 
determines how the R/W bits are interpreted. When the DE bit is set, the processor interprets 



i 



17-3 



DEBUGGING 



these bits as follows: 

00 — Break on instruction execution only 

01 — Break on data writes only 

10 — Break on I/O reads or writes 

11 — Break on data reads or writes but not instruction fetches 

When the DE bit is clear, the Pentium processor interprets the R/W bits the same as the 
Intel486 and Intel386 processors, which is as follows: 

00 — Break on instruction execution only 

01 — Break on data writes only 

10 — undefined 

1 1 — Break on data reads or writes but not instruction fetches 

The LEN0 to LEN3 fields in the DR7 register specify the size of the breakpointed location. A 
size of 1, 2, or 4 bytes may be specified. The length fields are interpreted as follows: 

00 — one-byte length 

01 — two-byte length 

10 — undefined 

11 — four-byte length 

If RWn is 00 (instruction execution), then LENn should also be 00. The effect of using any 
other length is undefined. 

The GD bit enables the debug register protection condition that is flagged by BD of DR6. Note 
that GD is cleared at entry to the debug exception handler by the processor. This allows the 
handler free access to the debug registers. 

The low eight bits of the DR7 register (fields L0 to L3 and GO to G3) individually enable the 
four address breakpoint conditions. There are two levels of enabling: the local (L0 through L3) 
and global (GO through G3) levels. The local enable bits are automatically cleared by the 
processor with every task switch to avoid unwanted breakpoint conditions in the new task. 
They are used to set breakpoint conditions in a single task. The global enable bits are not 
cleared by a task switch. They are used to enable breakpoint conditions which apply to all 
tasks. 



17.2.3. Debug Status Register (DR6) 

The debug status register shown in Figure 11-1 reports conditions sampled at the time the 
debug exception was generated. Among other information, it reports which breakpoint 
triggered the exception. Update only occurs if the exception is taken, then all bits are updated. 

When an enabled breakpoint generates a debug exception, it loads the low four bits of this 
register (B0 through B3) before entering the debug exception handler. The B bit is set if the 
condition described by the DR, LEN, and R/W bits is true, even if the breakpoint is not 
enabled by the L and G bits. The processor sets the B bits for all breakpoints which match the 
conditions present at the time the debug exception is generated, whether or not they are 
enabled. 



17-4 



DEBUGGING 



The BT bit is associated with the T bit (debug trap bit) of the TSS (see Chapter 10 for the 
format of a TSS). The processor sets the BT bit before entering the debug handler if a task 
switch has occurred to a task with a set T bit in its TSS. There is no bit in the DR7 register to 
enable or disable this exception; the T bit of the TSS is the only enabling bit. 

The BS bit is associated with the TF flag. The BS bit is set if the debug exception was 
triggered by the single- step execution mode (TF flag set). The single- step mode is the highest- 
priority debug exception; when the BS bit is set, any of the other debug status bits also may be 
set. 

The BD bit is set if the next instruction will read or write one of the eight debug registers while 
they are being used by in-circuit emulation if the GD bit in DR7 is set to one. 

Note that the contents of the DR6 register are never cleared by the processor. To avoid any 
confusion in identifying debug exceptions, the debug handler should clear the register before 
returning. 



17.2.4. Debug Registers DR4 and DR5 

Although debug registers 4 and 5 have been documented as reserved, previous generations of 
processors aliased these registers to debug registers 6 and 7, respectively. When debug 
extensions are not enabled (CR4.DE=0), the Pentium processor remains compatible with 
existing software by aliasing these references. However, when debug extensions are enabled 
(CR4.DE=1), attempts to reference debug registers 4 or 5 will result in an Undefined Opcode 
Exception (#UD). 



17.2.5. Breakpoint Field Recognition 

The address and LEN bits for each of the four breakpoint conditions define a range of 
sequential byte addresses for a data or I/O breakpoint. The LEN bits permit specification of a 
one-, two-, or four-byte range. Two-byte ranges must be aligned on word boundaries 
(addresses which are multiples of two) and four-byte ranges must be aligned on doubleword 
boundaries (addresses which are multiples of four). I/O breakpoints must be aligned on 
doubleword boundaries and may only be one or two bytes. I/O breakpoint addresses are zero 
extended from 16 to 32 bits for purposes of comparison with the breakpoint address in the 
selected debug register. These requirements are enforced by the processor; it uses the LEN 
bits to mask the lower address bits in the debug registers. Unaligned data or I/O breakpoint 
addresses do not yield the expected results. 

A data breakpoint for reading or writing is triggered if any of the bytes participating in an 
access is within the range defined by a breakpoint address register and its LEN bits. Table 17-1 
gives some examples of combinations of addresses and fields with references which do and do 
not cause traps. 



I 



17-5 



DEBUGGING 




Table 17-1. Breakpointing Examples 



Operation 


Address (hex) 


Length (in bytes) 


Register Contents 


DRO 


A0001 


1 (LLNO = 00) 


Register Contents 


DR1 


A0002 


I (LbNl = 00) 


Register Contents 


DR2 


B0002 


2(LEN2 = 01) 


Register Contents 


DR3 


COOOO 


4(LEN3 = 11) 






AUUU 1 


A 
\ 






AUUU<1 


A 
1 






A0001 


2 






A0002 


2 


Data Operations Which Trap 


B0002 


2 






B0001 


4 






COOOO 


4 






C0001 


2 






C0003 


1 






A0000 


1 






A0003 


4 


Data Operations Which Do Not Trap 


B0000 


2 






C0004 


4 



A data breakpoint for an unaligned operand can be made from two sets of entries in the 
breakpoint registers where each entry is byte-aligned, and the two entries together cover the 
operand. This breakpoint generates exceptions only for the operand, not for any neighboring 
bytes. 

Instruction breakpoint addresses must have a length specification of one byte (LEN = 00); the 
behavior of code breakpoints for other operand sizes is undefined. The processor recognizes an 
instruction breakpoint address only when it points to the first byte of an instruction. If the 
instruction has any prefixes, the breakpoint address must point to the first prefix. 

It is recommended that debuggers execute the LGDT instruction before returning to the 
program being debugged to ensure that breakpoints are detected. 



17.3. DEBUG EXCEPTIONS 

Two of the interrupt vectors of the Pentium processor are reserved for debug exceptions. The 
debug exception is the usual way to invoke debuggers designed for the Pentium processor. 



17.3.1. Interrupt 1 — Debug Exceptions 

The handler for this exception usually is a debugger or part of a debugging system. The 
processor generates a debug exception for any of several conditions. The debugger can check 
flags in the DR6 and DR7 registers to determine which condition caused the exception and 
which other conditions also might apply. Table 17-2 shows the states of these bits for each 
kind of breakpoint condition. 



17-6 



DEBUGGING 



Instruction breakpoints are faults; other debug exceptions are traps. The debug exception may 
report either or both at one time. The following sections present details for each class of debug 
exception. 



Table 17-2. Debug Exception Conditions 



Flags Tested 


Description 


BS = 1 

BO = 1 and (GEO = 1 or LEO = 1) 
B1 =1 and (GE1 =1 or LE1 =1) 
B2 = 1 and (GE2 = 1 or LE2 = 1) 
B3 = 1 and (GE3 = 1 or LE3 = 1) 
BD = 1 
BT = 1 


Single-step trap 

Breakpoint defined by DRO, LENO, and R/WO 
Breakpoint defined by DR1 , LEN1 , and R/W1 
Breakpoint defined by DR2, LEN2, and R/W2 
Breakpoint defined by DR3, LEN3, and R/W3 
Debug registers in use for in-circuit emulation 
Task switch 



1 7.3.1 .1 . INSTRUCTION-BREAKPOINT FAULT 

The processor reports an instruction breakpoint before it executes the breakpointed instruction 
(i.e., a debug exception caused by an instruction breakpoint is a fault). 

The RF flag permits the debug exception handler to restart instructions which cause faults 
other than debug faults. When a debug fault occurs, the system software must set the RF bit in 
the copy of the EFLAGS register which is pushed on the stack in the debug exception handler 
routine. This bit is set in preparation for resuming the program's execution at the breakpoint 
address without generating another breakpoint fault on the same instruction. (Note: The RF bit 
does not cause breakpoint traps to be ignored, nor other kinds of faults.) The RF flag is set by 
the IRETD instruction (but not by POPF or POPFD) to the value specified by the saved copy 
of the EFLAGS register in order to disable the generation of a code breakpoint exception on 
the instruction immediately following the IRETD. 

The processor clears the RF flag at the successful completion of every instruction except after 
the IRET instruction and JMP, CALL, or INT instructions which cause a task switch. 

The processor does not set the RF flag in the copy of the EFLAGS register pushed on the stack 
before entry into any fault handler. When the fault handler is entered for instruction 
breakpoints, for example, the debug handler should set the RF flag in the copy of the EFLAGS 
register pushed on the stack; so that when the IRET instruction is executed, returning control 
from the exception handler, the RF flag in the EFLAGS register will be set, and execution will 
resume at the breakpointed instruction without generating another breakpoint for the 
same instruction. 

Code breakpoints are the highest priority faults and are therefore guaranteed to be serviced 
before any other faults which may be detected during the decoding or execution of an 
instruction. If after a debug fault, the RF flag is set and the debug handler retries the faulting 
instruction, it is possible that retrying the instruction will generate other faults. The restart of 
the instruction after these faults also occurs with the RF flag set, so repeated debug faults 
continue to be suppressed. The processor clears the RF flag only after successful completion of 
the instruction. 



i 



17-7 



DEBUGGING 



1 7.3.1 .2. DATA MEMORY AND I/O BREAKPOINTS 

Data memory and I/O breakpoint exceptions are traps; i.e., the processor generates an 
exception for a breakpoint after executing the instruction which accesses the breakpointed 
memory or I/O location. 

Because data breakpoints are traps, the original data is overwritten before the trap exception is 
generated. If a debugger needs to save the contents of a write breakpoint location, it should 
save the original contents before setting the breakpoint. The handler can report the saved 
value after the breakpoint is triggered. The data in the debug registers can be used to address 
the new value stored by the instruction which triggered the breakpoint. 

The Pentium processor, like the Intel486 processor, ignores the GE and LE bits in DR7. If any 
of the Ln/Gn bits is set (or single stepping is enabled), instruction pairing is inhibited and the 
Pentium processor slows execution so that most breakpoints are reported exactly on the 
instruction that generated them. In the Intel386 DX processor, exact data breakpoint matching 
does not occur unless it is enabled by setting either the LE or the GE bit. 

The Pentium processor, however, is unable to report data breakpoints exactly for the REP 
MOVS and REP STOS instructions until the completion of the iteration after the one in which 
the breakpoint occurs in order to be able to execute the load, store, updates to ESI, EDI and 
ECX and the check for completion on each iteration of these REPeated instructions in a single 
clock. 

Repeated INS and OUTS instructions that generate an I/O breakpoint debug exception, trap 
after the completion of the first iteration. Repeated INS and OUTS instructions that generate a 
memory breakpoint debug exception trap after the iteration in which the memory address 
breakpoint location is accessed. 

17.3.1.3. GENERAL-DETECT FAULT 

The general-detect fault occurs when an attempt is made to use the debug registers at the same 
time they are being used by in-circuit emulation when the GD bit in DR7 is set to one. This 
additional protection feature guarantees that emulators can have full control over the debug 
registers when required. The exception handler can detect this condition by checking the state 
of the BD bit of the DR6 register. 

17.3.1.4. SINGLE-STEP TRAP 

This trap occurs if the TF flag was set before the instruction was executed. Note that the 
exception does not occur after an instruction which sets the TF flag. For example, if the POPF 
instruction is used to set the TF flag, a single-step trap does not occur until after the instruction 
following the POPF instruction. 

The processor clears the TF flag before calling the exception handler. If the TF flag was set in 
a TSS at the time of a task switch, the exception occurs after the first instruction is executed in 
the new task. 

The single- step flag normally is not cleared by privilege changes inside a task. The INT 
instructions, however, do clear the TF flag. Therefore, software debuggers which single-step 
code must recognize and emulate INT n or INTO instructions rather than executing them 

17-8 ■ 



DEBUGGING 



directly. To maintain protection, the operating system should check the current execution 
privilege level after any single-step trap to see if single stepping should continue at the current 
privilege level. 

The interrupt priorities guarantee that, if an external interrupt occurs, single stepping stops. 
When both an external interrupt and a single step interrupt occur together, the single step 
interrupt is processed first. This clears the TF flag. After saving the return address or switching 
tasks, the external interrupt input is examined before the first instruction of the single step 
handler executes. If the external interrupt is still pending, then it is serviced. The external 
interrupt handler does not run in single-step mode. To single step an interrupt handler, single 
step an INTn instruction which calls the interrupt handler. 

1 7.3.1 .5. TASK-SWITCH TRAP 

The debug exception also occurs after a task switch if the T bit of the new task's TSS is set. 
The exception occurs after control has passed to the new task, but before the first instruction of 
that task is executed. The exception handler can detect this condition by examining the BT bit 
of the DR6 register. 

Note that, if the debug exception handler is a task, the T bit of its TSS should not be set. 
Failure to observe this rule will put the processor in a loop. 



1 7.3.2. Interrupt 3 — Breakpoint Instruction 

The breakpoint trap is caused by execution of the INT 3 instruction. Typically, a debugger 
prepares a breakpoint by replacing the first opcode byte of an instruction with the opcode for 
the breakpoint instruction. When execution of the INT 3 instruction calls the exception 
handler, the return address points to the first byte of the instruction following the INT 3 
instruction. 

With older processors, this feature is used extensively for setting instruction breakpoints. With 
the Pentium, Intel486, and Intel386 processors, this use is more easily handled using the debug 
registers. However, the breakpoint exception still is useful for breakpointing debuggers, 
because the breakpoint exception can call another exception handler. The breakpoint exception 
also can be useful when it is necessary to set a greater number of breakpoints than permitted by 
the debug registers, or when breakpoints are being placed in the source code of a program 
under development. 



i 



17-9 



Caching, Pipelining 
and Buffering 



i 



intel 



CHAPTER 18 

CACHING, PIPELINING AND BUFFERING 



The Pentium processor has many features that work together to yield extremely high 
performance — features such as caches, buffers, and pipelining. In general, these features 
work behind the scenes; that is, programs automatically run faster without having to explicitly 
take these performance features into account. In spite of this transparent implementation, some 
programmers may wish to take maximum advantage of these features. This chapter provides 
the information necessary to do so. It also documents the few cases in which systems 
programmers must explicitly take these performance features into account. The features 
discussed are: 

• Internal instruction and data caches. 

• Address translation caches. 

• Prefetch queues. 

• Write buffers. 

• Execution pipelining. 



18.1. INTERNAL INSTRUCTION AND DATA CACHES 

The Pentium microprocessor has separate data and instruction caches on-chip. Caches raise 
system performance by satisfying an internal read request more quickly than a bus cycle to 
memory. They also reduce the processor's use of the external bus when the same locations are 
accessed multiple times. Having separate caches for instructions and data allows simultaneous 
cache look-up. Up to two data references and up to 32 bytes of raw opcodes can be accessed in 
one clock. The caches are fully transparent to applications software. 

Caching is available in all execution modes: real mode, protected mode, and virtual-8086 
mode. For a properly designed, single-processor system, the caching does not require further 
control once it is enabled during system initialization. 

The data and instruction caches hold 8K bytes each. The cache line width of the Pentium CPU 
is 256 bits or 32 bytes. A line can be filled from memory with a four-transfer burst cycle. 
External caches are not likely to use cache lines smaller than those of the internal cache. 

Cache lines can only be mapped to 32-byte aligned blocks of main memory. (A 32-byte 
aligned block begins at an address which is clear in its low-order five bits.) The caches do not 
support partially-filled cache lines, so caching even a single doubleword requires caching an 
entire line. 

The processor allows any area of memory to be cached, although both software and hardware 
can disallow certain areas from being cached — software by setting the PCD bit in the 
respective page table entries; hardware by deasserting the KEN# signal for bus cycles with 
addresses that fall within those areas. When both software and hardware agree that a requested 
datum is cacheable, the processor reads an entire 32-byte line into the appropriate cache. This 
operation is called a cache line fill. Cache line fills are generated only for read misses, not for 

■ 18-1 



CACHING, PIPELINING AND BUFFERING 



write misses. A store that misses the cache does not copy the missed line into cache from 
memory, but rather posts the datum in a write buffer, then sends it to the external bus when the 
bus is available. 

The CPU can use an external second-level cache outside of the processor chip. An external 
cache can improve performance by providing a larger cache or wider line, or by allowing the 
processor bus to run faster than the memory bus. 

Caches require special consideration in multiprocessor systems. When one processor accesses 
data cached in another processor, it must not receive incorrect data. If it modifies data, all other 
processors which access that data must receive the modified data. This property is called cache 
consistency. The CPU provides mechanisms which maintain cache consistency in the presence 
of multiple processors and external caches. 

The operation of internal and external caches is transparent to application software, but 
knowledge of the behavior of these caches may be useful in optimizing software performance. 
For example, knowledge of cache dimensions and replacement algorithms are an indication of 
how large of a data structure can be operated on at once without causing cache thrashing. In 
multiprocessor systems, maintenance of cache consistency may, in rare circumstances, require 
intervention by system software. For these rare cases, the Pentium microprocessor provides 
privileged cache control operations. 



18.1.1. Data Cache 

In the data cache, a cache protocol known as MESI maintains consistency with caches of other 
processors and with an external cache. The data cache has two status bits per tag; so, each line 
can be in one of the states defined in Table 18-1. The state of a cache line can change as the 
result of either internal or external activity related to that line. In general, the operation of the 
MESI protocol is transparent to programs. 



Table 18-1. MESI Cache Line States 



Cache Line State: 


M 

Modified 


E 

Exclusive 


S 

Shared 


Invalid 


This cache line is valid? 


Yes 


Yes 


Yes 


No 


The memory copy is... 


...out of date 


...valid 


...valid 




Copies exist in caches of other 
processors? 


No 


No 


Maybe 


Maybe 


A write to this line ... 


...does not go 
to bus 


...does not go 
to bus 


...goes to bus and 
updates cache 


...goes directly 
to bus 



18.1.2. Data Cache Update Policies 

A cache adheres to an update policy to determine when a write operation must update main 
memory. (The update policy does not affect read operations.) The update policies supported by 
the Pentium microprocessor data cache are: 

• Write-through — a write request to a line in the cache triggers updates to both cache 
18-2 



i 



CACHING, PIPELINING AND BUFFERING 



memory and main memory. Write-through is useful for applications such as a graphics 
frame buffer, where writes must update memory so that they can be seen on the graphics 
display. 

• Write-back — a write request to a line in the cache updates only the cache memory. The 
write-back policy reduces bus traffic by eliminating many unnecessary writes to memory. 
Writes to a line in the cache are not immediately forwarded to main memory; instead, they 
are accumulated in the cache. The modified cache line is written to main memory later, 
when a write-back operation is performed. Write-back operations are triggered when 
cache lines need to be deallocated, such as when new cache lines are being allocated in a 
cache which is already full. Write-back operations also are triggered by the mechanisms 
used to maintain cache consistency. 

The processor allows any area of memory to be subject to either policy. Both software and 
hardware have control over which policy is employed — software through the PWT bit of 
page table entries; hardware through the WB/WT# signal. 

The internal caches of the Pentium microprocessor can be used with external caches which are 
write-through, write-back, or a mixture of both. 

1 8.1 .3. Instruction Cache 

The instruction cache implements only the "SI" part of the MESI protocol, because the 
instruction cache is not writable. 

The instruction cache monitors changes in the data cache to maintain consistency between the 
caches when instructions are modified. For more information, refer to Section 18.2.3. 



18.2. OPERATION OF THE INTERNAL CACHES 

Software controls the operating mode of the caches by setting or clearing the CD and NW bits 
of CRO. These bits after RESET are set to one (cache disabled). Software can leave caching 
disabled, or software can enable caching by updating the CD bit and NW bits of CRO. 



1 8.2.1 . Cache Control Bits 

Table 18-2 summarizes the modes controlled by the CD and NW bits of CRO. For normal 
operation and highest performance, these bits should be set to zero. To completely disable the 
cache, the following two steps must be performed: 

1 . CD and NW must be set to 1 . 

2. The caches must be flushed 

If the cache is not flushed, cache hits on reads will still occur and data will be read from the 
cache. In addition, the cache must be flushed after being disabled to prevent any 
inconsistencies with main memory. 



18-3 



CACHING, PIPELINING AND BUFFERING 




Table 18-2. Cache Operating Modes 



CD 


NW 


Purpose/Description 








Normal highest performance cache operation. 

Read hits access the cache. 

Read misses may cause replacement. 

Write hits update the cache. 

Only writes to shared lines and write misses appear externally. 
Write hits can change shared lines to exclusive under control of 
WB/WT#. 

Invalidation is allowed. 





1 


Invalid setting. 

A general-protection exception with an error code of zero is generated. 


1 





Cache disabled. Memory consistency maintained. Existing 
contents locked in cache. 

Read hits access the cache. 

Read misses do not cause replacement. 

Write hits update cache. 

Only write hits to shared lines and write misses update memory. 
Write hits can change shared lines to exclusive under control of 

\A/D AA/T-H 

Invalidation is allowed. 


1 


1 


Cache disabled. Memory consistency not maintained. 

Read hits access the cache. 
Read misses do not cause replacement. 
Write hits update cache but not memory. 
Write hits change exclusive lines to modified. 
Shared lines remain shared after write hit. 
Write misses access memory. 
Invalidation is inhibited. 



18.2.2. Cache Management Instructions 

The INVD and WBINVD instructions are used to invalidate the contents of the internal and 
external caches. The INVD instruction invalidates all internal (data and instruction) cache 
entries and generates a special bus cycle which indicates that external caches also should be 
invalidated. (The response of external hardware to receiving a cache invalidation bus cycle is 
dependent on system implementation.) INVD should be used with care. It does not write back 
modified cache lines; therefore, it can cause the data cache to become inconsistent with other 
memories in the system. Unless there is a specific requirement or benefit to invalidate a cache 
without writing back the modified lines (i.e., testing or fault recovery where cache coherency 
with main memory is not a concern), software should use the WBINVD instruction. 

The WBINVD instruction first writes back any modified lines in the data cache, then 
invalidates the contents of its instruction and data caches. It ensures that cache coherency with 
main memory will be maintained regardless of system configuration (i.e., write-through or 
write-back). Following this, it generates special bus cycles to indicate that external caches 
should also write back modified data and invalidate their contents. 



18-4 



CACHING, PIPELINING AND BUFFERING 



18.2.3. Self-Modifying Code 

Unlike the Intel486 microprocessor, the Pentium microprocessor has separate caches for data 
and instructions. In spite of this difference in implementation, the Pentium microprocessor 
supports updates to instructions in a manner that is completely compatible with the Intel486 
microprocessor. 

A write to an instruction that is in the instruction cache causes the instruction to be invalidated 
in the instruction cache. This check is based on the physical address of the instruction. In 
addition, the Pentium microprocessor checks whether a write may modify an instruction that 
has been prefetched for execution; if so, it invalidates the prefetch queue. This check is based 
on the linear address of the instruction. 

Because the linear address of the write is checked against the linear address of the instructions 
that have been prefetched, special care must be taken for self-modifying code to work correctly 
when the physical addresses of the instruction and the written data are the same, but the linear 
addresses differ. In such cases, it is necessary to execute a serializing operation after the write 
and before executing the modified instruction. See the section on serializing operations below 
for more information. (Note that the check on linear addresses described above is not in 
practice a concern for compatibility. Applications that include self-modifying code use the 
same linear address for modifying and fetching the instruction. Systems software, such as a 
debugger, that might possibly modify an instruction using a different linear address than that 
used to fetch the instruction, will execute a serializing operation, such as IRET, before the 
modified instruction is executed.) 



18.3. PAGE-LEVEL CACHE MANAGEMENT 

When paging is enabled, two bits in entries of the page directory and second-level page tables 
are used to manage the caching of pages and to drive processor output pins. (These bits are 
reserved on Intel386 processors.) 

The PCD and PWT bits control caching on a page-by-page basis. The PCD bit (page-level 
cache disable) affects the operation of the internal cache. Both the PCD bit and the PWT bit 
(page-level write-through) drive processor output pins (called PCD and PWT) for controlling 
external caches. The treatment of these signals by external hardware depends on system 
design; for example, some hardware systems may control the caching of pages by decoding 
some of the high address bits. 

There are three potential sources of the bits used to drive the PCD and PWT outputs of the 
processor: the CR3 register, the page directory, and the second-level page tables. The 
processor outputs are driven by the CR3 register for bus cycles where paging is not used to 
generate the address, such as the loading of an entry in the page directory. The outputs are 
driven by a page directory entry when an entry from a second-level page table is accessed. The 
outputs are driven by a second-level page table entry when instructions or data in memory are 
accessed. When paging is disabled, these bits are ignored (that is, the CPU assumes PCD=1 
and PWT=1). See Chapter 9 for descriptions of the PCD and PWT bits in CR3. 



18-5 



CACHING, PIPELINING AND BUFFERING 



18.3.1. PCD Bit 

When the PCD bit of a page table entry is set, caching of data from the page is disabled, even 
if hardware requests caching by asserting the KEN# input. When the PCD bit is clear, caching 
may be requested by hardware on a cycle-by-cycle basis. 

The ability to disable caching is useful for pages which contain memory-mapped I/O ports and 
for pages which do not provide a performance benefit when cached, such as initialization 
software. 

Regardless of the page-table entries, the processor ignores the PCD output (i.e. assumes 
PCD=1) whenever the CD (Cache Disable) bit in CRO is set. 



18.3.2. PWTBit 

When a page table entry has a set PWT bit (bit position 3), a write-through caching policy is 
specified for data in the corresponding page. Clearing the PWT bit on the Pentium 
microprocessor enables a write-back policy for the page. External caches can also use the 
output signal driven by the PWT bit to control update policy on a page-by-page basis. 

18.4. ADDRESS TRANSLATION CACHES 

Refer to Chapter 1 1 for information on the address translation caches (TLBs). 

18.5. CACHE REPLACEMENT ALGORITHM 

The data, instruction caches use a least-recently-used (LRU) algorithm to choose which line of 
a set is overwritten when a miss causes a line fill and all lines in the set contain valid data. The 
address-translation cache uses a psuedo-LRU algorithm. These algorithms are controlled by 
LRU bits in the tags of each cache. The states of the valid bits take precedence over the LRU 
bits. If any of the lines in the set is invalid, an invalid line is used for the line fill, and the LRU 
bits are not used. RESET initializes the valid bits so that two Pentium CPUs executing the 
same code on identical boards have exactly the same series of cache hits, misses, and 
replacements. 



18.6. EXECUTION PIPELINING AND PAIRING 

The Pentium processor achieves approximately two times the integer execution speed of the 
Intel486 microprocessor through a superscalar architecture capable of executing two 
instructions in parallel. Two pipelines operate in parallel allowing integer instructions to 
execute in a single clock in each pipeline. The allocation of instructions to a pipeline is 
performed automatically by the processor. The processor preserves the appearance of strict 
sequential execution even in the presence of interrupts and exceptions. 



18-6 



CACHING, PIPELINING AND BUFFERING 



Refer to Appendix H for information about how to optimize programs to exploit the 
performance potential of the Pentium processor. 



18.7. WRITE BUFFERS 

The Pentium processor utilizes write buffers for memory operands and for each pipeline. Write 
buffers improve performance by allowing the processor to proceed with the next pair of 
instructions even though one of the current instructions writes to memory when the bus is 
busy. The write buffers can be filled in parallel when intructions in both pipes write to memory 
during the same clock; however, they are always emptied in the same sequence in which the 
write requests were generated by software. 

In general, the existence of these buffers is transparent to programmers. The Pentium processor 
ensures that memory read operations are never reordered ahead of prior pending write 
operations; however, for compatibility with future processors, programmers should follow the 
ordering guidelines presented in Chapter 19. Refer also to Chapter 15 for information about the 
interaction of I/O instructions with the memory write buffers. 



18.8. SERIALIZING INSTRUCTIONS 

After executing certain instructions the Pentium processor serializes instruction execution. This 
means that any modifications to flags, registers, and memory for previous instructions are 
completed before the next instruction is fetched and executed. For example, when a new value 
is loaded into CRO to enable protected mode, the processor always fetches the next instruction 
with protection enabled. 

When the processor serializes instruction execution, it ensures that it has completed any 
modifications to memory, including flushing any internally buffered stores; it then waits for 
the EWBE# pin to go active before fetching and executing the next instruction. Pentium 
processor systems can use the EWBE# pin to indicate that a store is pending externally. In this 
manner, a system designer can ensure that all externally pending stores are complete before the 
processor begins to fetch and execute the next instruction. 

The processor serializes instruction execution after executing any of the following instructions: 



CPUID LGDT MOV to Debug Register 

INVD LIDT MOV to Control Register 

INVLPG LLDT RSM 

IRET LTR WBINVD 

IRETD WRMSR 



The CPUID instruction can be executed at any privilege level to serialize instruction execution. 



18-7 



CACHING, PIPELINING AND BUFFERING 



With regard to serialization, note that: 

1. The Pentium processor does not generally write back the contents of modified data in its 
data cache to external memory when it serializes instruction execution. Software can force 
modified data to be written back by executing the instruction WBINVD. 

2. Whenever an instruction is executed to enable/disable paging (that is, change the PG bit of 
CRO), this instruction must be followed with a jump. The instruction at the target of the 
branch is fetched with the new value of PG (i.e., paging enabled/disabled), however, the 
jump instruction itself is fetched with the previous value of PG. Intel386, Intel486 and 
Pentium processors have slightly different requirements in this regard. See Chapter 23 for 
more information. In all other respects a MOV to CRO that changes PG is serializing. Any 
MOV to CRO that does not change PG is completely serializing. 

3. Whenever an instruction is executed to change the contents of CR3 while paging is 
enabled, the next; instruction is fetched using the translation tables that correspond to the 
new value of CR3. Therefore the MOV to CR3 instruction and the sequentially following 
instructions should be located on a page whose linear address is mapped to the same 
physical address by both the old and new values of CR3. 

4. The Pentium processor implements branch-prediction techniques to improve performance 
by prefetching the destination of a branch instruction before the branch instruction is 
executed. Consequently, instruction execution is not generally serialized when a branch 
instruction is executed. 

5. The I/O instructions are not completely serializing; the processor does not wait for these 
instructions to complete before it prefetches the next instruction. However, they do have 
some serializing properties that cause them to function in a manner that is compatible with 
processor generations prior to the Pentium processor. Refer to Chapter 15 for more 
information. 



18-8 



inteJ 

19 

Multiprocessing 



i 



CHAPTER 19 
MULTIPROCESSING 



The Pentium processor supports multiprocessing both on the processor bus and on a memory 
bus via secondary cache units. Due to the high bandwidth demands of multiprocessor systems 
Intel recommends the use of secondary cache. 

Multiprocessors can increase particular aspects of system performance. For example, a 
computer graphics system may use an i860 CPU for fast rendering of raster images, while a 
Pentium processor is used to support a standard operating system, such as UNIX, IBM OS/2, 
or Microsoft Windows. Or alternatively, multiple Pentium microprocessors can be used in a 
symmetric system architecture with an operating system such as multiprocessor UNIX. 
Multiprocessing systems are sensitive to the following design issues: 

• Maintaining cache consistency — When one processor accesses data cached in another 
processor, it must not receive incorrect data. If it modifies data, all other processors which 
access that data must receive the modified data. 

• Reliable communication — Processors need to be able to communicate with each other in a 
way which eliminates interference when more than one processor simultaneously accesses 
the same area in memory. 

• Write ordering — In some circumstances, it is important that memory writes be observed 
externally in precisely the same order as programmed. 

Cache consistency is discussed in Chapter 18. Reliable communication and write ordering are 
discussed in the following sections. 



19.1. LOCKED BUS CYCLES 

While the system architecture of multiprocessor systems varies greatly, they generally have a 
need for reliable communication with memory. A processor in the act of updating the 
Accessed bit of a segment descriptor, for example, should reject other attempts to update the 
descriptor until the operation is complete. 

It also is necessary to have reliable communication with other processors. Bus masters need to 
exchange data in a reliable way. For example, a bit in memory may be shared by several bus 
masters for use as a signal that some resource, such as a peripheral device, is idle. A bus master 
may test this bit, see that the resource is free, and change the state of the bit. The state would 
indicate to other potential bus masters that the resource is in use. A problem could arise if 
another bus master reads the bit between the time the first bus master reads the bit and the time 
the state of the bit is changed. This condition would indicate to both potential bus masters that 
the resource is free. They may interfere with each other as they both attempt to use the 
resource. The processor prevents this problem through support of locked bus cycles; requests 
for control of the bus are ignored during locked cycles. 

The Pentium processor protects the integrity of certain critical memory operations by asserting 
an output signal called LOCO. It is the responsibility of the hardware designer to use these 



19-1 



MULTIPROCESSING 



signals to control memory access among processors. 

The processor automatically asserts one of these signals during certain critical memory 
operations. Software can specify which other memory operations need to have LOCK# 
asserted. 

The features of the general-purpose multiprocessing interface include: 

• The LOCK# signal, which appears on a pin of the processor. 

• The LOCK instruction prefix, which allows software to assert LOCK#. 

• Automatic assertion of LOCK# for some kinds of memory operations. 

19.1.1. LOCK Prefix and the LOCK# Signal 

The LOCK prefix and its bus signal only should be used to prevent other bus masters from 
interrupting a data movement operation. The LOCK prefix can be used with the following 
Pentium CPU instructions when they modify memory. An invalid-opcode exception results 
from using the LOCK prefix before any other instruction, or with these instructions when no 
write operation is made to memory (i.e., when the destination operand is in a register). 

• Bit test and change: the BTS, BTR, and BTC instructions. 

• Exchange: the XCHG, XADD, CMPXCHG, and CMPXCHG8B instructions (no LOCK 
prefix is needed for the XCHG instruction). 

• One-operand arithmetic and logical: the INC, DEC, NOT, NEG instructions. 

• Two-operand arithmetic and logical: the ADD, ADC, SUB, SBB, AND, OR, and XOR 
instructions. 

A locked instruction is guaranteed to lock only the area of memory defined by the destination 
operand, but may lock a larger memory area. For example, typical 8086 and 80286 
configurations lock the entire physical memory space. 

Semaphores (shared memory used for signalling between multiple processors) should be 
accessed using identical address and length. For example, if one processor accesses a 
semaphore using word access, other processors should not access the semaphore using byte 
access. 

The integrity of the lock is not affected by the alignment of the memory field. The LOCK# 
signal is asserted for as many bus cycles as necessary to update the entire operand. 

1 9.1 .2. Automatic Locking 

There are some critical memory operations for which the processor automatically asserts the 
LOCK# signal. These operations are: 

• Acknowledging interrupts. After an interrupt request, the interrupt controller uses the data 
bus to send the interrupt vector of the source of the interrupt to the processor. The 
processor asserts LOCK# to ensure no other data appears on the data bus during this time. 

• Setting the Busy bit of a TSS descriptor. The processor tests and sets the Busy bit in the 
19-2 ■ 



MULTIPROCESSING 



Type field of the TSS descriptor when switching to a task. To ensure that two different 
processors do not switch to the same task simultaneously, the processor asserts the 
LOCK# signal while testing and setting this bit. 

• Updating segment descriptors. When loading a segment descriptor, the processor will set 
the Accessed bit if the bit is clear. During this operation, the processor asserts LOCK# so 
the descriptor will not be modified by another processor while it is being updated. For this 
action to be effective, operating-system procedures which update descriptors should use 
the following steps: 

— Use a locked operation when updating the access-rights byte to mark the descriptor 
not-present, and specify a value for the Type field which indicates the descriptor is 
being updated. 

— Update the fields of the descriptor. (This may require several memory accesses; 
therefore, LOCK cannot be used.) 

— Use a locked operation when updating the access-rights byte to mark the descriptor as 
valid and present. 

Note that the Intel386 DX processor always updates the Accessed bit, whether it is clear or 
not. The Intel486 and Pentium processors only update the Accessed bit if it is not already 
set. 

• Updating page-directory and page-table entries. When updating page-directory and page- 
table entries, the processor uses locked cycles to set the Accessed and Dirty bits. 

• Executing an XCHG instruction. The Pentium processor always asserts LOCK# during an 
XCHG instruction which references memory (even if the LOCK prefix is not used). 



1 9.2. MEMORY ACCESS ORDERING 

The Pentium microprocessor is a strongly ordered machine. "Strongly ordered" means that, in 
spite of parallel instruction execution, internal and external cache inquiry write-backs, and 
write buffering, the order in which writes are programmed is the order in which they are 
observed externally. In the case of I/O operations, both reads and writes always appear in 
programmed order. However, to optimize performance, the Pentium CPU allows memory 
reads to be reordered ahead of buffered writes in most situations. 

Strong ordering helps software designed for execution by a uniprocessor system work 
correctly in a multiprocessor or multimaster environment. Such software does not necessarily 
consider the possible effect of the reordering of memory writes. Strong ordering, however, 
exacts a performance penalty and therefore may not be implemented in future processors. 

Software intended to operate correctly in future, high-performance, weakly-ordered systems 
should not depend on the strongly ordered properties of the Pentium microprocessor. Instead, it 
should ensure that those accesses to shared variables which are intended to control concurrent 
execution among processors are explicitly ordered through the use of appropriate ordering 
operations. The ordering operations available on the Pentium microprocessor include the 
locking operations discussed in Section 19.1. and the serializing operations discussed in 
Chapter 18. 



i 



19-3 



20 



System Management 
Mode 



i 



CHAPTER 20 
SYSTEM MANAGEMENT MODE 



System Management Mode (SMM) helps systems developers provide very high level systems 
functions, such as power management or security, in a manner that is transparent not only to 
application software but also to operating systems. 

SMM is one of the major operating modes, on a level with protected mode, real-address mode, 
or virtual-86 mode. SMM, however, is intended for use only by firmware, not by applications 
software or general-purpose systems software. Figure 20-1 shows how the processor can enter 
SMM from any of the other modes, then return. The external signal SMW causes the processor 
to switch to SMM. The instruction RSM exits SMM. The SMI# signal might be generated, for 
example, by closing the lid of a portable computer. 

SMM is transparent to applications programs and operating systems because: 

• The only way to enter SMM is via a type of non-maskable interrupt triggered by an 
external signal. 

• The processor begins executing SMM code from a separate address space, referred to as 
system management RAM (SMRAM). 

• Upon entry into SMM, the processor saves the register state of the interrupted program in 
a part of SMRAM called the SMM state dump record. 

• All interrupts normally handled by the operating system or by applications are disabled 
upon entry into SMM. 

• A special instruction RSM restores processor registers from the SMM state dump record 
and returns control to the interrupted program. 

SMM is similar to real-address mode in that there are no privilege levels or address mapping. 
An SMM program can execute I/O and other system instructions and can address four 
gigabytes of memory. 



20.1. THE SMI INTERRUPT 

When an SMI# signal is recognized on an instruction execution boundary, the processor waits 
for all stores to complete (including those pending externally). The processor then saves its 
register state to SMRAM space and begins to execute the SMM handler. 

SMI# has greater priority than debug exceptions and external interrupts. This means that if 
more than one of these conditions occur at an instruction boundary, only the SMI# processing 
occurs, not a debug exception or external interrupt. 

Subsequent SMW and NMI requests are not acknowledged while the processor is in SMM. The 
machine check enable bit in CR4 is cleared as well. SMW and NMI interrupt requests that 
occur in SMM are latched and executed when the processor exits SMM with the RSM 
instruction. 



I 



20-1 



SYSTEM MANAGEMENT MODE 




Upon entry into SMM, external interrupts that require handlers are disabled (the IF bit in 
EFLAGS is cleared). This is necessary, because, while the processor is in SMM, it is running 
in a separate memory space. Consequently, the vectors stored in the interrupt descriptor table 
(IDT) for the prior mode are not applicable. To enable exception handling, the SMM program 
must set up new interrupt and exception vectors. The interrupt vector table for SMM has the 
same format as for real-address mode. Refer to Chapter 9 for information on the real-mode 
interrupt vector and changing the IDT register. Until it correctly sets up the interrupt vector 
table, the SMM handler program must not generate an exception. Even though interrupts are 
disabled, exceptions can still occur. Only correctly written software can prevent internal 
exceptions. When new exception vectors are set up, internal exceptions can be serviced. 

Also upon entry into SMM, single-step exceptions are disabled (the TF bit of EFLAGS is zero) 
and address breakpoint exceptions are disabled (DR7 is cleared). To use the debugging 
features of the processor to debug the SMM handler itself, the SMM handler must ensure that 



20-2 



I 



SYSTEM MANAGEMENT MODE 



an appropriate handler is available and installed in the IDT, then load the appropriate values 
into the debug registers or EFLAGS. 

20.2. SMM INITIAL STATE 

After the processor recognizes SMI# and saves the register state, it changes its state to the 
values shown in Table 20-1. 



Table 20-1. SMM Initial State 



Register 


Content 


General Purpose Registers 


Undefined 


EFLAGS 


00000002H 


EIP 


00008000H 


CS Selector 


3000H. This value gives an initial instruction base address of 
30000H; subsequently the instruction base address is the value of the 
prior State Dump Base field, even though CS remains 3000H. 


DS,ES,FS,GS,SS Selectors 


0000H (giving base addresses of 00000000H) 


CS,DS,ES,FS,GS,SS Limit 


FFFFFFFFH (4 gigabytes) 


CS,DS,ES,FS,GS,SS Attributes 


1 6-bit, expand up 


CRO 


Bits 0,2,3, & 31 cleared (PE,EM,TS & PG); rest unchanged 


CR4 


00000000H 


DR6 


Undefined 


DR7 


00000400H 


GDTR, LDTR, IDTR, TSSR 


Undefined 


Model Specific Registers 


Unmodified 



External hardware is responsible for flushing the data cache, invalidating both the data and 
instruction caches, and keeping them disabled during SMM. 



20.2.1 . System Management Mode Execution 

The processor begins executing the SMM handler at offset 8000H in the CS segment. The code 
segment base address is initially 30000H. This base address can be changed, however. 

When the System Management Mode handler is invoked, the processor's PE (protection) and 
PG (paging) bits in CRO are cleared to zero, putting the processor in an environment similar to 
real-address mode. Because the segment bases (other than CS) are cleared to zero and the 
segment limits are set to FFFFFFFFH, the address space can be treated as a single flat 4GB 
linear space that is unsegmented. The processor, however, still generates addresses as in real 
mode. When a segment selector is loaded with a 16-bit value, that value is still shifted 4 bits to 
the left and loaded into the segment base. Loading a segment register in SMM does not modify 
the limit and attributes in the hidden parts of descriptor registers. 

■ 20-3 



SYSTEM MANAGEMENT MODE 




The default operand size and the default address size are 16-bits; however, operand-size 
override and address-size override prefixes can be used as needed to directly access data 
anywhere within the four-gigabyte logical address space. 

With operand-size override prefixes, the SMM handler can use jumps, calls, and returns, to 
transfer control to any location within the four-gigabyte space. Note, however, the following 
restrictions: 

• Any control transfer that does not have an operand-size override prefix truncates EIP to 16 
low-order bits. 

• Due to the real-mode style of base-address formation, a long jump, call, interrupt, or 
exception cannot transfer control to a segment with a base address of more than 20 bits 
(one megabyte). 

• An interrupt or exception cannot transfer control to a segment offset of more than 16 bits 
(64 kilobytes). 

• If exceptions or interrupts are allowed to occur, only the low-order 16 bits of the return 
address are pushed onto the stack. If the offset of the interrupted procedure is greater than 
64 Kbyte, it is not possible for the interrupt handler to return control to that procedure 
without some software adjustment of the return address on the stack. 



20.3. SMRAM PROCESSOR STATE DUMP FORMAT 

Table 20-2 shows the organization of the state dump record in the SMRAM area . The physical 
locations of the registers are relative to the value loaded into the CS Base, which is initially 
30000H but which can be changed. The absolute location of the registers is: (CS Base + 
Register Offset). 

The fields at offsets FFA8H-FFFFH hold the register state of the processor at the time of the 
SMI# interrupt. The remaining fields are explained in the following sections. 



20-4 



I 



SYSTEM MANAGEMENT MODE 



Table 20-2. State Dump Format 



Register Offset (Hexadecimal) 


Register 


FFFC 


CRO 


FFF8 


CR3 


FFF4 


EFLAGS 


FFFO 


EIP 


FFEC 


EDI 


FFE8 


ESI 


FFE4 


EBP 


FFEO 


ESP 


FFDC 


EBX 


FFD8 


EDX 


FFD4 


ECX 


FFDO 


EAX 


FFCC 


DR6 


FFC8 


DR7 


FFC4 


TR* 


FFCO 


LDTR* 


FFBC 


GS* 


FFB8 


FS* 


FFB4 


OS* 


FFBO 


SS* 


FFAC 


CS* 


FFA8 


ES* 


FFA7-FF04 


RESERVED 


FF02 


Halt Auto Restart 


FFOO 


I/O Trap Restart 


FEFC 


SMM Revision Identifier 


FEF8 


State Dump Base 


FEF7-FE00 


RESERVED 



NOTES: 

Areas marked RESERVED should not be used by the SMM handler. Writing to these areas may cause the 
processor to malfunction. Software that depends on the contents of these areas will not be compatible with 
future processor generations. 

*Upper 2 bytes are RESERVED. 



The registers named in Table 20-2 are visible; that is, the SMM handler can read their values 
from the state dump record. Some (but not all) of these items can be changed by the SMM 



20-5 



SYSTEM MANAGEMENT MODE 



handler and the changed values will be restored to the processor registers by the RSM 
instruction. Table 20-3 shows which items can be changed and which cannot. Table 20-3 also 
indicates that some processor registers are saved in the state dump record but are not visible. 
These items are stored in RESERVED areas, but their locations and formats may be different 
in different processor versions. The last row of Table 20-3 shows the processor registers that 
are not automatically saved and restored by SMI# and RSM. If the SMM handler changes 
these registers, it must also save and restore them. 



Table 20-3. State Disposition 



State Item 


Saved and 
Restored? 


Readable? 


Writeable? 


EDI, ESI, EBP, ESP, EBX, EDX, ECS, EAX, E FLAGS, 
EIP 


YES 


YES 


YES 


CRO, CR3, DR6, DR7, TR, LDTR, GS, FS, DS, SS, CS, 
ES 


YES 


YES 


NO 


CR1 , CR2, CR4, hidden descriptor registers for GDT, 
LDT, IDT, CS, DS, ES, FS, GS 


YES 


NO 


NO 


DR0-DR7, FP registers STn, FCS, FSW, tag word, FP 
instruction pointer, FP opcode and operand pointer 


NO 


NO 


NO 



20.3.1. System Management Mode Revision Identifier (Offset 
FEFCH) 

The 32-bit SMM Revision Identifier specifies the version of SMM and the extensions that are 
available on the processor. Figure 20-2 shows the format of the SMM Revision Identifier. 



/$t 302928 2728 2524232221 20 19 18/l7/l6 


/15 14 13 12 11 10 9 8 7 8 5 4 3 2 1 O 




00000000000000 






SMM REVISION LEVEL 




\ 


\ 


\ 


\ 


\ 



I/O TRAP EXTENSION 
SMM BASE RELOCATION 
□ RESERVED 

APM68 



Figure 20-2. SMM Revision Identifier 

The fields of the SMM Revision Identifier are shown in Table 20-4. A one in bits 16 or 17 
indicate the processor supports those features. 




20-6 




SYSTEM MANAGEMENT MODE 



Table 20-4. SMM Revision Identifier 



Bits 


Comments 


0..15 
16 
17 


Base SMM version identifier 

The processor supports I/O Trap Restart 

The processor supports SMRAM relocation 



NOTE: All other bits are RESERVED. 



20.3.2. I/O Trap Restart (Offset FF00H) 

The I/O Trap Restart slot gives the SMM handler the option of causing the RSM instruction to 
automatically re-execute an interrupted I/O instruction. If, when the RSM instruction is 
executed, the I/O Trap Restart slot contains the value FFH, the CPU automatically re-executes 
the I/O instruction that SMI# has trapped. If the I/O Trap Restart slot contains the value 00H 
when the RSM instruction is executed, the CPU does not re-execute the I/O instruction. The 
CPU automatically initializes the I/O Trap Restart slot to zero during SMW processing. The 
SMM handler should set the I/O Trap Restart slot to FFH only when an SMM traps at an I/O- 
instruction boundary. Operation is unpredictable if the processor executes an RSM instruction 
and finds that the I/O Trap Restart slot is set to FFH but the interrupted instruction is not an I/O 
instruction. 



20.3.3. Halt Auto Restart (Offset FF02H) 

If SMW is recognized while the processor is halted, the processor sets the value of the Halt 
Auto Restart slot to 1 , otherwise the processor clears this value to 0. If this field is 1 , the SMM 
handler can change its value to control whether the processor resumes the HLT instruction 
upon returning from the handler with the RSM instruction. Table 20-5 shows the possibilities. 



Table 20-5. Halt Auto Restart 



Value at Entry 


Value at Exit 


Processor Action on Exit 








Returns to next instruction in interrupted program 





1 


Unpredictable 


1 





Returns to instruction after HLT 


1 


1 


Returns to interrupted HLT instruction 



20.3.4. State Dump Base (Offset FEF8H) 

The processor contains an invisible internal register that specifies the physical base address for 
the state dump record and for the first instruction of the SMM handler. Processors that support 
SMRAM relocation (including the Pentium microprocessor) save the value of this register in 
the State Dump Base slot during SMI# processing. 



20-7 



SYSTEM MANAGEMENT MODE 



The Pentium processor reloads the internal registers from the State Dump Base when executing 
the RSM instruction, which makes it possible to change the value of this register. The initial 
value for the State Dump Base and the value stored in the reserved internal register in the 
processor is 030000H. 

This is for compatibility with existing SMM systems where the default SMRAM area is 
minimally defined to be the 32-Kbyte region starting at 38000H. Now that the SMRAM 
location is variable, the SMRAM area is minimally defined to be the 32-Kbyte region starting 
at[8000H + CS Base]. 



20.4. RELOCATING SMRAM 

The SMM Revision Identifier indicates whether the processor supports the relocation of 
SMRAM. Relocating SMRAM to noncacheable addresses can prevent SMI# processing from 
disturbing cache contents. 

SMRAM relocation is implemented through the use of a location in the SMRAM state dump 
(State Dump Base slot) and an invisible internal register. The 4-byte State Dump Base field 
corresponds to the invisible internal register that the processor uses upon entering SMM to 
determine the location of the SMM state dump and the location of the first instruction of the 
SMM handler. 

When an SMM is serviced, the value in the invisible register on the processor is stored to the 
State Dump Base field. Upon executing the RSM instruction, the processor reloads the 
invisible register from the State Dump Base slot. 

The SMM handler can modify the value of the State Dump Base slot in the state dump record. 
Then, when subsequent SMI#'s are generated, the processor uses the new value to generate the 
location used for the SMM state dump and for the code-segment base. The state dump location 
must be aligned on a 32-Kbyte boundary. 

Note that assertion of the INIT signal does not change the value of the internal state dump base 
register. 

Note also that when the processor loads a new state dump base, the CS selector is not affected 
by the change. 



20.5. RETURNING FROM SMM 

The RSM instruction leaves SMM and returns control to the interrupted program. The RSM 
instruction can be executed only in SMM; an attempt to execute this instruction outside of 
SMM generates an invalid opcode exception. 

When the RSM instruction is executed, the processor state that was previously stored upon 
entrance to SMM is restored, and control returns to the interrupted application. If the processor 
detects invalid state information, it enters the shutdown state; this happens only in the 
following situations: 

• The value stored in the State Dump Base field is not a 32-Kbyte aligned address. 

• A reserved bit of CR4 is set to 1 . 



20-8 



SYSTEM MANAGEMENT MODE 



• A combination of bits in CRO is illegal; namely, (PG=1 and PE=0) or (NW=1 and CD=0). 

In shutdown mode, the processor stops executing instructions until an NMI interrupt is 
received or reset initialization is invoked. The processor generates a special bus cycle to 
indicate it has entered shutdown mode. Hardware designers may choose from a variety of 
responses to the shutdown signal; for example, turning on an indicator light on the front panel, 
generating an NMI interrupt to record diagnostic information, or invoking reset initialization. 

If the SMM handler has modified any system state that is not restored by RSM (such as the 
floating-point registers), then it should restore that state before executing RSM. 



20-9 



Part III 

Compatibility 



intel 

21 



Mixing 16-Bit and 
32-Bit Code 



I 



intel 

CHAPTER 21 
MIXING 16-BIT AND 32-BIT CODE 

The Pentium processor running in protected mode, like the Intel486 and Intel386 processors, is 
a complete 32-bit architecture, but it supports programs written for the 16-bit architecture of 
earlier Intel processors. There are three levels of this support: 

1 . Running 8086 and 80286 code with complete compatibility. 

2. Mixing 16-bit modules with 32-bit modules. 

3. Mixing 16-bit and 32-bit addresses and data within one module. 

The first level is discussed in Chapter 9, Chapter 22, and Chapter 23. Chapter 18 shows how 
16-bit and 32-bit modules can cooperate with one another, and how one module can use both 
16-bit and 32-bit operands and addressing. 

The Pentium processor functions most efficiently when the processor can distinguish between 
pure 16-bit modules and pure 32-bit modules. A pure 16-bit module has these characteristics: 

• All segments occupy 64 Kbytes or less. 

• Data items are primarily 8 bits or 16 bits wide. 

• Pointers to code and data have 16-bit offsets. 

• Control is transferred only among 16-bit segments. 
A pure 32-bit module has these characteristics: 

• Segments may occupy more than 64 Kbytes (0 bytes to 4 gigabytes). 

• Data items are primarily 8 bits or 32 bits wide. 

• Pointers to code and data have 32-bit offsets. 

• Control is transferred only among 32-bit segments. 

A program written for 16-bit processor would be pure 16-bit code. A new program written for 
the protected mode of the Pentium processor would be pure 32-bit code. 

21.1. USING 1 6-BIT AND 32-BIT ENVIRONMENTS 

The features of the architecture which permit the Pentium processor to mix 16-bit and 32-bit 
address and operand size include: 

• The D-bit (default bit) of code-segment descriptors, which determines the default choice 
of operand-size and address-size for the instructions of a code segment. (In real-address 
mode and virtual-8086 mode, which do not use descriptors, the default is 16 bits.) A code 
segment whose D-bit is set is a 32-bit segment; a code segment whose D-bit is clear is a 
16-bit segment. The D-bit eliminates the need to put the operand size and address size in 
instructions when all instructions use operands and effective addresses of the same size. 



I 



21-1 



MIXING 16-BIT AND 32-BIT CODE 



• Instruction prefixes to override the default choice of operand size and address size 
(available in protected mode as well as in real-address mode and virtual-8086 mode). 

• Separate 32-bit and 16-bit gates for intersegment control transfers (including call gates, 
interrupt gates, and trap gates). The operand size for the control transfer is determined by 
the type of gate, not by the D-bit or prefix of the transfer instruction. 

• Registers which can be used both for 16-bit and 32-bit operands and effective-address 
calculations. 

• The B bit (Big bit) of the stack segment descriptor, which specifies the size of stack 
pointer (the 32-bit ESP register or the 16-bit SP register) used by the processor for implicit 
stack references. The B bit for all data descriptors also controls upper address range for 
expand down segments. 

21 .2. MIXING 1 6-BIT AND 32-BIT OPERATIONS 

The Pentium processor has two instruction prefixes which allow mixing of 32-bit and 16-bit 
operations within one segment: 

• The operand-size prefix (66H) 

• The address-size prefix (67H) 

These prefixes reverse the default size selected by the Default bit. For example, the processor 
can interpret the MOV mem, reg instruction in any of four ways: 

• In a 32-bit segment: 

1. Moves 32 bits from a 32-bit register to memory using a 32-bit effective address. 

2. If preceded by an operand-size prefix, moves 16 bits from a 16-bit register to memory 
using a 32-bit effective address. 

3. If preceded by an address-size prefix, moves 32 bits from a 32-bit register to memory 
using a 16-bit effective address. 

4. If preceded by both an address-size prefix and an operand-size prefix, moves 16 bits 
from a 16-bit register to memory using a 16-bit effective address. 

• In a 16-bit segment: 

1. Moves 16 bits from a 16-bit register to memory using a 16-bit effective address. 

2. If preceded by an operand-size prefix, moves 32 bits from a 32-bit register to memory 
using a 16-bit effective address. 

3. If preceded by an address-size prefix, moves 16 bits from a 16-bit register to memory 
using a 32-bit effective address. 

4. If preceded by both an address-size prefix and an operand-size prefix, moves 32 bits 
from a 32-bit register to memory using a 32-bit effective address. 

These examples show that any instruction can generate any combination of operand size and 
address size regardless of whether the instruction is in a 16- or 32-bit segment. The choice of 
the 16- or 32-bit default for a code segment is based upon these criteria: 



21-2 



MIXING 16-BIT AND 32-BIT CODE 



1 . The need to address instructions or data in segments which are larger than 64 Kbytes. 

2. The predominant size of operands. 

3. The addressing modes desired. 

The Default bit should be given a setting which allows the predominant size of operands to be 
accessed without operand-size prefixes. 

21.3. SHARING DATA AMONG MIXED-SIZE CODE SEGMENTS 

Because the choice of operand size and address size is specified in code segments and their 
descriptors, data segments can be shared freely among both 16-bit and 32-bit code segments. 
The only limitation is imposed by pointers with 16-bit offsets, which only can point to the first 
64 Kbytes of a segment. When a data segment with more than 64 Kbytes is to be shared among 
16- and 32-bit segments, the data which is to be accessed by the 16-bit segments must be 
located within the first 64 Kbytes. 

A stack which spans less than 64 Kbytes can be shared by both 16- and 32-bit code segments. 
This class of stacks includes: 

• Stacks in expand-up segments with the Granularity and Big bits clear. 

• Stacks in expand-down segments with the Granularity and Big bits clear. 

• Stacks in expand-up segments with the Granularity bit set and the Big bit clear, in which 
the stack is contained completely within the lower 64 Kbytes. (Offsets greater than 
OFFFFH can be used for data, other than the stack, which is not shared.) 

The B-bit of a stack segment cannot, in general, be used to change the size of stack used by a 
16-bit code segment. The size of stack pointer used by the processor for implicit stack 
references is controlled by the B-bit of the data-segment descriptor for the stack. Implicit 
references are those caused by interrupts, exceptions, and instructions such as the PUSH, POP, 
CALL, and RET instructions. Although it seems like the B bit could be used to increase the 
stack segment for 16-bit programs beyond 64 Kbytes, this may not be done. The B-bit does not 
control explicit stack references, such as accesses to parameters or local variables. A 16-bit 
code segment can use a "big" stack only if the code is modified so that all explicit references to 
the stack are preceded by the address-size prefix, causing those references to use 32-bit 
addressing and explicit writes to the stack pointer are preceded by an operand-size prefix. 

In big, expand-down segments (the Big, and Expand-down bits set), all offsets may be greater 
than 64K, therefore 1 6-bit code cannot use this kind of stack segment unless the code segment 
is modified to use 32-bit addressing. (See Chapter 12 for more information about the B and E 
bits.) 

21.4. TRANSFERRING CONTROL AMONG MIXED-SIZE CODE 
SEGMENTS 

When transferring control among procedures in 16-bit and 32-bit code segments, programmers 
must be aware of three points: 

• Addressing limitations imposed by pointers with 16-bit offsets. 

■ 21-3 



MIXING 16-BIT AND 32-BIT CODE 




• Matching of operand-size attribute in effect for the CALL/RET instruction pair and the 
Interrupt/IRET pair for managing the stack correctly. 

• Translation of parameters, especially pointer parameters. 

• The validity of the SP register must be noted when using 16-bit gates (see Section 21.4.2.). 

Clearly, 16-bit effective addresses cannot be used to address data or code located beyond 
OFFFFH in a 32-bit segment, nor can large 32-bit parameters be squeezed into a 16-bit word; 
however, except for these obvious limits, most interface problems between 16-bit and 32-bit 
modules can be solved. Some solutions involve inserting interface code between modules. 



21 .4.1 . Size of Code-Segment Pointer 

For control-transfer instructions which use a pointer to identify the next instruction (i.e., those 
which do not use gates), the size of the offset portion of the pointer is determined by the 
operand-size attribute. The implications of the use of two different sizes of code-segment 
pointer are: 

• A JMP, CALL, or RET instruction from a 32-bit segment to a 16-bit segment is always 
possible using a 32-bit operand size. 

• A JMP, CALL, or RET instruction from a 16-bit segment using a 16-bit operand size 
cannot address a destination in a 32-bit segment if the address of the destination is greater 
than OFFFFH. 

An interface procedure can provide a mechanism for transfers from 16-bit segments to 
destinations in 32-bit segments beyond 64K. The requirements for this kind of interface 
procedure are discussed later in this chapter. 



21 .4.2. Stack Management for Control Transfer 

Because stack management is different for 16-bit CALL and RET instructions than for 32-bit 
CALL and RET instructions, the operand size of the RET instruction must match the CALL 
instruction. (See Figure 21-1. A 16-bit CALL instruction pushes the contents of the 16-bit IP 
register and (for calls between privilege levels) the 16-bit SP register. The matching RET 
instruction also must use a 16-bit operand size to pop these 16-bit values from the stack into 
the 16-bit registers. A 32-bit CALL instruction pushes the contents of the 32-bit EIP register 
and (for interlevel calls) the 32-bit ESP register. The matching RET instruction also must use a 
32-bit operand size to pop these 32-bit values from the stack into the 32-bit registers. If the two 
parts of a CALL/RET instruction pair do not have matching operand sizes, the stack will not be 
managed correctly and the values of the instruction pointer and stack pointer will not be 
restored to correct values. 

While executing 32-bit code, if a call to 16-bit code at a higher or equal privilege level (i.e., 
DPL<CPL) is made via a 16-bit call gate, then the upper 16-bits of the ESP register may be 
unreliable upon returning to the 32-bit code (i.e., after executing a RET in the 16-bit code 
segment). 

When the CALL instruction and its matching RET instruction are in segments which have D 
bits with the same values (i.e., both have 32-bit defaults or both have 16-bit defaults), the 



21-4 



i 



MIXING 16-BIT AND 32-BIT CODE 



default settings may be used. When the CALL instruction and its matching RET instruction are 
in segments which have different D-bit values, an operand size prefix must be used. 



D o 
i F 

R 

E E 
C X 
T P 
I A 
O N 
N S 
I 


N 



D 
I F 
R 

E E 
C X 
T P 
I A 
N 
N S 
I 


N 



WITHOUT PRIVILEGE TRANSITION 



AFTER 16-BIT CALL 
31 









PARM2 


PARM1 


CS 


IP 




_. 


I 
I 



SP 



AFTER 32-BIT CALL 
31 



PARM2 



PARM1 



CS 



EIP 



WITH PRIVILEGE TRANSITION 



AFTER 16-BIT CALL 

31 



ss 


SP 


PARM2 


PARM1 


CS 


IP 






1 




1 



AFTER 32-BIT CALL 

31 





SS 


ESP 


1 

PARM2 

i 


PAF 


IM1 




CS 


E 


P 

I 



ESP 



ESP 



Figure 21-1. Stack After Far 16- and 32-Bit Calls 



There are three ways for a 16-bit procedure to make a 32-bit call: 

1. Use a 16-bit call to a 32-bit interface procedure. The interface procedure uses a 32-bit call 
to the intended destination. 

2. Make the call through a 32-bit call gate. 

3. Modify the 16-bit procedure, inserting an operand-size prefix before the call, to change it 
to a 32-bit call. 

Likewise, there are three ways to cause a 32-bit procedure to make a 16-bit call: 

1. Use a 32-bit call to a 32-bit interface procedure. The interface procedure uses a 16-bit call 
to the intended destination. 

2. Make the call through a 16-bit call gate (the offset cannot exceed OFFFFH). 

■ 21-5 



MIXING 16-BIT AND 32-BIT CODE 



3. Modify the 32-bit procedure, inserting an operand-size prefix before the call, thereby 
changing it to a 16-bit call. (Be certain that the return offset does not exceed OFFFFH.) 

Programmers can use any of the preceding methods to make a CALL instruction in a 16-bit 
segment match the corresponding RET instruction in a 32-bit segment, or to make a CALL 
instruction in a 32-bit segment match the corresponding RET instruction in a 16-bit segment. 

21 .4.2.1 . CONTROLLING THE OPERAND SIZE FOR A CALL 

The operand-size attribute in effect for the CALL instruction is specified by the D bit for the 
segment containing the destination and by any operand-size instruction prefix. 

When the selector of the pointer referenced by a CALL instruction selects a gate descriptor, the 
type of call is determined by the type of call gate. Calls gates with descriptor type 4 have a 16- 
bit operand-size attribute; call gates with descriptor type 12 have a 32-bit operand- size 
attribute. The offset to the destination is taken from the gate descriptor; therefore, even a 16-bit 
procedure can call a procedure located more than 64 Kbytes from the base of a 32-bit segment, 
because a 32-bit call gate contains a 32-bit offset. 

An unmodified 16-bit code segment which has run successfully on an 8086 processor or in 
real-mode on an Intel 286 processor will have a D-bit which is clear and will not use operand- 
size override prefixes; therefore, it will use 16-bit versions of the CALL instruction. The only 
modification needed to make a 16-bit procedure produce a 32-bit call is to relink the call to a 
32-bit call gate. 

21 .4.2.2. CHANGING SIZE OF A CALL 

When adding 32-bit gates to 16-bit procedures, it is important to consider the number of 
parameters. The count field of the gate descriptor specifies the size of the parameter string to 
copy from the current stack to the stack of the more privileged procedure. The count field of a 
16-bit gate specifies the number of 16-bit words to be copied, whereas the count field of a 32- 
bit gate specifies the number of 32-bit doublewords to be copied; therefore, the 16-bit 
procedure must use an even number of words as parameters. 



21 .4.3, Interrupt Control Transfers 

With a control transfer caused by an exception or interrupt, a gate is used. The operand-size 
attribute for the interrupt is determined by the gate descriptor in the interrupt descriptor table 
(IDT). 

A 32-bit interrupt or trap gate (descriptor type 14 or 15) to a 32-bit interrupt handler can be 
used to interrupt either 32-bit or 16-bit procedures. However, sometimes it is not practical to 
permit an interrupt or exception to call a 16-bit handler when 32-bit code is running, because a 
16-bit interrupt procedure has a return offset of only 16 bits saved on its stack. If the 32-bit 
procedure is running at an address beyond OFFFFH, the 16-bit interrupt procedure cannot 
provide the return address. 



21-6 



i 



MIXING 16-BIT AND 32-BIT CODE 



21 -4.4. Parameter Translation 

When segment offsets or pointers (which contain segment offsets) are passed as parameters 
between 16-bit and 32-bit procedures, some translation is required. If a 32-bit procedure passes 
a pointer to data located beyond 64K to a 16-bit procedure, the 16-bit procedure cannot use it. 
Except for this limitation, interface code can perform any format conversion between 32-bit 
and 16-bit pointers which may be needed. 

Parameters passed by value between 32-bit and 16-bit code also may require translation 
between 32-bit and 16-bit formats. The form of the translation is application-dependent. 

21.4.5. The Interface Procedure 

Placing interface code between 32-bit and 16-bit procedures can be the solution to several 
interface problems: 

• Allowing procedures in 16-bit segments to call procedures with offsets greater than 
OFFFFH in 32-bit segments. 

• Matching operand size between CALL and RET instructions. 

• Translating parameters (data). 

• Possible invalidation of the upper bits of the ESP register. 

The interface code is simplified where these restrictions are followed. 

• Interface code resides in a code segment whose D-bit is set, which indicates a default 
operand size of 32 bits. 

• All procedures which may be called by 16-bit procedures have offsets which are not 
greater than OFFFFH. 

• All return addresses saved by 16-bit procedures also have offsets not greater than 
OFFFFH. 

The interface code becomes more complex if any of these restrictions are violated. For 
example, if a 16-bit procedure calls a 32-bit procedure with an entry point beyond OFFFFH, 
the interface code will have to provide the offset to the entry point. The mapping between 16- 
and 32-bit addresses only is performed automatically when a call gate is used, because the 
descriptor for a call gate contains a 32-bit address. When a call gate is not used, the descriptor 
must provide the 32-bit address. 

The interface code calls procedures in other segments. There may be two kinds of interface: 

• Where 16-bit procedures call 32-bit procedures. The interface code is called by 16-bit 
CALL instructions and uses the operand- size prefix before RET instructions for 
performing a 16-bit RET instruction. Calls to 32-bit segments are 32-bit CALL 
instructions (by default, because the D-bit is set), and the 32-bit code returns with 32-bit 
RET instructions. 



21-7 



MIXING 16-BIT AND 32-BIT CODE 




• Where 32-bit procedures call 16-bit procedures. The interface code is called by 32-bit 
CALL instructions, and returns with 32-bit RET instructions (by default, because the D-bit 
is set). CALL instructions to 16-bit procedures use the operand-size prefix; 16-bit 
procedures return with 16-bit RET instructions. 



21-8 



intel 

22 

Virtual-8086 Mode 



I 



intel 

CHAPTER 22 
VIRTUAL-8086 MODE 



The Pentium processor supports execution of one or more 8086 or 8088 programs in an 
Pentium processor protected-mode environment. An 8086 program runs in this environment as 
part of a virtual-8086 task. Virtual-8086 tasks take advantage of the hardware support for 
multitasking offered by the protected mode. Not only can there be multiple virtual-8086 tasks, 
each one running an 8086 program, but virtual-8086 tasks can be multitasked with other 
Pentium processor tasks. 

The purpose of a virtual-8086 task is to form a "virtual machine" for running programs written 
for the 8086 processor. A complete virtual machine consists of hardware and system software. 
The emulation of an 8086 processor is the result of software using hardware in the following 
ways: 

• The hardware provides a virtual set of registers (through the TSS), a virtual memory space 
(the first megabyte of the linear address space of the task), and virtual interrupt support 
and directly executes all instructions which deal with these registers and with this address 
space. 

• The software controls the external interfaces of the virtual machine (I/O, interrupts, and 
exceptions) in a manner consistent with the larger environment in which it runs. Software 
can choose to emulate I/O and interrupt and exception handling or let the hardware 
execute them directly without software intervention. 

Software which supports virtual 8086 machines is called a virtual-8086 monitor. The Pentium 
processor includes extensions to its virtual-8086 mode of operation that improve the 
performance of applications by eliminating the overhead of faulting to a virtual-8086 monitor 
for emulation of certain operations. For more information on the virtual mode extensions on 
the Pentium processor, see Appendix H. 

22.1 . EXECUTING 8086 CPU CODE 

The processor runs in virtual-8086 mode when the VM (virtual machine) bit in the EFLAGS 
register is set. The processor tests this flag under two general conditions: 

1. When loading segment registers, to know whether to use 8086-style address translation. 

2. When decoding instructions, to determine which instructions are sensitive to IOPL and 
which instructions are not supported (as in real mode). 



22.1 .1 . Registers and Instructions 

The register set available in virtual-8086 mode includes all the registers defined for the 8086 
processor plus new registers introduced after the 8086 processor (FS and GS). Instructions, 
which explicitly operate on the segment registers FS and GS, are available. The segment- 

■ 22-1 



VIRTUAL-8086 MODE 



override prefixes can be used to cause instructions to use the FS and GS registers for address 
calculations. Instructions can take advantage of 32-bit operands through the use of the operand 
size prefix. 

Programs running as virtual-8086 tasks can take advantage of the new application-oriented 
instructions added to the architecture by the introduction of the Intel 286, Intel386, Intel486, 
and Pentium processors: 

• New instructions introduced on the Intel 286 processors. 

— PUSH immediate data 

— Push all and pop all (PUSHA and POP A) 

— Multiply immediate data 

— Shift and rotate by immediate count 

— String I/O 

— ENTER and LEAVE instructions 

— BOUND instruction 

• New instructions introduced on the Intel386 processors. 

— LSS, LFS, LGS instructions 

— Long-displacement conditional jumps 

— Single-bit instructions 

— Bit scan instructions 

— Double-shift instructions 

— Byte set on condition instruction 

— Move with sign/zero extension 

— Generalized multiply instruction 

• New instructions introduced on the Intel486 processor. 

— B SWAP instruction 

— XADD instruction 

— CMPXCHG instruction 

• New instructions introduced on the Pentium processor. 

— CMPXCHG8B instruction 

— CPUID instruction 

Existing interrupt flag sensitive instructions provide significant performance improvement 
when using the virtual mode extensions of the Pentium processor. See Appendix H for more 
information. 



22.1 .2. Address Translation 

In virtual-8086 mode, the Pentium processor does not interpret 8086 selectors by referring to 
descriptors; instead, it forms linear addresses as an 8086 processor would. It shifts the selector 
left by four bits to form a 20-bit base address. The effective address is extended with four clear 

22-2 ■ 



VIRTUAL-8086 MODE 



bits in the upper bit positions and added to the base address to create a linear address, as shown 
in Figure 22-1. 

Because of the possibility of a carry, the resulting linear address may have as many as 21 
significant bits. An 8086 program may generate linear addresses anywhere in the range to 
10FFEFH (1 megabyte plus approximately 64 Kbytes) of the task's linear address space. 

Virtual-8086 tasks generate 32-bit linear addresses. While an 8086 program can use only the 
lowest 21 bits of a linear address, the linear address can be mapped using paging to any 32-bit 
physical address. 

Unlike the 8086 and 80286 processors, but like the Intel386 and Intel486 processors, the 
Pentium processor can generate 32-bit effective addresses using an address override prefix. 
However in virtual-8086 mode, the value of a 32-bit address may not exceed 65,535 without 
causing an exception. Protection faults (interrupt 12 or 13 with no error code) occur if an 
effective address is generated outside the range through 65,535. 





19 




3 







BASE 


16-BIT SEGMENT SELECTOR 











+ 


19 


15 









OFFSET 





16-BIT EFFECTIVE ADDRESS 




20 











LINEAR 
ADDRESS 


X X X X 


XXXXXXXXXXXXX 


X X 


X 


X 



APM52 



Figure 22-1. 8086 Address Translation 



22.2. STRUCTURE OF A VIRTUAL-8086 TASK 

A virtual-8086 task consists of the 8086 program to be run and the 32-bit "native mode" code 
which serves as the virtual-machine monitor. The task must be represented by a 32-bit TSS 
(not a 16-bit TSS). The processor enters virtual-8086 mode to run the 8086 program and 
returns to protected mode to run the monitor or other 32-bit protected mode tasks. 

To run in virtual-8086 mode, an existing 8086 processor program needs the following: 

• A virtual-8086 monitor. 

• Operating-system services. 

The virtual-8086 monitor is 32-bit protected-mode code which runs at privilege-level (most 
privileged). The monitor mostly consists of initialization, exception-handling procedures, and 
I/O emulation in order to virtualize the PC platform. As with any other Pentium CPU program, 



22-3 



VIRTUAL-8086 MODE 



code-segment descriptors for the monitor must exist in the GDT or in the task's LDT. The 
linear addresses above 10FFEFH are available for the virtual-8086 monitor, the operating 
system, and other system software. The monitor also may need data-segment descriptors so it 
can examine the interrupt vector table or other parts of the 8086 program in the first megabyte 
of the address space. 

In general, there are two options for implementing the 8086 operating system: 

1. The 8086 operating system may run as part of the 8086 program. This approach is 
desirable for either of the following reasons: 

— The 8086 application code modifies the operating system. 

— There is not sufficient development time to reimplement the 8086 operating system as 
a Pentium CPU operating system. 

2. The 8086 operating system may be implemented or emulated in the virtual-8086 monitor. 
This approach is desirable for any of the following reasons: 

— Operating system functions can be more easily coordinated among several virtual- 
8086 tasks. 

— The functions of the 8086 operating system can be easily emulated by calls to the 
Pentium CPU operating system. 

Note that the approach chosen for implementing the 8086 processor operating system may 
have different virtual-8086 tasks using different 8086 operating systems. 



22.2.1 . Paging for Virtual-8086 Tasks 

Paging is not necessary for a single virtual-8086 task, but paging is useful or necessary for any 
of the following reasons: 

• Creating multiple virtual-8086 tasks. Each task must map the lower megabyte of linear 
addresses to different physical locations. 

• Emulating the address wraparound which occurs at 1 megabyte. With members of the 
8086 family, it is possible to specify addresses larger than 1 megabyte. For example, with 
a selector value of 0FFFFH and an offset of 0FFFFH, the effective address would be 
10FFEFH (1 megabyte plus 65519 bytes). The 8086 processor, which can form addresses 
only up to 20 bits long, truncates the high-order bit, thereby "wrapping" this address to 
0FFEFH. The Pentium processor, however, does not truncate such an address. If any 8086 
processor programs depend on address wraparound, the same effect can be achieved in a 
virtual-8086 task by mapping linear addresses between 100000H and 110000H and linear 
addresses between and 10000H to the same physical addresses. 

• Creating a virtual address space larger than the physical address space. 

• Sharing 8086 operating system or ROM code which is common to several 8086 programs 
running in multitasking. 

• Redirecting or trapping references to memory -mapped I/O devices. 



22-4 



i 



VIRTUAL-8086 MODE 



22.2.2. Protection within a Virtual-8086 Task 

Protection is not enforced between the segments of an 8086 program. To protect the system 
software running in a virtual-8086 task from the 8086 application program, software designers 
may follow either of these approaches: 

• Reserve the first megabyte (plus 64 Kbytes) of each task's linear address space for the 
8086 processor program. An 8086 processor task cannot generate addresses outside this 
range. 

• Use the U/S bit of page-table entries to protect the virtual-machine monitor and other 
system software in each virtual-8086 task's space. When the processor is in virtual-8086 
mode, the CPL is 3 (least privileged). Therefore, an 8086 processor program has only user 
privileges. If the pages of the virtual-machine monitor have supervisor privilege, they 
cannot be accessed by the 8086 program. 

22.3. ENTERING AND LEAVING VIRTUAL-8086 MODE 

Figure 22-2 summarizes the ways to enter and leave an 8086 program. Virtual-8086 mode is 
entered when the VM flag is set. There are two ways to do this: 

1 . A switch to a task loads the image of the EFLAGS register from the new TSS. The TSS of 
the new task must be a 32-bit TSS, not a 16-bit TSS, because the 16-bit TSS does not load 
the high word of the EFLAGS register, which contains the VM flag. A set VM flag in the 
new contents of the EFLAGS register indicates that the new task is executing 8086 
instructions; therefore, while loading the segment registers from the TSS, the processor 
forms base addresses in the 8086 style. 

2. An IRET instruction from a procedure of a task loads the EFLAGS register from the stack. 
A set VM flag indicates the procedure to which control is being returned to be an 8086 
procedure. The CPL at the time the IRET instruction is executed must be 0, otherwise the 
processor does not change the state of the VM flag. 



i 



22-5 



VIRTUAL-8086 MODE 




MODE TRANSITION DIAGRAM 



TASK SWITCH 


INITIAL 


OR IRET 


ENTRY 



8086 PROGRAM 
(V86 MODE) 



TASK 
SWITCH 



TASK SWITCH 



INTERRUPT, EXCEPTION 



IRET 



OTHER CPU TASKS 
(PROTECTED MODE) 



V86 MONITOR 
(PROTECTED 
MODE) 



TASK 
.SWITCH 



TASK SWITCH 



Figure 22-2. Entering and Leaving Virtua 1-8086 Mode 



When a task switch is used to enter virtual-8086 mode, the segment registers are loaded from a 
TSS. When an IRET instruction is used to set the VM flag, however, the segment registers are 
loaded from the segment registers on the PLO stack (see Figure 22-3). 

The processor leaves virtual-8086 mode when an interrupt or exception occurs. There are two 
cases: 

1. The interrupt or exception causes a task switch. A task switch from a virtual-8086 task to 
any other task loads the EFLAGS register from the TSS of the new task. If the new TSS is 
a 32-bit TSS and the VM flag in the new contents of the EFLAGS register is clear or if the 
new TSS is a 16-bit TSS, the processor clears the VM flag of the EFLAGS register, loads 
the segment registers from the new TSS using protected mode address formation, and 
begins executing the instructions of the new task in 32-bit protected mode. 

2. The interrupt or exception calls a privilege-level procedure (most privileged). The 
processor stores the current contents of the EFLAGS register on the stack, then clears the 
VM flag. The interrupt or exception handler, therefore, runs as "native" 32-bit protected- 
mode code. If an interrupt or exception calls a procedure in a conforming segment or in a 
segment at a privilege level other than (most privileged), the processor generates a 
general-protection exception; the error code is the selector of the code segment to which a 
call was attempted. 



22-6 



i 



irrtel 



VIRTUAL-8086 MODE 



WITHOUT ERROR CODE 



WITH ERROR CODE 





UNUSED 




OLD GS 




OLD FS 




OLD DS 




OLD ES 




OLD SS 


OLD ESP 


OLD EFLAGS 




OLD CS 


OLD EIP 





ESP FROM 
TSS 



NEW ESP 





UNUSED 




OLD GS 




OLD FS 




OLD DS 




OLD ES 




OLD SS 


OLD ESP 


OLD EFLAGS 




OLD CS 


OLD EIP 


ERROR CODE 





ESP FROM 
TSS 



NEW ESP 



Figure 22-3. Privilege Level Stack After Interrupt in Virtual-8086 Mode 



System software does not change the state of the VM flag directly, but instead changes states 
in the image of the EFLAGS register stored on the stack or in the TSS. The virtual-8086 
monitor sets the VM flag in the EFLAGS image on the stack or in the TSS when first creating 
a virtual-8086 task. Exception and interrupt handlers can examine the VM flag on the stack. If 
the interrupted procedure was running in virtual-8086 mode, the handler may need to call the 
virtual-8086 monitor. 



22.3.1 . Transitions Through Task Switches 

A task switch to or from a virtual-8086 task may have any of three causes: 

1 . An interrupt which calls a task gate. 

2. An action of the scheduler of the 32-bit operating system. 

3. Executing an IRET instruction when the NT flag is set. 

In any of these cases, the processor changes the VM flag in the EFLAGS register according to 
the image in the new TSS. If the new TSS is a 16-bit TSS, the upper word of the EFLAGS 
register is not in the TSS; the processor clears the VM flag in this case. The processor updates 
the VM flag prior to loading the segment registers from their images in the new TSS. The new 
setting of the VM flag determines whether the processor interprets the new segment-register 
images as 8086 selectors, 80286 selectors or 32-bit selectors. 



22-7 



VIRTUAL-8086 MODE 



22.3.2. Transitions Through Trap Gates and Interrupt Gates 

The processor may leave virtual- 808 6 mode as the result of an exception or interrupt which 
calls a trap or interrupt gate. The exception or interrupt handler returns to the 8086 program by 
executing an IRET instruction. 

Exceptions and interrupts can be handled in one of three ways: 

1 . By the the virtual-8086 monitor. 

2. The virtual-8086 monitor can pass control to the 8086 program's interrupt handler. 

3. By a protected-mode interrupt service routine. 

If the interrupt or exception is one which the monitor needs to handle and the VM flag is set in 
the EFLAGS image stored on the stack, the interrupt handler passes control to the virtual-8086 
monitor. The virtual-8086 monitor may choose one of the first two methods listed above. If 
the exception or interrupt is one which the monitor does not need to handle, the IOPL can be 
set to 3 allowing the protected-mode interrupt handler to execute for all virtual-mode 
interrupts. 

Because it was designed to run on an 8086 processor, an 8086 program in a virtual-8086 task 
has an 8086-style interrupt table, which starts at linear address 0. However, for exceptions and 
interrupts requiring virtual-8086 monitor intervention and a transition into protected mode, the 
processor does not use this table directly. Instead, the processor calls handlers through the 
IDT. The IDT entry for an interrupt or exception in a virtual-8086 task must contain either: 

• A task gate. 

• A 32-bit trap gate (descriptor type 14) or 32-bit interrupt gate (descriptor type 15), which 
must point to a nonconforming, privilege-level (most privileged), code segment. 

Interrupts and exceptions which call 32-bit trap or interrupt gates use privilege-level 0. The 
contents of the segment registers are stored on the stack for this privilege level. Figure 22-3 
shows the format of this stack after an exception or interrupt which occurs while a virtual-8086 
task is running an 8086 program. 

After the processor saves the 8086 segment registers on the stack for privilege level 0, it clears 
the segment registers before running the handler procedure. This lets the interrupt handler 
safely save and restore the DS, ES, FS, and GS registers as though they were Pentium CPU 
selectors. Interrupt handlers, which may be called in the context of either a regular task or a 
virtual-8086 task, can use the same code sequences for saving and restoring the registers for 
any task. Clearing these registers before execution of the IRET instruction does not cause a 
trap in the interrupt handler. Interrupt procedures which expect values in the segment registers 
or which return values in the segment registers must use the register images saved on the stack 
for privilege level 0. Interrupt handlers which need to know whether the interrupt occurred in 
virtual-8086 mode can examine the VM flag in the stored contents of the EFLAGS register. 

Sending an interrupt or exception back to the 8086 program involves the following steps: 
1. Use the 8086 interrupt vector to locate the appropriate handler procedure. 



22-8 



i 



VIRTUAL-8086 MODE 



2. Store the FLAGS, CS and IP values of the 8086 program on the privilege-level 3 stack 
(least privileged). 

3. Change the return link on the privilege-level stack to point to the privilege-level 3 
handler procedure. 

4. Execute an IRET instruction to pass control to the handler. 

5. When the IRET instruction from the privilege-level 3 handler again calls the virtual-8086 
monitor, restore the return link on the privilege-level stack to point to the original, 
interrupted, privilege-level 3 procedure. 

6. Execute an IRET instruction to pass control back to the interrupted procedure. 

If the IOPL is set to three and the DPL of the interrupt gate is set to three, INT n instructions 
will trap with the given vector number n. Interrupt vectors that must have their IDT gates set to 
three can examine the VM bit in the EFLAGS image on the stack to determine if the interrupt 
needs to be redirected to the virtual-8086 monitor or passed to the 8086 program's interrupt 
handler. 

22.4. SENSITIVE INSTRUCTIONS 

When the Pentium processor is running in virtual-8086 mode, the CLI, STI, PUSHF, POPF, 
INT n, and IRET instructions are sensitive to IOPL. The IN, INS, OUT, and OUTS 
instructions, which are sensitive to IOPL in protected mode, are not sensitive in virtual-8086 
mode. Following is a complete list of instructions which are sensitive in virtual-8086 mode: 

CLI — Clear Interrupt-Enable Flag 

STI — Set Interrupt-Enable Flag 

PUSHF — Push Flags 

POPF — Pop Flags 

INT n — Software Interrupt 

IRET — Interrupt Return 

The CPL is always 3 while running in virtual-8086 mode; if the IOPL is less than 3, an attempt 
to use the instructions listed above triggers a general-protection exception. These instructions 
are sensitive to IOPL in order to give the virtual-8086 monitor a chance to emulate the 
facilities they affect. For information on the behavior of these instructions using the virtual 
mode extensions, see Appendix H. 

22.5. VIRTUAL INTERRUPT SUPPORT 

Many 8086 programs written for non-multitasking systems set and clear the IF flag to control 
interrupts. This may cause problems in a multitasking environment. As a result, virtual 
monitors running on the Intel386 and Intel486 processors require maintaining a virtual 
interrupt flag in software. All instructions affecting the IF flag trap to the virtual-8086 monitor 
for emulation on these processors. For more information on Pentium processor support of a 
virtual interrupt flag, see Appendix H. 



I 



22-9 



VIRTUAL-8086 MODE 



22.6. EMULATING 8086 OPERATING SYSTEM CALLS 

Many 8086 operating systems are called by pushing parameters onto the stack, then executing 
an INT n instruction. The INT n instruction is sensitive to IOPL to allow the virtual-8086 
monitor to emulate the function of the 8086 operating system or send the interrupt back to the 
8086 operating system. 

When the IOPL<3, INT n instructions are intercepted by the virtual-8086 monitor. When the 
IOPL=3, interrupts are serviced by the protected-mode interrupt service routine in a manner 
compatible with the Intel486 processor. On the Intel386 and Intel486 processors, all INT n 
instructions running in virtual-8086 mode require interception by the virtual-8086 monitor 
when the IOPL is less than 3. For information on Pentium processor virtual mode extension 
support of interrupt handling, see Appendix H. 

Table 22-1 determines what action the Pentium processor takes in virtual-8086 mode for a 
software interrupt based on the IOPL. 



Table 22-1. Software Interrupt Operation 



IOPL 


Processor Action 


=3 


Interrupt from Virtual-8086 Mode to Protected Mode: 

Clears VM and TF flags 
If service through interrupt gate, clears IF flag 
Changes to PLO stack using TSS values 
Pushes GS, FS, DS and ES onto PLO stack 
Clears GS, FS, DS and ES to 

Pushes SS, ESP, EFLAGS, CS and EIP of interrupted task onto PLO stack 
Sets CS and EIP from interrupt gate 


<3 


General protection exception 



22.7. VIRTUAL I/O 

Many 8086 programs written for non-multitasking systems directly access I/O ports. This may 
cause problems in a multitasking environment. If more than one program accesses the same 
port, they may interfere with each other. Most multitasking systems require application 
programs to access I/O ports through the operating system. This results in simplified, 
centralized control. 

The processor provides I/O protection for creating I/O which is compatible with the 
environment and transparent to 8086 programs. Designers may take any of several possible 
approaches to protecting I/O ports: 

• Protect the I/O address space and generate exceptions for all attempts to perform I/O 
directly. 



22-10 



i 



VIRTUAL-8086 MODE 



• Let the 8086 processor program perform I/O directly. 

• Generate exceptions on attempts to access specific I/O ports. 

• Generate exceptions on attempts to access specific memory-mapped I/O ports. 

The method of controlling access to I/O ports depends upon whether they are I/O-mapped or 
memory-mapped. 



22.7.1. I/O-Mapped I/O 

The I/O permission bit map can be used to generate exceptions on attempts to access specific 
I/O addresses. The I/O permission bit map of each virtual-8086 task determines which I/O 
addresses generate exceptions for that task. Because each task may have a different I/O 
permission bit map, the addresses which generate exceptions for one task may be different 
from the addresses for another task. This differs from protected mode because the IOPL is not 
checked. See Chapter 8 for more information about the I/O permission bit map. 



22.7.2. Memory-Mapped I/O 

In systems which use memory-mapped I/O, the paging facilities of the processor can be used to 
generate exceptions for attempts to access I/O ports. The virtual-8086 monitor may use paging 
to control memory-mapped I/O in these ways: 

• Map part of the linear address space of each task which needs to perform I/O to the 
physical address space where I/O ports are placed. By putting the I/O ports at different 
addresses (in different pages), the paging mechanism can enforce isolation between tasks. 

• Map part of the linear address space to pages which are not-present. This generates an 
exception whenever a task attempts to perform I/O to those pages. System software then 
can interpret the I/O operation being attempted. 

Software emulation of the I/O space may require too much operating system intervention 
under some conditions. In these cases, it may be possible to generate an exception for only the 
first attempt to access I/O. The system software then may determine whether a program can be 
given exclusive control of I/O temporarily, the protection of the I/O space may be lifted, and 
the program allowed to run at full speed. 



22.7.3. Special I/O Buffers 

Buffers of intelligent controllers (for example, a bit-mapped frame buffer) also can be 
emulated using page mapping. The linear space for the buffer can be mapped to a different 
physical space for each virtual-8086 task. The virtual-8086 monitor then can control which 
virtual buffer to copy onto the real buffer in the physical address space. 



22.8. DIFFERENCES FROM 8086 CPU 

In general, virtual-8086 mode will run software written for the 8086 and 8088 processors. The 
■ 22-1 1 



VIRTUAL-8086 MODE 



following list shows the minor differences between the 8086 processor and the virtual-8086 
mode of the Pentium processor and other 32-bit processors. 

1 . Instruction clock counts. 

The 32-bit processors takes fewer clocks for most instructions than the 8086 processor. The 
areas most likely to be affected include: 

— Delays required by I/O devices between I/O operations. 

— Assumed delays with 8086 processor operating in parallel with an 8087. 

2. Divide exceptions point to the DIV instruction. 

Divide exceptions on the Pentium processor always leave the saved CS:IP value pointing 
to the instruction which failed. On the 8086 processor, the CS:IP value points to the next 
instruction. 

3. Undefined 8086 processor opcodes. 

Opcodes which were not defined for the 8086 processor generate an invalid-opcode or 
execute as one of the new instructions defined for the Pentium processor. 

4. Value written by PUSH SP. 

The Pentium processor pushes a different value on the stack for PUSH SP than the 8086 
processor. The Pentium processor pushes the value in the SP register before it is 
decremented as part of the push operation; the 8086 processor pushes the value of the SP 
register after it is decremented. If the pushed value is important, replace PUSH SP 
instructions with the following three instructions: 

PUSH BP 
MOV BP, SP 
XCHG BP, [BP] 

This code functions as the 8086 PUSH SP instruction on the Pentium processor. 

5. Shift or rotate by more than 31 bits. 

The Pentium processor masks all shift and rotate counts to the lowest five bits. This limits 
the count to a maximum of 31 bit positions. 

6. Redundant prefixes. 

The Pentium processor limits instructions to 15 bytes. The only way to violate this limit is 
with redundant prefixes before an instruction. A general-protection exception is generated 
if the limit on instruction length is violated. The 8086 processor has no instruction length 
limit. 

7. Operand crossing offset or 65,535. 

On the 8086 processor, an attempt to access a memory operand which crosses offset 
65,535 (e.g., MOV a word to offset 65,535) or offset (e.g., PUSH a word when the 
contents of the SP register are 1) causes the offset to wrap around modulo 65,536. The 
Pentium processor generates an exception in these cases, a general-protection exception if 
the segment is a data segment (i.e., if the CS, DS, ES, FS, or GS register is being used to 
address the segment), or a stack exception if the segment is a stack segment (i.e., if the SS 
register is being used). 

8. Sequential execution across offset 65,535. 

On the 8086 processor, if sequential execution of instructions proceeds past offset 65,535, 



22-12 



i 



VIRTUAL-8086 MODE 



the processor fetches the next instruction byte from offset of the same segment. On the 
Pentium processor, the processor generates a general-protection exception. 

9. LOCK is restricted to certain instructions. 

The LOCK prefix and its output signal should only be used to prevent other bus masters 
from interrupting a data movement operation. The LOCK prefix only may be used with 
the following Pentium CPU instructions when they modify memory. An invalid-opcode 
exception results from using LOCK before any other instruction, or with these instructions 
when no write operation is made to memory. 

— Bit test and change: the BTS, BTR, and BTC instructions. 

— Exchange: the XCHG, XADD, CMPXCHG, and CMPXCH8B instructions (no LOCK 
prefix is needed for the XCHG instruction). 

— One-operand arithmetic and logical: the INC, DEC, NOT, NEG instructions 

— Two-operand arithmetic and logical: the ADD, ADC, SUB, SBB, AND, OR, and 
XOR instructions. 

10. Single- stepping external interrupt handlers. 

The priority of the Pentium processor single-step exception is different from that of the 
8086 processor. This change prevents an external interrupt handler from being single- 
stepped if the interrupt occurs while a program is being single-stepped. The Pentium 
processor single-step exception has higher priority than any external interrupt. The 
Pentium processor will still single-step through an interrupt handler called by the INT 
instruction or by an exception. 

11. IDIV exceptions for quotients of 80H or 8000H. 

The Pentium processor can generate the largest negative number as a quotient from the 
IDIV instruction. The 8086 processor generates a divide-error exception instead. 

12. Flags in stack. 

The contents of the EFLAGS register stored by the PUSHF instruction, by interrupts, and 
by exceptions is different from that stored by the 8086 processor in bit positions 12 
through 15. On the 8086 processor these bits are stored as though they were set, but in 
virtual-8086 mode bit 15 is always clear, and bits 14 through 12 have the last value loaded 
into them. 

13. NMI interrupting NMI handlers. 

After an NMI interrupt is accepted by the Pentium processor, the NMI interrupt is masked 
until an IRET instruction is executed. 

14. Floating-point errors call the floating-point-error exception. 

Floating-point exceptions on the Pentium processor call the floating-point error exception 
handler. If an 8086 processor uses another exception for the 8087 interrupt, both exception 
vectors should call the floating-point error exception handler. The Pentium processor has 
signals which, with the addition of external logic, support user-defined error reporting for 
emulation of the interrupt mechanism used in many personal computers. 

15. Numeric exception handlers should allow prefixes. 

On the Pentium processor, the value of the CS and IP registers saved for floating-point 
exceptions points at any prefixes which come before the ESC instruction. On the 8086 
processor, the saved CS:IP points to the ESC instruction. 

16. Floating-Point Unit does not use interrupt controller. 

■ 22-13 



VIRTUAL-8086 MODE 



The floating-point error signal to the Pentium processor does not pass through an interrupt 
controller (an INT signal from 8087 coprocessor does). Some instructions in a 
coprocessor-error exception handler may need to be deleted if they use the interrupt 
controller. The Pentium processor has signals which, with the addition of external logic, 
support user-defined error reporting for emulation of the interrupt mechanism used in 
many personal computers. 

17. Response to bus hold. 

Unlike the 8086 and Intel 286 processors, the Pentium processor responds to requests for 
control of the bus from other potential bus masters, such as DMA controllers, between 
transfers of parts of an unaligned operand, such as two words which form a doubleword. 

18. CPL is 3 in virtual-8086 mode. 

The 8086 processor does not support protection, so it has no CPL. Virtual-8086 mode uses 
a CPL of 3, which prevents the execution of privileged instructions. These are: 

— LIDT instruction 

— LGDT instruction 

— LMSW instruction 

— Special forms of the MOV instruction for loading and storing the control registers 

— CLTS instruction 

— HLT instruction 

— INVD instruction 

— WBINVD instruction 

— 1NVLPG instruction 

— RDMSR instruction 

— WRMSR instruction 

— RSM instruction 

These instructions may be executed while the processor is in real-address mode following 
reset initialization. They allow system data structures, such as descriptor tables, to be set 
up before entering protected mode. Since virtual-8086 mode is entered from protected 
mode, these structures will already be set up. 

19. Denormal exception handling is different. See Chapter 23 for details on exception 
handling differences. 



22.9. DIFFERENCES FROM Intel 286 CPU 

The differences between virtual-8086 mode and Intel 286 real-address mode affect the 
interface between applications and the operating system. The application runs at privilege level 
3 (user mode), so all attempts to use privilege-protected instructions and architectural features 
generate calls to the virtual-machine monitor. The monitor examines these calls and emulates 
them. 



22-14 



VIRTUAL-8086 MODE 



22.9.1. Privilege Level 

Programs running in virtual-8086 mode have a privilege level of 3 (user mode), which 
prevents the execution of privileged instructions. These are: 

• LIDT instruction 

• LGDT instruction 

• LMSW instruction 

• Special forms of the MOV instruction for loading and storing the control and debug 
registers 

• CLTS instruction 

• HLT instruction 

• INVD instruction 

• WBINVD instruction 

• INVLPG instruction 

• RDMSR instruction 

• WRMSR instruction 

• RSM instruction 

Virtual-8086 mode is entered from protected mode, so it should have no need for these 
instructions. These instructions, while not executable in virtual-8086 mode, can be executed in 
real-address mode. 



22.9.2. Bus Lock 

The Intel 286 processor implements the bus lock function differently than the Intel386, 
Intel486, and Pentium processors. This fact may or may not be apparent to 8086 programs, 
depending on how the virtual-8086 monitor handles the LOCK prefix. Instructions with the 
LOCK prefix are sensitive to the IOPL; software designers can choose to emulate its function. 
If, however, 8086 programs are allowed to execute LOCK directly, programs which use forms 
of memory locking specific to the 8086 processor may not run properly when run on the 
Pentium and other 32-bit processors. 

The LOCK prefix and its bus signal only should be used to prevent other bus masters from 
interrupting a data movement operation. The LOCK prefix only may be used with the 
following Pentium CPU instructions when they modify memory. An invalid-opcode exception 
results from using the LOCK prefix before any other instruction, or with these instructions 
when no write operation is made to memory (i.e., when the destination operand is in a 
register). 

• Bit test and change: the BTS, BTR, and BTC instructions. 

• Exchange: the XCHG, XADD, and CMPXCHG instructions (no LOCK prefix is needed 
for the XCHG instruction). 



i 



22-15 



VIRTUAL-8086 MODE 



• One-operand arithmetic and logical: the INC, DEC, NOT, NEG instructions. 

• Two-operand arithmetic and logical: the ADD, ADC, SUB, SBB, AND, OR, and XOR 
instructions. 

A locked instruction is guaranteed to lock only the area of memory defined by the destination 
operand, but may lock a larger memory area. For example, typical 8086 and Intel 286 
configurations lock the entire physical memory space. 

Unlike the 8086 and Intel 286 processors, the Intel386, Intel486 and Pentium processors 
respond to requests for control of the bus from other potential bus masters, such as DMA 
controllers, between transfers of parts of an unaligned operand, such as two words which form 
a doubleword. 



22.10. DIFFERENCES FROM Intel386 AND Intel486 CPU'S 

Real-address mode behavior is the same on the Intel386, Intel486, and Pentium processors. 
When the virtual mode extensions are disabled (VME bit in CR4 is set to zero), the virtual- 
8086 mode behavior of the Pentium processor is the same as on the Intel386 and Intel486 
processors. By enabling the virtual mode extensions (VME bit in CR4 is set to one), however, 
the virtual-8086 mode performance of the Pentium processor is significantly improved. See 
Appendix H for obtaining information on these extensions. For maximum performance, 
programs ported to the Pentium processor should be run with the cache enabled. 



22-16 



i 



intel 

23 

Compatibility 



i 



CHAPTER 23 
COMPATIBILITY 



The Pentium CPU is fully binary compatible with the Intel486 DX and SX CPU's, the Intel386 
DX and SX CPU's, the Intel 286 CPU and the 8086/8088 CPU's. Compatibility means that, 
within certain limited constraints, programs that execute on any previous generations of 
compatible microprocessors will produce identical results when executed on the Pentium CPU. 
There are, however, slightly different implementations of architectural features. These 
limitations and any implementation differences are listed in this chapter. 

The Pentium processor also includes extensions to the registers, instruction set, and control 
functions of the Intel486 architecture just as the Intel486 CPU included extensions to the 
Intel386 CPU. Those extensions have been defined with consideration for compatibility with 
previous and future microprocessors. This section also summarizes the compatibility 
considerations for those extensions. 



23.1. RESERVED BITS 

Throughout this manual, certain bits are marked as reserved in many register and memory 
layout descriptions. When bits are marked as undefined or reserved, it is essential for 
compatibility with future processors that software treat these bits as having a future, though 
unknown effect. Software should follow these guidelines in dealing with reserved bits: 

• Do not depend on the states of any reserved bits when testing the values of registers or 
memory locations which contain such bits. Mask out the reserved bits before testing. 

• Do not depend on the states of any reserved bits when storing to memory or to a register. 

• Do not depend on the ability to retain information written into any reserved bits. 

• When loading a register, always load the reserved bits with the values indicated in the 
documentation, if any, or reload them with values previously stored from the same 
register. 

Depending on the values of reserved register bits will make software dependent upon the 
unspecified manner in which the Pentium processor handles these bits. Depending upon 
reserved values risks incompatibility with future processors. AVOID ANY SOFTWARE 
DEPENDENCE UPON THE STATE OF RESERVED PENTIUM PROCESSOR REGISTER 
BITS. 

Software written for an Intel386 or Intel486 CPU which uses reserved bits correctly will port 
to the Pentium processor without generating general exceptions. 



23.2. INTEGER UNIT 

This section identifies the new features and the implementation differences of existing features 
in the integer unit, which includes added registers and flags, exception handling, memory 

| 23-1 



COMPATIBILITY 



management and protected mode features. 



23.2.1 . New Functions and Modes 

New control functions defined for the Pentium processor are enabled by mode bits in newly 
defined registers, discussed below, that were not present in the Intel486 architecture. The 
instructions that are executed to read and write these new registers are undefined on the 
Intel486 processor, and an invalid opcode exception occurs when an attempt is made to 
execute one of these instructions on the Intel486 processor. Consequently, programs that 
execute correctly on the Intel486 processor cannot erroneously enable these functions. 
However, when an instruction is executed to write one of the new registers and an attempt is 
made to set a reserved bit to a value other than the original value, then a general protection 
exception occurs on the Pentium processor so programs that execute on the Pentium processor 
cannot erroneously enable functions that may be implemented in future processors. The 
Pentium processor does not check for attempts to set reserved bits in model-specific registers. 
It is the obligation of the software writer to enforce this discipline. These reserved bits may be 
used in future Intel processors. 



23.2.2. Serializing Instructions 

Certain instructions have been defined to serialize instruction execution to ensure that 
modifications to flags, registers and memory are completed before the next instruction is 
fetched and executed. Because the Pentium processor uses branch-prediction techniques to 
improve performance, instruction execution is not generally serialized when a branch 
instruction is executed. As a result, branch instructions do not necessarily flush the prefetch 
queue on the Pentium processor and serializing instructions should replace branch instructions 
used for this purpose. Refer to Chapter 18 and the Pentium™ Processor Data Book for more 
information on serializing instructions. For more information on branch prediction, see 
Appendix H. 



23.2.3. Detecting the Presence of New Features 

As the Pentium processor provides extensions to the architecture of the Intel486 processor, 
other models within the processor family have provided both extensions to previous models 
and features specific to that model (such as testability functions). Consequently, software that 
wishes to use the extensions or specific features must identify on which model it is executing 
to determine what features are available. Programmers have developed code sequences that 
can be executed to distinguish between the 8086, Intel 286, Intel386, and Intel486 
microprocessors. The code sequences commonly test which bits in the processor's FLAGS 
register are implemented. (For an example see Chapter 5.) The CPUID instruction has been 
defined to provide a straightforward way for software to identify what family, model and 
stepping of processor it is running on. This can be accomplished as follows: 

1 . One of the code sequences described in Chapter 5 can be executed to determine that the 
software is executing on an Intel486 CPU or a later model that implements a superset of 
the Intel486 architecture. (This is typically done by testing the ability to change the value 



23-2 



i 



COMPATIBILITY 



of the AC flag.) 

2. Having determined that the processor is "at least" an Intel486 processor, a software 
sequence can test whether it is able to change the value of the ID bit. If software is able to 
change the value of the ID bit, then the processor supports the CPUID instruction. 

3. The sequence can then continue by executing the CPUID instruction. In order to use a 
particular architecture extension, software should check that the appropriate feature bit 
returned by this instruction is set. Refer to this instruction in Chapter 25 for more 
information about its operation. 



23.2.4. Undefined Opcodes 

All new instructions defined for the Pentium processor use binary encodings for which the 
invalid opcode exception occurs when an attempt is made to execute these instructions on the 
Intel486 processor. Consequently, programs that execute correctly on the Intel486 processor 
cannot erroneously execute these instructions and thereby produce unexpected results. 

23.2.5. Clock Counts 

Each processor takes fewer clocks for most instructions than earlier processors. The areas most 
likely to be affected include: 

• Delays required by I/O devices between I/O operations. 

• Assumed delays with 8086 processor operating in parallel with an 8087. 

23.2.6. Initialization and Reset 

This section identifies the state of the integer and floating-point units for the various 
microprocessors and numeric processor extensions. 

23.2.6.1 . INTEGER UNIT INITIALIZATION AND RESET 

Table 23-1 identifies the values of the integer unit registers following hardware reset for the 
32-bit Intel x86 microprocessors. These values are the same regardless of whether the Built-in 
Self Test (BIST) is invoked. 

23.2.6.2. FPU/NPX INITIALIZATION AND RESET 

The Pentium and Intel486 processors, following RESET, contain in ST0-ST7 stack registers 
with the tags set to valid (10) (but visible to the programmer as 01 via FSAVE/FSTENV). The 
Pentium processor, in addition, has an INIT pin which, when asserted, causes the processor to 
reset without altering the FPU state. The state of the Intel486 FPU is left unchanged when the 
Built-in Self Test (BIST) is not requested during RESET. 



i 



23-3 



COMPATIBILITY 




Table 23-1 . Processor State Following Power-Up 



Register 


Pentium™ CPU 


Intel486™ CPU 


Intel386™ CPU 


EFLAGS 1 


00000002H 


00000046H 


00000000H 


EIP 


0000FFF0H 


0000FFF0H 


0000FFF0H 


CRO 


6000001 OH 


6000001 OH 


7FFFFFE0H 


CR2 


00000000H 


00000000H 


00000000H 


CR3 


00000000H 


00000000H 


00000000H 


CR4 


00000000H 


00000000H 


00000000H 


CS 


0F000H 

base=0FFFF000H 

limit=OFFFFH 

AR=00000093H 


0F000H 

base=0FFFF000H 

limit=OFFFFH 

AR=0FF3F93FFH 


0F000F000H 
base=0FFFF000H 
limit=OFFFFH 
AR=0FF3F93FFH 


SS, DS, ES, FS, GS 


0000 

base=00000000H 

limit=OFFFFH 

AR=00000093H 


0000 

base=00000000H 

limit=OFFFFH 

AR=0FF3F93FFH 


0000 

base=00000000H 

limit=OFFFFH 

AR=0FF3F93FFH 


EDX 


0000x5xxH 


0000x4xxH 


00000308H 


EAX 


2 


2 


0C51BB653H 


EBX, ECX, ESI, EDI, EBP, ESP 


00000000H 


00000000H 


00000000H 


GDTR,LDTR 


00000000 
base=00000000H 
limit=OFFFFH 
AR=00000082H 


xxxxOOOO 
base=00000000H 
limit=OFFFFH 
AR=OFFFFFFFFH 


00000000 
base=00000000H 
limit=OFFFFH 
AR=OFFFFFFFFH 


IDTR 


00000000 
hacp-nnnnnnnnH 

Uaoc-uuuuuuuun 

limit=OFFFFH 
AR=00000082H 


xxxxOOOO 
hacp-nnnnnnnnn 

uaoc-uuuuuuuun 

limit=OFFFFH 
AR=OFFFFFFFFH 


00000000 
haQ<a-nnnnnnnnn 

limit=OFFFFH 
AR=OFFFFFFFFH 


DRO, DR1, DR2, DR3 


00000000H 


00000000H 


00000000H 


DR6 


r™ r~ r— r~ s\ r™ i— /m i 

FFFFOFFOH 


FFFF1FF0H 


FFFF1FF0H 


DR7 


00000400H 


00001 400H 


00001 400H 


Time Stamp Counter 





NA 3 


NA 


Control and Event Select 





NA 


NA 


TR12 





NA 


NA 


All Other MSR's 


Undefined 


NA 


NA 


Data and Code Cache 


Invalid 


Invalid 


NA 


TLB(s) 


Invalid 


Invalid 


NA 



23-4 



i 



COMPATIBILITY 



NOTES: 

1 . The high ten (14 for the Intel486 and Intel386 CPU's) bits of the EFLAGS register are undefined 
following power-up. Undefined bits are reserved. Software should not depend on the states of any of 
these bits. 

2. If Built-in Self Test is invoked, EAX is only if all tests passed. 

3. Not Applicable. 

Upon hardware RESET on a system with an Intel386 CPU and an Intel387 math coprocessor, 
the floating-point registers will remain unchanged unless the BIST is requested. When the 
BIST is requested, hardware RESET has almost the same effect as the FINIT instruction; the 
only difference is that FINIT leaves the stack registers unchanged, while hardware RESET 
with BIST resets them to 0. This could show up as a difference in the value of the tag word 
observed after the FSAVE/FSTENV instructions are executed. The FINIT instruction clears 
both the data and instruction error pointers. 

Following an Intel386 processor reset, the processor identifies the type of its coprocessor 
(Intel287 or Intel387 DX math coprocessor) by sampling its ERROR# input some time after 
the falling edge of RESET and before execution of the first ESC instruction. The Intel287 
coprocessor keeps its ERROR# output in inactive state after hardware reset; the Intel387 
coprocessor keeps its ERROR# output in active state after hardware reset. Upon hardware 
RESET or FINIT, the Intel387 math coprocessor signals an error condition. The Pentium and 
Intel486 processors, like the Intel287 coprocessor, do not. Table 23-2 provides a summary of 
the differences between the Intel386, Intel486, and Pentium FPU's following power-up. 



Table 23-2. FPU and NPX State Following Power-Up 



Register 


Pentium™ FPU 


Intel486™ FPU 


Intel387™ NPX 


Control Word 


0040H 


037FHH 


037FH 


Status Word 


0000H 


0000H 


OOOOH 


Tag Word 


5555H 


OFFFFH 


OFFFFH 


IP Offset 


00000000H 


00000000H 


00000000H 


Data Operand Offset 


00000000H 


00000000H 


00000000H 


CS Selector 


0000H 


0000H 


OOOOH 


Operand Selector 


0000H 


0000H 


OOOOH 


FSTACK 


All zeroes 


All zeroes 


All zeroes 



NOTES: 

The state of the FPU is left unch. .ged on the Pentium following INIT. The state of the FPU is left 
unchanged on the Intel486 FPU following RESET w/o BIST. 



23.2.6.3. Intel486 SX MICROPROCESSOR AND Intel487 SX MATH 
COPROCESSOR INITIALIZATION 

This interface is designed for two distinct sockets: one for the Intel486 SX CPU and one for 
end-user/dealer upgrade with Intel487 SX math coprocessor. Refer to the Intel486™ SX 
Microprocessor! Intel487™ SX Math CoProcessor Data Book for more details. The following 
should be considered when designing an Intel486 SX CPU/Intel487 SX MCP system. 



i 



23-5 



COMPATIBILITY 



1. The timing loops should be independent of clock speed and clocks per instruction. One 
way to attain this is to implement these loops in hardware and not in software (e.g., BIOS). 

2. The initialization routine should check the presence of a math coprocessor (e.g., Intel487 
SX math coprocessor) and should set the floating-point related bits in the CRO register 
accordingly (see Chapter 10 for a complete description of these bits). The recommended 
bit pattern is given in Table 23-3. The FSTCW instruction will give a value of FFFFh for 
the Intel486 SX microprocessor and 037Fh for the Intel487 SX math coprocessor. 



Table 23-3. Recommended Values of the FP Related Bits for Intel486™ SX 
Microprocessor/lntel487™ SX Math Coprocessor System 



CRO Bit 


Intel486™ SX Microprocessor 


Intel487™ SX Math Coprocessor 


EM 


1 





MP 





1 


NE 


1 


0, for DOS systems 






1 , for user-defined exception handler 



Following is an example code sequence to initialize the system and check for the presence of 
Intel486 SX microprocessor/Intel487 SX math coprocessor. Refer to Chapter 5 for complete 
CPU and coprocessor identification information. 



f ninit 

fstcw mem__loc 
mov ax, mem_loc 
cmp ax, 037 fh 

jz Intel487_SX_Math_CoProcessor_present ;ax=037fh 

jmp Intel486_SX_microprocessor_present ;ax=ffffh 

If the Intel487 SX math coprocessor is not present, the following code can be run to set the 
CRO register for the Intel486 SX microprocessor. 

mov eax, crO 

and eax, fffffffdh ;make MP=0 

or eax, 0024h ;make EM=1 # NE=1 

mov crO, eax 

The above initialization will cause any floating-point instruction to generate the interrupt 7. 
The software emulation will then take control to execute these instructions. This code is not 
required if an Intel487 SX math coprocessor is present in the system, thereupon the typical 
intialization routine for the Intel486 SX microprocessor will be adequate. 

The interpretation of different combinations of the EM and MP bits is shown in Table 23-4. 



23-6 




COMPATIBILITY 



Table 23-4. EM and MP Bits Interpretations 



EM 


MP 


Interpretation 








Numeric instructions are passed to FPU; WAIT ignores TS 





1 


Numeric instructions are passed to FPU; WAIT tests TS 


1 





Numeric instructions trap to emulator; WAIT ignores TS 


1 


1 


Numeric instructions trap to emulator, WAIT tests TS 



23.2.7. New Instructions 

This section identifies the introduction of new instructions for the 32-bit microprocessors. 

23.2.7.1 . NEW PENTIUM PROCESSOR INSTRUCTIONS 

The Pentium processor introduces three new application instructions: 

• CMPXCHG8B instruction 

• CPUID instruction 

• RDTSC instruction — For more information on RDTSC, see Appendix H. 

There are four new system instructions, used for reading from and writing to the new control 
register (CR4) and model specific registers, and resuming from system management mode: 

• MOV CR4, r32 and MOV r32, CR4 

• RDMSR 

• WRMSR 

• RSM 

The form of the MOV instruction used to access the test registers has been removed on the 
Pentium processor. New test registers have been defined for the cache, the TLB's and the BTB 
which are accessed through the model-specific registers on the Pentium processor. For more 
information on the test registers used with the RDMSR and WRMSR instructions, see 
Appendix H. 

23.2.7.2. NEW Intel486 PROCESSOR INSTRUCTIONS 

The Intel486 CPU introduced three new application instructions: 

• BSWAP instruction 

• XADD instruction 

• CMPXCHG instruction 

Three new system instructions, used for managing the cache and TLB, were introduced: 



i 



23-7 



COMPATIBILITY 



irrtel 



• INVD instruction 

• WBINVD instruction 

• INVLPG instruction 

23.2.7.3. NEW Intel386 PROCESSOR INSTRUCTIONS 

New instructions introduced on the Intel386 processor include: 

LSS, LFS, LGS instructions 
Long-displacement conditional jumps 
Single-bit instructions 
Bit scan instructions 
Double-shift instructions 
Byte set on condition instruction 
Move with sign/zero extension 
Generalized multiply instruction 
MOV to and from control registers 
MOV to and from test registers (now obsolete) 
MOV to and from debug registers 

23.2.8. Obsolete Instructions 

The following instructions no longer supported include: 

• MOV to and from test registers (removed from the Pentium processor) 
Execution of these instructions generates an invalid opcode fault. 

23.2.9. Flags 

This section discusses the flag bits additions to the EFLAGS register as shown in Figure 23-1. 



23-8 



i 



COMPATIBILITY 



I01 fQ/i/oaivQl07/r>eloi:/OA/<. 



, / ^'°^ 21 /^/ 19 ffJ 17 / 16 ^ 14 / 13 12/11/10/9 J 8/ 7/6/5j 



'/16/15/1' 

rIn 

£ 




ID FLAG (ID) 

VIRTUAL INTERRUPT PENDING (VIP) 
VIRTUAL INTERRUPT FLAG (VIF) — 

ALIGNMENT CHECK (AC) 

VIRTUAL 8086 MODE (VM) 

RESUME FLAG (RF) 

NESTED TASK (NT) 



I/O PRIVILEGE LEVEL (IOPL) 

OVERFLOW FLAG (OF) 

DIRECTION FLAG (DF) 

INTERRUPT ENABLE FLAG (IF) 

TRAP FLAG (TF) 

SIGN FLAG (SF) 

ZERO FLAG (ZF) — 

AUXILIARY CARRY FLAG (AF) 

PARITY FLAG (PF) 

CARRY FLAG (CF) 



\ \\\\\\\\ \N\\\\\\N\\\M 

>ffffff>ff>ffffffAAA 



□ 
□ 



Bit positions shown as or 1 are Intel reserved. 

Do not use. Always set them to the value previously read. 

Pentium ™ processor flag additions 
Intel486 ™ processor flag additions 



Figure 23-1 . Pentium™ Processor EFLAGS Register 



23.2.9.1 . NEW PENTIUM PROCESSOR FLAGS 

The Pentium processor includes the following three bits to the EFLAGS register: 

• VIF — For more information, see Appendix H. 

• VIP — For more information, see Appendix H. 

• ID — The ability to set and clear the IDentification Flag indicates that the processor 
supports the CPUID instruction. 

23.2.9.2. NEW Intel486 PROCESSOR FLAGS 

The AC flag (bit position 18), in conjunction with the AM bit in the CRO register, controls 
alignment checking. 

23.2.10. Control Registers 

This section identifies the addition of new control registers and control register bits in the 32- 
■ 23-9 



COMPATIBILITY 



intel 



bit Intel x86 microprocessors. See Figure 23-2 for extensions to the control registers for the 
Intel486 and Pentium processors. These extensions are discussed further in the following 
subsections. 



1 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7/ 6/ 5/ 4/ 3/ 2/ 1/ Ok 

^l—M.M I W HM..M imMUji ll M I IIIIIII M II UH.HUI I JI^IJ^UWWWU I JU III W I U I ^ MW Iii w i w „ ||||| , 1 m/ l w t J I / , /i ll l ll/ll / ! 




PAGE DIRECTORY BASE 



PAGE FAULT LINEAR ADDRESS 



CR4 



CR3 



CR2 



CR1 



CRO 



V\\a&fls^» n » is n •& t\ *&> an v& i q>\ s\ n\ -&\ \ \^ 

| Intel Reserved. Do not depend on the state of these bits. 
1 1 Pentium ™ Processor Control Register Extensions 



f||§ Intel486 ™ Processor Control Register Extensions 



Figure 23-2. Control Register Extensions 



23.2.10.1. PENTIUM PROCESSOR CONTROL REGISTERS 

The recommended values for the CD and NW bits in CRO (00) implements a write-back 
strategy for the data cache of the Pentium processor. On the Intel486 processor, these values 
implement a write-through strategy. See Table 23-5 for a comparison of these bits on the 
Intel486 and Pentium processors. For complete information on caching, refer to Chapter 18. 

One new Control Register (CR4) is defined. CR4 contains bits that enable certain extensions to 
the Intel486 architecture provided in the Pentium processor. These include: 



VME — For more information, see Appendix H. 
PVI — For more information, see Appendix H. 
TSD — For more information, see Appendix H. 

DE — While this bit is 1, Debugging Extensions are enabled, providing support for I/O 
breakpoints. Refer to Chapter 17 for more information. 



23-10 



i 



COMPATIBILITY 



• PSE — For more information, see Appendix H. 

• MCE — While this bit is 1 , Machine Check Exceptions are enabled, allowing exception 
handling for certain hardware error conditions. Refer to the Pentium™ Processor Data 
Book for more information. 

The content of CR4 is zero following reset. 

23.2.1 0.2. Intel486 PROCESSOR CONTROL REGISTERS 

Five new bits are defined in the CRO register for the Intel486 processor: 

• NE — The Numeric Error bit enables the standard mechanism for reporting floating-point 
numeric errors. 

• WP — The Write Protect bit write-protects user-level pages against supervisor-mode 
accesses. 

• AM — The Alignment Mask bit, in conjunction with the AC (Alignment Check) flag, 
controls whether alignment checking is performed. 

• NW — The Not Write-through bit enables write-throughs and cache invalidation cycles 
when clear and disables invalidation cycles and write-throughs which hit in the cache 
when set. 

• CD — The Cache Disable bit enables the internal cache when clear and disables the cache 
when set. 

Two new bits have been defined in the CR3 register: 

• PCD — The state of the Page-Level Cache Disable bit is driven on the PCD pin during 
bus cycles which are not paged, such as interrupt acknowledge cycles, when paging is 
enabled. The PCD pin is used to control caching in an external cache on a cycle-by-cycle 
basis. 



23-11 



COMPATIBILITY 




Table 23-5. Cache Mode Differences Between the Pentium™ and Intel486™ Processors 



CD 


NW 


Pentium™ CPU Description 


Intel486™ CPU Description 








Normal highest performance cache 
operation. 

Read hits access the cache. 
Read misses may cause replacements. 
These lines will enter the Exclusive or 
Shared state under the control of the 
Wd/w I # pin. 

Write hits update the cache. 

Only writes to shared lines and write 

misses appear externally 

Writes to Shared lines can be changed to 

the Exclusive State under the control of 

the WB/WT# pin. 

Invalidations are allowed. 


Normal highest performance cache 
operation 

Read hits access the cache. 

Read misses may cause replacements. 

Write hits update the cache. 
All writes appear externally. 

Invalidations are allowed. 





1 


Invalid Operation 

GP(0) 


Invalid Operation 

GP(0) 


1 





Cache disabled. Memory consistency 
maintained. Contents locked in cache. 

Read hits access the cache. 

Read misses do not cause replacement. 

Write hits update the cache. 

Only writes to Shared lines and write 

misses update external memory 

Writes to Shared lines can be changed to 

the Exclusive State under the control of 

the WB/WT# pin. 

Invalidations are allowed. 


Cache disabled. Memory 
consistency maintained. Contents 
locked in cache. 

Read hits access the cache. 

Read misses do not cause replacement. 

Write hits update the cache. 

All writes update external memory 

Invalidations are allowed. 


1 


1 


Cache disabled. Memory consistency 
not maintained. 

Read hits access the cache. 
Read misses do not cause replacement. 
Write hits update the cache, but do not 
access memory. 

Write hits will cause Exclusive State lines 

to change to Modified State 

Shared lines will remain in the Shared 

state after write hits. 

Write misses access memory. 

Inquire and Invalidation Cycles do not 

effect the cache state or contents. 

This is the state after reset. 


Cache disabled. Memory 
consistency not maintained. 

Read hits access the cache. 
Read misses do not cause replacement. 
Write hits update the cache, but do not 
access memory. 

Write misses access memory. 
Inquire and Invalidation Cycles do not 
effect the cache state or contents. 
This is the state after reset. 



23-12 



COMPATIBILITY 

• PWT — The state of the Page -Level Write Through bit is driven on the PWT pin during 
bus cycles which are not paged, such as interrupt acknowledge cycles, when paging is 
enabled. The PWT pin is used to control write-through in an external cache on a cycle-by- 
cycle basis. 

23.2.11. Debug Registers 

The Pentium processor includes extensions to the Intel486 processor debugging support for 
breakpoints on I/O references. To use the new breakpoint features, it is necessary to set 
CR4.DE to 1. 

23.2.1 1 .1 . DIFFERENCES IN DR6 

It is not possible to write a 1 to reserved bit 12 in DR6 on the Pentium processor. However, on 
the Intel486 processor, it is possible to write a 1 in bit position 12. 

See "Initialization Values" in this chapter for differences of this register at processor reset. 

23.2.1 1 .2. DIFFERENCES IN DR7 

The Pentium processor determines the type of breakpoint access by the bits R/WO to R/W3 in 
DR7 as follows: 

00 Break on instruction execution only 

01 Break on data writes only 

10 undefined if CR4.DE=0, break on I/O reads or writes but not instruction fetches if 
CR4.DE=1 

1 1 Break on data reads or writes but not instruction fetches 

On the Pentium processor, reserved bits 11, 12, 14 and 15 are hard- wired to 0. On the 
Intel486 CPU, however, bit 12 can be set. 

See "Initialization Values" above for differences of this register at processor reset. 

23.2.1 1 .3. DEBUG REGISTERS 4 AND 5 

Although the DR4 and DR5 registers have been documented as "Reserved", previous 
generations of processors aliased references to these registers to Debug Registers 6 and 7, 
respectively. When Debug Extensions are not enabled (CR4.DE=0), the Pentium processor 
remains compatible with existing software by allowing these aliased references. However, 
when Debug Extensions are enabled (CR4.DE=1), attempts to reference DR4 or DR5 will 
result in an invalid opcode exception. 

23.2.12. Test Registers 

The implementation of test registers on the Intel486 CPU used for testing the cache and TLB 
has been redesigned using model specific registers (discussed below) on the Pentium 




23-13 



COMPATIBILITY 



processor. The MOV to and from test register instructions generate invalid opcode exceptions 
on the Pentium processor. For more information on the use of the test registers, see 
Appendix H. 



23.2.13. Model Specific Registers 

Certain features of the Pentium processor that are described in the Pentium™ Processor Data 
Book are specific to the Pentium processor and may not be continued in the same way in future 
processors. Examples are functions for testability, performance monitoring, and machine check 
errors. These features are accessed through Model Specific Registers. The new instructions 
RDMSR and WRMSR are used to read and write these registers. In order to use such model- 
specific features, software should check that the "Family" number reported by the CPUID 
instruction is equal to 5. Software which uses these registers and functions may be 
incompatible with future processors. For more information, see Appendix H. 

Refer to the Pentium™ Processor Data Book for more information. 



23.2.14. Exceptions 

This section identifies the introduction of new exceptions in the 32-bit microprocessor family 
and implementation differences in existing exception handling. 

23.2.14.1. NEW PENTIUM PROCESSOR EXCEPTIONS 

The Pentium processor includes the following extensions and conditions to the Intel486 
architecture for exceptions and interrupts: 

• Exception #13 — A General-Protection exception occurs when an attempt is made to 
write 1 to a reserved bit position of a special register. 

• Exception #14 — A Page Fault exception occurs when a 1 is detected in any of the 
reserved bit positions of a page table entry, page directory entry, or page directory pointer 
during address translation by the Pentium processor. 

• Exception #18 — A Machine Check Exception is newly defined for reporting parity errors 
and other hardware errors. This is a model-specific exception and may not be 
implemented the same in future processors. For compatibility reasons, the MCE bit in the 
CR4 register acts as the machine check enable bit. When this bit is clear (which it is at 
reset), the processor inhibits generation of the machine check abort. In the event that a 
system is using the machine check interrupt vector for another purpose and the MCE bit in 
CR4 is set, the interrupt routine must examine the state of the CHK bit in the model- 
specific Machine Check Type register to determine the cause of the interrupt. See the 
Pentium™ Processor Data Book for more information on the Machine Check Type 
register and model-specific registers. 

See Chapter 14 for details on exceptions and interrupts. 



23-14 



I 



COMPATIBILITY 



23.2.14.2. NEW Intel486 PROCESSOR EXCEPTIONS 

The Intel486 processor includes the following extensions and conditions to the Intel486 
architecture for exceptions and interrupts: 

• Exception #17 — An Alignment Check exception reports unaligned memory references 
when alignment checking is being performed. 

23.2.1 4.3. NEW Intel386 PROCESSOR EXCEPTIONS 

The Intel386 processor introduced new conditions which can occur even in systems designed 
for the Intel 286 processor. 

• Exception #6 — The Invalid Opcode exception can result from improper use of the LOCK 
instruction prefix. 

• Exception #14 — A Page Fault exception can occur in a 16-bit program if the operating 
system enables paging. Paging can be used in a system with 16-bit tasks if all tasks use the 
same page directory. Because there is no place in a 16-bit TSS to store the PDBR register, 
switching to a 16-bit task does not change the value of the PDBR register. Tasks ported 
from the Intel 286 processor should be given 32-bit TSSs so they can make full use of 
paging. 

• Exception #13 — The Intel386 processor set a limit of 15 bytes on instruction length. The 
only way to violate this limit is by putting redundant prefixes before an instruction. A 
general-protection exception is generated if the limit on instruction length is violated. The 
8086 processor has no instruction length limit. 

23.2.14.4. INTERRUPT PROPAGATION DELAY 

External hardware interrupts on the Pentium processor may be recognized on different 
instruction boundaries due to the pipelined execution of the Pentium processor and possibly an 
extra instruction passing through the v-pipe concurrently with an instruction in the u-pipe. 
When the two instructions complete execution, the interrupt is then serviced. Therefore, the 
EIP pushed onto the stack when servicing the interrupt on the Pentium processor may be 
different then that for the Intel486 processor (i.e., it is serviced later). 

23.2.14.5. PRIORITY OF EXCEPTIONS 

The priority of exceptions are broken down into several major categories: 

• Traps on the previous instruction 

• External interrupts 

• Faults on fetching the next instruction 

• Faults in decoding the next instruction 

• Faults on executing an instruction 

There are no changes in the priority of these major categories between the different processors, 
■ 23-15 



COMPATIBILITY 



however, exceptions within these categories are implementation dependent and may change 
from processor to processor. To obtain information on exception priority within these 
categories, see Appendix H. 

23.2.1 4.6. DIVIDE-ERROR EXCEPTIONS 

Divide-error exceptions on the Pentium, Intel486, and Intel386 processors always leave the 
saved CS:IP value pointing to the instruction which failed. On the 8086 processor, the CS:IP 
value points to the next instruction. 

The Pentium, Intel486, and Intel386 processors can generate the largest negative number as a 
quotient for the IDIV instruction (80H and 8000H). The 8086 processor generates a divide- 
error exception instead. 

23.2.14.7. WRITES USING THE CS REGISTER PREFIX 

Following a switch from protected mode to real-address mode, the Intel486 processor requires 
the coding of a far jump control flow instruction prior to performing a write using the CS 
segment register prefix (for example: MOV CS:[0], EAX). The far jump in protected mode on 
the Intel486 processor reloads the CS access rights to be writable. If this requirement is not 
met, a general protection exception occurs. This requirement has been eliminated on the 
Pentium processor which leaves the access rights unchanged and ignores code segment access 
right protection checks in real-address mode. As a result, the code segment register can be 
used as a prefix in a write operation in real-address mode without generating an exception. For 
upwards and downwards compatibility, however, programmers may wish to include the far 
jump instruction prior to any writes to the code segment in real-address mode. 

The code segment can not be written to in protected mode on either the Intel486 or Pentium 
processor. 

23.2.14.8. NMI INTERRUPTS 

After an NMI interrupt is recognized by the Intel 286, Intel386, Intel486 and Pentium 
processors, the NMI interrupt is masked until the first IRET instruction is executed, unlike the 
8086 processor. 

23.2.14.9. INTERRUPT VECTOR TABLE LIMIT 

The LIDT instruction can be used to set a limit on the size of the interrupt vector table. The 
double fault exception is generated if an interrupt or exception attempts to read a vector 
beyond the limit. Shutdown then occurs on the 32-bit Intel x86 processors if the double fault 
handler vector is beyond the limit. (The 8086 processor does not have a shutdown mode nor a 
limit.) 



23.2.15. Descriptor Types and Contents 

Operating-system code which manages space in descriptor tables often contains an invalid 
value in the access-rights field of descriptor-table entries to identify unused entries. Access 



23-16 



i 



COMPATIBILITY 



rights values of 80H and 00H remain invalid for the Intel 286, Intel386, Intel486, and Pentium 
processors. Other values which were invalid on the Intel 286 processor may be valid on the 
32-bit processors because uses for these bits have been defined. 



23.2.16. Changes in Segment Descriptor Loads 

On the Intel386 processors, loading a segment descriptor always causes a locked read and 
write to set the accessed, bit of the descriptor. On the Pentium and Intel486 processors, the 
locked read and write occur only if the bit is not already set. 



23.2.17. Task Switching and Task State Segments 

This section identifies the implementation differences of task switching, additions to the task 
state segment and the handling of TSS's and TSS selectors. 

23.2.17.1 . PENTIUM PROCESSOR TASK STATE SEGMENTS 

The Pentium CPU TSS may contain additional information used in virtual-8086 mode by the 
virtual mode extensions to the Pentium CPU. For more information on virtual mode 
extensions, see Appendix H. 

23.2.1 7.2. TSS SELECTOR WRITES 

During task state saves, the Intel486 CPU writes two-byte selectors into a 32-bit TSS, leaving 
the upper 16 bits undefined. For performance reasons, the Pentium CPU writes four-byte 
selectors into the TSS with the upper two bytes being zero. For compatibility reasons, code 
should not depend on the value of the upper 16 bits of the selector in the TSS. 

23.2.1 7.3. ORDER OF READS/WRITES TO THE TSS 

The order of reads and writes into the TSS is processor dependent. The Pentium CPU may 
generate different page fault addresses (CR2) in the same TSS area than the Intel486 CPU, if a 
TSS crosses a page boundary (which is not recommended). 

23.2.17.4. USING A 16-BIT TSS WITH 32-BIT CONSTRUCTS 

Task switches using 16-bit TSS's should be used only for pure 16-bit code. Any new code 
written using 32-bit constructs (operands, addressing, or the upper word of the EFLAGS 
register) should use only 32-bit TSSs. This is due to the fact that the 32-bit processors do not 
save the upper 16 bits of EFLAGS to a 16-bit TSS. A task switch back to a 16-bit task that 
was executing in virtual mode will never re-enable the virtual mode, as this bit was not saved 
in the upper half of the EFLAGS value in the TSS. Therefore, it is strongly recommended that 
any code using 32-bit constructs use a only a 32-bit TSS to ensure correct behavior in a 
multitasking environment. 



i 



23-17 



COMPATIBILITY 



23.2.17.4.1. Differences In I/O Map Base Addresses 

The Intel486 processor considers the TSS segment to be a 16-bit segment and wraps around 
the 64K boundary. Any I/O accesses check for permission to access this I/O address at the I/O 
base address plus the I/O offset. If the I/O map base address exceeds the specified limit of 
ODFFFH, an I/O access will wrap around and obtain the permission for the I/O address at an 
incorrect location within the TSS. A TSS limit violation does not occur in this situation on the 
Intel486 processor. However, the Pentium processor considers the TSS to be a 32-bit segment 
and a limit violation occurs when the I/O base address plus the I/O offset is greater than the 
TSS limit. By following the recommended specification for the I/O base address to be less 
than ODFFFH, the Intel486 processor will not wrap around and access incorrect locations 
within the TSS for I/O port validation and the Pentium processor will not experience general 
protection faults. Figure 23-3 demonstrates the different areas accessed by the Intel486 and 
Pentium processors. 



OFFFFH + 10H = OUTSIDE SEGMENT 
FOR I/O VALIDATION 



OFFFFH 



I/O MAP BASE 
ADDRESS 









OFFFFH 





OFFFFH + 10H = OFH 
FOR I/O VALIDATION 

OH 



OFFFFH 









OFFFFH 





OH 



I/O ACCESS AT PORT 10H "S^^^^ams 

CHECKS BITMAP AT I/O MAP BASE ADDRESS J™ f^SJScH^^^ 

^FF^^ + nr^i^n^RFCWfiibiC SEGMENT LIMIT. WRAP AROUND 

OFFSET OFH FROM BEGINNING noPQ mot nrn ip r pmpqai 

OF TSS SEGMENT DUE TO WRAP AROUND OCCURRING ^^K^OT^^SbuRS. 

Intel486 ™ Processor Pentium ™ Processor 



APM129 



Figure 23-3. I/O Map Base Address Differences 



23.2.17.4.2. Caching, Pipelining, Prefetching 

The Pentium processor includes separate instruction and data caches. The data cache supports 
23-18 ■ 



COMPATIBILITY 



a write-back (or alternatively write-through, on a line by line basis) policy for memory 
updates. Refer to Chapter 18 and the Pentium™ Processor Data Book for more information 
about the organization and operation of the Pentium processor caches. 

The Intel486 processor includes a single internal cache for both instructions and data. 

The meaning of bits CD and NW in CRO have been redefined so the recommended value (00) 
enables write-back for the data cache of the Pentium processor. In the Intel486 processor the 
same value for these bits enables write-through for the cache. However, it is possible for 
external system hardware to force the Pentium processor to disable caching or to use write- 
through policy should that be required. Refer to Chapter 1 8 and the Pentium™ Processor Data 
Book for more information about hardware control of the Pentium processor caches. 

The Pentium processor supports page-level cache management in the same manner as the 
Intel486 by using the PCD and PWT bits in CR3, page directory pointers, page directory 
entries, and page table entries. The Intel486 processor, however, is not affected by the state of 
the PWT bit since the internal cache of the Intel486 processor is a write-through cache. 

23.2.17,5, SELF MODIFYING CODE WITH CACHE ENABLED 

On the Intel486 processor, a write to an instruction in the cache will modify it in both cache 
and memory. If the instruction was prefetched before the write, however, the old version of 
the instruction could be the one executed. To prevent this, it is necessary to flush the 
instruction prefetch unit of the Intel486 processor by coding a jump instruction immediately 
after any write that modifies an instruction. The Pentium processor, however, checks whether a 
write may modify an instruction that has been prefetched for execution. This check is based 
on the linear address of the instruction. If the linear address of an instruction is found to be 
present in the prefetch queue, the Pentium processor flushes the prefetch queue, eliminating 
the need to code a jump instruction after any writes that modify an instruction. 

Because the linear address of the write is checked against the linear address of the instructions 
that have been prefetched, special care must be taken for self-modifying code to work correctly 
when the physical addresses of the instruction and the written data are the same, but the linear 
addresses differ. In such cases, it is necessary to execute a serializing operation to flush the 
prefetch queue after the write and before executing the modified instruction. See Chapter 18 
for more information on serializing instructions. 

NOTE 

The check on linear addresses described above is not in practice a concern 
for compatibility. Applications that include self-modifying code use the same 
linear address for modifying and fetching the instruction. Systems software, 
such as a debugger, that might possibly modify an instruction using a 
different linear address than that used to fetch the instruction must execute a 
serializing operation, such as IRET, before the modified instruction is 
executed. 



23.2.18. Paging 

This section identifies enhancements made to the paging unit and implementation differences 



23-19 



COMPATIBILITY 



in the paging mechanism. 

23.2.18.1. PENTIUM PROCESSOR PAGING 

The Pentium processor provides an extension to the memory management/paging functions of 
the Intel486 CPU to support larger page sizes. See Appendix H for more information. 

23.2.18.2. Intel486 PROCESSOR PAGING 

Two bits introduced in the Intel486 processor have been defined in page table entries for 
controlling caching of pages: 

• PCD — The Page-Level Cache Disable bit controls caching on a page-by -page basis. 

• PWT — The Page-Level Write Through bit controls the use of a write-through of or write- 
back policy on a page-by-page basis. Since the internal cache of the Intel486 processor is 
a write-through cache, it is not affected by the state of the PWT bit. 

23.2.18.3. ENABLING AND DISABLING PAGING 

Paging is enabled and disabled by a MOV CRO, REG instruction that modifies the PG bit. The 
Intel386 CPU family, the Intel486 CPU family, and the Pentium CPU have slightly different 
requirements on the following code used to enable and disable paging: 

1. MOV CRO, REG followed immediately by a short JMP instruction. 

2. Identity map the entire sequence bounded by the MOV and JMP instructions. 

The Intel386 family of CPUs require steps 1 or 2 be performed. The Intel486 family of CPUs 
require that both steps 1 and 2 be performed. The Pentium CPU requires only step 2. Although 
a JMP instruction need not follow immediately, it is recommended, for upwards and 
downwards compatibility, that both requirements be observed. Specifically, the instructions 
modifying the PG bit should be followed immediately by a JMP instruction and those 
instructions should reside on a page whose linear and physical addresses are identical. 

23.2.19. Stack Operations 

This section identifies the differences in stack implementation between the various 
microprocessors. 

23.2.19.1. PUSHSP 

The Pentium CPU, Intel486, Intel386, and Intel 286 processors push a different value on the 
stack for a PUSH SP instruction than the 8086 processor. The 32-bit processors push the value 
of the SP register before it is decremented as part of the push operation; the 8086 processor 
pushes the value of the SP register after it is decremented. If the value pushed is important, 
replace PUSH SP instructions with the following three instructions: 

PUSH BP 



23-20 



COMPATIBILITY 



MOV BP, SP 
XCHG BP, [BP] 

This code functions as the 8086 processor PUSH SP instruction on the Pentium processor. 

23.2.19.2. FLAGS PUSHED ON THE STACK 

The setting of the flags stored by the PUSHF instruction, by interrupts, and by exceptions is 
different on the 32-bit processors than that stored by the 8086 and Intel 286 processors in bits 
12 and 13 (IOPL), 14 (NT), and 15 (reserved). The differences are as follows: 

• 8086 processor — bits 12 through 15 are always set. 

• Intel 286 processor — bits 12 through 15 are always clear in real-address mode. 

• 32-bit processors in real-address mode, bit 15 is always clear, and bits 14 through 12 have 
the last value loaded into them. 

Other bits that can be used to differentiate between the 32 bit processors include: 

• Bit 18 (AC) can be used to distinguish an Intel386 processor from the Intel486 and 
Pentium processors. Since it is not implemented on the Intel386 processor, it will always 
be clear. 

• Bit 21 (ID) can be used to determine if an application can execute the CPUID instruction. 
This instruction supplies information to applications at runtime that identifies the family, 
model, stepping, vendor and what features are implemented on the processor in a system. 
The ability to set and clear this bit indicates that the CPUID is supported on a processor. 

• Bits 19 and 20 will always be zero on processors that do not support virtual mode 
extensions. For more information on virtual mode extensions, see Appendix H. 

These differences can be used to distinguish what type of processor is present and are used in 
the CPU identification code example in Chapter 5. 

23.2.19.3. SELECTOR PUSHES/POPS 

On selector pushes, the Intel486 CPU writes two bytes onto four byte stacks and decrements 
ESP by 4. The Pentium CPU writes 4 bytes with the upper two bytes being zeros. 

On selector pops, the Intel486 CPU reads only 2 bytes. The Pentium CPU reads 4 bytes and 
discards the upper 2 bytes. This may have an effect if the ESP is close to the stack segment 
limit. On the Pentium CPU, ESP+4 may be above the stack limit in which case a fault will be 
generated. On the Intel486 CPU, ESP+2 may be less than the stack limit and no fault is 
generated. 

23.2.1 9.4. ERROR CODE PUSHES 

The Intel486 CPU implements the error code pushed on the stack as a 16-bit value. When 
pushed onto a 32-bit stack, the Intel486 CPU only pushes 2 bytes and updates ESP by 4. The 
Pentium CPU error code is a full 32 bits with the upper 16 bits set to zero. The Pentium CPU, 
therefore, pushes 4 bytes and updates ESP by 4. Any code that relys on the state of the upper 
16 bits may produce inconsistent results. 



I 



23-21 



COMPATIBILITY 



23.2.19.5. FAULT HANDLING EFFECTS ON THE STACK 

During the handling of certain instructions, such as CALL and PUSH A, faults may occur in 
different sequences for the different processors. For example, during far calls, the Intel486 
CPU pushes the old CS and EIP before a possible branch fault is resolved. A branch fault is a 
fault from a branch instruction occurring from a segment limit or access rights violation. If a 
branch fault is taken, the Intel486 CPU will have corrupted memory below the stack pointer. 
However, ESP is backed up in order to make the instruction restartable. The Pentium CPU 
issues the branch before the pushes. Therefore, if a branch fault does occur, the Pentium CPU 
does not corrupt memory below the stack pointer. This implementation difference, however, 
does not constitute a compatibility problem, as only values at or above the stack pointer are 
considered to be valid. 



23.2.19.6. INTERLEVEL RET/I RET FROM A 16-BIT INTERRUPT OR CALL GATE 

If a call or interrupt is made from a 32-bit stack environment through a 16-bit gate, only 16 
bits of the old ESP can be pushed onto the stack. On the subsequent RET/IRET, the 16-bit ESP 
is popped but the full 32-bit ESP is updated since control is being resumed in a 32-bit stack 
environment. The Intel486 processor writes the SS selector into the upper 16 bits of ESP. The 
Pentium CPU writes zeros into the the upper 16 bits. 



23.2.20. Mixing 16- and 32-Bit Segments 

The features of the 16-bit Intel 286 processor are an object-code compatible subset of those of 
the Pentium processor. The Default bit in segment descriptors indicates whether the processor 
is to treat a code, data, or stack segment as a 16-bit or 32-bit segment. 

The segment descriptors used by the Intel 286 processor are supported by the 32-bit processors 
if the Intel-reserved word (highest word) of the descriptor is clear. On the 32-bit Intel x86 
processors, this word includes the upper bits of the base address and the segment limit. 

The segment descriptors for data segments, code segments, local descriptor tables (there are no 
descriptors for global descriptor tables), and task gates are the same for the 16- and 32-bit 
processors. Other 16-bit descriptors (TSS segment, call gate, interrupt gate, and trap gate) are 
supported by the 32-bit processors. The 32-bit processors also have descriptors for TSS 
segments, call gates, interrupt gates, and trap gates which support the 32-bit architecture. Both 
kinds of descriptors can be used in the same system. 

For those segment descriptors common to both 16 and 32-bit processors, clear bits in the 
reserved word cause the 32-bit processors to interpret these descriptors exactly as an Intel 286 
processor does; for example: 

• Base Address — The upper eight bits of the 32-bit base address are clear, which limits base 
addresses to 24 bits. 

• Limit — The upper four bits of the limit field are clear, restricting the value of the limit 
field to 64 Kbytes. 

• Granularity bit — The Granularity bit is clear, indicating the value of the 16-bit limit is 
interpreted in units of 1 byte. 

23-22 ■ 



COMPATIBILITY 



• Big bit — In a data-segment descriptor, the B bit is clear in the segment descriptor used by 
the 32-bit processors, indicating the segment is no larger than 64 Kbytes. 

• Default bit — In a code-segment descriptor, the D bit is clear, indicating 16-bit addressing 
and operands are the default. In a stack-segment descriptor, the D bit is clear, indicating 
use of the SP register (instead of the ESP register) and a 64 Kbyte maximum segment 
limit. 

For formats of these descriptors and documentation of their use see the iAPX 286 
Programmer's Reference Manual. For information on mixing 16 and 32-bit code in 
applications, see Chapter 21. 



23.2.21. Segment and Address Wraparound 

This section discusses differences in segment and address wraparound between the Pentium, 
Intel486, Intel386, Intel 286, and 8086 processors. 

23.2.21 .1 . SEGMENT WRAPAROUND 

On the 8086 processor, an attempt to access a memory operand which crosses offset 65,535 or 
OFFFFH or offset (e.g., MOV a word to offset 65,535 or PUSH a word when SP = 1) causes 
the offset to wrap around modulo 65,536 or 010000H. With the Intel 286 processor, any base 
and offset combination which addresses beyond 16 megabytes wraps around to the first 
megabyte of the address space. The Pentium, Intel486, and Intel386 processors in real-address 
mode generate an exception in these cases: a general-protection exception if the segment is a 
data segment (i.e. if the CS, DS, ES, FS, or GS register is being used to address the segment) 
or a stack exception if the segment is a stack segment (i.e., if the SS register is being used). An 
exception to this behavior occurs when a stack access is datum aligned, and the stack pointer is 
pointing to the last aligned datum of that size at the top of the stack (ESP=0FFFFFFFC). 
When this data is popped, no segment limit violation occurs and the stack pointer will wrap 
around to 0. 

The address space of the Pentium and Intel486 processors may wraparound at 1 megabyte in 
real-address mode. An external pin A20M# forces wraparound if enabled. On members of the 
8086 family, it is possible to specify addresses greater than 1 megabyte. For example, with a 
selector value OFFFFH and an offset of OFFFFH, the effective address would be 10FFEFH (1 
megabyte + 65519 bytes). The 8086 processor, which can form addresses up to 20 bits long, 
truncates the uppermost bit, which "wraps" this address to 0FFEFH. However, the Pentium and 
Intel486 processors do not truncate this bit if A20M# is not enabled. 

If a stack operation wraps around the address limit, shutdown occurs. (The 8086 processor 
does not have a shutdown mode nor a limit.) 



23.2.22. Write Buffers and Memory Ordering 

The Pentium processor has two write buffers, one corresponding to each of the pipelines, to 
enhance the performance of consecutive writes to memory. These write buffers can be filled 
simultaneously in one clock e.g., by two simultaneous write misses in the two pipelines. Writes 
in these buffers are driven out on the external bus in the order they were generated by the 




I 



23-23 



COMPATIBILITY 



processor core. No reads (as a result of cache miss) are reordered around previously generated 
writes sitting in the write buffers. The implication of this is that the write buffers will be 
flushed or emptied before a subsequent bus cycle is run on the external bus. 

It should be noted that only memory writes are buffered and I/O writes are not. The Pentium 
and Intel486 processors do not synchronize the completion of memory writes on the bus and 
instruction execution after the write. The OUT instruction or a serializing instruction needs to 
be executed to synchronize writes with the next instruction. Refer to Chapter 18 for 
information on serializing instructions. 

No re-ordering of read cycles occurs on the Pentium processor. Specifically, the write buffers 
are flushed before the IN instruction is executed. 

On the Intel486 CPU, under certain conditions, a memory read will go onto the external bus 
before the memory writes pending in the buffer even though the writes occurred earlier in the 
program execution. A memory read will only be reordered in front of all writes pending in the 
buffers if all writes pending in the buffers are cache hits and the read is a cache miss. Under 
these conditions, the Intel486 processor will not read from an external memory location that 
needs to be updated by one of the pending writes. 

Locked bus cycles are used for read-modify- write accesses to memory. During a locked bus 
cycle, the Intel486 processor will always access external memory, it will never look for the 
location in the on-chip cache. All data pending in the Intel486 processor's write buffers will be 
written to memory before a locked cycle is allowed to proceed to the external bus. Thus, the 
locked bus cycle can be used for eliminating the possibility of reordering read cycles on the 
Intel486 processor. If the line is present in the cache, the Pentium processor will write it back if 
it was dirty and invalidate the line. 

I/O reads are never reordered in front of buffered memory writes on the Intel486 processor. 
This ensures an update of all memory locations before reading the status from an I/O device. 



23.2.23. Bus Locking 

The LOCK prefix and its bus signal only should be used to prevent other bus masters from 
interrupting a data movement operation. The LOCK prefix only may be used with the 
following Pentium CPU, Intel486 CPU, and Intel386 CPU instructions when they modify 
memory. An invalid-opcode exception results from using the LOCK prefix before any other 
instruction, or with these instructions when no write operation is made to memory (i.e., when 
the destination operand is in a register). 

• Bit test and change: the BTS, BTR, and BTC instructions. 

• Exchange: the XCHG, XADD, CMPXCHG, and CMPXCHG8B instructions (no LOCK 
prefix is needed for the XCHG instruction). 

• One-operand arithmetic and logical: the INC, DEC, NOT, NEG instructions. 

• Two-operand arithmetic and logical: the ADD, ADC, SUB, SBB, AND, OR, and XOR 
instructions. 

The Intel 286 processor performs the bus lock function differently than the Intel486 processor. 
Programs which use forms of memory locking specific to the Intel 286 processor may not run 
properly when run on the Intel486 processor. 



23-24 



i 




COMPATIBILITY 



A locked instruction is guaranteed to lock only the area of memory defined by the destination 
operand, but may lock a larger memory area. For example, typical 8086 and Intel 286 
configurations lock the entire physical memory space. Programmers should not depend on this. 

On the Intel 286 processor, the LOCK prefix is sensitive to IOPL; if CPL is less privileged 
than the IOPL, a general protection exception is generated. On the Intel386 DX, Intel486, and 
Pentium processors, no check against IOPL is performed. 



23.2.24. Bus Hold 

Unlike the 8086 and Intel 286 processors, but like the Intel386 and Intel486 processors, the 
Pentium processor respond to requests for control of the bus from other potential bus masters, 
such as DMA controllers, between transfers of parts of an unaligned operand, such as two 
words which form a doubleword. Unlike the Intel386 processor, the Pentium and Intel486 
processors respond to bus hold during reset initialization. 



23.2.25. Two Ways to Run Intel 286 CPU Tasks 

When porting 16-bit programs to the Pentium processor, there are two approaches to consider: 

1. Porting an entire 16 software system to a 32-bit processor, complete with the old operating 
system, loader, and system builder. 

In this case, all tasks will have 16-bit TSSs. The 32-bit processor is being used as if it were 
a faster version of the 16-bit processor. 

2. Porting selected 16-bit applications to run in a 32-bit processor environment with a 32-bit 
operating system, loader, and system builder. 

In this case, the TSSs used to represent 286 tasks should be changed to 32-bit TSSs. It is 
possible to mix 16 and 32-bit TSSs, but the benefits are small and the problems are great. 
All tasks in a 32-bit software system should have 32-bit TSSs. It is not necessary to change 
the 16-bit object modules themselves; TSSs are usually constructed by the operating 
system, by the loader, or by the system builder. See Chapter 21 for more discussion of the 
interface between 16-bit and 32-bit code. 

Because the 32-bit processors use the contents of the reserved word of 16-bit segment 
descriptors, 16-bit programs which place values in this word may not run correctly on the 32- 
bit processors. 



23.3. FLOATING-POINT UNIT 

This section addresses the issues that must be faced when transporting numerical software to a 
Pentium processor with integrated FPU from one of its predecessor systems. To software, the 
Pentium processor looks very much like an Intel486 DX system, an Intel486 SX and Intel487 
SX math coprocessor system, or an Intel386 CPU and Intel387 math coprocessor system. 
Software which runs on any of these systems will run with at most minor modifications on the 
Pentium processor. To transport code directly from an Intel 286 CPU with an Intel287 math 
coprocessor-based system or an 8086 CPU with an 8087 math coprocessor-based system to the 



i 



23-25 



COMPATIBILITY 



Intel486 processor, certain additional issues must be addressed. 



23.3.1. Control Register Bits 

This section summarizes the differences in control register bits that may affect numerical 
software. 



23.3.1 .1 . EXTENSION TYPE (ET) BIT 

The ET (Extension Type) bit of the CRO control register is used in the Intel386 processor to 
indicate whether the math coprocessor in the system is an Intel287 math coprocessor (ET=0) or 
an Intel387 DX math coprocessor (ET=1). This bit is not used by Pentium processor or 
Intel486 processor hardware. The ET bit is hardwired to "1". 

23.3.1 .2. NUMERIC EXCEPTION (NE) BIT 

The NE (Numeric Exception) bit of the CRO register is used in the Pentium and Intel486 
processors to determine whether unmasked floating-point exceptions are reported internally via 
interrupt vector 16 (NE=1) or through external interrupt (NE=0). On reset, the NE bit is 
initialized to 0, so software using the automatic internal error-reporting mechanism must set 
this bit to 1. This bit is nonexistent on the Intel386 processor. 

23.3.1 .3. MONITOR COPROCESSOR (MP) BIT 

As on the Intel 286 and Intel386 processors, the MP (Monitor coprocessor) bit of the CRO 
control register determines whether WAIT instructions trap when the context of the FPU is 
different from that of the currently-executing task. If MP=1 and TS=1, then a WAIT 
instruction will cause a Device Not Available fault (interrupt vector 7). The MP bit is used on 
the Intel 286 and Intel386 microprocessors to support the use of a WAIT instruction to wait on 
a device other than a numeric coprocessor. The device reports its status through the BUSY# 
pin. Since the Pentium and Intel486 processors do not have such a pin, the MP bit has no 
relevant use, and should be set to 1 for normal operation. 

23.3.1 .4. FPU STATUS WORD 

This section identifies differences to the FPU status word for the different Intel architecture 
processors/math coprocessors, as well as the reason for the differences, and their impact on 
software. 

• Bits C3-C0 — After FINIT and hardware reset, these bits are set to zero on the Pentium 
CPU, Intel486 CPU, and Intel387 math coprocessor. After FINIT and hardware reset, the 
Intel287 and 8087 math coprocessors leave these bits intact (they contain the prior value). 
This has no impact on software and provides a consistent state after reset. Transcendental 
instruction results in the core range of the Pentium processor (as defined in Chapter 7) 
may differ from the Intel486 DX and Intel487 SX CPU's by around 2 to 3 units in last 
place (ulps). As a result, CI may also differ. 



23-26 



i 



COMPATIBILITY 



• Bits C3, CI, CO — After an incomplete FPREM/FPREM1, these bits are set on zero on the 
Pentium processor, Intel486 processor and the Intel387 math coprocessor. On the 8087 
and the Intel287 math coprocessor, these bits are left intact following incomplete 
FPREM/FPREM 1 execution. 

• Bit C2 — Bit 10 serves as an incomplete bit for FPTAN on the Pentium and Intel486 
processors and the Intel387 math coprocessor. This bit is undefined for FPTAN on the 
Intel287 and 8087 math coprocessors. This change has no impact on software as 
programs do not check C2 after FPTAN. This upgrade allows fast checking of operand 
range. 

• Status Word Bit 6 for Stack Fault — When an invalid-operation exception occurs on the 
Pentium CPU, Intel486 CPU, or Intel387 math coprocessor due to stack overflow or 
underflow, not only is bit (IE) of the status word set, but also bit 6 is set to indicate a 
stack fault and bit 9 (CI) specifies overflow or underflow. Bit 6 is called SF and serves to 
distinguish invalid exceptions caused by stack overflow/underflow from those caused by 
numeric operations. When an invalid-operation exception occurs on the Intel287 or 8087 
math coprocessor due to stack overflow or underflow, only bit (IE) of the status word is 
set. Bit 6 is RESERVED. This has no impact on software. Existing exception handlers 
need not change, but may be upgraded to take advantage of the additional information. 
Newly written handlers will be more effective. This upgrade provides performance 
improvement. 

23.3.1 .5. CONTROL WORD 

Only affine closure is supported for infinity control on the Pentium CPU, Intel486 CPU, and 
Intel387 NPX. Bit 12 remains programmable but has no effect on operation. On the Intel287 
and 8087 math coprocessors, both affine and projective closures are supported. After RESET, 
the default value in the control word is projective. Software that requires projective infinity 
arithmetic may give different results. This change was made in order to conform to IEEE 
Standard 754. 

23.3.1.6. TAG WORD 

This section describes the differences in the tag word for the difference Intel architectures, the 
reason for the differences, and their impact on software. 

• When