Part of this archived thread from comp.sys.unisys: https://groups.google.com/d/msg/comp.arch/GtQ5o96eMaI/HPXiRiYKPv0J Subject: Re: segmentation or lack thereof From: Paul Kimpel Date: 12/9/2007, 12:16 PM Newsgroups: comp.arch,alt.os.development,comp.sys.unisys On 11/25/2007 11:19 AM, Louis Krupp wrote: > Paul Kimpel wrote: >> On 11/25/2007 7:29 AM, John L wrote: >>>> Any comments from the B5500/6700/A-Series builders or customers? >>> >>> I must admit I'd forgotten about the Burroughs machines.  My >>> impression is that they're the healthiest segmented machines around >>> today, but they also suffer from performance and address space issues. >>> >>> >> John's last statement is so bald and completely unsupported that I have to take issue with it. Every architecture has performance and address space issues, depending on the problem space in which you want to consider it. > > John said it was his "impression," which is something short of a claim to knowing the one and only truth.  I'm sure he's willing to stand corrected, just as the rest of us are open to education. > >> >> John's comment about address space is one that I really cannot understand, though. All of the MCP machines going back to the B5500 have had huge virtual address spaces, just not ones that are allocated all in one piece. I'm not sure how you could compute the total virtual address space limit on the current systems, but it's easily many trillions of bytes PER TASK. You would hit some practical limitations, such as the maximum size of the system's segment descriptor tables, long before running out of virtual address space provided by the architecture. In almost 40 years of working with these systems, I've never seen an application on them that came even close to pressuring the virtual address space, let alone exceeding it. There were serious PHYSICAL address space problems with the B6700/7700 systems in the late 1970s and early '80s, but these were resolved by the mid-80s in the A Series line, and done so largely without impact on existing applications. > > Can you give an overview of how A-Series segmentation works?  "RTFM" is OK in my case, since I have the manual... > > Louis > > (I've trimmed follow-ups, since as far as I know, Linux hasn't been ported to the A-Series.  Of course, I'm willing to stand corrected on that.) First, Louis' point concerning my comment on John L's "impression" is well taken. As I read John's sentence concerning the Burroughs machines, the first clause was an impression, but the second was a conclusion. If that was not what John intended, then I apologize. Louis asked if I could give an overview of how the Burroughs segmentation works, and John Ahlstrom has privately encouraged me to do so. I think I know this reasonably well from a software perspective, but as John L points out, the architecture is phenomenally complicated. The elements of the architecture are also extremely synergistic, and it's impossible to talk about one aspect (such as segmentation) in the absence of others. As a result, this is a very long post -- about 9900 words. In order to understand the context in which memory addressing and segmentation operate, I also have to talk to some degree about stacks, compilers, codefiles, and the very tight integration between the hardware architecture and the MCP (operating system) architecture. This isn't easy, so please bear with me while I lay down some groundwork and then to try to describe (without pictures) how segmentation and memory addressing for this very complex, but fascinating architecture work. Background. I will start with a short history lesson. The origin of the Burroughs stack/descriptor architecture was the B5000 of 1962. Bob Barton is generally credited with the conceptual design of the architecture. This machine was a major departure from anything Burroughs had attempted before. It was clearly influenced by the Rice University (where Barton had spent some time) computer, particularly by the Rice system's use of "codewords", from which the concept of descriptors arose. I believe the Ferranti Atlas was also an influence on the B5000 design. A somewhat updated version of the architecture was released as the B5500 in 1964 and stayed in production until 1970. At the very end there was briefly a version called the B5700, but it was really a B5500. (The B5900 mentioned below is a contemporary of the other Bx900 models and not a variant of the B5500.) Burroughs began work on a successor machine, the B6500, in 1965, but it was not released until 1969. It had a very difficult introduction to customer use, and did not work properly until it was updated and re- released as the B6700 in 1971. All B6500 CPUs in the field were then replaced with B6700s. The B6500/B6700 was a substantially different design from the B5500 -- the two instruction sets were quite different, as were the format of descriptors and other control words. Limits on the size of both physical and virtual address spaces were increased dramatically. A dramatically different way of handling character-oriented data was also introduced. About the only things that got carried over without change were the word size (48 bits), the integer and floating point word formats, and the six-bit character code, known as BCL. The MCP operating system was also completely redesigned. Nonetheless, the two designs are conceptually similar, and both the B6500/6700 hardware and system software should be seen as refinements of those for the B5500. The Burroughs B6700 is the basis for the Unisys ClearPath MCP systems that are still being sold today. There have been some tweaks to the instruction set, and a major change was made in the mid-1980s to the way memory is addressed to accommodate still larger physical address spaces, but the processor architecture (at least from the perspective of user- level software) has remained essentially the same since 1970. The architecture is now referred to internally as "E-mode" (I think the "E" stands for "emulation"), and its formal specification has gone through a number of levels, known as Beta, Gamma, Delta, and (the latest) Epsilon. The machines have had a variety of marketing names over the years. Since the various names can be confusing to those not familiar with the product line, here is a summary: Burroughs B5000/5500/5700 -- the original early 1960s architecture. Burroughs B6500/6700/7700/6800/7800/B5900/6900/7900 Burroughs/Unisys A Series (MicroA, A1-A7, A9-A13, A14-A19) Unisys ClearPath 4600/4800/5600/5800/6800 Unisys ClearPath NX4200 (the first of the models where the hardware is a standard Intel box and the MCP architecture is emulated on the Intel processor. This approach is currently used on all small-to-medium performance models) Unisys ClearPath LX5000/6000/7000 (later emulated systems) Unisys ClearPath Libra (these are also referred to as the "CS" series; the Libra 300, 400, and 520 models are emulated systems) Having so many models, we need a generic name, so I'll follow the current Unisys convention and refer to them as MCP systems (acknowledging that other, now obsolete, Burroughs/Unisys architectures also used operating systems named "MCP"). With that background, I will try to describe how memory addressing and segmentation works on the B6700, since discussing that particular model gives a good conceptual base. The mechanism was similar, but more primitive, on the B5500. This mechanism changed with the A Series to expand the physical address space and implement improvements in some other areas. I'll discuss those differences later. Even though the B6700 has been obsolete for 30 years, I'll talk about it mostly in the present tense. Words and Tags. The best place to start is with word formats. The B6700 (like the B5500) has a 48-bit data word. One of the significant changes from the B5500, however, was the addition of some extra bits to the word, called the "tag". The B6700 and later systems through E-mode Beta had a three-bit tag. E-mode Gamma and subsequent systems use a four-bit tag. The tag indicates what type of information the word contains. Generally, tags 0, 2, 4, and 6 are data types that user-level code can freely manipulate; the rest are control types that are used by a combination of the hardware and operating system kernel. The tag values for the B6700 are:     0 = single-precision data word (numeric or character data)     1 = IRW (indirect reference word) or SIRW (stuffed indirect         reference word). These are effectively indirect addresses to         words in a program stack. A brief discussion of what "stuffed"         means is covered in the section on Addressing Data, below.     2 = double-precision data word (typically numeric)     3 = general purpose protected control word: stack linkage, code         segment descriptor. All words in a code segment (one containing         executable instructions) also must have tags of 3.     4 = SIW (step index word): control word for the STBR (step branch)         looping instruction -- a nice idea, but the performance was         poor, and STBR is no longer supported on current systems. This         tag value is now used by the MCP as a special marker word in         stacks (e.g., for fault trap indicators).     5 = data descriptor. These words are the basis for data         segmentation and are discussed in detail below.     6 = uninitialized operand. Words with this tag value can be         overwritten, but reading them generates a fault interrupt. Used         as the initial and NULL value for pointers.     7 = PCW (program control word). The entry point to a procedure         (subroutine). Also used as one form of dynamic branch address. The additional tag values introduced with E-mode Gamma are primarily used for optimizations of descriptors after a segment has been allocated and are not significant for this conceptual discussion. There are user-mode instructions that can read and set the tag on the word at top of stack. You could theoretically do quite a bit of mischief with this on the B6700, but the compilers are a trusted component of the system, so in practice this was not a problem. In more recent systems, the microcode carefully restricts which tag values can be set on data words in the stack. The Set Tag instruction (STAG) running in user-level code cannot now, for example, set tag 5 on a data word with bit 47=1 (i.e., create a present data descriptor -- see below for what this means). This brings up the subject of "codefiles" (files containing executable code) and compilers. A codefile is a controlled entity under the MCP. Codefiles can be generated only by compilers, which are programs that must be "blessed" using a privileged MCP command. There is no assembler for these systems, nor anything that allows a user to generate arbitrary sequences of object code. Recompiling the source code for, say, the COBOL-85 compiler does not generate a compiler, just an ordinary program that inputs source code and outputs a file of instructions and other run-time data. That file is not a codefile, however. It cannot be executed and cannot be turned into a codefile. There is no facility for a program (even one running with the highest privileges) to read that file and present any of its data to the hardware as a code segment. Words of instructions in a code segment must have tags of 3. There is no way for user-level code to set those tags, let alone create a code segment and branch to it. The I/O hardware has a variant of the disk read operation that will apply tags of 3 to data as it is being read and transferred to memory, but only the MCP has access to the physical I/O hardware, so there is no way for user programs to initiate such an operation. [When I say "no way" here, I mean that the hardware/MCP architecture is specifically designed to prohibit such a thing, and I'm not aware of even a conceptual attack that could get around the design. There have been some holes discovered in the past, though, and of course it's entirely possible that there are more that have not yet been brought to light. I won't speculate on the likelihood of their existence. To me, the weakest part of this design is the trust that must be placed in the compilers. If you can generate arbitrary code, or modify a valid codefile externally (e.g., on a backup tape) and reload it, you can get around at least some of the system's protections, but it's still not easy).] This discussion of codefiles and compilers highlights a basic characteristic of MCP memory segmentation: code and data are entirely separate entities and are allocated and managed by the system separately. There are code segments and data segments, and while they are allocated from the same system-global heap and may be adjacent in physical memory, logically they are separate and addressed entirely differently. Both types of segments can be created only by the MCP. The contents of code segments are loaded solely from codefiles. Code segments are read-only, and as we will see, are automatically reentrant. Data Segment Descriptors. Descriptors are called such because they "describe" an area of memory. MCP systems are a form of capability architecture, and the descriptors are the capability -- you have to have access to the descriptor to access the data it describes. Descriptors are the basis of memory addressing and memory segmentation. A tag-5 word in the B6700 architecture represents a data descriptor. The word has a number of fields which I will identify using what is called partial-word notation, [s:n], where "s" is the starting bit number and "n" is the length of the field in bits. Bit 47 is high order and bit 0 is low order; all MCP systems use big-endian addressing.     [47:1]  Presence bit (commonly known as the "P-bit")     [46:1]  Copy bit     [45:1]  Indexed bit     [44:1]  Paged bit     [43:1]  Read-only bit     [42:3]  Element Size field               0 = single precision words               1 = double precision words               2 = four-bit characters (packed decimal)               3 = six-bit characters (BCL, now obsolete)               4 = eight-bit characters (EBCDIC)     [39:20] Length or index field (determined by [45:1])     [19:20] Segment starting address Accessing the data in a segment is done by means of "indexing" the descriptor with a zero-relative offset into the segment. There are a series of instructions that do this, as will be detailed below. The Presence bit indicates whether the memory area represented by the descriptor is physically present in memory. It is the basis for the virtual memory mechanism. "Virtual memory" is IBM's clever marketing term that everyone has adopted, but it does not accurately describe what goes on in MCP systems. "Automated memory segment overlay" is a more accurate term, and is what Burroughs called it before IBM's term caught everyone's attention. When the Presence bit is 1, the segment is physically present in memory starting at the real address in [19:20]. (Note that in A Series and later systems, the real memory address is no longer contained in descriptors, as will be discussed later). When the Presence bit is zero, the segment is not present in memory (or has never been allocated) and the descriptor is said to be "absent" (although it's really the data area that is absent). Attempting to access a segment through an absent descriptor generates a "Presence bit interrupt" -- what other systems would call a page fault. When handling this interrupt, the MCP interprets the value in the address field as follows:   * If zero, this indicates that the segment has never been allocated.     The MCP simply allocates an area of length specified by [39:20],     relocating or overlaying other physically-present segments as     necessary. The MCP clears the allocated area to binary zeroes     (primarily to wipe out any nasty tags that may be there from prior     use of that space), fixes up the address field in the descriptor and     sets its Presence bit. Exiting from the interrupt handler causes the     hardware to restart the instruction which generated the interrupt.     The compilers generate code in the initialization section of     procedures to build these "untouched" descriptors directly in the     program's stack. The physical memory area is not allocated until     (and unless) the program actually references it.   * Some types of objects require additional information from the     compiler (e.g., for multi-dimensional array-of-arrays structures,     the length field in the descriptor only specifies the length of the     first dimension so certain bit patterns in the address field for     untouched descriptors point to data structures the compiler has     built within the codefile that carry the necessary information for     the other dimensions).   * Otherwise, the segment was once present in memory but has been     rolled out, usually due to pressure from other memory allocation     activity. The value in the address field of the descriptor indicates     the location within the task's "overlay file" (a temporary, unnamed     file the MCP allocates for each task) where the data associated with     this descriptor has been rolled out. The MCP allocates an     appropriate area of memory, reads the segment from the overlay file,     fixes up the descriptor with the new real memory address, and exits     the interrupt handler to restart the instruction that was     interrupted. The Copy bit indicates that a tag-5 word is not the original descriptor for an area, but rather a copy of it. A non-copy descriptor is said to be the "mom" descriptor for an area, and there can only be one such mom. Copy descriptors are generated automatically by a number of instructions, principally those involved with indexing and loading words on the top of stack (e.g., to pass an array as a parameter to a procedure). Present copies point directly to the segment's memory area; absent copies point to the mom. In the B6700, there is no central page table, per se, to keep track of memory areas. The mom descriptor effectively serves the purpose of a page table entry. Every area of physical memory (allocated or available) is surrounded by a set of "memory link" words that are used by the MCP memory allocation routines to keep track of allocated areas and locate available areas of appropriate size. For allocated areas, one of these link words points back to the mom descriptor. The handling of moms and copies is another thing that has changed with the A Series and later models. The Indexed bit indicates whether the descriptor points to a whole segment or to one element within a segment. When the bit is 0, the length/index field contains the length of the segment. Some indexing instructions generate an "indexed" copy descriptor with both the Indexed and Copy bits set to 1 (all indexed descriptors are by definition copies). Indexed descriptors are typically used as a pointer, e.g., as a destination address for store operators or as a call-by-reference parameter to a procedure. They are also used as starting addresses for string manipulation instructions. Algol has a "pointer" type; it represents an indexed data descriptor. In an indexed descriptor, the length field is replaced by a zero- relative offset into the segment. The physical address of that element (assuming the segment is present in memory) is the sum of the base address in [19:20] and the offset in the length field. Depending on the value of the Element Size field [42:3], a descriptor could be word oriented or character oriented. Physically, memory is accessed as whole words. When a character-oriented descriptor is indexed, its length field is replaced by a specially-encoded offset. Bits [35:16] are a word offset within the segment; bits [39:4] are a character offset within that word. This obviously limits the offset for character pointers to 393215 for eight-bit characters and twice that for four-bit packed decimal digits (although this limit can be effectively eased by the use of paged areas, discussed next). The string instructions understand these character offsets and transparently handle character operations that start in the middle of words. The Paged bit indicates whether the memory area for the descriptor is monolithic (0) or paged (1). If paged (the original term for this is "segmented", but talking about segmented segments can be confusing), the address field does not point to the address of the data, but instead to a vector of other descriptors. Each of these descriptors in turn points to a data segment. In memory, this looks identical to a two-dimensional array-of-arrays structure. The difference is in how the software accesses the data. With a standard two-dimensional array, the software must explicitly compute and apply an index for each dimension. With a paged segment, the software is unaware of the second dimension. When the indexing instructions encounter a descriptor with the Paged bit set, they partition the index value into a page number and page offset and automatically re-index the second dimension. If an indexed copy is generated, it is a copy of the descriptor for the page, not the original (paged) descriptor, and the index offset will be that within the page. This helps alleviate the smaller limit for word offset available with indexed character descriptors. For the B6700, the page size was 256 words. Data areas were typically paged when they exceeded 1024 words, but the programmer had some control over this. The page size on current systems is 8192 words. The paging threshold is adjustable by the system administrator, and by default is also 8192 words. As with all memory areas in the system, the individual pages are not allocated until first referenced. The array of page descriptors (called a "dope vector") is not even allocated until one of the pages is initially accessed. All pages are individually relocatable and overlayable. String operations that run off the end of a physical memory area generate a "segmented [paged] array" interrupt. The MCP responds to this interrupt by locating the next page in sequence (if there is one) and restarting the operation. If there is no next page (which would also be the case if the string operation ran off the end of an unpaged segment), the result is a "Segmented Array Error", with which everyone who has programmed for MCP systems is no doubt intimately familiar. Unless trapped, this error (a form of boundary violation) is fatal to the program. The Read-only bit is just that -- if 1, it marks the segment as non- writeable. This is primarily used for segments that represent constant pools generated by the compiler and which are loaded from the codefile. The Element Size field, as discussed above, determines whether the descriptor represents single precision word, double precision word, or character-oriented data. It is quite common, especially with COBOL programs, to have multiple descriptors with differing size fields pointing to the same segment. Only one of these could be the mom, of course; the rest would have to be copies. This allows the software to address the same memory area as a mixture of word and character fields. The B5500 strictly used six-bit characters; the B6700 was basically an eight-bit EBCDIC machine, but could also handle six- and four-bit characters. Support for the six-bit codes was dropped from the architecture in the Bx900 models, ca. 1980. I understand that the latest Libra models have added support for 16-bit characters. The role of the Length/index and Address fields has been covered in the discussion above, so nothing additional needs to be said about them here. Code Segment Descriptors. Thus far, the discussion has been about data descriptors. There are also descriptors for code segments. They are similar to data descriptors, but simpler. Code segment descriptors have a tag of 3 and the Presence bit, Copy bit, Length, and Address fields of data descriptors. Bits [45:6] are not used. Code segments cannot be indexed or paged. They are by definition read only. The only element size they support is single precision words. Code segment descriptors live in a special type of data segment called the Segment Dictionary. An image of this segment is built by the compiler (all descriptors being in their absent, untouched form, of course) and stored in the codefile. The Segment Dictionary is loaded by the MCP as part of task initiation. In addition to code segment descriptors, the Segment Dictionary may contain (read-only) data descriptors for constant pools and scalar constant values. The Segment Dictionary in memory is actually a type of stack, although not for push/ pop type of activity. As we will see, stacks in MCP systems are a central element in the addressing environment, and it is for this purpose that a Segment Dictionary is loaded as a stack. Segment Dictionaries are also sharable -- if multiple tasks are initiated from the same codefile, the Segment Dictionary is loaded only once and the separate tasks are linked to this common copy. Thus, all of the object code and read-only constant pools for a program are automatically reentrant. Object code is addressed using a three-part index. The first part is the "segment number", which is the code segment descriptor's offset within the Segment Dictionary. The second part is the word offset within the segment. The third part is the instruction syllable (byte) offset within the word. These numbers are zero-relative and generally written in hex, so an address of 03C:0041:3 indicates segment #60, word offset 65, byte offset 3. The processor uses variable-length instructions and can branch to any syllable offset within a segment. By using one of several dynamic branch instructions with a Program Control Word (PCW, tag 7) as an argument, a program can branch across segments -- in fact PCWs allow programs to branch and call procedures across stacks. When a program branches to a different code segment (either directly or by means of a procedure call), the segment is made present if it is not already so, and the segment length and base memory address are loaded into registers within the processor designated for that purpose. Intra-segment branches use only the word and syllable offsets, and therefore do not need to continually reference the code segment descriptor. The Presence bit in a code segment descriptor indicates physical presence or absence of the segment in memory, just as for data segments. For an absent descriptor, the address field indicates the offset within the codefile where the segment starts. Code segments and constant pools are not loaded from the codefile until first reference. When a program is initiated, the Segment Dictionary is the only thing that is loaded from the codefile. Everything else is loaded as a result of Presence bit interrupts. Since codefiles and code segments are guaranteed to be read only, code segments (and constant pools loaded from the codefile) are never rolled out to an overlay file. The physical memory area is simply deallocated and the codefile offset restored in the absent descriptor (that offset is stored in one of the memory link words while the segment is present and the descriptor's address field points to the area in memory). The code or data is simply reloaded from the codefile the next time a task trips over the descriptor's Presence bit. Resizing Segments. Descriptors obviously support bounds checking, along with the dynamic relocation and overlay of real memory areas, but they have another significant advantage -- the dynamic resizing of data segments. Since the length of a segment is part of the descriptor, the basis for bounds checking is centralized. The physical memory area can be resized by a user program at run time, the length in the descriptor will be updated, and future indexing operations will check against the new length. The B6700 and all later systems support the programmatic resizing of data segments. Code segments are immutable, so resizing them is not a meaningful operation. Support for resizing varies by language, but in Algol, it is performed using the RESIZE intrinsic procedure.     RESIZE(A,N) resizes the array A to a new length of N. The unit of N     is determined by the Element Size field in the descriptor. The old     segment is deallocated and its contents are lost. The segment with     the new length will not be allocated until it is next referenced.     RESIZE(A,N,RETAIN) allocates a new segment of length N and copies up     to N units from the old segment to the new one. If the new length is     shorter, any remaining units from the old segment are lost; if the     new length is longer, the remainder of the segment is filled with     binary zeros. Once the data is copied to the new segment, the old     segment is deallocated.     RESIZE(A,N,PAGED) is similar to the RETAIN option, but creates the     new data segment with a paged data descriptor. RESIZE works on individual segments. In the case of multi-dimensional array-of-arrays structures, the row for each final dimension can be resized independently and to differing lengths. Actually, RESIZE can be applied to any of the dimensions of a multi- dimensional array. Given an Algol declaration of     ARRAY M [0:99, 0:49, 0:63, 0:4095]; RESIZE(M[4,7,*,*], 75, RETAIN) would resize the M[4,7] dope vector in the third dimension from its original length of 64 to a new length of 75. This would create 11 new, untouched descriptors of length 4096 at the end of that dope vector. These resizable array-of-arrays structures are often used, especially in Algol, to implement flexible, safe, dynamic storage allocation schemes for user programs. Addressing Data -- Indexing and the Stack. Note that in the discussion thus far, nothing has been said about user- level programs having access to memory addresses -- real or virtual. Data descriptors for the B6700 can hold real memory addresses, but the content of descriptors is controlled only by hardware instructions and the MCP, not user-level code. User-level code (and all but small parts of the MCP kernel, for that matter) does not manipulate addresses, it manipulates offsets into memory areas. Separate memory areas generally relate to separate objects for a programming language (e.g., arrays for Algol, Pascal, and Fortran, or 01 levels for COBOL). Since there are no addresses, there can be no invalid addresses. You can try to use an invalid offset, of course, but that offset must be applied against a descriptor, where it is ALWAYS checked against the length field. If the offset is less than zero or greater than or equal to the length, your task becomes the happy recipient of the "Invalid Index" bounds violation interrupt. Depending on the language you are using, you can trap this interrupt (actually, what you trap is the MCP's response to the interrupt), but you can't restart the operation which caused it -- your only options are to branch to a recovery routine or enter the catch portion of a try/catch block. If you do not trap the interrupt, your task is terminated by the MCP. Another thing that has not been mentioned yet is registers. There are certainly registers in the processor -- the B6700 had dozens of them -- but the instruction set accesses them implicitly. None of them are accessed directly by user-level code. (Actually, on the B6700 that wasn't quite true -- compilers would generate code to access some of the registers directly, but this was not common. Access to these registers has been tightened up considerably on more recent systems). Instead of loading addresses directly into registers and using those to read and store real or virtual memory locations, user-level programs on MCP systems access data in two ways: (a) directly in a stack or (b) outside of a stack by going through a descriptor that is in a stack. A stack in the MCP architecture is simultaneously three things:     (a) a memory area for push/pop expression evaluation,     (b) the stack frames (activation records) and return history for         procedure calls, and     (c) the basis for addressing global and local variables in a         program. The B5000 and all of its descendants were designed to support Algol. To understand how stack addressing works, you need to understand how Algol programs can be structured. If you don't know Algol, think Pascal -- they're very similar. An Algol program is constructed as a series of nested blocks. Each block can contain local variables (including procedure declarations), but also has addressability to the variables in all of its more global (i.e., containing) blocks. The body of a procedure can be considered a block, whether it has local declarations or not. The scope rules for identifiers in this scheme are the same as for a two-level language such as C, it's just that there can be more levels. In MCP systems, this nesting is referred to as the "lexicographical level" (lex level or LL). The global code ("outer block") of most user programs runs at LL=2. First-level procedures (such as you would have in a C program) run at LL=3. Any procedures declared within those first-level procedures (something you can't do in C) would run at LL=4, and so forth. The Segment Dictionary is loaded as a stack with an environment of LL=1. Therefore, multiple tasks initiated from the same codefile see their code segments and constant pools as globals at a higher level -- hence the Segment Dictionary and all of the segments based on it are reentrant. The MCP stack (another that is purely an addressing environment, not a data space for a task) is at LL=0. Note that stack addressing can cross stacks. Also note that on the B6700, all MCP globals were in the scope of the user programs. This allowed them to call MCP service procedures directly, as if they were global to the user program (which, in fact, they were). There is no "service call" or "branch communicate" or other major environment change to access O/S services -- user programs simply call what to them are global procedures. This accessibility to the MCP stack was recognized as a serious security issue fairly early on, and later systems blocked direct access to LL=0. User programs now still call MCP service routines as normal procedures, but access to the procedure entry points and other global objects is provided through indirect addresses in the Segment Dictionary. These indirect addresses are initially set up to cause a fault interrupt when first accessed, at which point the MCP verifies the access and replaces the intentionally bad word with a valid SIRW (tag=1, Stuffed Indirect Reference Word) to the MCP global. Subsequent references to the service routine merely require a one-level indirection to reach the entry point. The "stuffed" for an SIRW refers its ability to address a word in a stack outside the current scope chain, and in particular, across stacks. What gets stuffed in the word is sufficient environment information to allow out-of-scope, cross-stack addressing. A normal IRW (also tag=1) only addresses within the current scope chain. There is an instruction (STFF) that converts a normal IRW to an SIRW. Perhaps inevitably, it is commonly called the "get stuffed" operator. (Current systems no longer generate normal IRWs -- all reference words are now generated as SIRWs unconditionally, so STFF has become a no-op). The local variables for a block are allocated in the stack. In the case of procedures, this provides efficient recursion. Simple blocks (i.e., nested BEGIN/ENDs containing declarations) are implemented as parameterless procedure calls. Therefore, more-global variables are lower in the stack, or possibly in a separate stack (note that unlike some systems, MCP stacks grow from low addresses to high addresses -- a push increments the top-of-stack address, not decrements it). Addressing within the current scope chain is a very common thing to do, so MCP systems provide a series of base registers, called the "D" (for "display") registers. There is one D register for each lex level, and it contains the absolute (real) memory address to the base of a block's stack frame. The B6700 had 32 such D registers, but this proved (for once) to be more than necessary. Later systems cut back to 16 D registers (allowing user procedures nested 13 levels deep -- I doubt that I've ever coded anything that goes more than four levels deep). The B5900, which was the first microcoded processor and was based on bit- slice chips, tried to get by with four D registers (0, 1, 2, and current LL), but that didn't work too well, and that approach was abandoned in later designs. The simplest addressing mode for MCP systems is based on these D registers and uses a construct known as an "address couple". The address couple has two fields, LL (which selects a D register) and an offset from the address in that D register. This is written "(LL,offset)" -- thus (2,17) refers to the 17-th (zero relative) word in the global (LL=2) address space of a program. For an Algol program, this address space would be the outer block; for COBOL, it would be WORKING-STORAGE; for FORTRAN, it would be COMMON; for a C program it would be the environment for static declarations. With that introduction to stack addressing, here is the key concept: scalar variables are allocated in the stack; non-scalar variables (arrays, structures, records, whatever) are allocated outside the stack. A descriptor is allocated in the stack to access that non-scalar area. To illustrate this, here is a simple Algol program:     1:  BEGIN     2:  INTEGER I;          % I is allocated at (2,2)     3:  ARRAY A[0:99];      % descriptor for A at (2,3)     4:  I:= 1;     5:  DO     6:    A[I]:= A[I-1]+I     7:  UNTIL I:= I+1 > 100;     8:  END. The stack offset for I starts at 2 because there are two linkage words at the base of a stack frame for (a) the procedure return address and (b) a control word called the MSCW (Mark Stack Control Word) that allows the processor to reconstruct the D registers to the values for the calling environment upon procedure (or in this case, block) exit. Here is the code that Algol would generate for this snippet (note that this is slightly idealized and out of order from what the current Algol compiler would generate, but both the concept and effect are accurate). Line 1: (nothing) Line 2: ZERO            push a zero value onto the stack at (2,2) Line 3: LT48 000006400000 push a skeletal descriptor with length=100                           onto the stack at (2,3)         LT8 5           push a literal 5 onto the stack         STAG            set a tag=5 onto the skeletal descriptor; this                         leaves an untouched descriptor at (2,3)         PUSH            flush the top-of-stack (TOS) words into memory         LT16 2048       push literal 2048 (this is a marker word for the                         MCP BLOCKEXIT procedure called at the end)         BSET 47         set bit 47 in TOS word         LT8 6           push a literal 6         STAG            set tag=6 on the BLOCKEXIT marker word Line 4: ONE             push a literal 1 onto the stack         NAMC (2,2)      Name Call: push an IRW for I onto the stack         STOD            Store Destructive: store the 1 to the (2,2)                         address; pop both the IRW and literal 1 Line 5: (nothing) Line 6: VALC (2,2)      Value Call: push a copy of the value of I         NAMC (2,3)      push an IRW to the descriptor for A         INDX            Index: pop both parameters; push an indexed copy                         descriptor for A[I] (think: address of A[I])         VALC (2,2)      push another copy of I         ONE             push a literal 1         SUBT            Subtract: pop both parameters and push the                         value of I-1         NAMC (2,3)      push IRW to the descriptor for A         NXLV            index and load value: index A by I-1; pop both                         parameters and push the value of A[I-1]         VALC (2,2)      push a copy of the value of I         ADD             pop top two values; push the value of A[I-1]+1         STOD            store A[I-1]+I in A[I]; pop both addr & value Line 7: VALC (2,2)      push a copy of the value of I         ONE             push a literal 1         ADD             pop top two values and push the value of I+1         NTGR            integerize TOS word with rounding         NAMC (2,2)      push an IRW for I onto the stack         STON            Store Nondestructive: store I+1 back into I;                         pop the address but leave value I+1 on TOS         LT8 100         push a literal 100 onto the stack         GRTR            Compare Greater: [(I+1) > 100]; pop both values;                         push result of comparison: 1 if true, 0 if false         BRFL 4:4        branch false: if low-order bit of TOS word is 0,                         branch to Line 6 (word 4, syllable 4 in the                         current code segment); pop the TOS word Line 8: MKST            construct and push a MSCW (Mark Stack Control                         Word) in preparation for a procedure call         NAMC (1,4)      push an IRW to the PCW for the MCP's BLOCKEXIT                         procedure (actually, for an MCP intrinsic, it                         would be an IRW to an SIRW in the Segment                         Dictionary to the PCW in the MCP stack for                         BLOCKEXIT). This procedure is not passed any                         parameters, but if it were, they would be pushed                         into the stack at this point.         ENTR            Enter: call the BLOCKEXIT procedure         EXIT            Exit Procedure: cut back the stack (thus                         destroying this activation record), and exit the                         outer block of the program (this exits back into                         an MCP procedure which terminates the task and                         disposes of the stack's memory and related                         resources) When this program is initiated, the MCP reads some information from the codefile that tells it how to set up the data stack, including a recommended initial size. If the Segment Dictionary is not already present (due to another task executing the same codefile), a "code stack" is allocated for the Segment Dictionary and its image is loaded from the codefile. There is a base area of the data stack that the MCP uses for task management, which it also sets up. No program globals are loaded, however -- this will be done by stack-building code generated by the compiler for the outer block's data segment (as is shown for I and A in the example above). Instead, the MCP creates a dummy stack frame that makes it appear as if this task has called a procedure, but the return address from that call is set up as the entry point to the outer block's code segment. The MCP also constructs a TOSCW (Top of Stack Control Word) at the base of the stack, which tells the hardware how to find the top of stack and the base of the top stack frame. From that, the processor can reconstruct all of the stack linkage, D registers, return instruction address, and so forth. After building the initial stack image, the MCP simply links the task into the READYQ, the prioritized list of tasks waiting for a processor. Once the task rises to the top of this queue, a processor is assigned to it, at which point the processor "exits" into the entry point. The B6700 has a instruction, MVST (Move to Stack) that switches the processor from its current stack to another one. This instruction constructs a TOSCW for the current stack and uses the TOSCW for the new stack to reconstruct stack linkage and registers settings. Later systems did context switching in different ways, but it appears that on the current Libra systems, MVST is once again how it's done. Note that once the MCP sets up the initial stack image and releases the new task to the READYQ, all further saving and restoring of registers and other state information is handled automatically by the hardware. Since all registers have specific purposes (i.e., there are no general- purpose registers being used who knows when and for what), the hardware knows when the value of a register needs to be pushed into memory or recalled. This applies not only to context switches between tasks, but also to all procedure calls. Hardware interrupts are implemented as a forced procedure call on the stack that currently has the interrupted processor, so the same state-saving mechanism is used for interrupts as well. Back on the Algol example above, the very first thing that happens when the processor exits to the entry point is a Presence bit interrupt as it detects the Presence bit is zero in the descriptor for the outer block's code segment. Execution continues once this code segment is made present. The stack-building code at the beginning of the outer block creates the local variables for the stack frame and pushes them onto the stack. In the case of integer I, this is simply a literal zero; in the case of array A, the code constructs an untouched data descriptor of length 100. The 100 words of memory for the array will not be allocated until the descriptor is first "touched" and its zero Presence bit detected. This will happen the first time the NXLV (Index and Load Value) instruction is executed in Line 6. Note that the INDX (Index) instruction executed earlier does not cause a Presence bit interrupt, since it only generates an indexed copy descriptor and does not attempt to access an array element. The INDX instruction effectively acts as a "load address" instruction. Bounds checking takes place on both INDX and NXLV, however. The program then proceeds to initialize the value of I (some compilers would fold this assignment into the stack-building code, but Algol does not), and execute the DO loop that iterates through the elements of array A. To someone used to register-based architectures, this code probably looks like it generates a lot of memory accesses -- all those VALC and index operators, not to mention the stack pushes and pops. On the B6700 that certainly was true, as there was essentially no caching, except for two TOS registers. More recent implementations use caching extensively, however, and most of the apparent memory references would stay inside the processor. Another thing that may be apparent is that Algol does very little optimization. It is a one-pass compiler and, for better or worse, emits instructions in pretty much a what-you-code-is-what-you-get manner. This program contains an intentional bug, which becomes apparent on the last iteration of the DO loop. The value of I is 100, which is greater than the upper bound of array A. When I compiled this and ran it on an MCP system, I got the following message from the MCP:           2228 MSRHI3:INVALID INDEX @ (00100600)           2228 HISTORY: 003:0001:3 (00100600).     F-DS  2228 (PAUL)OBJECT/SIMPLE/ALGOL ON OPS. This indicates a bounds violation on line 100600 (a sequence number that is part of the source file line) at code segment address 003:0001:3, which is the INDX instruction for Line 6 in the example above. This bounds checking is not due to any debug or diagnostic mode I enabled for the compiler or the object code -- it's implicit in the segmented addressing mechanism for the architecture and cannot be turned off. The value 2228 is the MCP-assigned task number. F-DS indicates the program was terminated (discontinued, or "DS-ed" in MCP parlance) due to a fault interrupt. Although it is not apparent from this example, the HISTORY line is a trace of return addresses -- it shows the history of procedure calls that got to the point where the fault occurred. Assuming this bug did not exist (i.e., the comparison on line 7 was "> 99" instead of ">100"), the loop would have terminated when the value of I reached 100 and control would have fallen into the END statement for the block. The NTGR instruction for line 7 is due to the numeric format used with all MCP systems since the B5000 -- integers are implemented as a subset of floating-point values, and integer overflow generates a floating-point result. NTGR normalizes the TOS value as an integer and generates a fault interrupt if it exceeds the limits of integer representation (+/- 2**39-1). The call to BLOCKEXIT at the end of the code for the block is generated by the compiler to dispose of any complex objects (arrays, files, etc.) that were declared in this stack frame. The compiler generates a tag-6 marker word at the end of stack-building code that serves as a parameter to BLOCKEXIT. This marker word contains a bitmask indicating which types of resources BLOCKEXIT should look for. Failure to call BLOCKEXIT when required would result in memory leaks, and the presence of this call is another example of the trust the system places in its compilers. More recent E-mode levels include a "blockexit bit" in one of the stack linkage words that can be used by the MCP to enforce proper disposal of stack frame resources before the frame can be exited. A Series Enhancements to the Descriptor Mechanism. The Address field of data and code segment descriptors is 20 bits wide, which allows for a total of 1048576 words (6 MB). The B6700 has the same maximum physical memory size, so the field width was adequate. In the late 1960s and early '70s 1 MW seemed near infinite, but as systems became larger through the '70s (and especially as the use on-line applications and data bases grew during this period), this upper limit on physical memory of 1 MW became grossly inadequate. The B6800/7800/6900/7900 implemented a somewhat crude paging technique (the infamous "Global(tm)Memory") that helped somewhat on multi-processor systems, but the physical address space for a given processor at any one time remained at 1 MW. The A Series models introduced starting in the early 1980s addressed this problem by implementing a concept known as ASD (Actual Segment Descriptor). Heretofore, the "mom" descriptor in a program's data stack was the owner of a memory area and pointed directly to it when the segment was present in memory. There was no room to expand the address field in the 48 bit descriptor word, so the role of owner was moved from the mom to a central ASD table in memory. Instead of a real memory address, the Address field in descriptors now holds an index into this ASD table. On the latest processors, each entry in this table contains eight 48-bit words, of which only the first three are used by the hardware. The actual location, length, and status of each allocated memory area is now stored in these table entries, hence the ASD name. It functions similarly to the page table in other virtual memory architectures, except that the "pages" are variable-length segments. Most processors use caching to reduce the incidence of real memory accesses to this table, effectively implementing a form of TLB. The concept of "mom" and "copy" descriptors no longer exists in the architecture, at least not in anything like the way it was in the B6700. All descriptors in data segments (except untouched descriptors) point to an ASD table entry. In fact, since E-mode level Gamma introduced the use of four-bit tags (initially for the A11 and A16 models in the early 1990s), tag-5 words are used only to represent untouched descriptors. Once the area is allocated, descriptors accessible to user-level code (now called "virtual segment descriptors" or VSDs) use tag values of C, D, E, and F to identify various combinations of indexed/unindexed and paged/unpaged descriptors. The ASD index field in these VSD words is currently 23 bits in length, allowing for a maximum of just over 8 million segments in the system. The maximum length of a segment is still limited to 2**20-1 words, although it appears this could be extended. The physical address field in the ASD is currently 36 bits, allowing a maximum physical memory space of 64 GW or 384 GB. The large Libra systems I have encountered recently have 4-8 GW of physical memory, which appears to be more than adequate. With so much physical memory available, most systems today run with large amounts of the memory space unallocated, and almost no overlay due to memory allocation pressure takes place. As a result, the Presence bit mechanism now largely serves as an allocate-on-first-reference capability. Should you fill up the physical memory, however, the automated overlay ("virtual memory") mechanism will still do its thing. The size of the ASD table is established at system initialization time, based on the total size of physical memory and a factor that is settable by the system administrator. Running out of ASD table entries is a no- no, and causes the system to halt. A serious issue with the B6700 design was management of copy descriptors. Every time the presence, location, or size of a segment changed, not only did the mom descriptor need to be fixed up, but all of the copies, too. The only way to find those copies was to search for them, so copy descriptors were only permitted in stacks. There were (still are) special instructions to search for these copies, and a considerable software investment was made to minimize the number of stacks that needed to be searched, but stack-search overhead could be fierce, especially on a system that was near or beyond the thrashing point. The ASD implementation considerably helped this situation. Copies no longer contain real addresses to the data area or to the mom -- instead they point to the ASD table entry for the segment. This table index does not change through the life of the segment, so copies no longer need to be found and fixed up when the location, size, or status of a segment changes. One of the nicest aspects of the ASD implementation was that it had essentially no impact on user applications. Since descriptors are managed by the MCP, the details of how the indexing instructions compute real addresses are opaque to user-level code. It was necessary to recompile some programs, although not specifically to support the ASD addressing changes -- the B6700-era compilers emitted code that would access fields of descriptors (e.g., to determine the length of an area in Algol, you could obtain a tagless copy of the descriptor and isolate bits [39:20]). Starting with E-mode level Gamma, the length field was not even in the VSDs accessible by user-level code anymore, so new instructions (e.g., GLEN to determine the length of a segment) were implemented to perform these functions, and the old methods (along with the Algol syntax that supported them) were deprecated. Some other model- specific instruction sequences (such as directly accessing processor registers) were eliminated at the same time, all of which improved the security of the system and reduced somewhat the reliance on trusted compilers. That reliance was not eliminated entirely, however. There was a lengthy transition period that allowed users to recompile their programs so that their codefiles would be compliant with newer processors. Issues with Descriptors and the MCP Architecture in General. The first thing almost everyone comments on when first being exposed to the stack- and descriptor-based architecture of MCP systems is the memory access overhead of push/pop on the stack and of having to index through descriptors to reach data. The second thing that gets comments is the lack of user-accessible registers. There is no question that these characteristics of the architecture add overhead and (at least in the myopic view in which most seem to consider performance issues) degrade performance. This overhead is at least partially offset, however, by:     * the lack of unnecessary state saving,     * the efficiencies resulting from variable-length memory segments,     * the efficiencies resulting from unnecessary memory allocation by       delaying it until first reference,     * the efficiencies resulting from code and data segments being       closely related to language objects,     * the efficiencies resulting from being able safely to access data       and code across addressing environments and inter-task       (marshalling data across process boundaries is a foreign concept       in the MCP),     * the efficiencies in context switching,     * the efficiencies in interrupt handling, and     * the efficiencies resulting from hardware and operating system       environments that were designed specifically for each other. In an I/O-intensive, transaction-server environment (which is what the MCP systems are designed for), this performance trade-off balances out better than you might think. Where the architecture loses at the micro level of instruction performance, it gains at the macro level of system performance. For a server, that's what you want. Need to do high- performance numerical computation? MCP systems are probably not the ones you should consider using. Need to do high-performance transaction processing and safely balance the needs of hundreds or thousands of tasks competing for processors, memory, and I/O paths? MCP systems do quite well in that solution space. There is another aspect to performance that I do not think is considered often enough -- reliability performance. The lack of low-level bounds checking in other systems terrifies me, and it should terrify you. The idea that bound violations can be prevented simply by programmers "being careful" is both silly and irresponsible. Giving addresses to programmers is like giving whiskey and car keys to teenagers -- sooner or later something stupid is going to happen, and it's probably going to be sooner. I say this being a programmer. The current problems we have with malware are largely due to unchecked memory accesses and allowing data to be treated as code. These are problems that MCP systems simply do not have. As I have said before in this space, there is a cost to using descriptors and hardware-enforced bounds protection. This is also a cost to not using descriptors and hardware-enforced bounds protection. In my opinion, the MCP architecture has two major problems -- and neither of them relates to performance. The first is the reliance on trusted compilers. As discussed briefly above, tweaks to the architecture over the past 30 years have improved this situation, but the security of the system is still too dependent on the quality of code that the compilers generate. Barring a social engineering attack, it is quite difficult get untrusted code into the system in a form that can be executed. Once an untrustworthy compiler is authorized, though, havoc is possible. I am not aware that penetration of untrusted code has ever been a problem since the early B6700 days when some major holes were exposed in the enforcement of codefile integrity, but this is too ripe and area for potential abuse, and one that requires enforcement in too many parts of the system, to be considered an adequate aspect of the architecture. The second problem is that the segmentation, addressing, and memory management mechanisms of the system are built for hierarchical, block- structured languages such as Algol and Pascal. Memory objects effectively "belong" to the block that declared them, and are automatically deallocated when that block exits. This approach also works fine for COBOL, and is adequate for FORTRAN (at least through FORTRAN-77). It works poorly, though, for languages which rely on heap- based memory management, where an object can have a life after its originating environment no longer exists. The MCP compilers and operating system go to great lengths to prevent "up-level pointers" -- the existence of references to a locally- allocated segment that can be stored in a more global scope. The system does not have an efficient way to locate and invalidate such up-level references when the locally-allocated segment is deallocated, so their use is prohibited. Current MCP systems support a C compiler, but the performance of its code is not all that good, partly because C is basically a high-level assembler for register-based, flat-address architectures (to which the MCP architecture is a nearly complete antithesis), and partly because the C heap is currently implemented as a series of array rows, with C pointers implemented using integer offsets into the runtime environment's heap space. It works, but the result is not very efficient. The problem is even worse for object-oriented languages such as Java. A descriptor-based capability architecture should be a good fit with an object-oriented language (a descriptor, after all, is a primitive form of object), but the current MCP architecture is too closely tied with the Algol memory model to work well natively with Java. I consider this to be both a real shame, and a real threat to the future viability of the MCP architecture. Those issues aside, I think the MCP architecture is still the most interesting, if least appreciated, one on the market today. There are other interesting aspects of the architecture that I have either passed over quickly (such as stack linkage and cross-stack addressing) or ignored altogether (such as the process model and the concepts of task families and synchronous and asynchronous dependent tasks). Then there are the MCP stack, the stack vector, procedure calls, accidental entries ("thunks"), parameter passing, string and data movement instructions, server libraries, and connection libraries. These are all also well integrated with the segmentation, stack, and addressing issues discussed above. In fact, one of the endlessly fascinating things about this architecture to me is that, for all of its complexity, everything fits into a nicely integrated whole. It's quite elegant, really. For those willing to RTFM, the Unisys support web site allows free access to essentially all of the documentation for current systems. You can access this through the front door by going to http://support.unisys.com/ and clicking the "Access documentation" link at very bottom of page. Click through the agreement and you will be presented with a page that can search the documentation. You can also access documents directly if you know their URL. Here are some useful ones: ClearPath Libra 680/690 (latest version of E-mode level Epsilon spec): http://public.support.unisys.com/c71/docs/Libra680-1.1/68787530-004.pdf ClearPath NX6820/6830 (E-model level Delta spec): http://public.support.unisys.com/aseries/docs/Hardware/70205547-001.pdf Current Algol language reference: http://public.support.unisys.com/aseries/docs/ClearPath- MCP-11.1/PDF/86000098-507.pdf Bitsavers has a good collection of documents for the older MCP systems under:     http://www.bitsavers.org/pdf/burroughs/ In particular, you might want to look at the following documents under that URL: Narrative Description of the B5500 MCP: B5000_5500_5700/1023579_Narrative_Description_Of_B5500_MCP_Oct66.pdf B5500 Reference Manual (architecture reference): B5000_5500_5700/1021326_B5500_RefMan_May67.pdf B5500 Extended Algol: B5000_5500_5700/1028024_B5500_ExtendedAlgol_Jul67.pdf Elliot Organick's 1973 book on the MCP architecture: B5000_5500_5700/Organick_B5700_B6700_1973.pdf B6700 Reference Manual (architecture reference): B6500_6700/1058633_B6700_RefMan_May72.pdf A good paper by Hauck and Dent on the B6500/6700 stack mechanism: B6500_6700/1035441_B6500_B7500_Stack_Mechanism_1968.pdf If on first reading you don't understand this architecture, you're running about average. I will happily try to reply to questions and comments.