Part of this archived thread from comp.sys.unisys:
    https://groups.google.com/d/msg/comp.arch/GtQ5o96eMaI/HPXiRiYKPv0J


Subject:    Re: segmentation or lack thereof
From:       Paul Kimpel <paul.kimpel@digm.com>
Date:       12/9/2007, 12:16 PM
Newsgroups: comp.arch,alt.os.development,comp.sys.unisys

On 11/25/2007 11:19 AM, Louis Krupp wrote:
> Paul Kimpel wrote:
>> On 11/25/2007 7:29 AM, John L wrote:
>>>> Any comments from the B5500/6700/A-Series builders or customers?
>>>
>>> I must admit I'd forgotten about the Burroughs machines.  My
>>> impression is that they're the healthiest segmented machines around
>>> today, but they also suffer from performance and address space
issues.
>>>
>>>
>> John's last statement is so bald and completely unsupported that I
have to take issue with it. Every architecture has performance and
address space issues, depending on the problem space in which you want
to consider it.
>
> John said it was his "impression," which is something short of a claim
to knowing the one and only truth.  I'm sure he's willing to stand
corrected, just as the rest of us are open to education.
>
<snip>
>>
>> John's comment about address space is one that I really cannot
understand, though. All of the MCP machines going back to the B5500 have
had huge virtual address spaces, just not ones that are allocated all in
one piece. I'm not sure how you could compute the total virtual address
space limit on the current systems, but it's easily many trillions of
bytes PER TASK. You would hit some practical limitations, such as the
maximum size of the system's segment descriptor tables, long before
running out of virtual address space provided by the architecture. In
almost 40 years of working with these systems, I've never seen an
application on them that came even close to pressuring the virtual
address space, let alone exceeding it. There were serious PHYSICAL
address space problems with the B6700/7700 systems in the late 1970s and
early '80s, but these were resolved by the mid-80s in the A Series line,
and done so largely without impact on existing applications.
>
> Can you give an overview of how A-Series segmentation works?  "RTFM"
is OK in my case, since I have the manual...
>
> Louis
>
> (I've trimmed follow-ups, since as far as I know, Linux hasn't been
ported to the A-Series.  Of course, I'm willing to stand corrected on
that.)


First, Louis' point concerning my comment on John L's "impression" is
well taken. As I read John's sentence concerning the Burroughs machines,
the first clause was an impression, but the second was a conclusion. If
that was not what John intended, then I apologize.

Louis asked if I could give an overview of how the Burroughs
segmentation works, and John Ahlstrom has privately encouraged me to do
so. I think I know this reasonably well from a software perspective, but
as John L points out, the architecture is phenomenally complicated. The
elements of the architecture are also extremely synergistic, and it's
impossible to talk about one aspect (such as segmentation) in the
absence of others. As a result, this is a very long post -- about 9900
words.

In order to understand the context in which memory addressing and
segmentation operate, I also have to talk to some degree about stacks,
compilers, codefiles, and the very tight integration between the
hardware architecture and the MCP (operating system) architecture. This
isn't easy, so please bear with me while I lay down some groundwork and
then to try to describe (without pictures) how segmentation and memory
addressing for this very complex, but fascinating architecture work.


Background.

I will start with a short history lesson. The origin of the Burroughs
stack/descriptor architecture was the B5000 of 1962. Bob Barton is
generally credited with the conceptual design of the architecture. This
machine was a major departure from anything Burroughs had attempted
before. It was clearly influenced by the Rice University (where Barton
had spent some time) computer, particularly by the Rice system's use of
"codewords", from which the concept of descriptors arose. I believe the
Ferranti Atlas was also an influence on the B5000 design. A somewhat
updated version of the architecture was released as the B5500 in 1964
and stayed in production until 1970. At the very end there was briefly a
version called the B5700, but it was really a B5500. (The B5900
mentioned below is a contemporary of the other Bx900 models and not a
variant of the B5500.)

Burroughs began work on a successor machine, the B6500, in 1965, but it
was not released until 1969. It had a very difficult introduction to
customer use, and did not work properly until it was updated and re-
released as the B6700 in 1971. All B6500 CPUs in the field were then
replaced with B6700s.

The B6500/B6700 was a substantially different design from the B5500 --
the two instruction sets were quite different, as were the format of
descriptors and other control words. Limits on the size of both physical
and virtual address spaces were increased dramatically. A dramatically
different way of handling character-oriented data was also introduced.
About the only things that got carried over without change were the word
size (48 bits), the integer and floating point word formats, and the
six-bit character code, known as BCL. The MCP operating system was also
completely redesigned. Nonetheless, the two designs are conceptually
similar, and both the B6500/6700 hardware and system software should be
seen as refinements of those for the B5500.

The Burroughs B6700 is the basis for the Unisys ClearPath MCP systems
that are still being sold today. There have been some tweaks to the
instruction set, and a major change was made in the mid-1980s to the way
memory is addressed to accommodate still larger physical address spaces,
but the processor architecture (at least from the perspective of user-
level software) has remained essentially the same since 1970. The
architecture is now referred to internally as "E-mode" (I think the "E"
stands for "emulation"), and its formal specification has gone through a
number of levels, known as Beta, Gamma, Delta, and (the latest) Epsilon.

The machines have had a variety of marketing names over the years. Since
the various names can be confusing to those not familiar with the
product line, here is a summary:

Burroughs B5000/5500/5700 -- the original early 1960s architecture.

Burroughs B6500/6700/7700/6800/7800/B5900/6900/7900

Burroughs/Unisys A Series (MicroA, A1-A7, A9-A13, A14-A19)

Unisys ClearPath 4600/4800/5600/5800/6800

Unisys ClearPath NX4200 (the first of the models where the hardware is a
standard Intel box and the MCP architecture is emulated on the Intel
processor. This approach is currently used on all small-to-medium
performance models)

Unisys ClearPath LX5000/6000/7000 (later emulated systems)

Unisys ClearPath Libra (these are also referred to as the "CS" series;
the Libra 300, 400, and 520 models are emulated systems)

Having so many models, we need a generic name, so I'll follow the
current Unisys convention and refer to them as MCP systems
(acknowledging that other, now obsolete, Burroughs/Unisys architectures
also used operating systems named "MCP").

With that background, I will try to describe how memory addressing and
segmentation works on the B6700, since discussing that particular model
gives a good conceptual base. The mechanism was similar, but more
primitive, on the B5500. This mechanism changed with the A Series to
expand the physical address space and implement improvements in some
other areas. I'll discuss those differences later. Even though the B6700
has been obsolete for 30 years, I'll talk about it mostly in the present
tense.


Words and Tags.

The best place to start is with word formats. The B6700 (like the B5500)
has a 48-bit data word. One of the significant changes from the B5500,
however, was the addition of some extra bits to the word, called the
"tag". The B6700 and later systems through E-mode Beta had a three-bit
tag. E-mode Gamma and subsequent systems use a four-bit tag.

The tag indicates what type of information the word contains. Generally,
tags 0, 2, 4, and 6 are data types that user-level code can freely
manipulate; the rest are control types that are used by a combination of
the hardware and operating system kernel. The tag values for the B6700
are:

    0 = single-precision data word (numeric or character data)

    1 = IRW (indirect reference word) or SIRW (stuffed indirect
        reference word). These are effectively indirect addresses to
        words in a program stack. A brief discussion of what "stuffed"
        means is covered in the section on Addressing Data, below.

    2 = double-precision data word (typically numeric)

    3 = general purpose protected control word: stack linkage, code
        segment descriptor. All words in a code segment (one containing
        executable instructions) also must have tags of 3.

    4 = SIW (step index word): control word for the STBR (step branch)
        looping instruction -- a nice idea, but the performance was
        poor, and STBR is no longer supported on current systems. This
        tag value is now used by the MCP as a special marker word in
        stacks (e.g., for fault trap indicators).

    5 = data descriptor. These words are the basis for data
        segmentation and are discussed in detail below.

    6 = uninitialized operand. Words with this tag value can be
        overwritten, but reading them generates a fault interrupt. Used
        as the initial and NULL value for pointers.

    7 = PCW (program control word). The entry point to a procedure
        (subroutine). Also used as one form of dynamic branch address.

The additional tag values introduced with E-mode Gamma are primarily
used for optimizations of descriptors after a segment has been allocated
and are not significant for this conceptual discussion.

There are user-mode instructions that can read and set the tag on the
word at top of stack. You could theoretically do quite a bit of mischief
with this on the B6700, but the compilers are a trusted component of the
system, so in practice this was not a problem. In more recent systems,
the microcode carefully restricts which tag values can be set on data
words in the stack. The Set Tag instruction (STAG) running in user-level
code cannot now, for example, set tag 5 on a data word with bit 47=1
(i.e., create a present data descriptor -- see below for what this
means).

This brings up the subject of "codefiles" (files containing executable
code) and compilers. A codefile is a controlled entity under the MCP.
Codefiles can be generated only by compilers, which are programs that
must be "blessed" using a privileged MCP command. There is no assembler
for these systems, nor anything that allows a user to generate arbitrary
sequences of object code. Recompiling the source code for, say, the
COBOL-85 compiler does not generate a compiler, just an ordinary program
that inputs source code and outputs a file of instructions and other
run-time data. That file is not a codefile, however. It cannot be
executed and cannot be turned into a codefile. There is no facility for
a program (even one running with the highest privileges) to read that
file and present any of its data to the hardware as a code segment.
Words of instructions in a code segment must have tags of 3. There is no
way for user-level code to set those tags, let alone create a code
segment and branch to it. The I/O hardware has a variant of the disk
read operation that will apply tags of 3 to data as it is being read and
transferred to memory, but only the MCP has access to the physical I/O
hardware, so there is no way for user programs to initiate such an
operation.

[When I say "no way" here, I mean that the hardware/MCP architecture is
specifically designed to prohibit such a thing, and I'm not aware of
even a conceptual attack that could get around the design. There have
been some holes discovered in the past, though, and of course it's
entirely possible that there are more that have not yet been brought to
light. I won't speculate on the likelihood of their existence. To me,
the weakest part of this design is the trust that must be placed in the
compilers. If you can generate arbitrary code, or modify a valid
codefile externally (e.g., on a backup tape) and reload it, you can get
around at least some of the system's protections, but it's still not
easy).]

This discussion of codefiles and compilers highlights a basic
characteristic of MCP memory segmentation: code and data are entirely
separate entities and are allocated and managed by the system
separately. There are code segments and data segments, and while they
are allocated from the same system-global heap and may be adjacent in
physical memory, logically they are separate and addressed entirely
differently. Both types of segments can be created only by the MCP. The
contents of code segments are loaded solely from codefiles. Code
segments are read-only, and as we will see, are automatically reentrant.


Data Segment Descriptors.

Descriptors are called such because they "describe" an area of memory.
MCP systems are a form of capability architecture, and the descriptors
are the capability -- you have to have access to the descriptor to
access the data it describes. Descriptors are the basis of memory
addressing and memory segmentation.

A tag-5 word in the B6700 architecture represents a data descriptor. The
word has a number of fields which I will identify using what is called
partial-word notation, [s:n], where "s" is the starting bit number and
"n" is the length of the field in bits. Bit 47 is high order and bit 0
is low order; all MCP systems use big-endian addressing.

    [47:1]  Presence bit (commonly known as the "P-bit")

    [46:1]  Copy bit

    [45:1]  Indexed bit

    [44:1]  Paged bit

    [43:1]  Read-only bit

    [42:3]  Element Size field
              0 = single precision words
              1 = double precision words
              2 = four-bit characters (packed decimal)
              3 = six-bit characters (BCL, now obsolete)
              4 = eight-bit characters (EBCDIC)

    [39:20] Length or index field (determined by [45:1])

    [19:20] Segment starting address

Accessing the data in a segment is done by means of "indexing" the
descriptor with a zero-relative offset into the segment. There are a
series of instructions that do this, as will be detailed below.

The Presence bit indicates whether the memory area represented by the
descriptor is physically present in memory. It is the basis for the
virtual memory mechanism. "Virtual memory" is IBM's clever marketing
term that everyone has adopted, but it does not accurately describe what
goes on in MCP systems. "Automated memory segment overlay" is a more
accurate term, and is what Burroughs called it before IBM's term caught
everyone's attention.

When the Presence bit is 1, the segment is physically present in memory
starting at the real address in [19:20]. (Note that in A Series and
later systems, the real memory address is no longer contained in
descriptors, as will be discussed later).

When the Presence bit is zero, the segment is not present in memory (or
has never been allocated) and the descriptor is said to be "absent"
(although it's really the data area that is absent). Attempting to
access a segment through an absent descriptor generates a "Presence bit
interrupt" -- what other systems would call a page fault. When handling
this interrupt, the MCP interprets the value in the address field as
follows:

  * If zero, this indicates that the segment has never been allocated.
    The MCP simply allocates an area of length specified by [39:20],
    relocating or overlaying other physically-present segments as
    necessary. The MCP clears the allocated area to binary zeroes
    (primarily to wipe out any nasty tags that may be there from prior
    use of that space), fixes up the address field in the descriptor and
    sets its Presence bit. Exiting from the interrupt handler causes the
    hardware to restart the instruction which generated the interrupt.
    The compilers generate code in the initialization section of
    procedures to build these "untouched" descriptors directly in the
    program's stack. The physical memory area is not allocated until
    (and unless) the program actually references it.

  * Some types of objects require additional information from the
    compiler (e.g., for multi-dimensional array-of-arrays structures,
    the length field in the descriptor only specifies the length of the
    first dimension so certain bit patterns in the address field for
    untouched descriptors point to data structures the compiler has
    built within the codefile that carry the necessary information for
    the other dimensions).

  * Otherwise, the segment was once present in memory but has been
    rolled out, usually due to pressure from other memory allocation
    activity. The value in the address field of the descriptor indicates
    the location within the task's "overlay file" (a temporary, unnamed
    file the MCP allocates for each task) where the data associated with
    this descriptor has been rolled out. The MCP allocates an
    appropriate area of memory, reads the segment from the overlay file,
    fixes up the descriptor with the new real memory address, and exits
    the interrupt handler to restart the instruction that was
    interrupted.

The Copy bit indicates that a tag-5 word is not the original descriptor
for an area, but rather a copy of it. A non-copy descriptor is said to
be the "mom" descriptor for an area, and there can only be one such mom.
Copy descriptors are generated automatically by a number of
instructions, principally those involved with indexing and loading words
on the top of stack (e.g., to pass an array as a parameter to a
procedure). Present copies point directly to the segment's memory area;
absent copies point to the mom.

In the B6700, there is no central page table, per se, to keep track of
memory areas. The mom descriptor effectively serves the purpose of a
page table entry. Every area of physical memory (allocated or available)
is surrounded by a set of "memory link" words that are used by the MCP
memory allocation routines to keep track of allocated areas and locate
available areas of appropriate size. For allocated areas, one of these
link words points back to the mom descriptor. The handling of moms and
copies is another thing that has changed with the A Series and later
models.

The Indexed bit indicates whether the descriptor points to a whole
segment or to one element within a segment. When the bit is 0, the
length/index field contains the length of the segment. Some indexing
instructions generate an "indexed" copy descriptor with both the Indexed
and Copy bits set to 1 (all indexed descriptors are by definition
copies). Indexed descriptors are typically used as a pointer, e.g., as a
destination address for store operators or as a call-by-reference
parameter to a procedure. They are also used as starting addresses for
string manipulation instructions. Algol has a "pointer" type; it
represents an indexed data descriptor.

In an indexed descriptor, the length field is replaced by a zero-
relative offset into the segment. The physical address of that element
(assuming the segment is present in memory) is the sum of the base
address in [19:20] and the offset in the length field.

Depending on the value of the Element Size field [42:3], a descriptor
could be word oriented or character oriented. Physically, memory is
accessed as whole words. When a character-oriented descriptor is
indexed, its length field is replaced by a specially-encoded offset.
Bits [35:16] are a word offset within the segment; bits [39:4] are a
character offset within that word. This obviously limits the offset for
character pointers to 393215 for eight-bit characters and twice that for
four-bit packed decimal digits (although this limit can be effectively
eased by the use of paged areas, discussed next). The string
instructions understand these character offsets and transparently handle
character operations that start in the middle of words.

The Paged bit indicates whether the memory area for the descriptor is
monolithic (0) or paged (1). If paged (the original term for this is
"segmented", but talking about segmented segments can be confusing), the
address field does not point to the address of the data, but instead to
a vector of other descriptors. Each of these descriptors in turn points
to a data segment. In memory, this looks identical to a two-dimensional
array-of-arrays structure. The difference is in how the software
accesses the data. With a standard two-dimensional array, the software
must explicitly compute and apply an index for each dimension. With a
paged segment, the software is unaware of the second dimension. When the
indexing instructions encounter a descriptor with the Paged bit set,
they partition the index value into a page number and page offset and
automatically re-index the second dimension. If an indexed copy is
generated, it is a copy of the descriptor for the page, not the original
(paged) descriptor, and the index offset will be that within the page.
This helps alleviate the smaller limit for word offset available with
indexed character descriptors.

For the B6700, the page size was 256 words. Data areas were typically
paged when they exceeded 1024 words, but the programmer had some control
over this. The page size on current systems is 8192 words. The paging
threshold is adjustable by the system administrator, and by default is
also 8192 words. As with all memory areas in the system, the individual
pages are not allocated until first referenced. The array of page
descriptors (called a "dope vector") is not even allocated until one of
the pages is initially accessed. All pages are individually relocatable
and overlayable.

String operations that run off the end of a physical memory area
generate a "segmented [paged] array" interrupt. The MCP responds to this
interrupt by locating the next page in sequence (if there is one) and
restarting the operation. If there is no next page (which would also be
the case if the string operation ran off the end of an unpaged segment),
the result is a "Segmented Array Error", with which everyone who has
programmed for MCP systems is no doubt intimately familiar. Unless
trapped, this error (a form of boundary violation) is fatal to the
program.

The Read-only bit is just that -- if 1, it marks the segment as non-
writeable. This is primarily used for segments that represent constant
pools generated by the compiler and which are loaded from the codefile.

The Element Size field, as discussed above, determines whether the
descriptor represents single precision word, double precision word, or
character-oriented data. It is quite common, especially with COBOL
programs, to have multiple descriptors with differing size fields
pointing to the same segment. Only one of these could be the mom, of
course; the rest would have to be copies. This allows the software to
address the same memory area as a mixture of word and character fields.

The B5500 strictly used six-bit characters; the B6700 was basically an
eight-bit EBCDIC machine, but could also handle six- and four-bit
characters. Support for the six-bit codes was dropped from the
architecture in the Bx900 models, ca. 1980. I understand that the latest
Libra models have added support for 16-bit characters.

The role of the Length/index and Address fields has been covered in the
discussion above, so nothing additional needs to be said about them
here.


Code Segment Descriptors.

Thus far, the discussion has been about data descriptors. There are also
descriptors for code segments. They are similar to data descriptors, but
simpler. Code segment descriptors have a tag of 3 and the Presence bit,
Copy bit, Length, and Address fields of data descriptors. Bits [45:6]
are not used. Code segments cannot be indexed or paged. They are by
definition read only. The only element size they support is single
precision words.

Code segment descriptors live in a special type of data segment called
the Segment Dictionary. An image of this segment is built by the
compiler (all descriptors being in their absent, untouched form, of
course) and stored in the codefile. The Segment Dictionary is loaded by
the MCP as part of task initiation. In addition to code segment
descriptors, the Segment Dictionary may contain (read-only) data
descriptors for constant pools and scalar constant values. The Segment
Dictionary in memory is actually a type of stack, although not for push/
pop type of activity. As we will see, stacks in MCP systems are a
central element in the addressing environment, and it is for this
purpose that a Segment Dictionary is loaded as a stack. Segment
Dictionaries are also sharable -- if multiple tasks are initiated from
the same codefile, the Segment Dictionary is loaded only once and the
separate tasks are linked to this common copy. Thus, all of the object
code and read-only constant pools for a program are automatically
reentrant.

Object code is addressed using a three-part index. The first part is the
"segment number", which is the code segment descriptor's offset within
the Segment Dictionary. The second part is the word offset within the
segment. The third part is the instruction syllable (byte) offset within
the word. These numbers are zero-relative and generally written in hex,
so an address of 03C:0041:3 indicates segment #60, word offset 65, byte
offset 3.

The processor uses variable-length instructions and can branch to any
syllable offset within a segment. By using one of several dynamic branch
instructions with a Program Control Word (PCW, tag 7) as an argument, a
program can branch across segments -- in fact PCWs allow programs to
branch and call procedures across stacks. When a program branches to a
different code segment (either directly or by means of a procedure
call), the segment is made present if it is not already so, and the
segment length and base memory address are loaded into registers within
the processor designated for that purpose. Intra-segment branches use
only the word and syllable offsets, and therefore do not need to
continually reference the code segment descriptor.

The Presence bit in a code segment descriptor indicates physical
presence or absence of the segment in memory, just as for data segments.
For an absent descriptor, the address field indicates the offset within
the codefile where the segment starts. Code segments and constant pools
are not loaded from the codefile until first reference. When a program
is initiated, the Segment Dictionary is the only thing that is loaded
from the codefile. Everything else is loaded as a result of Presence bit
interrupts.

Since codefiles and code segments are guaranteed to be read only, code
segments (and constant pools loaded from the codefile) are never rolled
out to an overlay file. The physical memory area is simply deallocated
and the codefile offset restored in the absent descriptor (that offset
is stored in one of the memory link words while the segment is present
and the descriptor's address field points to the area in memory). The
code or data is simply reloaded from the codefile the next time a task
trips over the descriptor's Presence bit.


Resizing Segments.

Descriptors obviously support bounds checking, along with the dynamic
relocation and overlay of real memory areas, but they have another
significant advantage -- the dynamic resizing of data segments. Since
the length of a segment is part of the descriptor, the basis for bounds
checking is centralized. The physical memory area can be resized by a
user program at run time, the length in the descriptor will be updated,
and future indexing operations will check against the new length.

The B6700 and all later systems support the programmatic resizing of
data segments. Code segments are immutable, so resizing them is not a
meaningful operation. Support for resizing varies by language, but in
Algol, it is performed using the RESIZE intrinsic procedure.

    RESIZE(A,N) resizes the array A to a new length of N. The unit of N
    is determined by the Element Size field in the descriptor. The old
    segment is deallocated and its contents are lost. The segment with
    the new length will not be allocated until it is next referenced.

    RESIZE(A,N,RETAIN) allocates a new segment of length N and copies up
    to N units from the old segment to the new one. If the new length is
    shorter, any remaining units from the old segment are lost; if the
    new length is longer, the remainder of the segment is filled with
    binary zeros. Once the data is copied to the new segment, the old
    segment is deallocated.

    RESIZE(A,N,PAGED) is similar to the RETAIN option, but creates the
    new data segment with a paged data descriptor.

RESIZE works on individual segments. In the case of multi-dimensional
array-of-arrays structures, the row for each final dimension can be
resized independently and to differing lengths.

Actually, RESIZE can be applied to any of the dimensions of a multi-
dimensional array. Given an Algol declaration of

    ARRAY M [0:99, 0:49, 0:63, 0:4095];

RESIZE(M[4,7,*,*], 75, RETAIN) would resize the M[4,7] dope vector in
the third dimension from its original length of 64 to a new length of
75. This would create 11 new, untouched descriptors of length 4096 at
the end of that dope vector.

These resizable array-of-arrays structures are often used, especially in
Algol, to implement flexible, safe, dynamic storage allocation schemes
for user programs.


Addressing Data -- Indexing and the Stack.

Note that in the discussion thus far, nothing has been said about user-
level programs having access to memory addresses -- real or virtual.
Data descriptors for the B6700 can hold real memory addresses, but the
content of descriptors is controlled only by hardware instructions and
the MCP, not user-level code. User-level code (and all but small parts
of the MCP kernel, for that matter) does not manipulate addresses, it
manipulates offsets into memory areas. Separate memory areas generally
relate to separate objects for a programming language (e.g., arrays for
Algol, Pascal, and Fortran, or 01 levels for COBOL).

Since there are no addresses, there can be no invalid addresses. You can
try to use an invalid offset, of course, but that offset must be applied
against a descriptor, where it is ALWAYS checked against the length
field. If the offset is less than zero or greater than or equal to the
length, your task becomes the happy recipient of the "Invalid Index"
bounds violation interrupt. Depending on the language you are using, you
can trap this interrupt (actually, what you trap is the MCP's response
to the interrupt), but you can't restart the operation which caused it
-- your only options are to branch to a recovery routine or enter the
catch portion of a try/catch block. If you do not trap the interrupt,
your task is terminated by the MCP.

Another thing that has not been mentioned yet is registers. There are
certainly registers in the processor -- the B6700 had dozens of them --
but the instruction set accesses them implicitly. None of them are
accessed directly by user-level code. (Actually, on the B6700 that
wasn't quite true -- compilers would generate code to access some of the
registers directly, but this was not common. Access to these registers
has been tightened up considerably on more recent systems). Instead of
loading addresses directly into registers and using those to read and
store real or virtual memory locations, user-level programs on MCP
systems access data in two ways: (a) directly in a stack or (b) outside
of a stack by going through a descriptor that is in a stack.

A stack in the MCP architecture is simultaneously three things:

    (a) a memory area for push/pop expression evaluation,
    (b) the stack frames (activation records) and return history for
        procedure calls, and
    (c) the basis for addressing global and local variables in a
        program.

The B5000 and all of its descendants were designed to support Algol. To
understand how stack addressing works, you need to understand how Algol
programs can be structured. If you don't know Algol, think Pascal --
they're very similar.

An Algol program is constructed as a series of nested blocks. Each block
can contain local variables (including procedure declarations), but also
has addressability to the variables in all of its more global (i.e.,
containing) blocks. The body of a procedure can be considered a block,
whether it has local declarations or not. The scope rules for
identifiers in this scheme are the same as for a two-level language such
as C, it's just that there can be more levels. In MCP systems, this
nesting is referred to as the "lexicographical level" (lex level or LL).
The global code ("outer block") of most user programs runs at LL=2.
First-level procedures (such as you would have in a C program) run at
LL=3. Any procedures declared within those first-level procedures
(something you can't do in C) would run at LL=4, and so forth.

The Segment Dictionary is loaded as a stack with an environment of LL=1.
Therefore, multiple tasks initiated from the same codefile see their
code segments and constant pools as globals at a higher level -- hence
the Segment Dictionary and all of the segments based on it are
reentrant. The MCP stack (another that is purely an addressing
environment, not a data space for a task) is at LL=0. Note that stack
addressing can cross stacks. Also note that on the B6700, all MCP
globals were in the scope of the user programs. This allowed them to
call MCP service procedures directly, as if they were global to the user
program (which, in fact, they were). There is no "service call" or
"branch communicate" or other major environment change to access O/S
services -- user programs simply call what to them are global
procedures.

This accessibility to the MCP stack was recognized as a serious security
issue fairly early on, and later systems blocked direct access to LL=0.
User programs now still call MCP service routines as normal procedures,
but access to the procedure entry points and other global objects is
provided through indirect addresses in the Segment Dictionary. These
indirect addresses are initially set up to cause a fault interrupt when
first accessed, at which point the MCP verifies the access and replaces
the intentionally bad word with a valid SIRW (tag=1, Stuffed Indirect
Reference Word) to the MCP global. Subsequent references to the service
routine merely require a one-level indirection to reach the entry point.

The "stuffed" for an SIRW refers its ability to address a word in a
stack outside the current scope chain, and in particular, across stacks.
What gets stuffed in the word is sufficient environment information to
allow out-of-scope, cross-stack addressing. A normal IRW (also tag=1)
only addresses within the current scope chain. There is an instruction
(STFF) that converts a normal IRW to an SIRW. Perhaps inevitably, it is
commonly called the "get stuffed" operator. (Current systems no longer
generate normal IRWs -- all reference words are now generated as SIRWs
unconditionally, so STFF has become a no-op).

The local variables for a block are allocated in the stack. In the case
of procedures, this provides efficient recursion. Simple blocks (i.e.,
nested BEGIN/ENDs containing declarations) are implemented as
parameterless procedure calls. Therefore, more-global variables are
lower in the stack, or possibly in a separate stack (note that unlike
some systems, MCP stacks grow from low addresses to high addresses -- a
push increments the top-of-stack address, not decrements it).

Addressing within the current scope chain is a very common thing to do,
so MCP systems provide a series of base registers, called the "D" (for
"display") registers. There is one D register for each lex level, and it
contains the absolute (real) memory address to the base of a block's
stack frame. The B6700 had 32 such D registers, but this proved (for
once) to be more than necessary. Later systems cut back to 16 D
registers (allowing user procedures nested 13 levels deep -- I doubt
that I've ever coded anything that goes more than four levels deep). The
B5900, which was the first microcoded processor and was based on bit-
slice chips, tried to get by with four D registers (0, 1, 2, and current
LL), but that didn't work too well, and that approach was abandoned in
later designs.

The simplest addressing mode for MCP systems is based on these D
registers and uses a construct known as an "address couple". The address
couple has two fields, LL (which selects a D register) and an offset
from the address in that D register. This is written "(LL,offset)" --
thus (2,17) refers to the 17-th (zero relative) word in the global
(LL=2) address space of a program. For an Algol program, this address
space would be the outer block; for COBOL, it would be WORKING-STORAGE;
for FORTRAN, it would be COMMON; for a C program it would be the
environment for static declarations.

With that introduction to stack addressing, here is the key concept:
scalar variables are allocated in the stack; non-scalar variables
(arrays, structures, records, whatever) are allocated outside the stack.
A descriptor is allocated in the stack to access that non-scalar area.
To illustrate this, here is a simple Algol program:

    1:  BEGIN
    2:  INTEGER I;          % I is allocated at (2,2)
    3:  ARRAY A[0:99];      % descriptor for A at (2,3)
    4:  I:= 1;
    5:  DO
    6:    A[I]:= A[I-1]+I
    7:  UNTIL I:= I+1 > 100;
    8:  END.

The stack offset for I starts at 2 because there are two linkage words
at the base of a stack frame for (a) the procedure return address and
(b) a control word called the MSCW (Mark Stack Control Word) that allows
the processor to reconstruct the D registers to the values for the
calling environment upon procedure (or in this case, block) exit.

Here is the code that Algol would generate for this snippet (note that
this is slightly idealized and out of order from what the current Algol
compiler would generate, but both the concept and effect are accurate).

Line 1: (nothing)

Line 2: ZERO            push a zero value onto the stack at (2,2)

Line 3: LT48 000006400000 push a skeletal descriptor with length=100
                          onto the stack at (2,3)
        LT8 5           push a literal 5 onto the stack
        STAG            set a tag=5 onto the skeletal descriptor; this
                        leaves an untouched descriptor at (2,3)
        PUSH            flush the top-of-stack (TOS) words into memory

        LT16 2048       push literal 2048 (this is a marker word for the
                        MCP BLOCKEXIT procedure called at the end)
        BSET 47         set bit 47 in TOS word
        LT8 6           push a literal 6
        STAG            set tag=6 on the BLOCKEXIT marker word

Line 4: ONE             push a literal 1 onto the stack
        NAMC (2,2)      Name Call: push an IRW for I onto the stack
        STOD            Store Destructive: store the 1 to the (2,2)
                        address; pop both the IRW and literal 1

Line 5: (nothing)

Line 6: VALC (2,2)      Value Call: push a copy of the value of I
        NAMC (2,3)      push an IRW to the descriptor for A
        INDX            Index: pop both parameters; push an indexed copy
                        descriptor for A[I] (think: address of A[I])
        VALC (2,2)      push another copy of I
        ONE             push a literal 1
        SUBT            Subtract: pop both parameters and push the
                        value of I-1
        NAMC (2,3)      push IRW to the descriptor for A
        NXLV            index and load value: index A by I-1; pop both
                        parameters and push the value of A[I-1]
        VALC (2,2)      push a copy of the value of I
        ADD             pop top two values; push the value of A[I-1]+1
        STOD            store A[I-1]+I in A[I]; pop both addr & value

Line 7: VALC (2,2)      push a copy of the value of I
        ONE             push a literal 1
        ADD             pop top two values and push the value of I+1
        NTGR            integerize TOS word with rounding
        NAMC (2,2)      push an IRW for I onto the stack
        STON            Store Nondestructive: store I+1 back into I;
                        pop the address but leave value I+1 on TOS
        LT8 100         push a literal 100 onto the stack
        GRTR            Compare Greater: [(I+1) > 100]; pop both values;
                        push result of comparison: 1 if true, 0 if false
        BRFL 4:4        branch false: if low-order bit of TOS word is 0,
                        branch to Line 6 (word 4, syllable 4 in the
                        current code segment); pop the TOS word

Line 8: MKST            construct and push a MSCW (Mark Stack Control
                        Word) in preparation for a procedure call
        NAMC (1,4)      push an IRW to the PCW for the MCP's BLOCKEXIT
                        procedure (actually, for an MCP intrinsic, it
                        would be an IRW to an SIRW in the Segment
                        Dictionary to the PCW in the MCP stack for
                        BLOCKEXIT). This procedure is not passed any
                        parameters, but if it were, they would be pushed
                        into the stack at this point.
        ENTR            Enter: call the BLOCKEXIT procedure
        EXIT            Exit Procedure: cut back the stack (thus
                        destroying this activation record), and exit the
                        outer block of the program (this exits back into
                        an MCP procedure which terminates the task and
                        disposes of the stack's memory and related
                        resources)

When this program is initiated, the MCP reads some information from the
codefile that tells it how to set up the data stack, including a
recommended initial size. If the Segment Dictionary is not already
present (due to another task executing the same codefile), a "code
stack" is allocated for the Segment Dictionary and its image is loaded
from the codefile. There is a base area of the data stack that the MCP
uses for task management, which it also sets up. No program globals are
loaded, however -- this will be done by stack-building code generated by
the compiler for the outer block's data segment (as is shown for I and A
in the example above). Instead, the MCP creates a dummy stack frame that
makes it appear as if this task has called a procedure, but the return
address from that call is set up as the entry point to the outer block's
code segment.

The MCP also constructs a TOSCW (Top of Stack Control Word) at the base
of the stack, which tells the hardware how to find the top of stack and
the base of the top stack frame. From that, the processor can
reconstruct all of the stack linkage, D registers, return instruction
address, and so forth. After building the initial stack image, the MCP
simply links the task into the READYQ, the prioritized list of tasks
waiting for a processor. Once the task rises to the top of this queue, a
processor is assigned to it, at which point the processor "exits" into
the entry point.

The B6700 has a instruction, MVST (Move to Stack) that switches the
processor from its current stack to another one. This instruction
constructs a TOSCW for the current stack and uses the TOSCW for the new
stack to reconstruct stack linkage and registers settings. Later systems
did context switching in different ways, but it appears that on the
current Libra systems, MVST is once again how it's done.

Note that once the MCP sets up the initial stack image and releases the
new task to the READYQ, all further saving and restoring of registers
and other state information is handled automatically by the hardware.
Since all registers have specific purposes (i.e., there are no general-
purpose registers being used who knows when and for what), the hardware
knows when the value of a register needs to be pushed into memory or
recalled. This applies not only to context switches between tasks, but
also to all procedure calls. Hardware interrupts are implemented as a
forced procedure call on the stack that currently has the interrupted
processor, so the same state-saving mechanism is used for interrupts as
well.

Back on the Algol example above, the very first thing that happens when
the processor exits to the entry point is a Presence bit interrupt as it
detects the Presence bit is zero in the descriptor for the outer block's
code segment. Execution continues once this code segment is made
present.

The stack-building code at the beginning of the outer block creates the
local variables for the stack frame and pushes them onto the stack. In
the case of integer I, this is simply a literal zero; in the case of
array A, the code constructs an untouched data descriptor of length 100.

The 100 words of memory for the array will not be allocated until the
descriptor is first "touched" and its zero Presence bit detected. This
will happen the first time the NXLV (Index and Load Value) instruction
is executed in Line 6. Note that the INDX (Index) instruction executed
earlier does not cause a Presence bit interrupt, since it only generates
an indexed copy descriptor and does not attempt to access an array
element. The INDX instruction effectively acts as a "load address"
instruction. Bounds checking takes place on both INDX and NXLV, however.

The program then proceeds to initialize the value of I (some compilers
would fold this assignment into the stack-building code, but Algol does
not), and execute the DO loop that iterates through the elements of
array A. To someone used to register-based architectures, this code
probably looks like it generates a lot of memory accesses -- all those
VALC and index operators, not to mention the stack pushes and pops. On
the B6700 that certainly was true, as there was essentially no caching,
except for two TOS registers. More recent implementations use caching
extensively, however, and most of the apparent memory references would
stay inside the processor.

Another thing that may be apparent is that Algol does very little
optimization. It is a one-pass compiler and, for better or worse, emits
instructions in pretty much a what-you-code-is-what-you-get manner.

This program contains an intentional bug, which becomes apparent on the
last iteration of the DO loop. The value of I is 100, which is greater
than the upper bound of array A. When I compiled this and ran it on an
MCP system, I got the following message from the MCP:

          2228 MSRHI3:INVALID INDEX @ (00100600)
          2228 HISTORY: 003:0001:3 (00100600).
    F-DS  2228 (PAUL)OBJECT/SIMPLE/ALGOL ON OPS.

This indicates a bounds violation on line 100600 (a sequence number that
is part of the source file line) at code segment address 003:0001:3,
which is the INDX instruction for Line 6 in the example above. This
bounds checking is not due to any debug or diagnostic mode I enabled for
the compiler or the object code -- it's implicit in the segmented
addressing mechanism for the architecture and cannot be turned off.

The value 2228 is the MCP-assigned task number. F-DS indicates the
program was terminated (discontinued, or "DS-ed" in MCP parlance) due to
a fault interrupt. Although it is not apparent from this example, the
HISTORY line is a trace of return addresses -- it shows the history of
procedure calls that got to the point where the fault occurred.

Assuming this bug did not exist (i.e., the comparison on line 7 was ">
99" instead of ">100"), the loop would have terminated when the value of
I reached 100 and control would have fallen into the END statement for
the block. The NTGR instruction for line 7 is due to the numeric format
used with all MCP systems since the B5000 -- integers are implemented as
a subset of floating-point values, and integer overflow generates a
floating-point result. NTGR normalizes the TOS value as an integer and
generates a fault interrupt if it exceeds the limits of integer
representation (+/- 2**39-1).

The call to BLOCKEXIT at the end of the code for the block is generated
by the compiler to dispose of any complex objects (arrays, files, etc.)
that were declared in this stack frame. The compiler generates a tag-6
marker word at the end of stack-building code that serves as a parameter
to BLOCKEXIT. This marker word contains a bitmask indicating which types
of resources BLOCKEXIT should look for. Failure to call BLOCKEXIT when
required would result in memory leaks, and the presence of this call is
another example of the trust the system places in its compilers. More
recent E-mode levels include a "blockexit bit" in one of the stack
linkage words that can be used by the MCP to enforce proper disposal of
stack frame resources before the frame can be exited.


A Series Enhancements to the Descriptor Mechanism.

The Address field of data and code segment descriptors is 20 bits wide,
which allows for a total of 1048576 words (6 MB). The B6700 has the same
maximum physical memory size, so the field width was adequate. In the
late 1960s and early '70s 1 MW seemed near infinite, but as systems
became larger through the '70s (and especially as the use on-line
applications and data bases grew during this period), this upper limit
on physical memory of 1 MW became grossly inadequate. The
B6800/7800/6900/7900 implemented a somewhat crude paging technique (the
infamous "Global(tm)Memory") that helped somewhat on multi-processor
systems, but the physical address space for a given processor at any one
time remained at 1 MW.

The A Series models introduced starting in the early 1980s addressed
this problem by implementing a concept known as ASD (Actual Segment
Descriptor). Heretofore, the "mom" descriptor in a program's data stack
was the owner of a memory area and pointed directly to it when the
segment was present in memory. There was no room to expand the address
field in the 48 bit descriptor word, so the role of owner was moved from
the mom to a central ASD table in memory. Instead of a real memory
address, the Address field in descriptors now holds an index into this
ASD table. On the latest processors, each entry in this table contains
eight 48-bit words, of which only the first three are used by the
hardware. The actual location, length, and status of each allocated
memory area is now stored in these table entries, hence the ASD name. It
functions similarly to the page table in other virtual memory
architectures, except that the "pages" are variable-length segments.
Most processors use caching to reduce the incidence of real memory
accesses to this table, effectively implementing a form of TLB.

The concept of "mom" and "copy" descriptors no longer exists in the
architecture, at least not in anything like the way it was in the B6700.
All descriptors in data segments (except untouched descriptors) point to
an ASD table entry. In fact, since E-mode level Gamma introduced the use
of four-bit tags (initially for the A11 and A16 models in the early
1990s), tag-5 words are used only to represent untouched descriptors.
Once the area is allocated, descriptors accessible to user-level code
(now called "virtual segment descriptors" or VSDs) use tag values of C,
D, E, and F to identify various combinations of indexed/unindexed and
paged/unpaged descriptors.

The ASD index field in these VSD words is currently 23 bits in length,
allowing for a maximum of just over 8 million segments in the system.
The maximum length of a segment is still limited to 2**20-1 words,
although it appears this could be extended. The physical address field
in the ASD is currently 36 bits, allowing a maximum physical memory
space of 64 GW or 384 GB. The large Libra systems I have encountered
recently have 4-8 GW of physical memory, which appears to be more than
adequate.

With so much physical memory available, most systems today run with
large amounts of the memory space unallocated, and almost no overlay due
to memory allocation pressure takes place. As a result, the Presence bit
mechanism now largely serves as an allocate-on-first-reference
capability. Should you fill up the physical memory, however, the
automated overlay ("virtual memory") mechanism will still do its thing.

The size of the ASD table is established at system initialization time,
based on the total size of physical memory and a factor that is settable
by the system administrator. Running out of ASD table entries is a no-
no, and causes the system to halt.

A serious issue with the B6700 design was management of copy
descriptors. Every time the presence, location, or size of a segment
changed, not only did the mom descriptor need to be fixed up, but all of
the copies, too. The only way to find those copies was to search for
them, so copy descriptors were only permitted in stacks. There were
(still are) special instructions to search for these copies, and a
considerable software investment was made to minimize the number of
stacks that needed to be searched, but stack-search overhead could be
fierce, especially on a system that was near or beyond the thrashing
point.

The ASD implementation considerably helped this situation. Copies no
longer contain real addresses to the data area or to the mom -- instead
they point to the ASD table entry for the segment. This table index does
not change through the life of the segment, so copies no longer need to
be found and fixed up when the location, size, or status of a segment
changes.

One of the nicest aspects of the ASD implementation was that it had
essentially no impact on user applications. Since descriptors are
managed by the MCP, the details of how the indexing instructions compute
real addresses are opaque to user-level code. It was necessary to
recompile some programs, although not specifically to support the ASD
addressing changes -- the B6700-era compilers emitted code that would
access fields of descriptors (e.g., to determine the length of an area
in Algol, you could obtain a tagless copy of the descriptor and isolate
bits [39:20]). Starting with E-mode level Gamma, the length field was
not even in the VSDs accessible by user-level code anymore, so new
instructions (e.g., GLEN to determine the length of a segment) were
implemented to perform these functions, and the old methods (along with
the Algol syntax that supported them) were deprecated. Some other model-
specific instruction sequences (such as directly accessing processor
registers) were eliminated at the same time, all of which improved the
security of the system and reduced somewhat the reliance on trusted
compilers. That reliance was not eliminated entirely, however. There was
a lengthy transition period that allowed users to recompile their
programs so that their codefiles would be compliant with newer
processors.


Issues with Descriptors and the MCP Architecture in General.

The first thing almost everyone comments on when first being exposed to
the stack- and descriptor-based architecture of MCP systems is the
memory access overhead of push/pop on the stack and of having to index
through descriptors to reach data. The second thing that gets comments
is the lack of user-accessible registers.

There is no question that these characteristics of the architecture add
overhead and (at least in the myopic view in which most seem to consider
performance issues) degrade performance. This overhead is at least
partially offset, however, by:

    * the lack of unnecessary state saving,
    * the efficiencies resulting from variable-length memory segments,
    * the efficiencies resulting from unnecessary memory allocation by
      delaying it until first reference,
    * the efficiencies resulting from code and data segments being
      closely related to language objects,
    * the efficiencies resulting from being able safely to access data
      and code across addressing environments and inter-task
      (marshalling data across process boundaries is a foreign concept
      in the MCP),
    * the efficiencies in context switching,
    * the efficiencies in interrupt handling, and
    * the efficiencies resulting from hardware and operating system
      environments that were designed specifically for each other.

In an I/O-intensive, transaction-server environment (which is what the
MCP systems are designed for), this performance trade-off balances out
better than you might think. Where the architecture loses at the micro
level of instruction performance, it gains at the macro level of system
performance. For a server, that's what you want. Need to do high-
performance numerical computation? MCP systems are probably not the ones
you should consider using. Need to do high-performance transaction
processing and safely balance the needs of hundreds or thousands of
tasks competing for processors, memory, and I/O paths? MCP systems do
quite well in that solution space.

There is another aspect to performance that I do not think is considered
often enough -- reliability performance. The lack of low-level bounds
checking in other systems terrifies me, and it should terrify you. The
idea that bound violations can be prevented simply by programmers "being
careful" is both silly and irresponsible. Giving addresses to
programmers is like giving whiskey and car keys to teenagers -- sooner
or later something stupid is going to happen, and it's probably going to
be sooner. I say this being a programmer. The current problems we have
with malware are largely due to unchecked memory accesses and allowing
data to be treated as code. These are problems that MCP systems simply
do not have.

As I have said before in this space, there is a cost to using
descriptors and hardware-enforced bounds protection. This is also a cost
to not using descriptors and hardware-enforced bounds protection.

In my opinion, the MCP architecture has two major problems -- and
neither of them relates to performance. The first is the reliance on
trusted compilers. As discussed briefly above, tweaks to the
architecture over the past 30 years have improved this situation, but
the security of the system is still too dependent on the quality of code
that the compilers generate. Barring a social engineering attack, it is
quite difficult get untrusted code into the system in a form that can be
executed. Once an untrustworthy compiler is authorized, though, havoc is
possible. I am not aware that penetration of untrusted code has ever
been a problem since the early B6700 days when some major holes were
exposed in the enforcement of codefile integrity, but this is too ripe
and area for potential abuse, and one that requires enforcement in too
many parts of the system, to be considered an adequate aspect of the
architecture.

The second problem is that the segmentation, addressing, and memory
management mechanisms of the system are built for hierarchical, block-
structured languages such as Algol and Pascal. Memory objects
effectively "belong" to the block that declared them, and are
automatically deallocated when that block exits. This approach also
works fine for COBOL, and is adequate for FORTRAN (at least through
FORTRAN-77). It works poorly, though, for languages which rely on heap-
based memory management, where an object can have a life after its
originating environment no longer exists.

The MCP compilers and operating system go to great lengths to prevent
"up-level pointers" -- the existence of references to a locally-
allocated segment that can be stored in a more global scope. The system
does not have an efficient way to locate and invalidate such up-level
references when the locally-allocated segment is deallocated, so their
use is prohibited.

Current MCP systems support a C compiler, but the performance of its
code is not all that good, partly because C is basically a high-level
assembler for register-based, flat-address architectures (to which the
MCP architecture is a nearly complete antithesis), and partly because
the C heap is currently implemented as a series of array rows, with C
pointers implemented using integer offsets into the runtime
environment's heap space. It works, but the result is not very
efficient.

The problem is even worse for object-oriented languages such as Java. A
descriptor-based capability architecture should be a good fit with an
object-oriented language (a descriptor, after all, is a primitive form
of object), but the current MCP architecture is too closely tied with
the Algol memory model to work well natively with Java. I consider this
to be both a real shame, and a real threat to the future viability of
the MCP architecture.

Those issues aside, I think the MCP architecture is still the most
interesting, if least appreciated, one on the market today. There are
other interesting aspects of the architecture that I have either passed
over quickly (such as stack linkage and cross-stack addressing) or
ignored altogether (such as the process model and the concepts of task
families and synchronous and asynchronous dependent tasks). Then there
are the MCP stack, the stack vector, procedure calls, accidental entries
("thunks"), parameter passing, string and data movement instructions,
server libraries, and connection libraries. These are all also well
integrated with the segmentation, stack, and addressing issues discussed
above. In fact, one of the endlessly fascinating things about this
architecture to me is that, for all of its complexity, everything fits
into a nicely integrated whole. It's quite elegant, really.

For those willing to RTFM, the Unisys support web site allows free
access to essentially all of the documentation for current systems. You
can access this through the front door by going to
http://support.unisys.com/ and clicking the "Access documentation" link
at very bottom of page. Click through the agreement and you will be
presented with a page that can search the documentation. You can also
access documents directly if you know their URL. Here are some useful
ones:

ClearPath Libra 680/690 (latest version of E-mode level Epsilon spec):
http://public.support.unisys.com/c71/docs/Libra680-1.1/68787530-004.pdf

ClearPath NX6820/6830 (E-model level Delta spec):
http://public.support.unisys.com/aseries/docs/Hardware/70205547-001.pdf

Current Algol language reference:
http://public.support.unisys.com/aseries/docs/ClearPath-
MCP-11.1/PDF/86000098-507.pdf

Bitsavers has a good collection of documents for the older MCP systems
under:

    http://www.bitsavers.org/pdf/burroughs/

In particular, you might want to look at the following documents under
that URL:

Narrative Description of the B5500 MCP:
B5000_5500_5700/1023579_Narrative_Description_Of_B5500_MCP_Oct66.pdf

B5500 Reference Manual (architecture reference):
B5000_5500_5700/1021326_B5500_RefMan_May67.pdf

B5500 Extended Algol:
B5000_5500_5700/1028024_B5500_ExtendedAlgol_Jul67.pdf

Elliot Organick's 1973 book on the MCP architecture:
B5000_5500_5700/Organick_B5700_B6700_1973.pdf

B6700 Reference Manual (architecture reference):
B6500_6700/1058633_B6700_RefMan_May72.pdf

A good paper by Hauck and Dent on the B6500/6700 stack mechanism:
B6500_6700/1035441_B6500_B7500_Stack_Mechanism_1968.pdf

If on first reading you don't understand this architecture, you're
running about average. I will happily try to reply to questions and
comments.