CACM Manuscript Documentation Unit TENEX, A Paged Time Sharing System for the PDP-10* Daniel G. Bobrow Jerry D. Burchfiel Daniel L. Murphy Raymond S. Tomlinson Bolt Beranek and Newman Inc. Computer Science Division Cambridge, Massachusetts 02138 Abstract: TENEX is a new time sharing system implemented on a DEC PDP-10 augmented by special paging hardware developed at BBN. This report specifies a set of goals which we feel are important for any time sharing system. It then describes how the TENEX design and implementation achieves these goals. These include specifications for a powerful multiprocess large memory virtual machine, intimate terminal interaction, comprehensive uniform file and I/O capabilities, and clean flexible system structure. Although our implementation required some compromise to achieve a system operational within six months of hardware checkout, TENEX has proven to be a good interactive system with flexible multi-process facilities and reliable operation. Key Words: TENEX, Paging, Virtual Machines, Time sharing system, Scheduling algorithm, Process structure, PDP-10 CR Categories: 2.44 4.32 4.39 4.42 - - - - - - - - - - - - - - *The work reported here was supported in part by the Advanced Research Projects Agency of the DOD, and in part by BBN. In addition to the authors, significant contributions to the design and implementation of TENEX were made by T. R. Strollo, who led the technical staff, and J. R. Barnaby who implemented the EXEC subsystem. Others who contributed include T. Myer, E. Fiala, D. Wallace and J. Elkind. Page 2 1.0 INTRODUCTION TENEX is a new time sharing operating system implemented on the DEC PDP-10. It was developed because no existing system of the appropriate size and cost could meet the requirements of the research projects at BBN. During the development phase of TENEX we formulated a set of goals which we hoped would produce a workable and well integrated system, as well as realize our specific requirements. Our background included knowledge and use of a number of other systems including the DEC PDP-1 systems designed at BBN(1), the Berkeley system for the SDS 940(10), MIT CTSS(3), the DEC 10/50 system(5), and MULTICS(4). We used good design ideas from each of these systems, and tried to avoid what we felt were problems of operation and implementation which we saw in these systems. The scale of the system and available resources imposed certain constraints on the implementation, including that 1) minimal change be made to the PDP-10 processor, and none to the basic address computation, and 2) the system had to be in service for users within six months of the operation of the hardware and at that time dominate our previous service, the DEC 10/50 and the SDS 940 systems. Our design goals, presented below, are generally understood. Some, however, are often overlooked and usually not emphasized. We considered all of these important in the design of TENEX. Our design goals fall into three broad categories with several specific objectives under each. I. State of the Art Virtual Machine. a. Paged virtual address space equal to or greater than the addressing capability of the processor with full provision for protection and sharing. b. Multiple process capability in virtual machine with appropriate communication facilities. c. File system integrated into virtual address space, built on multi-level symbolic directory structure with protection, and providing consistent access to all external I/O devices and data streams. d. Extended instruction repertoire making available many common operations as single instructions. II. Good Human Engineering Throughout System a. An executive command language interpreter which provides direct access to a large variety of small, commonly used system functions, and access to and control over all other subsystems and user programs. Command language forms should be extremely versatile, adapting to the skill and experience of the user. b. Terminal interface design should facilitate intimate interaction between program and user, provide extensive interrupt capability, and full ASCII character set. c. Virtual machine functions should provide all necessary options, with reasonable default values simplifying common cases, and require no system-created objects to be placed in the user address space. d. The system should encourage and facilitate cooperation among Page 3 users as well as provide protection against undesired interaction. III. The system must be Implementable, Maintainable, and Modifiable. a. Software must be modular with well defined interfaces and with provision for adding or changing modules clearly considered. b. Software must be debuggable and reliable, allowing use of available debugging aids and including internal redundancy checks. c. System should run efficiently, allow dynamic manual adjustment of service if desired, and allow extensive reconfiguration without reassembly. d. System should contain instrumentation to clearly indicate performance. Page 4 2.0 HARDWARE DEVELOPMENT FOR TENEX Hardware development and modification for TENEX was limited to that necessary to achieve the goals specified above. This effort included an address mapping (paging) device implemented with then current DEC modules, and some changes to the PDP-10 processor. A hard limit on the latter resulted from the lack of physical room available in the processor. Both processor changes and paging facilities had to be designed to allow the standard DEC time sharing software and diagnostics to run. 2.1 The BBN Pager The BBN pager is an interface between the PDP-10 processor and the memory bus. It provides individual mapping (relocation) of each page (512 words) of both user and monitor address spaces using separate maps for each. The pager uses "associative registers" and core memory tables to store the mapping information. On each memory request from the processor, the 9 high-order bits of the address and the request-type level (read, write, execute) are compared in parallel with the contents of each associative register. If a match is found, the register containing the match also contains eleven high-order address bits to reference up to one million words of physical core. If no match is found, reference is made to a 512 word "page table" in physical core memory. The word selected in this page table is determined by a dispatch based on the original 9 high-order address bits. In the simple case of a private page which is in core, the 11 high-order address bits and protection bits are found in this word and are automatically loaded into an associative register by the pager. There are three other cases: A. The page is not in core, is protected from the requested type of access, or is non-existent; in this case a page fault (trap) will occur. B. The page is shared; in this case the map contains a "shared" pointer to a system table which contains the location information for the page. C. The page belongs to another process; in this case, the entry contains an "indirect" pointer to an entry in another page table from which the location information is obtained. The goal of program (code and data) sharing was given extensive consideration in the design of the BBN Pager. The indirect and shared pointer mechanism allows pages to be actively shared (be represented in more than one address space) but still have the current address (core or secondary storage) stored in only one place. This allows memory control tables and data structures to be kept simple, and the memory management software to move pages without extensive computational overhead. The pager permits individual pages to be shared for write as well as read references, so two or more processes may communicate by sharing a common page into which any or all may write. Rather than enforce a discipline of pure procedures, with private data in another segment, a unique "copy-on-write" facility Page 5 allows users to share large portions of an address space containing procedures and data, obtaining private copies of only those pages which are changed. This is implemented through an independent per-page status bit available to users which will produces a trap on a write reference, and a system procedure by which a private copy is then created. For example, this permits shared programs to be prepared with pre-constructed data areas which will be kept shared if not modified, and put in private storage if changed. One final unique feature is that the pager maintains a record of the activity of the pages in core memory in a "core status" table. The pager notes when a page has been referenced, which processes have used that page, and whether the page has been written into. This information is used by the memory management software to be described. 2.2 Processor Modifications Except for the pager trapping facilities, all of the TENEX virtual machine facilities (monitor calls) are reached via a new system call instruction, JSYS, added to the PDP-10 processor. JSYS provides an independent transfer mechanism into the monitor which does not conflict with "UUO" system calls used by DEC software. It accomplishes a transfer from the user program to the specified monitor routine in one instruction time via a block of cells called the JSYS transfer vector. The JSYS address provides the index to the proper transfer vector entry. The state of the processor, including the return address is stored in a location specified by the transfer vector, usually in a separate data area, so that it is suitable for reentrant code. The JSYS transfer vector occupies exactly one page in the monitor space and could be mapped independently for each process, but this is not done in the current system. There exists a context, either user or monitor, for each instruction execution. The JSYS system call may be executed in either context, with the "callee" operating in monitor context. One hardware modification to the PDP-10 adds a bit to the state word of the processor to record the context of the "caller". Two ways of accessing the caller memory context were added to the PDP-10. One is an execute instruction which allows current or previous context to be specified for each memory reference of the object instruction. The second access instruction group moves data between the AC's (general registers) of the current context, and memory of the previous context. AC's from the previous context are accessed using a pager 'AC base register' which specifies the location of the stored AC's. Page 6 3.0 THE TENEX VIRTUAL MACHINE A user process running under TENEX operates on a virtual machine similar to a PDP-10 arithmetic processor with 256K of virtual memory. The paging hardware traps processor references to any data not in core, and a core manager performs the necessary I/O to make the referenced page available. Such traps are invisible to the user process. The virtual processor does not make available to the user the direct I/O instructions of the PDP-10. But through instructions which call monitor routines, the virtual machine provides facilities that are considerably more powerful and sophisticated than typical hardware configurations used directly. 3.1 Virtual Memory Structure The TENEX virtual memory may be viewed as a linear address space of 256K words, and programs may use it in this fashion. However, the existence of the paging hardware means that the monitor must deal with memory in pages of 512 words, and some of the power which the mapping hardware provides is accessible to a user program. The contents of the virtual memory at any time are specified by the virtual memory map of 512 slots, which the user may read or write. The contents of each slot specify the page in that position in the virtual address space, and the type of access allowable (read and/or write and/or execute) for that page. In the simplest case, a map slot may contain (a pointer to) a private page, i.e. a page shared with no other processes in the system. A private page is automatically created whenever a process makes a reference to a page for which the map slot is empty. A slot may also contain an indirect pointer to a page in this or some other process. A memory reference to a location in such a page will be executed just as though the instruction had directly addressed the page pointed to. Any change made to the page by either process will be seen by both processes. If the owner of the page changes the contents of his memory map, then the process with the indirect pointer will see the change. A virtual memory slot may also contain a pointer to a page from a file in the file system, as discussed later. 3.2 Job Structure A job is a set of one or more hierarchically related processes, and it has the following attributes. 1. The name of user who initiated the job 2. An account number to charge costs associated with use of system resources. 3. Some open files. 4. A hierarchy of running and/or suspended processes. A job may also have one or more terminal or other devices assigned and attached. Page 7 3.2.1 Process Hierarchy TENEX permits each job to have multiple simultaneously runnable processes. The relationship among them is defined by a structure which looks like an inverted tree defined by the capability for direct control and killing. A process always has exactly one immediately superior process and may have one or more inferior processes. Two processes are said to be parallel if they have the same immediate superior. In TENEX, a process may create processes inferior, but not parallel or superior in the structure. A process can communicate with other members of the structure by (a) sharing memory (b) direct control (superior to inferior only), or (c) pseudo (software simulated) interrupts as described in 3.3. Although not completely general, a tree structure process heirarchy implicitly provides the protection and reference facilities that are wanted in most applications. These include referencing inferior processes as a class for freezing, killing, and resuming, fielding of interrupts and special conditions by a superior process, and protection of the superior process from inferiors. Currently in TENEX, multiple processes are used: 1. To enable the EXEC to run user programs, handling faults and servicing user requested interrupts. 2. To allow programs to block for an arbitrary set of events; one process waits for each event and signals the main process when it occurrs. 3. To implement an invisible debugging program, completely protected from malfunction of the program under test. 3.3 Pseudo-Interrupt TENEX provides a facility for a process to receive asynchronous signals from other processes, from terminals, or as the result of its own execution. The various processes in a job may explicitly direct interrupts to each other for purposes of communication. A process may enable an interrupt which will occur whenever the user hits a particular key on the controlling terminal. Finally, a process may use the pseudo-interrupt system to detect any of a set of unusual conditions, including illegal references to memory, processor overflow conditions, end-of-file and data errors. 3.4 Other Monitor Functions Other functions which form a part of the virtual machine include: a) Functions which provide information to the program about the state of the system or job. (Time of day, runtime used, name of user, etc.) b) Functions which save and restore the computational environment of a process, to allow restarting of a suspended program. c) Functions which provide frequently needed forms of I/O Page 8 conversions such as fixed or floating point number input and output, and date and time to string conversions. 3.5 Backward Compatibility (DEC 10/50 monitors) Since TENEX was being implemented on a machine for which a large useful program library existed, mostly for use under the DEC 10/50 time sharing monitor, we felt it was highly desirable to be able to run such programs under the new monitor system. We felt it should be possible to run binary images of old programs, i.e. without reassembling. Toward this end, all of the TENEX monitor calls were implemented with the JSYS instruction, reserving all old monitor calls for their previous use. Secondly, routines were designed which implemented all of the existing 10/50 monitor calls in terms of the available TENEX monitor calls. This set of routines implements all of the functions available in the 10/50 monitor except those specifically intended for the maintenance of the system. Assembled together as a compatability package, they occupy slightly less than 2.5K of core. The package is kept as a core image file and is never seen by programs which use only TENEX monitor calls. However, the functions are automatically made available to 10/50 type programs by the monitor. When a program makes its first 10/50 type monitor call, the TENEX monitor maps the compatability package into a remote portion of the process address space, an area not usually available on a 10/50 system. Subsequent 10/50 type monitor calls cause a transfer to the compatability package which then interprets the call. The compatibility routines are placed in the user space for several reasons: a) regular use can be made of the pseudo interrupt system, b) the compability package (which requires constant maintenance) can be maintained as a separate module, totally independent from the monitor, and c) the monitor is protected from malfunction by the compatiblity routines. Page 9 4.0 USER INTERACTION WITH TENEX 4.1 Terminal Interaction Capabilities The terminal service module of TENEX was designed to provide any type of interactive behavior a program might find useful. Many programs, especially the command language interpreter described below, benefit by having many short interactions with the user, often one or a few characters. Full duplex terminals are preferred for use with TENEX for these reasons and for the reason that the user can in fact anticipate the machine's responses and begin typing input before output is completed. Algorithms for echoing typein ensure that the typescript is an accurate record of the dialog. Half-duplex terminals may be used but at some cost in convenience. 4.2 Executive Command Language Users at terminals communicate and work with TENEX primarily through a command language interpreter called the TENEX Executive, or EXEC. The EXEC is an interactive, well human engineered program which can accept commands from a user's teletype or from a file. It is implemented as a reentrant, shared program which runs in user mode, usually as the top level process in the structure. The EXEC provides the user with a multitude of facilities which are activated by simple, easy-to-learn commands. These facilities provide access to the system (e.g. LOGIN); utility operations on files and file directories; initiation of private programs and subsystems; limited debugging aids; initiation of batch (detached operations); printout of user information and system statistics; and system maintenance. The EXEC was designed with two primary objectives--ease of learning and ease of use. To ease the learning process, all commands are English words which are descriptive of the facility being activated. (e.g. COPY to copy information from one file to another, STATISTICS to obtain a listing of current system statistics). Each command begins with a keyword. Depending on the command, the initial keyword may be followed by arguments such as file names, numbers, and additional keywords, and/or "noise words" to make the command more readable. The noise words are enclosed in parentheses to distinguish them from the arguments. In order to help novice users, two special assistance features were incorporated. First, when the EXEC requires input from the user during a command interaction, (for instance, to collect arguments of that command) a cue is typed to indicate to the user what is expected. For example, an interaction which renames a file might be: @REN$AME (EXISTING FILE) ALPHA$.MAC (TO BE) BETA The user's input has been underlined. The $ indicates a typed ESC (ASCII escape, code 33(8)) which invokes the EXEC's verbose cueing responses in parentheses. In this example the user typed only three Page 10 letters REN followed by ESC which invoked command completion by the EXEC, a feature which makes the language particularly easy to use. An ESC after any initial substring of a command or argument (such as a file name) invokes completion. If the substring is insufficient for unique identification of the intended input, the EXEC rings the teletype's bell and awaits additional characters. If the initial substring cannot be recognized the EXEC types '?' to ask the user to retype that input. If the novice user still doesn't understand what is expected in his response, he may invoke the second special assistence feature by typing the character '?'. This causes the EXEC to type out a list of all options available to the user at that point, then request a response. To summarize, three general styles of input may be used, distinguished by syntactic structure, including special input terminators. Hence the styles do not require different input modes and thus may be intermixed freely within a session or even with a statement, adapting to the state of knowledge and verbosity of the user. The input styles allow: 1. Complete Input. A complete command may be typed in. with all keywords and noise words given in their entirety, and without use of any non-printing characters. 2. Abbreviations. The user may shorten a command in two ways: he can omit noise words completely, and he can shorten keywords. Any keyword may be abbreviated with any initial substring (terminated with space) long enough to distinguish it from the other keywords acceptable in that context. 3. Completion. The user types the same characters as in abbreviated input, except he terminates each field (keyword or argument) with the ESC key. This produces a print-out of the complete command--each ESC causes the rest of the field (if an abbreviated keyword or file name) and any following noise words (with enclosing parentheses) to be printed. The EXEC also provides editing characters to permit the user to correct typing errors in his input. These editing characters permit the user to delete the last character of his typed input, the last word, or all of it. He can also ask for his edited input to be retyped for clarity. 4.3 Interrupt and Escape Characters ASCII Control-C is the EXEC's attention character. When typed by the user, it causes any running program to be stopped and control to be given to the EXEC via the pseudo interrupt system. The user may then continue his program or take any other action. Another terminal interrupt character, control-T is serviced by the EXEC. It interrupts a user's EXEC process to type out total CPU and console time used, and status of the process being run under the EXEC. Page 11 5.0 THE TENEX FILE SYSTEM The TENEX file system provides a general mechanism for obtaining information from and sending data to external devices attached to the TENEX system (12). Write only and read only devices are included in the file system so that all TENEX I/O may be handled uniformly. The first major function of the TENEX file system is to provide symbolic file name management. This includes two separate but related activities. The first involves translation of a symbolic name into an internal "file descriptor block" pointer associated with that name, second, it involves checking information concerned with 1) the file status, e.g. whether it exists, access rights, etc.; and 2) the process requesting access to the file. This second activity, known as File Access Protection, determines if this process should be allowed to know about the existence of this file, and if so, what access it is allowed. A symbolic name for TENEX files consists of up to five fields and thus conceptually represents a tree of maximum depth five. Not all nodes of this tree go down to maximum depth. This scheme was chosen rather than a full tree to simplify the problem of compatibility with existing DEC PDP-10 software and name lookup and recognition. We are currently considering the feasibility of implementing a full tree directory structure similar to MULTICS (11). At each level there would be set of information which is related to access rights, and media dependence of the data access for this node. Each node would represent a collection of related information, with the terminal nodes being files. The fundamental unit of storage in a TENEX file is a byte which may be from 1 to 36 bits in length. A stream of bytes constitutes a file which is the basic named element in the file system. Programs may reference files byte by byte in a sequential manner or, if the device permits, at random. Files may also be referenced by byte strings. No structure other than bytes and files is imposed on the user, and byte and string input and output are the basic operations. Of course, additional structure and other operations may be implemented by the user programs. 5.1 File Names A TENEX file is named by a file descriptor composed of 5 fields some of which are omitted for certain devices. The five fields are device name, directory name, file name, extension, and version number. The file name field is intended to designate a class of files which are related in some way. This convention is not enforced, but most users of TENEX follow the convention since it facilitates management of a users files. The extension field is intended to designate variously processed forms of the same information. A file's extension is frequently specified by a program. For example, PROG.MAC, PROG.REL, and PROG.SAV would be used to indicate the MACRO assembly code source, relocatable file, and binary image of a single program. The version number of a file enumerates successive versions of a file. Page 12 Normally each time a file is written a new version is automatically created by making its version number be one greater than the highest existing version. This protects a user from loss if he accidentally writes on the wrong file. Excess versions may be deleted by the user or automatically by the system when they have been put on a backup storage medium. Any of the fields of a file description may be abbreviated except for device and version. The appearance of an ESC in the file descriptor causes the portion of the field before the ESC to be looked up, and the system will supply the omitted characters and/or fields. Abbreviation without this output is not provided in order to insure that the typescript reflects exactly what was done. The system provides default values for each field except the file name. A default value is used for a field if the user omits any input for that field, e.g. the device and directory. This simplifies references to files in most common cases. 5.2 File Access Protection Because TENEX must service a diverse user community, it is essential that access to files be protected in a general way. Generally, access to a file depends on two things: the kind of access desired, and the relation of the program making the access to the owner of the file. Presently, a simple protection scheme is implemented in which the only possible relationships a program may bear to the file's owner are: 1. The directory attached to the job under which the program is running is the same as the owning directory. 2. The directory attached to the job under which the program is running is in the same group as the owning directory. 3. Neither 1 or 2. Five kinds of access are distinguished for a file: directory listing, read, write, execute and append. The above three relationships and five protection types are are related by 18 bits (a 3 by 6 binary matrix) in which a one indicates that a particular access is permitted for a particular relationship. If directory listing access is not permitted, the process requesting access is given an error return which is indistinguishable from the error for nonexistent file. This is important if the information that a file exists should not be generally available, as is the case for secure systems. Other access restrictions cause errors only when an attempt is made to open a file, as described below. For purposes of determining group access, a 36 bit word is administratively associated with each directory and each user. If the bitwise "and" of the user group word of the accessor and the directory group word of the accessee is non-zero, the group access permission is used. Provision has been made for a more general file protection system in which more general access relationships may be expressed in a special Page 13 file protection language. For example, access may be allowed only to an explicitly named set of users. 5.3 File Operations Using a file in TENEX is basically a four step process. First a correspondence is established between a file name and a Job File Number (JFN) which is a small index into a job table for files. Next the file is opened, establishing the mode and access permission and setting up monitor tables to permit data of the file to be accessed. Third, data is transferred to or from the file, and finally the file is closed fixing up the directory information and releasing the space occupied in system tables for the file. For purposes of file sharing, all instances of opening a particular file reference the same data. Data written in a file will be immediately seen by readers of the file. To protect against confusion resulting from multiple uncooperating simultaneous writers and readers of a file, a file can be opened with what we call thawed or unthawed access. With thawed access, a file may have any number of thawed writers and/or thawed readers, but no provision is made to guarantee that information is in consistent (frozen) state. With unthawed access, a file may have any number of unthawed readers; or exactly one unthawed writer; this prevents any potentially conflicting operations. Simultaneous accessors of a file must be all thawed or all unthawed. Page 14 6.0 THE MONITOR 6.1 Scheduler The TENEX scheduler is designed to meet a set of potentially conflicting requirements. The first is to provide an equitable distribution of CPU service, which we define as at least 1/N of real time where there are N jobs on the system. Secondly, because TENEX is designed to be a good interactive system, the scheduler must identify and give prompt service to jobs making interactive requests. Thirdly, because use of the CPU is intimately tied to the allocation of core memory, it must make efficient use of core memory to maximize CPU usage. Finally, the scheduler should have provision for administratively controlling the allocation of resources so as to obtain other than equal distribution if desired. 6.1.1 Balance Set Scheduling For the scheduler, we want a coherent policy which obeys Denning's "working set principle" (6). A priority rating, as described below, is given to each runnable process in the system, and an estimate is made of the working set size of each process. The jobs with highest priority whose total working sets will fit in core are called the balance set and may be run concurrently. When any process in the set page faults one of the others in the balance set is given CPU service. Denning (7,8) has shown that such balance set scheduling minimizes thrashing and tends to maximize system efficiency. Periodic monitoring of the entire set of runnable processes for changes in priorities and in working set sizes allows adjustment of balance set membership. 6.1.2 Setting Process Priorities To implement the basic scheduling function, a scheduling algorithm was chosen which groups processes together on a number of separate queues each with associated runtime quantum, similar to algorithms described by Corbato (3) and BBN (1). Lower queues in general have lower priorities but longer runtimes. A common problem with many schedulers of this type is that processes are placed on the highest priority queue after any interaction. Under conditions of heavy load or with poorly behaved interactive processes, it may happen that the interactive processes succeed in using all of the available time and so lock out the compute bound processes which have fallen to the lower queues. In TENEX, priority is based on a long term average ratio of CPU use to real time, and a process's priority after an interaction is determined by its priority before the interaction and the length of the interaction. Specifically, a process's priority is decreased while running at a constant rate, C, and increased while blocked at a rate of C/N, where N in the number of runnable processes in the system. This ensures that equitable service is given both to compute-bound and interactive jobs. Page 15 To improve response characteristics, an interactive "escape clause" is included in the scheduling algorithm. After a block wait of greater than minimum time, a process is given a short quantum at maximum priority. Priority and queue position after this burst are determined by the long term average. The effect of this provision is to ensure quick service to very short interactions, even when requested immediately after a long computation. 6.1.3 Resource Guarantees and Limitations In some cases CPU resource guarantees independent of load are desired, e.g. a demonstration which requires significant CPU time during a period of medium or heavy load, or a user who is willing to pay extra for premium service which does not degrade as the load on the machine increases. A facility is implemented in TENEX to handle these situations. A person with appropriate administrative access can assign to any job on the system a fraction, F, of guaranteed CPU service. For any job so designated, the scheduler will attempt to ensure that: C/T > F where C is the CPU seconds used by the process, and T is the real time since the process last unblocked. For example, if the parameter is set to 30%, the scheduler will provide at least 18 seconds of CPU service to the specified job during each minute of real time. This parameter acts as a ceiling as well as a floor for CPU service. That is, if there are other runnable processes on the system which are not declared special, then the scheduler will ensure that the special process receives no more than the stated fraction of CPU service. We have found that a process with this sort of resource guarantee displays very consistent interactive behaviour despite widely varying loads on the time sharing machine. 6.2 Core Management The information provided in the core status table by the paging hardware is essential to the proper management of core memory in TENEX to avoid thrashing and other forms of inefficient operation. Paging is done on demand. No ordinary pages are preloaded before a process is run, and in general, a process will not have all the pages of its virtual memory in core at once. When a process references a page which is not in core, a pager trap occurs and a core management routine is invoked. The run time since the last page fault is used for a running average of page fault times, i.e. interfault intervals in process time. A process is considered to have enough of its working set if the average page fault time equals PAV, a system parameter currently set to 67ms (or 2 drum revolutions). If the process is faulting more often than PAV, it is considered below its working set size, and a swap to bring in the requested page initiated. Control returns to the scheduler so that it can run another process until the swap is complete. If the process is faulting less often, a core managing routine is invoked to reduce the size of the process, i.e. remove some of its pages from core if space Page 16 is needed. We have found this algorithm is stable and works well; Denning and Schwartz (9) show formally that this is a good idea. To reduce the size of a process working set, a least recently used algorithm is used. The age of each page of a process is determined by a 9 bit logical age field stored by the pager in the core status table when a reference to that page causes a pager reload. Since collecting pages is costly, all sufficiently old pages are removed on a working set reduction. A quick reference however can catch a page before it leaves physical core. 6.3 System Measurements In order to observe and improve the performance of TENEX in regular service, various measuring functions were built into monitor routines. Some measures serve to indicate the efficiency of scheduling, the core/CPU balance, and the nature of the various processes running on the system. The scheduler maintains a set of integrals over time which give (as a fraction of real time): IDLE - time when no processes are requesting CPU service WAIT - time when all runnable processes are waiting for completion of page fault CORE - overhead time spent in core management TRAP - time spent handling pager traps Two relationships among these are: IDLE + WAIT = total time spent in scheduler REALTIME - IDLE - WAIT - TRAP = time spent running user processes Also maintained by the scheduler is an integral over time of the number of processes in the balance set, the number of transfers between core and secondary storage, and number of terminal interactions. One measure is of interest on a recurring basis to all users of the system. The scheduler maintains three exponential averages A(T+t)=A(T)*exp(-t/c)+N*(1-exp(-t/c)), with time constants = 1, 5, 15 minutes and N = number of runnable processes on the system. This indicates the true current load on the system better than the number of jobs logged in. Users often choose on the basis of these load figures what they do on the system at a particular time. Three figures are better than just the last one, because from these one can predict the trend as well as note local variation. 6.4 Debugging Aids Certain debugging procedures and aids used in the development of the system contributed greatly to the speed of development and integrity of the system. Our principal debugging aid is a program called DDT, available in several forms in the system. DDT is a program which allows memory locations to be examined and modified and breakpoints (return of control to DDT) to be placed in a running program. All interactions with DDT are symbolic, using the symbols defined in the source program and obtained from the assembler. Page 17 The form of DDT first used and still necessary for debugging basic level code is a stand alone version which resides in core memory along with monitor. It is used for debugging the scheduler, portions of the core manager, and other basic routines. The second form of DDT was added as soon as the basic monitor could support demand paging and create a virtual memory. This DDT exists in the monitor map and may be used as an ordinary program at a system terminal. It is capable of examining and changing the running monitor and all of the associated tables and other contents of the monitor virtual memory. Use of this form of DDT actually allows several persons to work on debugging portions of the monitor simultaneously. A third form of DDT is used with user programs and is cognizant of the access status (execute or write protected, etc) of pages of the user program. Debugging the system was further facilitated by the use of a considerable number of internal redundancy and consistency checks which call one of two recovery routines upon detecting any malfunction. If the system is attended by system personnal, these routines enter a DDT breakpoint and the state of the monitor can be examined to determine what has gone wrong. This has enabled us to find and correct obscure or infrequent faults in the software. If the system is unattended, then, depending on which routine was called and the severity of the fault, one of three things happens: 1) operation is continued with no interruption, 2) one process is crashed, or 3) the system is automatically reloaded and restarted. This usually provides continued operation in the event of unexpected hardware or software malfunction. Both routines cause the source of the inconsistency and a message describing the problem to be logged for further action by system personnel. Page 18 7.0 CONCLUSION One of the most valuable results of our work was the knowledge we gained of how to organize a hardware/software project of this size. Virtually all of the work on TENEX from initial inception to a useable system was done over a two year period. There were a total of six people principally involved in the design and implementation. An eighteen month part time study, hardware design and implementation culminated in a series of documents which describe in considerable detail each of the important modules of the system. These documents were carefully and closely followed during the actual coding of the system. The first stage of coding was completed in 6 months; at this stage the system was operating and capable of sustaining use by non-system users for work on their individual projects. The key design document, the JSYS Manual (extended machine code), was kept updated by a person who devoted full time to insuring its consistency and coherence, and in retrospect, it is our judgement that this contributed significantly to the overall integrity of the system. We felt it was extremely important to optimize the size of the tasks and the number of people working on the project. We felt that too many people working on a particular task or too great an overlap of people on separate tasks would result in serious inefficiency. Therefore, tasks given to each person were as large as could reasonably be handled by that person, and in so far as possible, tasks were or related in ways that were well defined and documented. We believe that this procedure was a major factor in the demonstrated integrity of the system as well as in the speed with which it was implemented. Page 19 8.0 BIBLIOGRAPHY (1) BBN MEDICAL INFORMATION TECHNOLOGY DEPARTMENT: "THE HOSPITAL COMPUTER PROJECT TIME SHARING EXECUTIVE SYSTEM" - BBN Report Number 1673, April 1968 (2) Bobrow, D.G., Burchfiel, J. D., Murphy, D. L., and Tomlinson, R. S. : "TENEX, A Paged Time Sharing System For the PDP-10" - BBN Report Numer 2180, August 1971 (3) Corbato, F. J., et al: "An Experimental Time-sharing System" - AFIPS Conference Proceedings Vol. 21 (1962 SJCC) (4) -- : "An Introduction and Overview of the Multics System" - AFIPS Conference Proceedings Vol. 27 (1965 FJCC) (5) Digital Equipment Corp.: "PDP-10 Reference Handbook" - DEC, 1971 (6) Denning, P.: "Working Set Model for Program Behavior" - Communications of the ACM, Vol. 11, No. 5, May, 1968 (7) -- : "Thrashing, It's Causes and Prevention" - AFIPS Conference Proceedings Vol. 33 (1968 FJCC) (8) Denning, P.: "Equipment Configuration in Balanced Computer Systems! - IEEE Transactions on Computers, Vol. C-18, No. 11, November, 1969 (9) Denning, P. and Schwartz "Properties of the working set model" Communications of the ACM (this issue) (10) Lampson, B., et al: "A User Machine in a Time Sharing System" - Proceedings of the IEEE, Vol. 54, No. 12, Dec. 1966. (11) Spier, M. J. and Organick, E.: "The Multics Interprocess Communication Facility" - Proceedings of the Second Symposium on Operating System Principles, Oct. 1969 (12) Wilkes, M. Time Sharing Computer Systems; American Elsevier 1968