A NEW KIND OF COMPUTING
by BERNARD A HODSON
The industry today is plagued by a variety of problems: insecure operating systems, viruses, worms, spam, theft of identity, intrusion into personal systems, wireless data interception, satellite data interception, hackers, and on and on. The costs to industry from spam alone are high, and viruses have played havoc with business activity, even putting some companies out of business. Security threats to individuals, companies and countries are increasing. It is high time that we addressed potential solutions and acted upon those that offer the most promise.
This paper describes one possible solution. It outlines a programming paradigm that could be developed as a standard. It has already been used successfully on several levels of computer, from main frames through microcomputers to 8 bit RISC chips for smart cards and embedded systems. The paradigm proposed is a standard that can apply to all levels of programming activity, with considerable flexibility for customisation. It has the potential to eliminate most of the problems mentioned in the first sentence and is also small enough that the entire system could be encrypted for each computer and server. It would simplify or eliminate all operating systems.
The paradigm utilises an expandable language, which can be converted to a byte string on any computer. The byte string is completely independent of the target computer for the application. Using the rules established for the paradigm any application written in terms of the expandable language can be processed by a simple compiler. In fact the rules are so simple that an application can be developed without the need for a compiler, generating a byte string for acceptance by the run system. Iy will be explained at the end of the paper how readers can obtain a copy of the simple compiler and a basic expandable language, to try that phase out.
The run system processes the generated string of byte codes. To do this it uses a double numeric system which uniquely identifies every element needed within an application. This technique makes the virtual processor run system very tiny, from three or four thousand bytes for 8 bit RISC chips using typical smart card and embedded systems applications, to seven or eight thousand bytes for a microcomputer with simple graphics, to somewhat more for the processing of video images and other more demanding applications. The numeric coding system used uniquely identifies every activity to be carried out, making for a fast running application. The numeric codes of the paradigm are unique, enabling new capability to be added without affecting what has previously been developed.
To compile or not to compile, that is the question
The author has had experience in developing compilers and also a JAVA run system. Fortran and Cobol (as for C, C++, PL1 and other compilers) generate a machine language structure which utilises a library of subroutines. In JAVA a string of byte codes is generated which requires a library of methods and similar structures. The libraries for both approaches tend to be large. Fortran and Cobol are quite limited in their capabilities while JAVA is verbose with a clumsy vocabulary. Even the very simple ubiquitous 'hello world' application in JAVA needs a huge amount of methods and resources.
The compiler for the paradigm of this paper is itself very small and can be placed, if desired, at the front of the run system, taking just a few hundred more bytes (as the compiler and run system have mutual routines). In that situation the language elements, rather than the byte codes, are presented to the system, which generates the byte codes first and then runs them. This mode is particularly useful for safety critical applications (where the compiler AND the application have to be tested whenever a change to either occurs). For the balance of the paper following this section, however, the compile is considered to have been done, the system being presented with a string of byte codes. The compile operation is shown in figure 1.
Figure 1. The compiler converts language elements to byte codes.
Application line 1
Application line 2
The compiler is very small
String of byte codes (code 0 to 255)
All elements of the run system are static. The only part of the paradigm that can be considered variable is the generated byte stream, which will vary from application to application.
Every application consists of a string of language elements which may be associated with parameters such as numbers or variable names. With the exception of numeric data and literals all language elements and variables are converted to a single byte.
Each language element is associated with a number currently running from 0 to 255, although at present only a fraction are used. It is unlikely that more than 256 will ever be needed but, if so, the number can simply be increased to just short of 65536, without in any way affecting what has been developed before such an extension takes place.
Some typical examples of language elements are:
The language elements shown are 'looping', 'screen', 'bitmap' and 'arith', which will have a numeric code associated with them such as 3,5,+,*
- looping 1 1 100 adr grt
- screen ^hello world to^ name
- arith alpha = beta + gamma / delta + 13
For the element 'looping' the numbers 1 1 100 represent looping parameters going from 1 by steps of 1 to 100. The symbols adr and grt represent transfer points for the true or false result of the operation. Such a language statement may result in the byte code sequence 2(1(1(d25
The (1 indicates that a numeric number has been converted to its binary equivalent, in this case a 1, the 25 indicate the second and fifth named language statements are to be transferred to depending on the result of the looping arithmetic (this is done automatically during the compile process). Other language elements give alternate forms of loop control
The element 'screen' might result in the byte code sequence 511 indicating that the first literal is to be placed on the screen followed by the first variable, which would likely contain the name of a person receiving the message. It has been ascertained that few applications contain more than 256 variable names. While this is the limit in the initial system, extension to just less than 65536 variable names can be accomplished without disturbing what has been developed previously
The element 'bitmap' would have the single byte code + which would trigger a sequence of activity in the run system asking for the name of the bitmap image that should be produced
The final language element 'arith' might generate the byte codes *47+6/3+%51~13 where the 4th variable has the result of taking the 7th variable, adding the 6th divided by the 3rd to which is added 13. In this case the % indicates that what follows is a floating point number whose length is 5 with positive sign and value 13.0. Again the relative numbers used are a function of the compiler, the programmer having no need to be aware of the coded sequences.
An initial reading may suggest that the structure is complicated but the numeric conversions are done by the very tiny compiler and are very simply processed by the run system. The coding system does not need to be specifically known by a programmer. The run system, from that byte code stream, does exactly what is required.
One important observation is that a spurious byte code introduced nefariously would likely cause the application to abort. For more critical applications a check sum could be added at the end of the byte codes giving the total value of all bytes, more or less guaranteeing security from hackers and virus activity. This would be checked at the beginning of an application.
The virtual machine processor - the run system
The run system consists of about 30 small modules in native code, the number of modules depending on the functionality included. Most of the modules are independent of each other so that the size of the virtual processor (VP) can be reduced for the client needs (e.g. smart card applications may not need the graphics or the bitmap modules). Even so the VP is very small for most client needs, ranging in size from about 4k bytes to 10k, depending on the functionality included.
Access to the VP is through a numeric code within the static part of the software (to be described in following sections).
Most of the modules require only a few bytes of machine code, the only exceptions being modules such as bitmap and the software floating point routines for add, subtract, multiply, divide and test floating point numbers (which are similar to but more accurate than the IEEE format). The technique of numeric coding for the static part of the system is what makes such a small VP size possible. The coding also enables the VP to go directly to both the module required and also its associated parameters.
Most of the modules are concerned with data moves from direct or indirect addresses, and with binary arithmetic and logic routines. These were identified as all that were necessary from a review of compiler generated code from many applications in a business environment.
Language and internal elements
In addition to the VP there is a static section (which is the main reason such a small but functionally powerful system can be built) consisting of a string of numbers associated with each language element (one for each element) and a set of internal element numeric strings which complement the modules in the VP. Both the language elements and the internal elements complementing the VP modules were identified by a study of existing business applications. Both can be augmented without affecting what has been developed previously, enabling a controlled development of the concept to take place.
The application programmer does not need to know the internal structure of the elements, this being only of concern to the very small number of people who are involved with system expansion. The general form of the elements will, however, be illustrated, see figure 2.
Figure 2. Typical structure of an element.
Name1 A, B, C, m, D, E, F, G, name2, n, H, I, o, name7, endit
Each element, whether language or internal is given a mnemonic associated with its function, such as Name1 in the figure. This applies to either a language or an internal element.
A,B,... indicates an identifiable mnemonic calling for a VP module with appropriate parameters, such as 'screen' or 'looping'.
m,n...indicates a branch operation depending on whether the result of the previous activity was 'true' or 'false'
name2, name7... refer to other internal element strings which are to be used. The element called must not be a language element. It is this feature which also contributes significantly to the very small size of the system, as several layers of internal elements may be addressed before the system returns to the VP module following the call to another internal module.
endit is a special function indicating the end of an element, it need not be placed at the end of the element but should be the logical termination point of the element.
The various items A, m, namex are assigned by system developers, the system itself being designed to process the byte codes generated by the compile operation. As was mentioned earlier the byte codes can be generated on any system with an appropriate compiler, or even be generated manually.
Structure of the numeric code
The numeric code is number between 0 and 65535. Numbers above 65500 are for control functions, such as endit and error and termination activities. Numbers less than 512 indicate a logical transfer within the element, while numbers under 32768 but above 512 refer to an internal element name location. In this regard it is not expected that the internal elements will ever need more than 32768 bytes, requiring only about a quarter of that figure at the present time. As the concept becomes accepted as an industry standard it is conceivable that internal elements could go higher than 32768 but a strategy has been developed for an orderly upgrade should that situation ever occur.
The numbers between 32768 and 65500 refer to the use of modules within the VP. One part of the number indicates the specific module to be used, while the balance of the number uniquely identifies the location of the parameters to be used. It should be stressed once again that these numbers are allocated during the compile stage from the language statements of the application, and need not be known by an application developer.
Most applications are relatively compact in their byte code structure and many applications can be resident simultaneously, even on smart cards with their limited real estate, as well as on embedded systems. In early development work a complete hospital information system, on-line accountancy, and a business credit reporting system were all using the same VP software, each with a string of their own applications.
Use of multiple applications with the same software does involve some control of the application names, to avoid duplication and ambiguity, but this is relatively simple to accomplish. The multiple applications can also be assigned one, two or three priority levels, if desired. In this context all priority one's are processed once, then a priority two, repeating the cycle until all priority two's have been processed once, at which time a priority three gets processed. This round robin priority ensures that all applications do see some light of day during the course of on-going operations. This is achieved by a simple 'roll-in roll-out' process of variable data within the application, including the stack process which controls the multi-layer operation of the internal and language elements.
Another useful feature is that the language elements can be in any ethnic language, and the multiple applications do not have to be in the same ethnic language. Even within a single application it is possible to use more than one ethnic language through the use of synonyms. Use of such synonyms adds slightly to VP processing but not significantly so.
The VP, as well as the numeric elements, are quite small and the elements could readily be encrypted with any one of the current algorithms, being decrypted only during their use. This would add a minor but continuous process time penalty. An alternative would be to store the elements in encrypted form and decrypt the elements at the beginning of a run, which would be a reasonable strategy for discreet running applications but likely not quite as suitable for continuously running applications such as may be used for pipeline or nuclear power plant monitoring.
For those with less sensitive needs various check digits can be incorporated both for the VP and for the element segments, these check sums being verified at each run if necessary, or at random intervals. It is unlikely that any intrusion of elements or VP would go undetected.
The proposed standard for programming represents the culmination of years of development, most of it catering to the conventional approach. The standard now proposed is essentially a numeric table with a small number of associated modules that decode the numeric and carry out what previously has been done by computer instructions. The modules will grow very slowly as the concepts are accepted by industry, the numeric elements on a more accelerated basis as the functionality is enhanced. It is possible, with the approach outlined, to consider this as a single, unchanging technology, that cann accommodate all current and future application needs, from the molecular needs of nanotechnology to the mathematical expansion requirements of the most powerful super computers.
The earlier concepts demonstrated successful applications on main frames, on mid size and micro computers and on microcontrollers with RISC chips.
It will take several years for these concepts to become dominant in the industry but dominate it they certainly will. In the first instance they should be adapted to microcontrollers for smart cards and embedded systems, which constitute over 90% of all installed computers, but which are not dominated by monopolistic software vendors, and where only limited interaction is required between processors.
This will necessitate establishing simple networks based on the concepts (communications and control of high speed networks were part of earlier development), in particular with the use of smart cards for a variety of purposes such as health, financial transactions, personal identification and the like. This would place the concepts in the chips on the cards, in the card readers and in the servers controlling the network.
At the same time the numerical approach should be introduced to the embedded systems arena, by specific industries such as the automotive or aerospace.
Having successfully been introduced at the microcontroller level it could then move to the larger systems, at first integrating with their various operating systems, but then replacing them, as they will become redundant. Again it would best be done by industry (servers, graphics, video etc.) but after successful implementation with the microcontroller world most industries will, by that time, be ready to move.
In order for the this to be done in a controlled fashion by the PC+ industrial groups, which tend to favour propriety in software, it would be useful to establish a working group to oversee the orderly development of the numeric approach,
Free from viruses, worms and identity theft
One of the reasons that worms and viruses continue to exist is that the current software approach is based on computer language requiring a huge infrastructure. The complexity of the operating systems used is such that it is impossible to guarantee that there are no security loopholes. These loopholes are then exploited by nefarious persons to introduce code which is sent around the world, which can cause millions of users systems to be compromised, and which can create a national security risk. Until this vulnerability is erased no user, no company and no country is safe from vicious attacks on its computing lifeblood.
One of the reasons for moving to the numeric approach is the need to get away from this multitude of problems. The numeric approach offers:
- A very small VP which is static, with no opportunity to introduce spurious code if check sums are included
- A highly efficient VP which can be encrypted if necessary
- A static set of elements which numerically describe suites of applications in a form where each number within an element points directly to the VP process required.
- A static set of numeric elements which can be verified through check sums.
- Even in the unlikely event that a spurious number was introduced into a numeric element, without affecting the check sum, applications would abort, due to the critical relationship that exists between each number within the element structure.
The one area where the numeric system does not have full control is in the byte stream code generated by the compile function. Even here, however, the byte code stream does not have access to the numerical elements nor to the small VP machine code. Neither, when multiple applications are running, can the data from one application corrupt the data from another application, with the exception that some applications can share data, and in such circumstances the application developer would have to handle that aspect.
Although the analysis has not yet been done it is believed that the VP is so tiny that it could be proved to be error free. In a similar way so could each of the numeric elements. Numeric elements and the VP are static so that, once verified, there would be little need, if any, for further verification.