7 - OIL INDUSTRY PART 2
by BERNARD A HODSON
We come now to one of the significant developments preceding the growth of the GENETIX concept. At this time in the oil industry there was a shortage of people who could programme the computer. Imperial Oil, at considerable expense (about $750,000 just to enter the data, which was a major expense in those days), had developed what at that time was the largest data base system in the oil industry, if not the world. It consisted of 19 categories of data relating to all oil and gas wells in Western Canada, covering things like rock porosity, rock permeability, chemical analyses of water encountered while drilling, type of oil show, pressures encountered, and eventually production volumes of oil or gas. Considerable effort was expended in trying to establish geological names for rock formations that were consistent across Canada (no small task in itself, as they varied from Province to Province), and how to handle geophysical data.
As an oil well is a legal entity and may produce oil or gas at several levels underground it was necessary to develop a well identification system that would be acceptable in a Court of Law. To complicate matters further the drilled well is not necessarily straight (sometimes they are deliberately made to slant), and there may be oil or gas producing formations underground that are quite a geographic distance from the co-ordinates of the well head at the surface. Life was further complicated by the fact that there had been several different survey methods in Canada since the Country was formed. In Manitoba they had the Red River system, where surveys were based on the current position of the river at the time of the survey (unfortunately the Red River frequently changes its flow). In Saskatchewan and Alberta they had the LSD system (Legal Sub Division) which divides the territory in to segments of six miles square, with 36 sections of one mile square, which can be divided in to quarter sections. In British Columbia (BC) they had yet another survey method, called the BC Centizone survey system.
In order to get an acceptable legal identifier (needed because of the high returns on oil properties) it was necessary to go through all the Canadian archives, from when the Country was legally born to the present day, and carefully check the survey records (A number of discrepancies were found). All this was necessary because every data base has to have some way of identifying where the data comes from, or where it can be found. Eventually an acceptable identification method was established, also cross links between the different geological formation names.
Considering the investment required the data base was established without any real knowledge of how it was going to be used. At first it was used maybe once a week and then as word got around about its usefulness, once a day, then several times a week, continuing to increase in data retrieval requests as managers found they could get useful data from the "Well Data System" or WDS as it came to be known.
The problem was that we did not have enough programmers to handle all the data retrieval requests and do other work at the same time. At this time I said to one of my staff that in my experience there were only seven or eight standard functions in any computer program and that if we could generalise those functions we might ease the pressure on our staff. I gave him the task of examining as many of our computer programs as he could. Six weeks later he came back and said my surmise was wrong. Inquiring how much higher than eight the number of basic functions might be he came back, much to my surprise and delight, to say there were only three. From this we developed a generalised program called GIRLS (Generalised Information Retrieval and Listing System) which eventually handled the bulk of retrieval requests from the WDS, and which was so user friendly (to use a modern phrase) that secretaries to managers, not programmers, could now generate quite complex retrieval requests. As an example, as the system became more sophisticated, they could ask for a contour map of a subsurface area showing all locations with a particular geological formation that had oil shows. We also developed at the same time a Generalised Edit and Maintenance (GEM) program that could be used to create new data files. This did not get much use, however, as most of our files had by then been created.
Up to this point retrieval of data from files had been handled by a program produced by the manufacturer called a Report Program Generator (RPG) but this was not user friendly and required people to "program" each report, a totally boring occupation for the highly sophisticated people we had on our staff. They were highly pleased to have this chore taken from their task lists by the GIRLS program.
This development had a major impact on structure I later developed for GENETIX. Our philosophy in this early work was to generate computer code from the input documents used to generate a GIRLS request. The reason for this was partly because an interpreter of the GIRLS statements might have been relatively slow and partly because we were gaining experience. On some of the requests the amount of machine code generated would exceed the memory availability of the system, and the request had to be split in two.
This led to my later decision with GENETIX to eliminate all machine code generation.
Although the WDS had taken a fortune (for those days) to develop it paid off very handsomely. At that time in Alberta (in the 60s), there was a glut of oil and companies were limited in the amount of oil they could produce, having esta blished "quotas". Every six months companies were allowed to request the Courts for an increase in their quota, and other companies had only a few weeks in which to object. Every six months Shell, Mobil, Texaco and others would submit a request for a quota increase and we would then spend considerable computer time playing this request against our own requirements.
At that time I believe we knew more about our competitors oil and gas wells than they did. In any case the figures we produced from the WDS refuted the competitor claims and saved our company hundreds of millions of dollars each year had our quota been reduced and our competitors increased.
The Province of Alberta also had established a smaller data base which contained information on production and similar items. This was the "legal" data base for the Province and every oil company had to submit data on a monthly basis that was entered in that data base. The data was submitted on paper and was then keyed by the Province. This type of operation inevitably led to discrepancies between our WDS and the legal data base of the Province, so we needed to keep our own data as well as the erroneous data within the Provincial data base. This is one of the reasons I am still sceptical about government data bases, but for additional reasons which I will explain in a later chapter. You can almost guarantee that any data base created by government organisations, will have at least five per cent, if not more, of its data erroneous.
In order to be of the most value the WDS had to be well documented, so that any user could be assured that the data coming from it was accurate and reflected the true state of affairs. Well locations had to be accurate, the units used understood (feet, not metres, pressure and flow parameters defined, geological names correct etc.). Because of this and for other obvious reasons I developed a set of documentation standards for every program under my jurisdiction, but in particular for the WDS. The people that documented the WDSalong the lines I proposed did an excellent job. These standards were asked for, and adopted, by the US affiliates, so I can only assume that their documentation had been extremely poor, a state, unfortunately that has prevailed with many companies throughout the history of the computer industry and which, if anything, is getting worse, in spite of advances in software.
When the IBM 1410 was delivered to us it was one of the first production systems and it looked as though IBM had hired a bunch of people with no computer experience to develop its compilers and operating systems. For those without a computer background, it may be useful to say here that a compiler converts the program written by a computer programmer into machine code containing 1s and 0s (which is the basis of all computing) and the operating system then runs the application in machine code.
A typical COBOL compile, even for a simple application, took three hours, so we did one compile and then made all changes in the language of themachine. When we examined the generated code we saw routines which were accessed frequently, but which did absolutely nothing.
Although our specific interests were scientific the wisdom lacking people in our US head office had decreed a business oriented machine be purchased. The FORTRAN compiler wasn't quite as bad as the one for COBOL, but was still pretty awful. It was so bad, in fact, that I asked the company if, on my own time, I could develop a FORTRAN compiler, to which they agreed, granting me free time on the computer. This also led later on to the way in which certain of the GENETIX routines were implemented. We had our revenge later, after I had left Imperial, when the US company was in such a computing mess that one of my staff was given a carte blanche to sort out the mess and get them back on track.
A few words on FORTRAN compilers are useful. In most cases a FORTRAN compiler was a multiple pass affair, each pass generating a pseudo language output, which then had to be re-read to generate yet further pseudo output. At this point in time there had never been a one pass FORTRAN compiler that generated machine code immediately, many people saying that it could not be done. The compiler I developed was the world's first one pass FORTRAN compiler.
Even years later a PhD in computer science from Canada’s foremost University in computer science, the University of Waterloo, said it could not be done, which made me doubt the knowledge of the people who had taught him computer science.
The final pass of a compiler was usually a code optimisation. In my case a limited amount of optimisation was done on the single pass. Most compilers at the time did not process business data very easily, so the compiler I wrote incorporated some features for business data handling for the first time,which were not incorporated into the FORTRAN standard till ten years later.
Most compilers at the time tended to be what we called a kludge, a whole slew of code with very little useful segmentation. An acquaintance who has seen the structure of current operating systems such as Windows, says little as changed.
The secret to the success of the compiler was the use of what are known as "modules' which did specific tasks. These modules were relatively easy to write and test on their own. Having tested them on their own then you knew they would work in the full compiler (This principle has been incorporated in the current GENETIX).
Others were experimenting with "modular programming" at the time and I was invited to present a paper at the first conference on modular programming organised by Yourden and Constantine in Boston, who later became well known for their books and knowledge on what became known as "structured programming". My modular approach developed for the compiler was later to lead to the concept of the "software genes" of the GENETIX paradigm.
One of the shortcomings of any Fortran compiler at the time was an inability to handle files or data prepared in the normal day to day operations ot a company. The "object" programs generated by the compiler required that data on tape be in fixed length blocks of 120 or 132 characters We had large amounts of technical data stored in the WDS of variable length and it was inconvenient to try and break this up in to fixed length segments. I therefore had to develop techniques for handling variable length data elements that could be handled by Fortran statements. The techniques I developed later became a fundamental part of the way that GENETIX handles data elements.
Another feature, the one which ten years later became a FORTRAN standard, related also to the processing of data files. In the WDS we often wanted to search for particular records and bypass those we were not interested in. The typical FORTRAN compiler of the day would read the data at run time and interpret its format, a very time consuming process on what by today's standards were very slow processors. The technique developed, later adapted in part to GENETIX, was to analyse the format of therecord at compile time and generate an easily interpreted format that would speed up the operation of data search at run time. GENETIX, which does not need a compiling operation, later had some benefit from the approach used.
Another advantage of the modular approach developed was that new features could be readily added to the FORTRAN compiler, without going through an horrendous testing procedure. Again, in later years, this became a standard feature of GENETIX. As one example was an ability to compare alphabetic strings, which was difficult to do in compilers at that time.
The main reason for being able to create a one pass compiler was a technique I developed to handle forward looking statements. A typical statement might read GO TO ALPHA where ALPHA was somewhere in an area of the program that had not been read (the reson many said made a one pass compiler impossible). All I did was set up a simple transfer table and, when ALPHA was reached, placed its location in the transfer table. It wasted allof one computer instruction.
The ideas from this compiler were presented at the annual meeting of GUIDE, an organisation of users of IBM business class machines. This was well received and no doubt reached England, as will become evident in a later chapter, when I describe an invitation to write compilers for an atomic energy research establishment using an advanced computer system.
Some time earlier a new University had been created in Ontario called the University of Waterloo. It had recently established a Department of Computer Science and I was asked to present a paper on the FORTRAN compiler on their first ever seminar. There was a need at that time for a speedy compiler that wasn't overly fussy about optimisation, so that students could generate computer programs quickly and, once marked, throw them away. Following the presentation they indicated they would like to develop a "quick and dirty" compiler that could speed up student processing. I was asked if I would like to work on the compiler withthem but could not commit myself at the time.
Following my presentation on how such a compiler could be written they developed WATFOR, which was very successful and used by many Universities throughout the world. This was followed by WATFIV, WATBOL (for COBOL) and other offshoots. Regrettably the University never acknowledged the contribution I had made to their success, which I considered highly unethical.
Unfortunately there is a not insignificant minority in academic circles who like to steal other people's ideas. When I was later at the University of Manitoba the Head of the Mathematics Department at the time said that, in order to protect his intellectual ideas, he circulated any significant work to enough people (about 100 or so), that no one could then claim his ideas. Unfortunately with the University of Waterloo I was too niaive, but I have since followed his example.
One other development is useful to record at this time, which had some bearing on GENETIX at a later date. Imperial had assembled a huge amount of technical data. There were technical reports, a slew of seismic and geological records, production records, partnership agreements, technical correspondence and what have you. It was necessary to develop some way of meaningfully accessing this data. At that time the person heading up the Technical Services group developed what was known as a "double dictionary" (similar to what we call today in the computer industry an "inverted list"). This involved the development of a set of key retrieval words and then placing alongside those words the names and locations of all documents containing that key word.
The document was generated in two parts. To search for documents containing several key words you first saw the list of documents with the first key word and matched it with the list of documents associated with the second key word. This usually shortened the list. You then did this with the third key word and so on. The developer of the scheme made many presentations on this technique and invariably came out with his favourite expression comparing data with information. He would state that Gina Lollobrigida had dimensions x,y,z which was data but to say that Gina was a "living doll" was information. It always made his point.
He asked us if we could computerise the double dictionary approach, which we did, storing the dictionaries on magnetic tape. This method of handling data was adopted world wide by what became the Exxon Corporation, the media of storage being expanded first to magnetic disc and then to optical disc. The developer was moved to the US to head up the operation, followed shortly by one of my staff who had developed the computer programs.
This type of activity was a pleasure and also created its own difficulties. Our technical group was so highly regarded that we were asked to trouble shoot for Exxon around the world. We received so many invitations that we had to learn to say "no". One went to Peru for a while, another to Venezuela, others went elsewhere. We declined invitations to go to the Middle East and Europe, because we had our own company work to do. After I left Imperial, as I have mentioned earlier, one of my former staff was given "carte blanche" to go and clean up the computing mess within the Exxon empire, and apparently it was a real mess. Contrary to popular belief many of the largest companies had, and still have, bureaucratic morasses in their computer operations.
Software development has now been through three cycles and at the time in question we were still in the first cycle. Nothing much has been learned from the three cycles and we are still making a mess of software development in what is now the fourth cycle.
The first computers were very slow by today's standards and had very little memory. In consequence we had to be very innovative and develop small and very compact application programs which, nevertheless, did the job they were designed for. As an example one of our South American affiliates developed a 700 byte input/output control system (IOCS) which handled all the tasks for reading from keyboards, reading and writing to magnetic tape drives, and generating output for printing. This compared favourably with the far more resource consuming IOCS systems generated by the computer manufacturer. The GIRLS program we developed was far more versatile and much smaller than the lumbering Report Program Generator developed by the manufacturer.
As the industry developed certain components became cheaper, and more memory was made available. Rather than use these extra resources judicially programmers tended to become sloppy and used the extra resources unwisely, leading to irresponsible use of those resources. We thus had the build up to today's monstrous and unreliable operating systems, as well as the bloated application programs that are so prevalent.
The second cycle saw the advent of orbiting satellites, followed by the introduction of personal computers. Again the resources available for computing were limited and application developers had to be innovative. But as hardware became cheaper and more expansive the same sloppiness developed. The third cycle with hand held devices is going the same way.