Developing Applications in DATATRIEVE
DATATRIEVE as a programming language

Joe H. Gallagher, Ph. D.

We use DATATRIEVE because it is (or has most, if not all, of the properties of) a fourth generation language. The commands, statements, or clauses such as READY, STORE, MODIFY, PRINT, REPORT, PLOT, CROSS, etc. give DATATRIEVE its power and productivity over third generation programming languages.

But DATATRIEVE also has just about all of the fundamental language constructs that are available in third generation languages. Like FORTRAN or other third generation languages, DATATRIEVE has language structures which correspond to the concepts of a character set, variable names, data typing, assignment statements, specification statements, control statements, subroutines and functions, input/output statements, and format statements.

To really see how very much alike some of the programming constructs in DATATRIEVE are to FORTRAN (a widely know and used third generation language), I would like to compare them on a construct by construct basis.

The characters sets of VAX-DATATRIEVE and VAX-FORTRAN are almost the same; upper and lower case characters, digits, and most special characters are used in both. Only the ampersand "&" used in FORTRAN to specify an alternate return label in a subroutine argument list; the vertical bar "|" used in DATATRIEVE as a concatenation operator; the question mark "?" used in DATATRIEVE as an edit-string for missing values; and the at sign "@" used in DATATRIEVE to invoke an indirect command file differ between the two languages.

Variable names (call symbolic names in FORTRAN-land) are also almost exactly the same. Both languages allow up to 31 character names as a string of letters, digits, and the special characters dollar sign ($) and underscore (_). However, DATATRIEVE also accepts the hyphen (-) which it interprets as and translates to an underscore. Both languages required that the name begin with a letter and end with a letter or digit; documentation for both languages strongly suggest that users avoid variable names which contain the dollar sign which are reserved to Digital. Also in both languages, it is possible to used a keyword name like FIND or END (both of which occur in FORTRAN and DATATRIEVE) and ignore the admonition that a variable name cannot be a keyword and sometimes get away with it. But in both languages it is very poor programming style and practice to use a variable name which is the same as a keyword; it will nearly always come back to haunt you with unpredictable results or debugging problems later.

Data typing statements have correspondences between FORTRAN and DATATRIEVE. Type declarations in FORTRAN have direct analogies in PICTURE clauses or USAGE clauses in DATATRIEVE record definitions or variable declarations. In fact, the data types

       DATATRIEVE                     FORTRAN

         BYTE                          BYTE 
         WORD                          INTEGER*2 
         LONG, COMP, or INTEGER        INTEGER*4 
         REAL, COMP-1                  REAL*4 
         DOUBLE, COMP-2                DOUBLE PRECISION 
         G_FLOATING                    REAL*8 /G_FLOATING 
         H_FLOATING                    REAL*16 
         PIC X(N)                      CHARACTER*N

have exact correspondences. The data types LOGICAL and COMPLEX in FORTRAN and PACKED, ZONED, QUAD, and DATE in DATATRIEVE do not have exact matches, but QUAD and DATA can easily be handled in FORTRAN as an integer array of dimension 2.

Specification statements in FORTRAN such has BLOCK DATA, COMMON, DATA, DIMENSION, EXTERNAL, IMPLICIT, INTRINSIC, etc. have no real analogies in DATATRIEVE. Specification statements in third generation languages have mostly to do with variable storage, allocation, or initialization. In fact, it would be most disappointing if a fourth generation language like DATATRIEVE wasn't smart enough to manage this sort of thing without help from a programmer.

Explicit assignment statements in DATATRIEVE work conceptually in exactly the way as assignments statements work in FORTRAN and almost all other third generation languages. An expression on the right-hand side of the equals sign is computed and assigned to the storage location named on the left-hand side of the equals sign. There are some very important differences in the internal details of how certain numeric expressions are evaluated. FORTRAN has a set of very rigorous rules for evaluating intermediate or temporary numeric values. In contrast, DATATRIEVE converts all numeric values to DOUBLE PRECISION, evaluates the expressions, and then converts the final results to a format and precision determined by a set of internal intelligent, heuristic rules. These heuristic rules are, in some cases, slightly different in DATATRIEVE-11 and VAX-DATATRIEVE.

In DATATRIEVE there is a second kind of assignment statement; it is the implicit assignment statement implement by a

         DECLARE variable COMPUTED BY . . .  .

statement. Here, the variable is evaluate whenever it is referenced. Its data type is implied rather than specified; DATATRIEVE make some intelligent assumptions about the data type of the declared variable.

Control statements are the heart and soul of a programming language. DATATRIEVE has been implemented with a modicum of control statements; most third generation languages are inundated with control statements such as GOTOs which encourage non-structured programming techniques. The carefully planned absence of such control statements in DATATRIEVE guides programmers into the happy realm of structured programming.

The statement block is implemented in DATATRIEVE by the BEGIN-END block. FORTRAN uses the IF THEN, END IF block or the DO or DO WHILE, END DO implied block structures. The allowable contents of the program block in FORTRAN and DATATRIEVE are decidedly different. In FORTRAN, almost any other FORTRAN program statement is allowed within the statement block. In DATATRIEVE, commands (things which change the DICTIONARY or modify the environment) and certain statements such as FIND and SELECT may not be used within a BEGIN-END block. This restriction on the allowed contents of a BEGIN-END block in DATATRIEVE may, at first, appear to be a harsh and artificial restriction; however, it is no real impediment to the cogitative DATATRIEVE programmer.

The IF-THEN-ELSE, conditional execution of a statement or statement block, is implemented in a simple, straight-forward way in DATATRIEVE. The corresponding logical branching in FORTRAN is somewhat more complex and is much easier to used in the N-way, rather than 2-way, condition. The format of the IF-THEN-ELSE in DATATRIEVE is

   IF boolean-expression [THEN] statement-1 [ELSE statement-2]

When the IF-THEN-ELSE is combined with the BEGIN-END block, the format is

         IF boolean-expression THEN BEGIN

Note that the ELSE keyword appears in the form of END ELSE BEGIN. One of the common DATATRIEVE programming errors occurs when inexperienced DATATRIEVE programmers try to start a line of DATATRIEVE code with an ELSE. The ELSE may occur at the end or in the middle of a line, but may not begin a line.

There are three looping structures in DATATRIEVE, the REPEAT, WHILE, and FOR statements. The REPEAT N BEGIN ... END in DATATRIEVE exactly corresponds to DO I=1, N ... END DO in FORTRAN. The DO construct in FORTRAN is more flexible, however. The WHILE (condition) BEGIN ... END in DATATRIEVE also exactly corresponds to DO WHILE (condition) ... END DO in FORTRAN. The FOR statement in DATATRIEVE executes the following statement or statement block once for each occurrence of the record stream of the FOR statement. The FOR statement has no analogue in any third generation language; in fact, it is the power of such statements as the FOR statement which makes DATATRIEVE in the class of fourth generation languages.

The ABORT is the last control statement in DATATRIEVE. The action of the ABORT statement is quite complex; there is, in fact, about two pages of discussion about what are the results of the ABORT statement in the VAX DATATRIEVE Reference Manual. Depending upon whether SET ABORT or SET NO ABORT is in effect, DATATRIEVE will return to command level or only abort the current statement. The ABORT statement in DATATRIEVE works a little like RETURN or STOP in FORTRAN or even a little like a NEXT statement in RATFOR.

Subroutine and functions are common structures in FORTRAN. A procedure in DATATRIEVE is a little like a subroutine; however, it is really more like a macro call which is expanded into in-line code. Of course, a procedure does not have any formal mechanism for passing arguments nor is there any facility for dummy arguments. A procedure in DATATRIEVE is really more like an internal subroutine in interpreted BASIC where the code interpreted and executed in-line. DATATRIEVE has a rich collection of functions. The functions all begin with FN$ and include mathematical, trigonometric, string, time and date, and process and environmental functions. More advanced users can create new functions which can be called from within DATATRIEVE.

Some of the input/output statements and format statements have direct correspondence between DATATRIEVE and FORTRAN. The PRINT statement in DATATRIEVE corresponds to PRINT and WRITE in FORTRAN and the EDIT-STRING or FORMAT USING in DATATRIEVE corresponds to the FORMAT in FORTRAN. But there are many statements in DATATRIEVE which have correspondences in complete applications in FORTRAN; the REPORT statement, PLOT statement, or the modification of a field by READY FOO; FOR FOO; MODIFY USING PRICE = PRICE * 1.05 require hundreds, if not thousands, of lines of FORTRAN code.

From this feature-by-feature comparison of the language constructs of DATATRIEVE and FORTRAN, it is clear that DATATRIEVE does indeed have all the tools necessary to make "programs" in the third generation sense. But what are the types of applications in DATATRIEVE where one should use third generation programming techniques?

One can answer this question by recalling what DATATRIEVE does best; DATATRIEVE was designed to process record streams. This means that when DATATRIEVE is dealing with a record stream, one record at a time, DATATRIEVE has natural and powerful ways of managing this situation. However, if one needs to manage more than one record at a time, third generation programming is needed.

Consider two examples using records with the following record definition:

         define record foo-record using
         01 foo-rec.
            03 name pic x(30).
            03 address pic x(30).
            03 city pic x(20).
            03 state pic xx.
            03 zip pic 9(5).

For the first example, suppose that the data for this mailing label application has been accumulated over a long period of time by several different data input clerks. There are now duplicate records in the data base. How does one find the duplicate records (assuming that NAME is the field that determines the duplicates)? Well, one could do the following:

         ready foo
         find foo
         sum 1 by name

and then look for the cases where NAME occurs two times. But if the FOO data set has a large number of records this will become a very, very tedious scan of the output. A better way would be to use third generation programming techniques to let DATATRIEVE find the duplicates. Consider the following procedure

         define procedure find_dups
         ready foo
         declare name1 pic x(30).
         declare name2 pic x(30).
         name1 = " "
         for foo sorted by name begin
             name2 = name
             if (name2 equal name1) then begin
                 print foo-rec
             name1 = name2

which will list the second and subsequent duplicate records.

For the second example, consider the creation of three-up mailing labels (sorted by zip code and then name) which are 3.5 inches wide and 15/16 inch high by three across. A procedure like

         define procedure do_labels
         ready foo
         set columns-page = 132        ! print line wider than 80
         ! first label
         declare abuff1 pic x(30).
         declare abuff2 pic x(30).
         declare abuff3 pic x(30).
         ! second label
         declare bbuff1 pic x(30).
         declare bbuff2 pic x(30).
         declare bbuff3 pic x(30).
         ! third label
         declare cbuff1 pic x(30).
         declare cbuff2 pic x(30).
         declare cbuff3 pic x(30).
         ! counter
         declare cnt usage is integer.

         counter = 0
         for foo sorted by zip, name begin
           counter = counter + 1
           if (counter gt 3) then begin
             counter = 1
           if (counter equal 1) then begin
             abuff1 = name
             abuff2 = address
             abuff3 = city||", "|state|"  "|zip
             bbuff1 = " "              ! these global variables
             bbuff2 = " "              ! are blanked out because
             bbuff3 = " "              ! the FOO data base may
             cbuff1 = " "              ! not contain a number
             cbuff2 = " "              ! of records which are
             cbuff3 = " "              ! evenly divisible by 3
           if (counter equal 2) then begin
             bbuff1 = name
             bbuff2 = address
             bbuff3 = city||", "|state|"  "|zip
           if (counter equal 3) then begin
             cbuff1 = name
             cbuff2 = address
             cbuff3 = city||", "|state|"  "|zip
             print abuff1(-),col 36,bbuff1(-),col 72,cbuff1(-)
             print abuff2(-),col 36,bbuff2(-),col 72,cbuff2(-)
             print abuff3(-),col 36,bbuff3(-),col 72,cbuff3(-)
             print skip 2
           end                         ! end of for loop
         if (counter equal 1, 2) then begin ! print odd last ones
           print abuff1(-),col 36,bbuff1(-),col 72,cbuff1(-)
           print abuff2(-),col 36,bbuff2(-),col 72,cbuff2(-)
           print abuff3(-),col 36,bbuff3(-),col 72,cbuff3(-)
           print skip 2

uses a large number of third generation programming statements to handle of the situation where there is one or two records left over at the end of the file.

A third excellent example of the third generation language capabilities of DATATRIEVE can be found in the transcription of the Wombat Magic session from the 1987 Spring DECUS Symposium in Nashville. Doug Cropper, one the developers of VAX DATATRIEVE, presented a method of calculating house mortgage or car payments in DATATRIEVE. The presentation appears in the August 1987 issue of the DECUS U.S. Chapter SIGs Newsletters, Volume 2, Number 12, on pages DTR-7 and 8. Of course, the calculations could have been done just a easily in BASIC or some other third generation language.

The last example of using third generation programming techniques in DATATRIEVE is the calculation of a mathematical function by infinite series methods. The example I have chosen is the calculation of the Error Function. The Error Function is an integral of the Gaussian or Normal probability density function; it is used in determining cumulative probability. The Error Function is calculated by an infinite series which looks like

 ERF(Z) = (2/SQRT(PI))*(Z - Z^3/(3) + Z^5/(2*5) - Z^7/(3*2*7) + 
         . . . + (-1)^N * Z^(2N+1)/(N!*(2N+1)) + . . .)

where the "^" means raised to the power of, and the "!" means factorial. A procedure to calculate this function is given by

! N must be declared as DOUBLE outside of ERF and 
!   initialized to 0
! x must be declared as DOUBLE and 
!   initialized to the input argument
! y must be declared as DOUBLE.
! s must be declared as DOUBLE.
S = 1
WHILE (FN$ABS(S) GT 1E-11) BEGIN  ! the 1E-11 control precision
    TWOBYROOTPI = 1.1283791671    ! 2.0/fn$sqrt(PI)
    SIGN = 1.0 
    S = X
    SIGN = SIGN * -1.0
    S = S * X * X * (2*N - 1)/(N * (2*N + 1))
  N = N + 1

It is used in the following way:

         DTR> declare x usage is double.
         DTR> declare n usage is double.
         DTR> declare s usage is double.
         DTR> declare y usage is double
         CON> edit-string is 9.9(10)
         CON> query-header is "Error"/"Function" .
         DTR> n = 0
         DTR> x = .25
         DTR> :erf
         DTR> print y

It is correct to 10 decimal places!

I hope the examples have illustrated that when we use a fourth generation programming language like DATATRIEVE we can still use, when necessary, the techniques of third generation languages that have served us so well in the past.

Originally published in the newsletter of the DATATRIEVE/4GL SIG, The Wombat Examiner and 4GL Dispatch, Volume 9, Number 4, pages 2-7; in the Combined SIGs Newsletters of Digital Equipment Computer Users Society, Volume 3, Number 4, December1987.
Joe H. Gallagher, Ph. D.