How to summarize data

Joe H. Gallagher, Ph. D.


When information is stored within a DATATRIEVE domain in many records, one often needs to summarize data by combining information from records with like values of a break field.  In affect, one would like to be able to do

report a in data-domain ... sorted by break-field
...
at bottom of break-field STORE b in summary-domain using begin
    summary-field      = break-field
    summary-count      = count
    summary-amount     = total amount
    end
end-report

Well, of course, this would be nice, but it doesn't work!  Such a feature (code in darkred) is within the syntax of the DATATRIEVE language, but it can not be using within a REPORT statement.  I did submit a PIR (product improvement request) asking that such a feature be implement with the following syntax.

summarize a in data-domain ... sorted by break-field
...
at bottom of break-field STORE b in summary-domain using begin
    summary-field      = break-field
    summary-count      = count
    summary-amount     = total amount
    end
end-summarize
 
There are two techniques for accomplishing such a data summarization using existing DATATRIEVE syntax.  The first method is to write the data into a text file using the report writer.  Then, create a domain whose file is the report writer output file.  The second method is to use a FOR loop and temporary variables to simulate the work of the report write.  Then, the data is STOREd into the summary domain.

The data-domain is used in each example.  A portion of the record definition of this domain would look like:

define domain data-domain using data-domain-record on ... ;
define record data-domain-record using
01 data-domain-rec.
   03 ...
   03 break-field ...  .
   03 amount ...  .
   03 ...
;
 

Technique 1: Report Writer file

Use the Report Writer to summarize the data and write it into the file TEMP.DAT using

report data-domain sorted by break-field on TEMP.DAT
set no report-header
set lines-page = 99999
at bottom of break-field print col 1, break-field (-),
    space 0, count(-) using 9(5),
    space 0, total amount(-) using 9(8)
end-report

Then, a domain to read the summarized data is:

define domain summary-domain using
    summary-domain-record on TEMP.DAT;
define record summary-domain-record using
01 summary-domain-rec.
   03 break-field ...  .
   03 summary-count pic 9(5).
   03 summary-amount pic 9(8).
;

One of the disadvantages to this technique is that the report writer may create some blank records which start with a space or a form feed at the beginning or end of the file TEMP.DAT.  One must, therefore, remember to explicitly exclude such records using an RSE (record selection expression) of the form

report summary-domain with
    break-field not starting with " ", fn$char(12) ...
 
 

Technique 2: FOR loop and temporary variable with STORE

To simulate what the report writer does, one needs a global variable for each break field and
a global variable for each value to summarize and a flag to sense the first time in the FOR loop.
 

define domain summary-domain using
    summary-domain-record on summary-domain.dat;
define record summary-domain-record using
01 summary-domain-rec.
   03 break-field ...  .
   03 summary-count    usage is long.
   03 summary-amount   usage is long.
;

declare t_break_field ... .
declare t_count      usage is long.
declare t_amount     usage is long.
declare first-time   pic x.

ready data-domain shared read
define file for summary-domain;
ready summary-domain shared write

t_break_field = " "
first-time = "Y"

for a in data-domain sorted by break-field begin
    if (break_field ne t_break_field) then begin
        if (first_time eq "Y") then begin
            first_time = "N"
            end else begin
            store b in summary-domain using begin
                break-field   = t_break_field
                summary_count = t_count
                summary_amount= t_amount
                end
            end
        t_amount    = amount
        t_count     = 1
        end else begin
        t_count = t_count + 1
        t_amount= t_amount + amount
        end
    t_break_field = break_field
    end
store b in summary-domain using begin
    break-field   = t_break_field
    summary_count = t_count
    summary_amount= t_amount
    end
 
Should there be more than one break field and each field is a character string (PIC X(n)), then one can easily extend the code above using

declare super_break_field computed by
    break_field1 | break_field2 | . . . | break_fieldN .

If one or more of the fields you wish to break on is a binary value (USAGE IS BYTE, INTEGER, LONG, QUAD, etc.), then you will need to make comparisons on each of the individual break fields.  When any of the break fields has changed, then store a summary record.  The reason that one can not easily use the super_break_field method above, is that once you have concatenated character string variables and binary variables, it is difficult (it requires some user-written functions) to recover the value of the binary variables from their character string representation.


An originally web publication.  February 15, 1998.

Joe H. Gallagher, Ph. D.
dtrwiz@ix.netcom.com
MAILTO 

BACK  Back