, 5 min read

Considerations on GDGs on Linux

1. Introduction. The mainframe knows the notion of GDGs, i.e., Generation Data Groups. These are groups of files, which are handled as almost a single entity.

Some notions. A GDG file consists of three parts in its file name:

<GDG-Base>.G<generation>.V<version>

The limit is the maximal number of generations we want to keep.

Nr. Notion Meaning
1. GDG base common prefix
2. generation number in range 0001 to 9999, default is 0001; also called absolute generation
3. relative generation number in range -9998 to +9998; added to absolute generation
4. version number in range 00 to 99, default is 00
5. limit maximal number of files for a GDG, e.g., 5

Here is an examples of a GDG with base DONKNUTH and limit 3:

  1. DONKNUTH.G0001V00
  2. DONKNUTH.G0002V00
  3. DONKNUTH.G0003V00

Accessing a generation is done using the paranthesis notation on MVS and using relative generations.

  1. So DONKNTUH(0) is DONKNUTH.G0003V00. The "current" generation.
  2. Similarly, DONKNTUH(-1) is DONKNUTH.G0002V00. This is the "previous" generation.
  3. And, DONKNTUH(-2) is DONKNUTH.G0001V00. This is the before-previous generation.
  4. A new generation is created by specifying DONKNTUH(+1). That would be DONKNUTH.G0004V00.

Here is another example of GDG with base DRitchie and limit 8:

  1. DRitchie.g9996v00
  2. DRitchie.g9997v00
  3. DRitchie.g9998v00
  4. DRitchie.g9999v00
  5. DRitchie.g0001v00
  6. DRitchie.g0002v00
  7. DRitchie.g0003v00
  8. DRitchie.g0004v00

The generation number 0000 is not used. In above example DRitchie.g0000v00 will never exist. I.e., there is a jump from 9999 to 0001 in absolute generations.

The version of GDG is of no relevance in our following discussions. Its purpose is to change a specific generation and alter the file. For example, you could change DRitchie.g0003v00 to DRitchie.g0003v01. In that case DRitchie.g0003v00 will be gone. Only DRitchie.g0003v01 remains, and, of course, the other generations.

See Absolute generation and version numbers.

2. Solution sketch. This short memo outlines how this can be implemented with the help of SQLite. SQLite is particularly apt to this task as it provides:

  1. Concurrency processing, i.e., protecting against accidental overwriting of parallel tasks
  2. Keeping all information in a single file

The solution is now to have two sets of files:

  1. A number of data files, which actually hold the data, named xxx.g0001v00, xxx.g0002v00, xxx.g0003v00, etc.
  2. An SQLite file xxx.db for managing the generation, i.e., the actual file with generation number

The version part is kept constant to V00. We stick to the lowercase file suffixes. Mass lowercasing of file names described an approach to lowercase huge number of files.

In addition there will be a single binary gdg, which controls the access to above sets of files. Program gdg is entirely controlled by command line arguments.

gdg is written in C and accesses the SQLite database file and returns a string, which is the file name of the generation in question. As gdg is used in many shell scripts the startup time of gdg is important and other languages are too slow, see Performance Comparison C vs. Lua vs. LuaJIT vs. Java.

gdg takes two input strings via command line:

  1. The name of the SQLite database for the GDG, e.g., DONKNUTH.db
  2. The requested relative generation number, e.g., +2, or simply 2

For the example with DONKNUTH.db and +2, it will return

DONKNUTH.G0005V00

Likewise:

echo $(gdg DONKNUTH.db -2)

will print DONKNUTH.G0001V00.

To create an entire new GDG use command line flag -c and specify GDG base and limit. E.g., creating above example DRitchie with limit 8:

gdg -c DRitchie.db 8

If the limit is omitted, it is assumed to be 1.

Similarly, the limit is changed by specifying command line argument -c. If the new limit is smaller than the previous one, multiple files might get deleted by gdg.

gdg does not use the paranthesis notation from MVS, as paranthesis have a special meaning with the shell, and calling gdg from a shell is the most common use-case.

It should be noted that gdg just reads strings and produces strings. It does not produce any files. However, gdg deletes files! I.e., according the specified limit, surplus files are deleted. Forever. Whenever a positive relative generation is specified all the files which do not match the limit criterion are deleted by gdg.

Deleting the surplus files is done using scandir() and unlink().

gdg can take an optional third argument: a program name. This is just a string stored in the genhist table, see below.

3. Data model. The SQLite database will store below table called genmgt ("generation management").

Nr. Column type nullable Example or meaning
1 base text not null GDG base: file name excluding directory, e.g., DONKNUTH
2 generation int not null current absolute generation, e.g., 73; starts with 1
3 limit int not null total number of generations allowed, e.g., 5; default is 1

As only the file name without path of the GDG is stored, the GDG can be moved around the filesystem.

We might be interested in the history of accesses. I.e., who accessed the GDG with what program or job. For this we can use the table genhist ("generation history"). This table is entirely optional and not required.

Nr. Column type nullable Example or meaning
1 generation int not null historic absolute generation number , e.g., 71
2 uid int not null user-id from getuid()
3 gid int not null group-id from getgid()
4 pgmname text null name of program or job, which accessed the i-th generation
5 atime date not null access time

The following SQL statements will be used:

  1. Create an entire new GDG base: insert into genmgt (...)
  2. Specify relative generation, e.g., (+1):
    • select generation, limit from genmgt
    • update genmgt set generation = ...
  3. Change limit: update genmgt set limit = ...

4. Outlook. Each GDG element might be a so called partitioned dataset. In UNIX jargon this would be a directory. I.e., so ideally gdg can also handle a GDG of directories.

5. Effort estimation. Program gdg can be written in less than three mandays.