Stallmann, Brglez, Ghosh: Data Formats

The actual data used and generated to obtain the results reported in the paper can be reconstructed using the random number seeds in the file SEEDS_ISO (for isomorphic classes) or SEEDS_RND (for random graph classes). Each line of these files has the name of a class followed by the three 16-bit unsigned integers used to construct the seed for the IEEE 48-bit random number generator (rand48 library).

The data directory (after the desired bigraph classes are generated or the top-level directory in the other archive) contains a subdirectory for each bigraph class. These subdirectories are named as follows (using the organization of Table 2 in the paper). In all cases k is a two-digit encoding of the rank, padded with a 0 when needed.

The notation GB refers to the fact that these bigraphs are "balanced", i.e. b = 1 (each layer has the same number of nodes, off by one if there's an odd number). The "rnd0" really means "d = infinity" (or, originally, "d is not relevant"). A class iso(R), where R is a random class, is denoted by adding _scr (for "scrambled") as a suffix to the name of the directory for R.

Before any experiments are executed, each class directory with name C contains two files for each subject, one is C_s.dot, where s is the subject sequence number, encoded in 4 digits padded with 0's, and the other is C_s.dot.ORD. The first describes the bigraph itself; the second describes the permutation of each layer. C_s.dot contains a description of the bigraph in simplified dot format (based on the paper by Gansner et al.; our parser can deal with more general formats, but will ignore any nonessential information about the geometry of nodes).

Dot format.

The expected format of our dot files is

where each statement defines an edge

The graph name and node names must be legal C identifiers (begin with an alphabetic character, have any number of alphanumeric characters, and underscore counts as an alphabetic character). Arbitrary white space is allowed between and within statements. Comments delimited by /* ... */ count as white space.

Graphs generated by our programs always have a comment giving the three short integers of the rand48 seed right after the graph name. The graph name is usually the same as the file name (with the .dot extension).

Ord format.

An ord file contains a sequence of layer descriptions separated by white space. Each layer description is of the form where the layer_number is an integer (0 or 1 in the case of bigraphs, but the notation can be generalized to more than 2 layers) and node is the name of a node on the given layer. The current implementation requires that all nodes of the layer be present in the list and occur exactly once. The list specifies the left-to-right ordering of nodes on the given layer.

Anything between a # and the end of the line is interpreted as a comment. Arbitrary white space and/or comments can occur before the {, between nodes, or on either side of a layer description.

Other data files.

The program that executes our treatments generates an ord file with name graph_name_trt.ord where t is a 4-digit encoding of the treatment number. The execution scripts delete this file after it is no longer needed, but they can be edited to omit this step.

When a treatment is executed on subjects of a class, the execution scripts create a file TRt in the class directory, where t is a 2-digit encoding of the treatment number. The file has two columns (tab-separated), with graph names in the first column, and number of crossings after the treatment was applied in the second.

Summary files, created by the script sbg_summary, and containing summary information for a given class, have names of the form SUMMARY_class-directory. There are also files that record information about total time and number of iterations, called TIMING_SUMMARY_class-directory and ITR_SUMMARY_class-directory, respectively.

Files generated for comparison with the Jünger-Mutzel paper begin with the prefix GN_JM_. This is followed by S_size for the sparse graphs, where size is the number of nodes per layer, or 10_density for 10+10 graphs of increasing density. Similar graphs with prefix GC_JM_ are generated, as are most of our graphs, in such as way as to ensure connectedness.

There are also a few classes with prefix GN_ replacing the usual GB_ in the random classes. These, like the graphs used in the Jünger-Mutzel paper, use a simpler random-graph generation technique that does not ensure connectedness.


Matthias Stallmann ( Matt_Stallmann@ncsu.edu)
Last modified: Mon Mar 19 11:53:28 2001