Title: ManyBugs/README
Description: README for the ManyBugs dataset 
Author: Claire Le Goues, clegoues@cs.cmu.edu (borrowing somewhat from
unpublished text produced by Neal Holtschulte)
Date: June 1, 2014
Updated: January 6, 2016
Version: beta-2.0

*********
* 0. LICENSE
*********

See toplevel directory for license information.


*********
* 1. OVERVIEW
*********

This directory contains experimental scenarios and baseline results of running
AE, GenProg, and TSPRepair for the 185 defect scenarios in the ManyBugs dataset.  This
README is a roadmap to the materials in this directory.  More information is
available with the article; in other READMEs in the autorepairbenchmarks
package; and via email.  Always feel free to contact me/us with questions,
comments, complaints, pointers, and suggestions.  However, please read this
entire README first, because there are pointers throughout that might help you
make quick sense of some of the peculiarities of the data and materials in the
directory.

NOTE: I use the names of the three tools somewhat interchangably throughout this
README because they're all built on top of the same internals; running each
involves passing different arguments to the same executable.

The ManyBugs directory contains two subdirectories: scenarios/ and results/

*********
* 2. SUMMARY DATA
*********

bug_data.csv contains information on all of the defects in the ManyBugs dataset.
The data is also available in each of the scenario tarballs (see below), but you
may find it helpful to look at the summary data.  The first column is the
scenario name, corresponding to the tarball/directory name. Each subsequent
column is roughly self-explanatory (and labeled).  Additional detail on the
content currently in this spreadsheet is explained below in the context of the
bug-info provided with the tarball. 

*********
* 3. SCENARIOS
*********

scenarios/ contains the defect scenarios, packaged in .tar.gz form.  Each
scenario is named following a convention indicating the software system name and
revision identifiers in question.  For software associated with git
repositories, we include the date of the fix commit in the scenario name to
impose an easily-identifiable temporal ordering on the defects in each program
set.  

NOTE: because we parallelized development and experiments, there are cases where
the filenames corresponding to results on git-based scenarios in the results/
directory do not contain the dates.  The revisions are the same and the baseline
results are valid, and only the naming differs. You can refer to the SHA
identifiers for the revisions to match up the results with the scenarios.

If you'd like to use these scenarios with our virtual machine images, you *must*
untar them in /root/genprog-many-bugs/.  If not, you must reconfigure the
programs yourself, and hunt through the scripts to find the places where we
hard-coded paths.  This is doable, but annoying.

Note also that genprog-many-bugs on the VM you're using may include scripts and
so on related to benchmark setup.  Please ignore them; they're not related to
the use of the tarballs here.

Each scenario contains roughly the following contents, with some variation in
support (such as for program-specific compilation challenges) and byproduct
(from GenProg/AE setup runs) files.  This list is organized somewhat by topic
area/importance and then in order based on how important I, Claire, think it
will be to you, reader:

** Program source and compilation **

*** General materials  ***

program/ 
  The program source code tree, checked out and configured to build at the
  defect revision.

bug-info/ 

  log-msg.txt contains the log message associated with the fix revision
  (available in the majority of cases).  

  scenario-data.txt contains information about the defect scenario, including
  version number of the ManyBugs release from which the scenario was taken
  (currently at version beta-2.0). We provide links to bug database entries
  where we could find them; in a couple of cases.  We list the files or script
  code that corresponds to the failing test cases.  We looked at the human
  patches to characterize which code constructs they syntactically modify.  We
  manually classified the defect to the best of our ability based on the log
  message, diffs, test cases, forum posts, etc, doing our best to be consistent
  with the tags we use to allow you to slice the scenarios however is most
  suitable for your work.  If you come up with new tags or new columns or
  classification schemes, we'd love to hear from you!

  We include PDFs of old forum posts or bug database entries when they're
  informative but difficult to find on the modern internet. 

diffs/
  contains subfolder(s) mimicking the path to the buggy file, files, or module
  to be repaired. Within the subfolder are 3 files: the buggy version of the c
  file to be repaired, the *human-repaired* version of the c file to be
  repaired, and a file containing the output of a diff of the two. This folder
  is useful for examining the human-made changes that repaired a bug.  Note that
  these are not preprocessed and correspond to the output of diff as applied to
  the code that arrives from the repository at the revision numbers in question.

  For example, a hypothetical scenario php-bug-324110-324112 (back when php had
  an svn repository!) would have a diffs folder with the following contents:
  
  diffs/
    Zend/
      zend_exceptions.c-324110
      zend_exceptions.c-324112
      zend_exceptions.c-diff

preprocessed/
  contains the bug-implicated files at the buggy revision *as passed through the
  C preprocessor*.  Can be copied right in place into program/ to build it at
  the buggy revision afresh.  These files are what GenProg/AE uses as input, and
  unless you have a *very* fancy C parsing engine, you will probably do the same
  or similar.  These files correspond to their counterparts in diffs/ at the
  buggy revision, but after running through the C preprocessor and thus will
  look very different (like fixed/ at the fix revision).

fixed/
  contains the *preprocessed* code at the *human-repaired* revision.  This
  directory contains files that also appear in diffs/ (those corresponding to
  the fix revision), *but* they have been through the C preprocessor.  These are
  not GenProg/AE fixes.

bugged-program.txt
  lists the files in preprocessed/.  We have presliced the programs to contain
  only the implicated module(s); if you'd like to try to repair the entire
  program, refer to the program/ source tree, which is checked out and
  configured at the buggy revision.  *Don't delete this* if you intend to use
  compile.pl to compile program variants.

  For GenProg/AE users: this file is passed as input to GenProg/AE to specify
  the program under repair (--program)

compile.pl/*compile-helper*/etc
  compilation scripts.  compile.pl is the toplevel script for compiling a
  prgoram variant; see the configuration file associated with a scenario for the
  exact script/format. comiple.pl takes the name of an executable at the top
  level of a subdirectory that holds the source code corresponding to a
  variant. It refers to bugged-program.txt to check which files should be
  included in a variant, and then copies those files from the base variant
  directory into the program source tree, where it then recompiles the program.

  The argument style is mostly for legacy reasons and for interoperability with
  various GenProg machinations, but can be trivially worked around. For some
  history/an illustration: for defects that compile all of their source code
  into one new executable per variant, the argument passed to compile.pl would
  be the name of the executable to be compiled.  Thus, to compile variant 0,
  GenProg makes a folder 000000 to hold the source code corresponding to the
  variant as well as byproducts of the build process, and passes the argument
  000000/000000 to the compile script.  compile.pl just hacks this naming
  convention to find the source files and then builds in the program/ source
  directory, to save time and configuration energy.

local-root
  install directory for scenarios that require a local prefix when configuring
  or building (valgrind, gzip, etc).

*** More esoteric material included primarily for GenProg legacy purposes **

fixed-program.txt
  lists the files in fixed/ Included for compatability with our GenProg ICSE
  2012 experiments.

fault.lines/fix.lines
  filenames and line numbers that the human modified in the buggy revision (and
  that are modified in the fixed revision).  Each line contains a filename and a
  linenumber and a floating point number that we have used as weights in certain
  GenProg experiments appearing in ICSE 2012; non-GenProg users will probably
  ignore the weights, at least, if not both of these files altogether.


** Program and variant testing ** 

test.sh
  Script to run test cases on program variants.  Takes as arguments a test name
  and the name of a compiled executable to test; it *mostly* (see below) ignores
  the latter and, for each program scenario, assumes that it is testing the
  program as compiled in the source tree.  You need to provide something for the
  second argument, however, because when used by GenProg/AE, test.sh checks the
  name of the executable to determine if the test cases are being used to
  compute coverage information (and thus will complain if that argument is
  empty).  Providing random garbage as that argument is fine for non-GenProg/AE
  users.  Test cases are preceded by 'p' for initially passing tests and 'n' for
  the initially failing tests (those corresponding to the bug under repair).
  Tests are 1-indexed within each p/n class.  The mapping between the integers
  and the actual test cases for each scenario is somewhat arbitrary; each
  program has a different means for executing its test suite, and we had to
  break down those test suites into individual tests in some manner or another.
  Returns 0 if a test case passes and non-zero if it fails.  Can be run either
  by another repair program or manually, such as:

  & ./test.sh p1 /root/mountpoint-genprog/genprog-many-bugs/wireshark-bug-37112-37111/sanity/./repair.sanity

limit.c/limit
  a small C program that is compiled to the limit executable; helps sandbox test
  cases by limiting cpu time that they can consume, guarding against infinite
  loops.  If a test case exceeds a time limit (program-specific), the script
  terminates the test case and returns an exit code indicating failure. 

bug-failures/fix-failures
  text files listing integer-indexed test cases failed by the the program at the
  bug and fix revisions, respectively. Test cases are referenced by a positive
  integer, one number per line (same naming scheme as in test.sh).  The integers
  correspond to the (positive) test numbering in test.sh.  Note that test cases
  are 1-indexed, for legacy reasons.  Used to determine the initially passing
  and initially failing test cases for a scenario (basically bug-failures -
  fixfailures = bug test cases).  Test cases that both revisions fail are
  excluded from consideration. 


** GenProg/AE support and byproducts **

configuration-default, configuration-oracle
  configuration files for GenProg v2.0 containing default settings; GenProg/AE
  allows arguments to be passed either via command line or via configuration
  files.  Note that these files do not contain the full configuration of the
  experiments for which results appear in this directory; consult the output
  logs of those runs to see the full specification.  We typically use these as a
  starting point (as they contain the program to be repaired, the mechanisms for
  running the test script, etc), and add extra arguments after it in the call to
  GenProg, for example:

  $ /PATH/TO/GENPROG/repair configuration-default --seed 1 --search ga --appp 0.35 

  Arguments passed to genprog/AE after the configuration file supercede those in
  the config file.  The oracle configuration is included for legacy purposes,
  but the experiments we discuss in the benchmarks journal article all build on
  the default config.

coverage.path.neg and coverage.path.pos
  contains a list of integers corresponding to statements executed by the
  failing and passing test cases, respectively.  The numbering scheme comes from
  GenProg/AE/CIL internals. 

coverage/
  Byproduct of GenProg/AE, used to compute coverage information.  A version of
  the source files under repair that has been instrumented to print out
  statement identifying information.  Non GenProg/AE users should feel free to
  ignore. 

repair.cache
  the GenProg/AE test cache.  Does not necessarily contain a repair; the naming
  is misleading, for legacy purposes.  Can (and often should, for fresh runs) be
  removed without harm, but can also speed up your GenProg/AE runs.

default.cache
  the genprog representation cache corresponding to the default configuration
  setup.  Possibly useful even to non-GenProg users because if you demarshal it
  appropriately (check out the GenProg code to find the original data structure
  setup), you can map integers to the statement numbers in the coverage.path.*
  files.

repair.debug.#
  repair.debug.# is a file produced by running GenProg/AE with a given random
  seed #. For example, if the random seed is zero, then repair.debug.0 is the
  name of the file that will be produced. repair.debug files contain the
  output of a GenProg or AE run.  If a scenario tarball includes one, it is more
  than likely leftover from a "sanity check" or coverage-producing run conducted
  while setting up the scenarios.  

sanity/
  source code corresponding to a variant of the program that was used for sanity
  checking by a previous AE/GenProg run; a byproduct.  Sometimes useful to
  consult if you are running AE/GenProg experiments, since the code has been
  preprocessed and passed through CIL, and thus may be more easily comparable to
  variant code as emitted by GenProg.  For other experiments, you can probably
  remove it, and can certainly ignore it.


*********
* 3. RESULTS
*********

results/ contains results, organized by bug repair technique: ae/ and
genprog2.0/.  See the associated article:

The ManyBugs and IntroClass Benchmarks for Automated Repair of C Programs, by
Claire Le Goues, Neal Holtschulte, Edward K. Smith, Yuriy Brun, Premkumar
Devanbu, Stephanie Forrest, and Westley Weimer. IEEE Transactions on Software
Engineering, Dec 2015.

for motivation for and details about the experimental setup.  Note also that all
of the debug logs begin with an enumeration of all of the command line options
and their settings for that particular run, enabling reproduction.

Within the repair technique directories, the results are organized by benchmark
program.  In the genprog2.0 directory, each program is further subdivided by
defect scenario; in ae/ there is no further subdivision beyond benchmark.  This
is simply a matter of organizational convenience: AE is deterministic, and thus
its results involve one execution of the repair algorithm per scenario, while
genprog2.0 involves 10 executions of the repair algorithm per.

NOTE: as mentioned above, some of these results were obtained by running
experiments before we completed renaming of the git scenarios, and thus do not
all contain the dates as they do in the scenario package naming scheme, above;
the SHAs line up, however.

The results themselves are packaged in tgz format. For each scenario, for each
seed run for each scenario (recalling again that AE will typically only have
seed 1, since it is deterministic), there will *typically* exist at least one
"debug" tgz and one "seed/results" tgz.  

For example, the results/ae/libtiff directory contains results for libtiff
scenario libtiff-bug-2005-12-27-6f76e76-5dac30f:

debug: libtiff-bug-6f9f4d7-73757f3-debug-configuration-default-domU-12-31-39-0B-B0-66-2013-05-07-21-41.tgz
seed/results: libtiff-bug-6f9f4d7-73757f3-s1-configuration-default-domU-12-31-39-0B-B0-66-2013-05-07-21-41.tgz

The -s#- in the name refers to the seed; this is more important for genprog2.0.
the -configuration-default- refers to the configuration file used as a starting
point for the run.  The domU- through the year (2012, 2013, or 2014) identifies
the virtual machine on which the run was executed and is used to distinguish
between runs; the last component is typically the date (and possibly time) the
run was collected.

The "debug" tgz contains *at least*:

debug.txt
  the debug log for the run.  This also includes the output of GenProg/AE, as
  would appear in any corresponding repair.debug.* file.

experiment-machine-script.sh
  The bash script used to launch the run on a machine.

It may also contain repair.debug.*, repair.cache, and repair/; if so, these
match those in the "seed" tgz.

The "seed" tgz, if available, contains:

repair.debug.#
   The log output of the run.  Starts with all command line options.  The number
   at the end is the seed with which the random number was generated.  It will
   match the seed in the filename of the tgz.

experiment-machine-script.sh 
  Same as above

repair.cache
  The genprog test cache

repair/
  If this folder is available, it means the run found a repair, and this
  directory contains the source code correpsonding to the repaired variant as
  found on this run).

If a "seed" tgz appears to be missing for a scenario, this is usually because
the run timed out before hitting its population or generation maximum and thus
the was killed before completing naturally.  You can find the partial
repair.debug. file corresponding to this run in its debug tgz file, but you
will have to grep through the debug.txt to figure out which seed it corresponds
to (for genprog2.0 runs; for ae, seed is always 1).