Title: ManyBugs/README Description: README for the ManyBugs dataset Author: Claire Le Goues, clegoues@cs.cmu.edu (borrowing somewhat from unpublished text produced by Neal Holtschulte) Date: June 1, 2014 Updated: January 6, 2016 Version: beta-2.0 ********* * 0. LICENSE ********* See toplevel directory for license information. ********* * 1. OVERVIEW ********* This directory contains experimental scenarios and baseline results of running AE, GenProg, and TSPRepair for the 185 defect scenarios in the ManyBugs dataset. This README is a roadmap to the materials in this directory. More information is available with the article; in other READMEs in the autorepairbenchmarks package; and via email. Always feel free to contact me/us with questions, comments, complaints, pointers, and suggestions. However, please read this entire README first, because there are pointers throughout that might help you make quick sense of some of the peculiarities of the data and materials in the directory. NOTE: I use the names of the three tools somewhat interchangably throughout this README because they're all built on top of the same internals; running each involves passing different arguments to the same executable. The ManyBugs directory contains two subdirectories: scenarios/ and results/ ********* * 2. SUMMARY DATA ********* bug_data.csv contains information on all of the defects in the ManyBugs dataset. The data is also available in each of the scenario tarballs (see below), but you may find it helpful to look at the summary data. The first column is the scenario name, corresponding to the tarball/directory name. Each subsequent column is roughly self-explanatory (and labeled). Additional detail on the content currently in this spreadsheet is explained below in the context of the bug-info provided with the tarball. ********* * 3. SCENARIOS ********* scenarios/ contains the defect scenarios, packaged in .tar.gz form. Each scenario is named following a convention indicating the software system name and revision identifiers in question. For software associated with git repositories, we include the date of the fix commit in the scenario name to impose an easily-identifiable temporal ordering on the defects in each program set. NOTE: because we parallelized development and experiments, there are cases where the filenames corresponding to results on git-based scenarios in the results/ directory do not contain the dates. The revisions are the same and the baseline results are valid, and only the naming differs. You can refer to the SHA identifiers for the revisions to match up the results with the scenarios. If you'd like to use these scenarios with our virtual machine images, you *must* untar them in /root/genprog-many-bugs/. If not, you must reconfigure the programs yourself, and hunt through the scripts to find the places where we hard-coded paths. This is doable, but annoying. Note also that genprog-many-bugs on the VM you're using may include scripts and so on related to benchmark setup. Please ignore them; they're not related to the use of the tarballs here. Each scenario contains roughly the following contents, with some variation in support (such as for program-specific compilation challenges) and byproduct (from GenProg/AE setup runs) files. This list is organized somewhat by topic area/importance and then in order based on how important I, Claire, think it will be to you, reader: ** Program source and compilation ** *** General materials *** program/ The program source code tree, checked out and configured to build at the defect revision. bug-info/ log-msg.txt contains the log message associated with the fix revision (available in the majority of cases). scenario-data.txt contains information about the defect scenario, including version number of the ManyBugs release from which the scenario was taken (currently at version beta-2.0). We provide links to bug database entries where we could find them; in a couple of cases. We list the files or script code that corresponds to the failing test cases. We looked at the human patches to characterize which code constructs they syntactically modify. We manually classified the defect to the best of our ability based on the log message, diffs, test cases, forum posts, etc, doing our best to be consistent with the tags we use to allow you to slice the scenarios however is most suitable for your work. If you come up with new tags or new columns or classification schemes, we'd love to hear from you! We include PDFs of old forum posts or bug database entries when they're informative but difficult to find on the modern internet. diffs/ contains subfolder(s) mimicking the path to the buggy file, files, or module to be repaired. Within the subfolder are 3 files: the buggy version of the c file to be repaired, the *human-repaired* version of the c file to be repaired, and a file containing the output of a diff of the two. This folder is useful for examining the human-made changes that repaired a bug. Note that these are not preprocessed and correspond to the output of diff as applied to the code that arrives from the repository at the revision numbers in question. For example, a hypothetical scenario php-bug-324110-324112 (back when php had an svn repository!) would have a diffs folder with the following contents: diffs/ Zend/ zend_exceptions.c-324110 zend_exceptions.c-324112 zend_exceptions.c-diff preprocessed/ contains the bug-implicated files at the buggy revision *as passed through the C preprocessor*. Can be copied right in place into program/ to build it at the buggy revision afresh. These files are what GenProg/AE uses as input, and unless you have a *very* fancy C parsing engine, you will probably do the same or similar. These files correspond to their counterparts in diffs/ at the buggy revision, but after running through the C preprocessor and thus will look very different (like fixed/ at the fix revision). fixed/ contains the *preprocessed* code at the *human-repaired* revision. This directory contains files that also appear in diffs/ (those corresponding to the fix revision), *but* they have been through the C preprocessor. These are not GenProg/AE fixes. bugged-program.txt lists the files in preprocessed/. We have presliced the programs to contain only the implicated module(s); if you'd like to try to repair the entire program, refer to the program/ source tree, which is checked out and configured at the buggy revision. *Don't delete this* if you intend to use compile.pl to compile program variants. For GenProg/AE users: this file is passed as input to GenProg/AE to specify the program under repair (--program) compile.pl/*compile-helper*/etc compilation scripts. compile.pl is the toplevel script for compiling a prgoram variant; see the configuration file associated with a scenario for the exact script/format. comiple.pl takes the name of an executable at the top level of a subdirectory that holds the source code corresponding to a variant. It refers to bugged-program.txt to check which files should be included in a variant, and then copies those files from the base variant directory into the program source tree, where it then recompiles the program. The argument style is mostly for legacy reasons and for interoperability with various GenProg machinations, but can be trivially worked around. For some history/an illustration: for defects that compile all of their source code into one new executable per variant, the argument passed to compile.pl would be the name of the executable to be compiled. Thus, to compile variant 0, GenProg makes a folder 000000 to hold the source code corresponding to the variant as well as byproducts of the build process, and passes the argument 000000/000000 to the compile script. compile.pl just hacks this naming convention to find the source files and then builds in the program/ source directory, to save time and configuration energy. local-root install directory for scenarios that require a local prefix when configuring or building (valgrind, gzip, etc). *** More esoteric material included primarily for GenProg legacy purposes ** fixed-program.txt lists the files in fixed/ Included for compatability with our GenProg ICSE 2012 experiments. fault.lines/fix.lines filenames and line numbers that the human modified in the buggy revision (and that are modified in the fixed revision). Each line contains a filename and a linenumber and a floating point number that we have used as weights in certain GenProg experiments appearing in ICSE 2012; non-GenProg users will probably ignore the weights, at least, if not both of these files altogether. ** Program and variant testing ** test.sh Script to run test cases on program variants. Takes as arguments a test name and the name of a compiled executable to test; it *mostly* (see below) ignores the latter and, for each program scenario, assumes that it is testing the program as compiled in the source tree. You need to provide something for the second argument, however, because when used by GenProg/AE, test.sh checks the name of the executable to determine if the test cases are being used to compute coverage information (and thus will complain if that argument is empty). Providing random garbage as that argument is fine for non-GenProg/AE users. Test cases are preceded by 'p' for initially passing tests and 'n' for the initially failing tests (those corresponding to the bug under repair). Tests are 1-indexed within each p/n class. The mapping between the integers and the actual test cases for each scenario is somewhat arbitrary; each program has a different means for executing its test suite, and we had to break down those test suites into individual tests in some manner or another. Returns 0 if a test case passes and non-zero if it fails. Can be run either by another repair program or manually, such as: & ./test.sh p1 /root/mountpoint-genprog/genprog-many-bugs/wireshark-bug-37112-37111/sanity/./repair.sanity limit.c/limit a small C program that is compiled to the limit executable; helps sandbox test cases by limiting cpu time that they can consume, guarding against infinite loops. If a test case exceeds a time limit (program-specific), the script terminates the test case and returns an exit code indicating failure. bug-failures/fix-failures text files listing integer-indexed test cases failed by the the program at the bug and fix revisions, respectively. Test cases are referenced by a positive integer, one number per line (same naming scheme as in test.sh). The integers correspond to the (positive) test numbering in test.sh. Note that test cases are 1-indexed, for legacy reasons. Used to determine the initially passing and initially failing test cases for a scenario (basically bug-failures - fixfailures = bug test cases). Test cases that both revisions fail are excluded from consideration. ** GenProg/AE support and byproducts ** configuration-default, configuration-oracle configuration files for GenProg v2.0 containing default settings; GenProg/AE allows arguments to be passed either via command line or via configuration files. Note that these files do not contain the full configuration of the experiments for which results appear in this directory; consult the output logs of those runs to see the full specification. We typically use these as a starting point (as they contain the program to be repaired, the mechanisms for running the test script, etc), and add extra arguments after it in the call to GenProg, for example: $ /PATH/TO/GENPROG/repair configuration-default --seed 1 --search ga --appp 0.35 Arguments passed to genprog/AE after the configuration file supercede those in the config file. The oracle configuration is included for legacy purposes, but the experiments we discuss in the benchmarks journal article all build on the default config. coverage.path.neg and coverage.path.pos contains a list of integers corresponding to statements executed by the failing and passing test cases, respectively. The numbering scheme comes from GenProg/AE/CIL internals. coverage/ Byproduct of GenProg/AE, used to compute coverage information. A version of the source files under repair that has been instrumented to print out statement identifying information. Non GenProg/AE users should feel free to ignore. repair.cache the GenProg/AE test cache. Does not necessarily contain a repair; the naming is misleading, for legacy purposes. Can (and often should, for fresh runs) be removed without harm, but can also speed up your GenProg/AE runs. default.cache the genprog representation cache corresponding to the default configuration setup. Possibly useful even to non-GenProg users because if you demarshal it appropriately (check out the GenProg code to find the original data structure setup), you can map integers to the statement numbers in the coverage.path.* files. repair.debug.# repair.debug.# is a file produced by running GenProg/AE with a given random seed #. For example, if the random seed is zero, then repair.debug.0 is the name of the file that will be produced. repair.debug files contain the output of a GenProg or AE run. If a scenario tarball includes one, it is more than likely leftover from a "sanity check" or coverage-producing run conducted while setting up the scenarios. sanity/ source code corresponding to a variant of the program that was used for sanity checking by a previous AE/GenProg run; a byproduct. Sometimes useful to consult if you are running AE/GenProg experiments, since the code has been preprocessed and passed through CIL, and thus may be more easily comparable to variant code as emitted by GenProg. For other experiments, you can probably remove it, and can certainly ignore it. ********* * 3. RESULTS ********* results/ contains results, organized by bug repair technique: ae/ and genprog2.0/. See the associated article: The ManyBugs and IntroClass Benchmarks for Automated Repair of C Programs, by Claire Le Goues, Neal Holtschulte, Edward K. Smith, Yuriy Brun, Premkumar Devanbu, Stephanie Forrest, and Westley Weimer. IEEE Transactions on Software Engineering, Dec 2015. for motivation for and details about the experimental setup. Note also that all of the debug logs begin with an enumeration of all of the command line options and their settings for that particular run, enabling reproduction. Within the repair technique directories, the results are organized by benchmark program. In the genprog2.0 directory, each program is further subdivided by defect scenario; in ae/ there is no further subdivision beyond benchmark. This is simply a matter of organizational convenience: AE is deterministic, and thus its results involve one execution of the repair algorithm per scenario, while genprog2.0 involves 10 executions of the repair algorithm per. NOTE: as mentioned above, some of these results were obtained by running experiments before we completed renaming of the git scenarios, and thus do not all contain the dates as they do in the scenario package naming scheme, above; the SHAs line up, however. The results themselves are packaged in tgz format. For each scenario, for each seed run for each scenario (recalling again that AE will typically only have seed 1, since it is deterministic), there will *typically* exist at least one "debug" tgz and one "seed/results" tgz. For example, the results/ae/libtiff directory contains results for libtiff scenario libtiff-bug-2005-12-27-6f76e76-5dac30f: debug: libtiff-bug-6f9f4d7-73757f3-debug-configuration-default-domU-12-31-39-0B-B0-66-2013-05-07-21-41.tgz seed/results: libtiff-bug-6f9f4d7-73757f3-s1-configuration-default-domU-12-31-39-0B-B0-66-2013-05-07-21-41.tgz The -s#- in the name refers to the seed; this is more important for genprog2.0. the -configuration-default- refers to the configuration file used as a starting point for the run. The domU- through the year (2012, 2013, or 2014) identifies the virtual machine on which the run was executed and is used to distinguish between runs; the last component is typically the date (and possibly time) the run was collected. The "debug" tgz contains *at least*: debug.txt the debug log for the run. This also includes the output of GenProg/AE, as would appear in any corresponding repair.debug.* file. experiment-machine-script.sh The bash script used to launch the run on a machine. It may also contain repair.debug.*, repair.cache, and repair/; if so, these match those in the "seed" tgz. The "seed" tgz, if available, contains: repair.debug.# The log output of the run. Starts with all command line options. The number at the end is the seed with which the random number was generated. It will match the seed in the filename of the tgz. experiment-machine-script.sh Same as above repair.cache The genprog test cache repair/ If this folder is available, it means the run found a repair, and this directory contains the source code correpsonding to the repaired variant as found on this run). If a "seed" tgz appears to be missing for a scenario, this is usually because the run timed out before hitting its population or generation maximum and thus the was killed before completing naturally. You can find the partial repair.debug. file corresponding to this run in its debug tgz file, but you will have to grep through the debug.txt to figure out which seed it corresponds to (for genprog2.0 runs; for ae, seed is always 1).