These scripts are what I used to perform the staging-r1575 vs. 1.0.9
regression tests.  You need to get:

1) JGF Grande (single-threaded) from the JGF website
   (just extract to ./grande and run the compile script)
2) JGF Grande (multi-threaded) from the JGF website
   (just extract to ./thread_grande and run the compile script)
3) SPECjvm98 -- packaged for Ashes2 by Bruno Dufour
4) JOlden -- packaged for Ashes2 by Bruno Dufour

We cannot distribute SPECjvm98, ask Bruno about it if you have
permission to use it legally.
  (bruno.dufour@mail.mcgill.ca)

To run spec and jolden:

./compile_spec
./run_spec &> spec_times_$DATE
./create_report spec $DATE

./compile_jolden
./run_jolden &> jolden_times_$DATE
./create_report jolden $DATE

To run grande and thread_grande:

./compile_grande
./run_grande &> grande_times_$DATE

** manually cut the grande_times_$DATE file into:
  1) grande_section1_times_$DATE
  2) grande_section2_times_$DATE
  3) grande_section3_times_$DATE

./create_report grande section1 $DATE
./create_report grande section2 $DATE
./create_report grande section3 $DATE

./compile_thread_grande
./run_thread_grande &> thread_grande_times_$DATE

** manually cut the thread_grande_times_$DATE file into:
  1) thread_grande_section1_times_$DATE
  2) thread_grande_section2_times_$DATE
  3) thread_grande_section3_times_$DATE

./create_report thread_grande section1 $DATE
./create_report thread_grande section2 $DATE
./create_report thread_grande section3 $DATE

The current configuration only runs the SizeA benchmarks from JGF.

These scripts need some modifications for flexibility (to use different VM's, to
use a variable number of VM's, etc. etc.).  It would also be nice to package
the JGF benchmarks in the same way as the JOlden and SPECjvm98 benchmarks, to
reduce the script-adjusting overhead when running a new set of benchmarks.

In the output files:

*_times_* are the raw benchmark times
*_report_* are reports, with user times and pass/fail status (see first)
*_diff_* are diffs that caused failure in the reports
*_paranoid_diff_* are diffs that don't ignore run-specific information

In *_report_*:

  sd_t == switch-debug time
  i_t == inlined time
  sd_s == switch-debug status
  i_s == inlined status

In *_diff_*:

Each SableVM output is compared against java.out, which is supposed to
be the correct output (from sun's 1.4.1-b21 java).  If there's a problem, 
the name of the SableVM output file is printed, and then the diff.

e.g.

./grande/section1/sablevm-1.0.9-switch-debug-JGFSerialBench.out
1c1,12
< Exception in thread "main" java.lang.VerifyError: (class: JGFSerialBench, method: JGFrun signature: ()V) Incompatible types for storing into array of arrays or objects
---
> *** Couldn't bind native method Java_java_lang_Class_getDeclaredFields ***
> *** or Java_java_lang_Class_getDeclaredFields__ ***
> java.lang.UnsatisfiedLinkError

If they're identical, save for timing information (grep -i -v -E
'time|total|msec|returned value'), the result is a pass in the report
file.  The `paranoid' diffs are those that have everything (but still
worth scanning over -- one time I found negative numbers reported for
timing information with my spmt stuff).  I ran the SableVM's with '-Y'.

Mail chris.pickett [at] mail.mcgill.ca if you have questions about this.