These are very basic download instructions for SableSpMT.  Patches
are welcome.

My current SableSpMT implementation:

  svn co svn://svn.sablevm.org/developers/chris/sandbox/sablevm sablespmt

I've merged changes up to and including revision 4620 of the SableVM
trunk.  Accordingly, you also need r4620 of sablevm-classpath:

  svn co -r4620 svn://svn.sablevm.org/sablevm-classpath/trunk sablevm-classpath-4620

You'll need my version of Soot in order to transform classfiles:

  svn co https://svn.sable.mcgill.ca/soot/soot/branches/cpicke soot-cpicke

In turn, you'll also want older versions of Jasmin and Polyglot,
available in the current directory.

An outline of the build process follows:

  1) Build my version of Soot.
  2) Build sablevm-classpath-4620.
  3) Build SableSpMT.
  4) Transform benchmarks (and optionally classpath).

I am in the process of cleaning up the whole system, which I hope to
complete by the end of the summer.  Please note that the current
SableSpMT implementation only works on x86_64; this is one of the
things I plan to fix.  In case you're curious, I plan to:

  1) Update to the Soot trunk
  2) Move my analyses into Soot itself
  3) Eliminate the need for Spmt.fork() / Spmt.join() insertion in Soot
  4) Move most of my code into a language-independent library for SpMT, libspmt.
  5) Provide support for libspmt directly in the SableVM trunk.

Currently I am working actively on (4).  If you have any questions
whatsoever, please don't hesitate to contact me.

These are the questions I've been asked so far:
===============================================

Q1: What are the specific things you plan to optimize to improve the execution times?

A1: You should read the PASTE'05 paper for our profiling results.

1) Child forking.  Currently we fork a child at every single
INVOKE<X> and there is just way too much overhead.  We'd like to do
better experiments with heuristics like John Whaley and others have
written about (we have some of these heuristics in SableSpMT already,
most are pretty obvious, but they're turned off for the current two
papers). Compiler analyses to find fork points could also be quite
useful.

2) Speculative synchronization.  Currently children must stop on
MONITOR(ENTER|EXIT), synchronized INVOKE<X>, and also any time a
memory barrier is required by the JSR-133 Cookbook.  If instead we
record the lock and barrier operations in our dependence buffer we
should be able to enter and exit critical sections speculatively and
to proceed past volatile and final field loads and stores.

3) Nested speculation.  Although we allow a parent virtual/native
thread to fork a new child virtual thread for each stack frame,
children aren't allowed to fork their own children.  If we let this
happen, it has the potential to expose a lot more parallelism.  In the
PASTE'05 paper you'll see that the priority queue spends a lot of its
time being empty, which means in fact that there aren't enough children
being enqueued. The major problem with nested speculation is cleaning
up when a parent dies and leaves behind a whole tree of children.

4) Return value prediction.  Currently, at every fork point, five
predictions are made (last value, stride, two-delta stride, context,
and memoization), and one of these is selected by the hybrid predictor
(you can see our VPW2 paper if want more details).  The context and
memoization predictions are quite expensive since they use a hash
function and also quite big hashtables.  It seems likely that the
hybrid will dynamically stabilize on one predictor after say 1000
predictions, for a given callsite, and that we can turn the other 4
predictors off, saving overhead costs.  Also, we might be able to
limit hashtable expansion for the context and memoization predictors
once accuracy stops improving, and therefore save on
memory/cache/pagefault costs.

5) Load value prediction.  When a speculative child violates a
dependence it is forced to abort, and the LCPC paper shows that this
happens fairly often (Table 3).  If we could predict the value when we
get to GET(FIELD|STATIC) or <X>ALOAD using a simple non-table
predictor (e.g. last value for this bytecode instruction), then it
might reduce the number of violations.

6) JIT compiler support.  Starting in the fall, I'm working at IBM to
get TLS working in their JIT compiler.  This may or may not have an
advantage over doing it in SableVM, which is just an interpreter.
We'll have to see.  But I'd like to get speedup in SableVM first, so
this one doesn't really count.