These are very basic download instructions for SableSpMT. Patches are welcome. My current SableSpMT implementation: svn co svn://svn.sablevm.org/developers/chris/sandbox/sablevm sablespmt I've merged changes up to and including revision 4620 of the SableVM trunk. Accordingly, you also need r4620 of sablevm-classpath: svn co -r4620 svn://svn.sablevm.org/sablevm-classpath/trunk sablevm-classpath-4620 You'll need my version of Soot in order to transform classfiles: svn co https://svn.sable.mcgill.ca/soot/soot/branches/cpicke soot-cpicke In turn, you'll also want older versions of Jasmin and Polyglot, available in the current directory. An outline of the build process follows: 1) Build my version of Soot. 2) Build sablevm-classpath-4620. 3) Build SableSpMT. 4) Transform benchmarks (and optionally classpath). I am in the process of cleaning up the whole system, which I hope to complete by the end of the summer. Please note that the current SableSpMT implementation only works on x86_64; this is one of the things I plan to fix. In case you're curious, I plan to: 1) Update to the Soot trunk 2) Move my analyses into Soot itself 3) Eliminate the need for Spmt.fork() / Spmt.join() insertion in Soot 4) Move most of my code into a language-independent library for SpMT, libspmt. 5) Provide support for libspmt directly in the SableVM trunk. Currently I am working actively on (4). If you have any questions whatsoever, please don't hesitate to contact me. These are the questions I've been asked so far: =============================================== Q1: What are the specific things you plan to optimize to improve the execution times? A1: You should read the PASTE'05 paper for our profiling results. 1) Child forking. Currently we fork a child at every single INVOKE and there is just way too much overhead. We'd like to do better experiments with heuristics like John Whaley and others have written about (we have some of these heuristics in SableSpMT already, most are pretty obvious, but they're turned off for the current two papers). Compiler analyses to find fork points could also be quite useful. 2) Speculative synchronization. Currently children must stop on MONITOR(ENTER|EXIT), synchronized INVOKE, and also any time a memory barrier is required by the JSR-133 Cookbook. If instead we record the lock and barrier operations in our dependence buffer we should be able to enter and exit critical sections speculatively and to proceed past volatile and final field loads and stores. 3) Nested speculation. Although we allow a parent virtual/native thread to fork a new child virtual thread for each stack frame, children aren't allowed to fork their own children. If we let this happen, it has the potential to expose a lot more parallelism. In the PASTE'05 paper you'll see that the priority queue spends a lot of its time being empty, which means in fact that there aren't enough children being enqueued. The major problem with nested speculation is cleaning up when a parent dies and leaves behind a whole tree of children. 4) Return value prediction. Currently, at every fork point, five predictions are made (last value, stride, two-delta stride, context, and memoization), and one of these is selected by the hybrid predictor (you can see our VPW2 paper if want more details). The context and memoization predictions are quite expensive since they use a hash function and also quite big hashtables. It seems likely that the hybrid will dynamically stabilize on one predictor after say 1000 predictions, for a given callsite, and that we can turn the other 4 predictors off, saving overhead costs. Also, we might be able to limit hashtable expansion for the context and memoization predictors once accuracy stops improving, and therefore save on memory/cache/pagefault costs. 5) Load value prediction. When a speculative child violates a dependence it is forced to abort, and the LCPC paper shows that this happens fairly often (Table 3). If we could predict the value when we get to GET(FIELD|STATIC) or ALOAD using a simple non-table predictor (e.g. last value for this bytecode instruction), then it might reduce the number of violations. 6) JIT compiler support. Starting in the fall, I'm working at IBM to get TLS working in their JIT compiler. This may or may not have an advantage over doing it in SableVM, which is just an interpreter. We'll have to see. But I'd like to get speedup in SableVM first, so this one doesn't really count.