/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
 * This file is part of SableVM.                             *
 * See the file "LICENSE" for Copyright information and the  *
 * terms and conditions for copying, distribution and        *
 * modification of SableVM.                                  *
 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */

================================
Files in sablevm/src/libsablevm/
================================

### TODO: needs updating

Here we describe the core files in sablevm/src/libsablem/ that
come together to make the libsablevm.so file at compile time.
Several of these files are written using m4 macros, to save space and
reduce code duplication.  It is strongly recommended that you learn
to read and write m4 following existing examples, although you can
look at the generated files if you want pure C.  Files generated from
m4 macros are not included the following list.

When you start a new project, first go through the following list of
files and make a condensed list of those that you really need to look at.

bootstrap.m4.c:
	get ready to create java/lang/Class instances
	create a whole bunch of fundamental classes (lots of exception classes)

buffer.m4.c:
buffer.m4.h:
	dependence buffer for speculative execution (not released yet)

cast.list:
	list of all macro supported casts possible within SableVM C source code
	( _svmt_cp_info, _svmt_attribute_info, _svmt_type_info, jobject,
	 jarray, _svmt_native_ref, etc.)

cast.m4.c:
	macro m4svm_cast($1,$2,$3) to cast $3 to $2

cl_alloc.list:
	list of method declarations for class memory allocation

cl_alloc.m4.c:
cl_alloc.m4.h:
	macros to implement methods declared in cl_alloc.list

class_file_parser.h:
	short C preprocessor macro to parse class files.  not really sure
	why this is necessary ... Etienne says it may be obsolete.

class_loader.c:
	class loading, including that done during bootstrapping and
	invoked by the user code

class_loader_memory_manager.c:
class_loader_memory_manager.h:
	the class loader has its own memory manager (see Ch. 3 of thesis)

constants.h:
	constants defined within SableVM for:
		type state
		array types
		constant pool tags
		access flags
		types
		signal codes
		2 specific constants
		interpreter flags
		stack map types
		thread status
		instructions

---------- FROM THE sablevm-developer LIST ------------

>> all constants defined should go into constants.h and this should be a 
>> policy for anyone modifying the source.  i'm not sure if there are 
>> constants defined elsewhere in the VM.

Except for system-specific things, of course.

-------------------------------------------------------

direct_threaded.m4:
	single line macro definition (why is this file necessary?)

error.c:
	code for using signal handlers to throw exceptions

error.list:
	list of exceptions and errors handled by SableVM

error_bits.m4.h:
error_classes.m4.h:
error_init_methods.m4.h:
error_instances.m4.h:
	very short multicallable macros to declare error types.
	not sure why all four of these files exist ... they are
	very similar and all declare macros of the same name.


---------- FROM THE sablevm-developer LIST ------------

>> error_bits.m4.h:
>> error_classes.m4.h:
>> error_init_methods.m4.h:
>> error_instances.m4.h:
>>        very short multicallable macros to declare error types.
>>        not sure why all four of these files exist ... they are
>>        very similar and all declare macros of the same name.


That's the idea: They declare macros of the same name, yet expand
differenlty.  Look into src/libsablevm/Makefile.am again.

-------------------------------------------------------
	
error_throwing.c:
	macro to implement throwing specific errors

fatal.c:
fatal.h:
	code to throw fatal errors that cause abortion of the VM

gc_copying.c:
	semi-space Cheney copying collector

gc_generational.c:
	currently broken generational GC ... Etienne suggested
	getting rid of it altogether (it will still exist in the
	repository if you "delete" it).

gc_none.c:
	no GC memory management (just allocate heap and operate until
	it is full).

global_alloc.list:
	list of method declarations for global allocations

global_alloc.m4.c:
global_alloc.m4.h:
	macros to implement malloc, calloc, and free methods as specified
	in global_alloc.list

global_refs.c:
global_refs.h:
	code to create and free native globals (in what context?).  requires
	synchronization on vm->global_mutex.

---------- FROM THE sablevm-developer LIST ------------

>> I need a better explanation for global_refs.* and local_refs.*

The idea is that you get type-safe allocation and free functions.  In
addition, the free function resets the pointer variable to NULL,
making sure any subsequent attempt to dereference the freed pointer
causes a segfault (instead of using an obsolete pointer value, which
can be very, very difficult to track).

Also, these functions do all the necessary to throw an exception in
case of out-of-memory situation.

Being type-safe, you know that the allocated memory is of the right
size.  Once you'll have accumulated a lot C development experience,
you'll know how easy it is to make the following mistake, when you
copy/paste code:

sometype1 *var = malloc (sizeof(sometype2));

And how easy it is to forget to check whether var == NULL.  You'll
also notice how it is annoying to retype the correct error handling
code.

The global_refs do all of the tedious work for you, so you get to
only write:

  if (_svmm_gzalloc_gc_map_node (env, method->parameters_gc_map) != JNI_OK)
    {
      return JNI_ERR;
    }

As for the ..._no_exception version, they do the same, but they do not
create and throw an OutOfMemoryError.  They are necessary, because in
early bootstrapping, the VM has not yet created the heap, so it cannot
instantiate an Error object instance.

-------------------------------------------------------
	
heap_manager.c:
	code to get object hashcodes and include GC's.
	it seems that the code at the end of this file is out of date, if not
	the whole file ... Etienne says "Possible".

heap_manager.h:
	empty file essentially ... Etienne says "Probably".

initialization.c:
	class initialization code.

inlined_threaded.m4:
	single line macro definition (why is this file necessary?)	

instructions.m4.c:
	implementations of all bytecodes

instructions_preparation.m4:
	single line macro definition (why is this file necessary?)	
	
instructions_preparation.m4.c:
	macros for preparing switch-, direct-, and inlined-threaded instructions

instructions_switch.m4:
	single line macro definition (why is this file necessary?)	

---------- FROM THE sablevm-developer LIST ------------

>> I don't know why we need direct_threaded.m4, inlined_threaded.m4, 
>> switch_threaded.m4,
>> instructions_preparation.m4, and instructions_switch.m4.  In general, I 
>> think the very short files  like these are unnecessary.

They are necessary.  Look in src/libsablevm/Makefile.am.  These file impact the
selection of the appropriate macro expansion.

-------------------------------------------------------
	
instructions_switch.m4.c	
	another switch-threaded macro, confused about this file.

---------- FROM THE sablevm-developer LIST ------------

>> I don't know why instructions_switch.m4.c exists, it seems similar 
>> to the content in instructions_preparation.m4.c and there are no files 
>> called instructions_inlined.m4.c, or instructions_direct.m4.c. 

It is necessary.  Unlike the direct/threaded threaded engines, the
switch-based interpreter has to provide a real "switch" statement of
bytecode implementations to be included in _svmf_interpreter()
[interpreter.c], yet it does need also a separate "switch" (like the
two other engines) to provide information about each bytecode (in
prevision of method preparation [prepare_code.c]).

The key to understand what is happening is to trace the execution of
_svmf_interpreter using a debugger (I suggest DDD) by setting
breakpoints at the appropriate locations.  For this, you need to
compile the 3 engines, ideally with --enable debugging-features,
unless you want to be intrigued by the segfault used in the normal
control flow of the VM...  

-------------------------------------------------------

interpreter.c:
interpreter.h:
	the main interpreter engine that can be compiled to run in
	switch-, direct-, or inlined-threaded modes.

invoke_interface.c:
invoke_interface.h:
	JNI methods to create and destroy VM's, with AttachCurrentThread,
	DetachCurrentThread, and GetEnv still unimplemented.

java_lang_*:
	JNI methods for java.lang.* ... some of these are fundamental to VM
	operation, others are not and remain to be implemented.

---------- FROM THE sablevm-developer LIST ------------

>> move the java_lang_* stuff into libsablevm/java and remove vmlib.* 
>> altogether.

No.  The Auto* tools don't really support compilation of a single
library out of files in multiple directories.

If you want the C optimizer to do a good job, you want to compile a
single library in one shot.  I never understood why people do a per
file compilation of C code to .o.  It makes no sense; C wasn't
designed like that.  An optimizing compiler cannot inline functions or
do any global optimization if you arbitrarily separate your
compilation unit in smaller units on a source file boundary.

It only makes snese to generate a .(s?)o if you plan to reuse the same
functionality across many executable/libraries, and dont want your
optimizer to do any cross function boundary optimization.

libsablevm.so should only export a restricted set of symbols: JNI_*
and Java_*.  Using more than one compilation unit would cause the
exportation of additional (read: inteternal!) symbols.

So, unless you want to fix the GNU auto* tools, you'll have to live
with the current file structure. 

-------------------------------------------------------

jnidefs.h:
	definitions of jobject, jarray, jfieldID, and jmethodID.  why do
	these get their own special file, instead of being included in
	types.h or jni.h?

	Etienne's answer:  Because jni.h is installed on the users system
	and made available for Java programmers to be able to write JNI
	libraries to link with the VM.  The types above are "opaque"
	types, from a Java programmer's point of view, so that the same
	JNI library can wotk with *any* JVM implementing the JNI 
	interface.  Of course, within SableVM itself, these types
	should't be opaque, so there you have jnidefs.h to
	"enlighten" these types. 

lib_init.c:
	not really sure, seems to be only for calling lt_dlinit(), and
	ensuring that it only gets called once.

	Etienne's answer:  You can invoke JNI_CreateJavaVM multiple times
	(concurrently) within a single process (as long as each call is
        on a different thread).  lib_init.c serves to make sure some 
        global libsablevm initialization happens only once, using the
        standard POSIX pthread_once().

libsablevm.c:
	big include file for compilation to a single object (see Makefile.am)

---------- FROM THE sablevm-developer LIST ------------

>> all #include directives should go in libsablevm.c

except for pthread_rec*, and some other ones like in
_svmf_interpreter().

-------------------------------------------------------

link.c:
link.h:
	class, array, and type linking.

local_refs.c:
local_refs.h:
	similar to global_refs.c but for locals.

macros.c:
	casting functions and _svmf_is_same_object.  not really sure why
	these cast functions are necessary, can't we generate them using
	cast.list and cast.m4.c ... maybe this file is obsolete.
	
macros.h:
	some C preprocessor macros.  seems like these are probably
	obsolete given the widespread use of m4 now.

---------- FROM THE sablevm-developer LIST ------------

>> it seems that everything in macros.c should go in cast.list, except 
>> for _svmf_is_same_object which should go in one of the util files.  I'm 
>> not sure if macros.c is obsolete or not.  It seems that macros.h is 
>> probably obsolete as well, and if not, should probably be targeted for 
>> replacement with m4 macros in the future.

You are mostly right.  A couple of methods cannot be generated using
cast.list because of the additional conditional.  Also, the name
"macros.[ch]" is not really intuitive.  It is part of a few legacy
things from before the SableVM rearchitecturing to use m4...

-------------------------------------------------------

macros.m4:
	basic m4 macros for use within the rest of the VM.  learning how
	m4 works is a good idea if you want to understand SableVM.

Makefile.am:
	input file for automake

method_invoke.list:
	declarations of VM-critical nonvirtual and static methods.  not
	sure what the "specific" methods are all about.

---------- FROM THE sablevm-developer LIST ------------

>> unclear about "specific" methods in method_invoke.list

The non-"specific" method invocations are invocation of method
decalred in classes which are automatically loaded by the bootstrap
class loader.

The "specific" method invocations are used to invoke a "specific"
version of a method signature selected at runtime.  So, in the
"specific" case, you have an additional formal parameter for providing
a _svmt_method_info pointer.

For example, to invoke the <clinit> method of a class (part of class
initialization), the VM needs the _svmt_method_info which is specific
to that class.  It cannot use a generic _svmt_method_info from
method/class loaded at bootstrap time.

If you don't like the "specific" name, (I'm starting to dislike it
quite a bit), I'm open to suggestions for a replacement.

-------------------------------------------------------


method_invoke.m4.c:
method_invoke.m4.h:
	macros containing the bodies of methods declared in method_invoke.list

native.c:
native.h:
	code to enable native method execution in SableVM

native_interface.c:
	implementation of all JNI methods

native_interface.h:
	very small file defining the JNINativeInterface extern.

---------- FROM THE sablevm-developer LIST ------------

>> do you need native_interface.h?
>> 

These are remnants of the old SableVM code base where I was using
separate sompilation units (as is most intuitive with auto* tools).

So, unless the "extern" needs a forward declaration, we could maybe
get rid of it.

[Order of inclusion in libsablevm.c can be a good technical reason to
keep a .h file. :-)]

-------------------------------------------------------

new_instance.c:
new_instance.h:
	methods to create objects and arrays.


predictor.m4.c:
predictor.m4.h:
	value predictor for speculative execution (not released yet)

prepare.c:
prepare.h:
	prepare interfaces and classes, but not method bodies.

prepare_code.c:
prepare_code.h:
	prepare method bodies (see Ch.2 of thesis)

pthread_rec_svm.c:
pthread_rec_svm.h:
	implementation of recursive locks on top of pthread

---------- FROM THE sablevm-developer LIST ------------

>> all type declarations should go in types.h .... this should be a
>> policy like for constants.h ... the stuff in system.h obviously
>> shouldn't be in there though.  % grep typedef * shows that only
>> pthread_rec_svm.* violates this.

pthread_rec_svm.* should remain so.  It is an implementation of
recusrive mutexes on top of POSIX non-recursive mutexes.  These files
can easily be reused by other applications, with no dependency
whatsoever on other SableVM internal data structures & types.  

This could go into a seprate directory, yet I prefer to keep more
opimization opportunities by leaving them as part of the same
compilation unit.

-------------------------------------------------------

resolve.c:
resolve.h:
	code to resolve methods, classes, and interfaces when found
	in bytecode during execution.

splay_tree.list:
	list of splay tree kinds used within SableVM (there are 4, one
        each for types, gc_maps, sequences, and interface_method_signatures). 

splay_tree.m4.c:
	macros to generate splay tree code for the kinds listed in
	splay_tree.list:

switch_threaded.m4
	single line macro definition (why is this file necessary?)

system.c
	system specific implementations

system.h
	system specific header

thread.c:
thread.h:
	thread management functions (initialization, stopping for GC, etc.)

types.h:
	This file is really at the heart of SableVM.  It defines the data
   	structures used in SableVM.  When looking at the VM for a first
   	time, you should spend some time reading this file.

---------- FROM THE sablevm-developer LIST ------------

util.h:
	a few short utility C preprocessor macros ... not sure if these
	are defined elsewhere or not.  for example, it seems that the
	memset() wrapper should go into global_alloc.m4.c with the
	other memory management stuff

util.m4.c:
	some put/get macros.  seems like they complement those in util1.c
	but I'm not sure.

util1.c:
	various utilities for working with C code and for getting
	stack traces.  not sure why the put/get_BOOLEAN/REFERENCE_field/static
	methods are in here ...
util2.c:
	more utility code, this time mostly for working with Java data
	structures.  seems to me like the Java stuff should go in one util
	file and the C stuff in another.


>> util.h, util.m4.c., util1.c, and util2.c need to be cleaned up (see 
>> my comments in the file).  split
>> the util functions into two categories: those for working with the C 
>> language, and those for
>> working with Java data structures.  furthermore, util.h could be built 
>> from an m4 file and a list file it
>> seems.


Patches are welcome.  The reason it is not all in a single file is
because of compile dependencies (order of #include's).

-------------------------------------------------------

verifier.c:
verifier.h:
	bytecode verifier (not fully implemented).
XS
vm_args.m4.c:
	macros to handle arguments to the VM from the command line (or from
	whoever creates it?)

vmlib.c:
vmlib.h:
	some JNI methods ... not sure if this file is perhaps obsolete (or at
	least part of it, because it seems some java.lang.* stuff is in here.
	the header file is empty.