c3po c3po - 6 months ago 56
Java Question

SWIG get returntype from String as String array in java

For a small Java project I needed to interact with existing code written in C, so to make things easy (I'm not a C/C++ programmer unfortunately..) I decided to use swig.

The generated wrapper code seems to work; however, when I call a function that is supposed to give me a NULL-delimited list of strings (That's what the C function is supposed to return, if I'm not mistaken) the wrapped code only returns the first String value of the expected list of values. I assume the correct return datatype in Java would be a String Array instead of a String? Is this assumption correct and can this be handled by specifying a

typemap
in the swig interface file? Or, am I on the wrong track?

The function in the C header file states:

DllImport char *GetProjects dsproto((void));


The resulting JNI java file:

public final static native String GetProjects();


Any help/pointers would be greatly appreciated!

Answer

Solution 1 - Java

There are a bunch of different ways you can solve this problem in SWIG. I've started out with a solution that just requires you to write a little more Java (inside the SWIG interface) and that automatically gets applied to make your function return String[] with the semantics you desire.

First up I wrote a small test.h file that lets us exercise the typemaps we're working towards:

static const char *GetThings(void) {
  return "Hello\0World\0This\0Is\0A\0Lot\0Of Strings\0";
}

Nothing special just a single function that splits multiple strings into one and terminates with a double \0 (the last one is implicit in string constants in C).

I then wrote the following SWIG interface to wrap it:

%module test
%{
#include "test.h"
%}

%include <carrays.i>

%array_functions(signed char, ByteArray);

%apply SWIGTYPE* { const char *GetThings };

%pragma(java) moduleimports=%{
import java.util.ArrayList;
import java.io.ByteArrayOutputStream;
%}

%pragma(java) modulecode=%{
static private String[] pptr2array(long in, boolean owner) {
  SWIGTYPE_p_signed_char raw=null;
  try {
    raw = new SWIGTYPE_p_signed_char(in, owner);
    ArrayList<String> tmp = new ArrayList<String>();
    int pos = 0;
    ByteArrayOutputStream bos = new ByteArrayOutputStream();
    while (ByteArray_getitem(raw, pos) != 0) {
      byte c;
      while ((c = ByteArray_getitem(raw, pos++)) != 0) {
        bos.write(c);
      }
      tmp.add(bos.toString());
      bos.reset();
    }
    return tmp.toArray(new String[tmp.size()]);
  }
  finally {
    if (owner && null != raw) {
      delete_ByteArray(raw);
    }
  }
}
%}

%typemap(jstype) const char *GetThings "String[]";
%typemap(javaout) const char *GetThings {
  return pptr2array($jnicall, $owner);
}

%include "test.h"

Essentially what that does is use the carrays.i SWIG library file to expose a few functions that let us get, set and delete arrays exactly like a raw pointer in C would. Since SWIG special cases char * by default though we have to break that in the case of the function we're looking at with %apply since we don't want that happening. Using signed char for the array functions gets us what we want: a mapping to Byte in Java and not String.

The jstype typemap simply changes the resulting function return type to what we want it to be: String[]. The javaout typemap is explaining how we're doing a conversion from what the JNI call returns (a long since we deliberately stopped it getting wrapped as a normal null terminated string) and instead uses a bit of extra Java we wrote inside the module (pptr2array) to do that work for us.

Inside pptr2array we're essentially building up our output array byte by byte into each String. I used an ArrayList because I'd rather grow it dynamically than make two passes over the output. Using a ByteArrayOutputStream is a neat way to build a Byte array byte by byte, which has two main advantages:

  1. Multibyte unicode can work correctly like this. This is in contrast to casting each byte to char and appending to a String(Builder) individually.
  2. We can re-use the same ByteArrayOutputStream for each string, which lets the buffer get reused. Not really a deal breaker at this scale, but no harm done by doing it from day 1.

One more point to note: in order for $owner to be set correctly and indicate if we're expected to free() the memory returned from the C function you'll need to use %newobject. See discussion of $owner in docs.


Solution 2 - JNI

If you prefer you can write almost the same solution, but entirely in typemaps making a few JNI calls instead:

%module test
%{
#include "test.h"
#include <assert.h>
%}

%typemap(jni) const char *GetThings "jobjectArray";
%typemap(jtype) const char *GetThings "String[]";
%typemap(jstype) const char *GetThings "String[]";
%typemap(javaout) const char *GetThings {
  return $jnicall;
}
%typemap(out) const char *GetThings {
  size_t count = 0;
  const char *pos = $1;
  while (*pos) {
    while (*pos++); // SKIP
    ++count;
  }
  $result = JCALL3(NewObjectArray, jenv, count, JCALL1(FindClass, jenv, "java/lang/String"), NULL);
  pos = $1;
  size_t idx = 0;
  while (*pos) {
    jobject str = JCALL1(NewStringUTF, jenv, pos);
    assert(idx<count);
    JCALL3(SetObjectArrayElement, jenv, $result, idx++, str);
    while (*pos++); // SKIP
  }
  //free($1); // Iff you need to free the C function's return value
}

%include "test.h"

Here we've done essentially the same thing, but added 3 more typemaps. The jtype and jnitype typemaps tell SWIG what return types the generated JNI code and corresponding native function is going to return, as Java and C (JNI) type respectively. The javaout typemap get simpler, all it does is pass a String[] straight through as a String[].

The in typemap however is where the work happens. We allocate a Java array of String[] in the native code. This is done by making a first pass to simply count how many elements there are. (There's no neat way of doing this in one pass in C). Then in a second pass we call NewStringUTF and store that into the right place in the output array object we created previously.

All that remains to be done then is free the result the function returned if required. (In my example it's a const char* string literal so we don't free it).

Comments