jabal jabal - 2 months ago 6x
Groovy Question

Boost up bulk Groovy eval() with caching

i am writing a program for bulk processing excel files. each row's data is put into a map and the filename and the sheetname determines the script that processed the fetched data. these scripts are not bundled into my program, they are not even classes implementing a specific interface.

this is the processing loop logic:

excelfile.eachLineOnSheet { line, sheet ->
def data = extractData();
def lineprocessorscript = determineLineProcessor(excelfile, sheet);

Eval.xy data, outputfile, lineprocessorscript


Of course this is easy, but on large files i'd like to improve performance. first i cached lineprocessors' code, so that .groovy files are read only once.

is it possible to make Eval.xy faster by caching the compiled script somehow?
i'd like to keep my scripts simple, so that the do not implement any interface or stuff.


@Binil Thomas answer helped me to start. I looked at the Groovy sources and saw that GroovyClassLoader has a built-in caching, but when called from Eval direction the caching is turned off:

private Class parseClass(final GroovyCodeSource codeSource) throws CompilationFailedException {
  // Don't cache scripts
  return loader.parseClass(codeSource, false);

Why not cache scripts..? This is exactly what I needed.. :-) So I wrote the stuff that Eval does based on the sources and this came out:

lineprocessors.each {

  if(cachedLineProcessorCodes[it] == null) {
    def gsc = new GroovyCodeSource(new File(it).getText(), it, 'DEFAULT_CODE_BASE')
    Class cc = gcl.parseClass(gsc, true)
    cachedLineProcessorCodes[it] = cc

  def binding = new Binding()
  binding.setVariable("x", linedata)
  binding.setVariable("y", lineProcFiles[it])

  def Script sc = InvokerHelper.createScript(cachedLineProcessorCodes[it], binding)

  //Eval.xy linedata, lineProcFiles[it], new File(it).getText()


In my case, when 7900 lines were processed by the groovy script, the runtime decreased from ~73s to ~5s.