So, I found and am currently using Stanford Parser and it works GREAT for splitting sentences. Most of our sentences are from AP so it works very well for that task.
Here's the problems:
If you want to try sticking with the Stanford Tokenizer/Parser, look at the documentation page for the tokenizer.
If you just want to split sentences, you don't need to invoke the parser proper, and so you should be able to get away with a tiny amount of memory - a megabyte or two - by directly using DocumentPreprocessor.
While there is only limited customization of the tokenizer available, you can change the processing of quotes. You might want to try one of:
The first will mean no quote mapping of any kind, the second would change single or doubled ascii quotes (if any) into left and right quotes according to the best of its ability.
And while the tokenizer splits words in various ways to match Penn Treebank conventions, you should be able to construct precisely the original text from the tokens returned (see the various other fields in the CoreLabel). Otherwise it's a bug.