user1629109 user1629109 - 5 months ago 109
Java Question

Match a pattern and write the stream to a file using Java 8 Stream

I'm trying to read a huge file and extract the text within "quotes" and put the lines into a set and write the content of the set to a file using Java 8


public class DataMiner {

private static final Pattern quoteRegex = Pattern.compile("\"([^\"]*)\"");

public static void main(String[] args) {

String fileName = "c://exec.log";
try (Stream<String> stream = Files.lines(Paths.get(fileName))) {
Set<String> dataSet = stream.
//How do I Perform pattern match here
Files.write(Paths.get(fileName), dataSet);

} catch (IOException e) {


Please help me. Thanks!

EDIT: Answers to the questions..

  1. No there are no multiple quoted texts.

  2. I could have used simple loop. But I want to use Java 8 streams


Unfortunately, the Java regular expression classes don't provide a stream for matched results, only a splitAsStream() method, but you don't want split.

Note: It has been added in Java 9 as Matcher.results().

You can however create a generic helper class for it yourself:

public final class PatternStreamer {
    private final Pattern pattern;
    public PatternStreamer(String regex) {
        this.pattern = Pattern.compile(regex);
    public Stream<MatchResult> results(CharSequence input) {
        List<MatchResult> list = new ArrayList<>();
        for (Matcher m = this.pattern.matcher(input); m.find(); )

Then your code becomes easy by using flatMap():

private static final PatternStreamer quoteRegex = new PatternStreamer("\"([^\"]*)\"");
public static void main(String[] args) throws Exception {
    String inFileName = "c:\\exec.log";
    String outFileName = "c:\\exec_quoted.txt";
    try (Stream<String> stream = Files.lines(Paths.get(inFileName))) {
        Set<String> dataSet = stream.flatMap(quoteRegex::results)
                                    .map(r ->
        Files.write(Paths.get(outFileName), dataSet);

Since you only process a line at a time, the temporary List is fine. If the input string is very long and will have a lot of matches, then a Spliterator would be a better choice. See How do I create a Stream of regex matches?