flyingmouse flyingmouse - 4 months ago 32
Java Question

How to split a .doc into several .doc using JAVA POI?

I am using

POI
to read
.doc
files, and I want to select some of the contents to form new
.doc
files. Specifically speaking, is it possible to write the content of a “paragraph” in the “range” to a new file? Thank you.

HWPFDocument doc = new HWPFDocument(fs);
Range range = doc.getRange();
for (int i = 0; i < range.numParagraphs(); i++) {
//here I wish to write the content in a Paragraph
//into a new .doc file "doc1""doc2"
//instead of doc.write(pathName) that only write one .doc file.
}

Answer

So here is the code that works with the current task. Here the criteria of selecting paragraphs is quite simple: paragraphs 11..20 go to the file "us.docx", and 21..30 - to "japan.docx".

import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;

import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.usermodel.Paragraph;
import org.apache.poi.hwpf.usermodel.Range;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;


public class SplitDocs {

    public static void main(String[] args) {

        FileInputStream in = null;
        HWPFDocument doc = null;

        XWPFDocument us = null;
        XWPFDocument japan = null;
        FileOutputStream outUs = null;
        FileOutputStream outJapan = null;

        try {
            in = new FileInputStream("wto.doc");
            doc = new HWPFDocument(in);

            us = new XWPFDocument();
            japan = new XWPFDocument();

            Range range = doc.getRange();

            for (int parIndex = 0; parIndex < range.numParagraphs(); parIndex++) {  
                Paragraph paragraph = range.getParagraph(parIndex);

                String text = paragraph.text();
                System.out.println("***Paragraph" + parIndex + ": " + text);

                if ( (parIndex >= 11) && (parIndex <= 20) ) {
                    createParagraphInAnotherDocument(us, text);
                } else if ( (parIndex >= 21) && (parIndex <= 30) ) {
                    createParagraphInAnotherDocument(japan, text);
                }
            }

            outUs = new FileOutputStream("us.docx");
            outJapan = new FileOutputStream("japan.docx");
            us.write(outUs);
            japan.write(outJapan);

            in.close();
            outUs.close();
            outJapan.close();

        } catch (IOException e) {
            e.printStackTrace();
        }

    }

    private static void createParagraphInAnotherDocument(XWPFDocument document, String text)  {         XWPFParagraph newPar = document.createParagraph();
        newPar.createRun().setText(text, 0);
    }

}

I used .docx as the output as it is waaaaay easier to add new paragraphs to a .docx than to a .doc file. The method insertAfter(ParagraphProperties props, int styleIndex) for inserting a new Paragraph to a given range is now deprecated (i use POI version 3.10), and i couldn't find an easy and logical way to create a new Paragraph object in the empty .doc file. Whereas it's a pleasure to use straightforward and clean XWPFParagraph newPar = document.createParagraph();.

However, this code uses .doc as an input, as required in your task. Hope this will help :)

P.S. Here we use a simple choosing criteria, using paragraph indices. If you need something like font criteria, as you said, you will probably post another questions, or maybe you'll find the solution yourself. Anyway, with docx things get easier.

Comments