excitive excitive - 5 months ago 126
Vb.net Question

Reading PDF Bookmarks in VB.NET using iTextSharp

I am making a tool that scans PDF files and searches for text in PDF bookmarks and body text. I am using Visual Studio 2008 with VB.NET with iTextSharp.

How do I load bookmarks' list from an existing PDF file?

Answer

It depends on what you understand when you say "bookmarks".

You want the outlines (the entries that are visible in the bookmarks panel):

The CreateOnlineTree examples shows you how to use the SimpleBookmark class to create an XML file containing the complete outline tree (in PDF jargon, bookmarks are called outlines).

Java:

PdfReader reader = new PdfReader(src);
List<HashMap<String, Object>> list = SimpleBookmark.getBookmark(reader);
SimpleBookmark.exportToXML(list,
        new FileOutputStream(dest), "ISO8859-1", true);
reader.close();

C#:

PdfReader reader = new PdfReader(pdfIn);
var list = SimpleBookmark.GetBookmark(reader);
using (MemoryStream ms = new MemoryStream()) {
    SimpleBookmark.ExportToXML(list, ms, "ISO8859-1", true); 
    ms.Position = 0;
    using (StreamReader sr =  new StreamReader(ms)) {
        return sr.ReadToEnd();
    }              
} 

The list object can also be used to examine the different bookmark elements one by one programmatically (this is all explained in the official documentation).

You want the named destinations (specific places in the document you can link to by name):

Now suppose that you meant to say named destinations, then you need the SimpleNamedDestination class as shown in the LinkActions example:

Java:

PdfReader reader = new PdfReader(src);
HashMap<String,String> map = SimpleNamedDestination.getNamedDestination(reader, false);
SimpleNamedDestination.exportToXML(map, new FileOutputStream(dest),
        "ISO8859-1", true);
reader.close();

C#:

PdfReader reader = new PdfReader(src);
Dictionary<string,string> map = SimpleNamedDestination
      .GetNamedDestination(reader, false);
using (MemoryStream ms = new MemoryStream()) {
    SimpleNamedDestination.ExportToXML(map, ms, "ISO8859-1", true);
    ms.Position = 0;
    using (StreamReader sr =  new StreamReader(ms)) {
      return sr.ReadToEnd();
    }
}

The map object can also be used to examine the different named destinations one by one programmatically. Note the Boolean parameter that is used when retrieving the named destinations. Named destinations can be stored using a PDF name object as name, or using a PDF string object. The Boolean parameter indicates whether you want the former (true = stored as PDF name objects) or the latter (false = stored as PDF string objects) type of named destinations.

Named destinations are predefined targets in a PDF file that can be found through their name. Although the official name is named destinations, some people refer to them as bookmarks too (but when we say bookmarks in the context of PDF, we usually want to refer to outlines).