Nampa Gwakondo Nampa Gwakondo - 3 years ago 191
Java Question

Creating new annotation sets in GATE

I have started learning GATE application and I would like to use it to extract information from an unstructured document. The information I am interested in are date, location, event information and person’s names. I would like to get information about events that happened at a specific location on a specific date and the person/s name. I have been reading the GATE manual and thats how I got the glimpse on how to build your pipeline. However, I am not figuring out how I can create my new annotation types and make sure that they are annotated to a new annotation set which should appear under the annotation sets on the right. I found similar questions like GATE - How to create a new annotation SET? but it didn help me either.

Let me explain what I did so far:


  1. Created .lst file for my new NE and put them under ANNIE resources/gazetteer directory

  2. I added the .lst file description in the list.def file

  3. I identified my patterns in the document e.g for Date formats like ddmm, dd.mm.yyyy

  4. I wrote JAPE rule for each pattern in a separate .jape file

  5. Added the JAPE file names into the main.jape file

  6. Loaded the PR and my document into GATE

  7. Run the application



This is how my JAPE Rule looks like for one date format:

Phase: datesearching
Input: Token Lookup SpaceToken
Options: control = appelt

////////////////////////////////////Macros
//Initialization of regular expressions
Macro: DAY_ONE
({Token.kind == number,Token.category==CD, Token.length == "1"})

Macro: C
({Token.kind == number,Token.category==CD, Token.length == "2"})

Macro: YEAR
({Token.kind == number,Token.category==CD, Token.length == "4"})

Macro: MONTH
({Lookup.majorType=="Month"})

Rule: ddmmyyydash
(
(DAY_ONE|DAY_TWO)
({Token.string == ","}|{Token.string == "."} |{Token.string == "-"})
(MONTH)
({Token.string == ","}|{Token.string == "."} |{Token.string == "-"})
(YEAR)
)
:ddmmyyyydash
-->
:ddmmyyyydash.DateMonthYearDash= {rule = "ddmmyyyydash"}


Can someone please help me with what I should do to make sure that DateMonthYearDash is created as a new annotation set? How do I do it? Thanks a lot.

When I change the outputAsName of the Jape Transducer the new set is not appearing like the rest. This is how it looks:

annotation set list

Answer Source

As said, linked or quoted in the question you mention (GATE - How to create a new annotation SET?), you have two options:

  1. Change the outputASName of your JAPE transducer PR.
  2. Use Annotation Set Transfer PR to copy or move desired annotations from one annotation set to another one.

JAPE function - explanation

JAPE transducer (similarly to many other GATE PRs) simply takes some input annotations and based on them it creates some new output annotations. The input and output annotation sets names can be configured by inputASName and outputASName run-time parameters. inputASName says where it should look for input annotations and outputASName says where it should put output annotations to.

What should be where

The input annotation set must contain the necessary input annotations before the JAPE transducer PR is executed. These annotations are usually created by preceding PRs in the pipeline. Otherwise it will not see the necessary input annotations and it will not produce anything.

The output annotation set may be empty or it may contain anything before the JAPE execution. It doesn't matter. The import thing is that the new output annotations (DateMonthYearDash in your case) are created there when the JAPE transducer PR execution finished.
So after successful JAPE execution you should see the new annotations there.

Some terminology

Note that annotation sets have names.
While annotations have type, id, offsets, features and annotation set they belong to.


JAPE correction

I found some issues in your JAPE grammar:

  1. Don't include SpaceToken unless you explicitly use them in your grammar or you are sure there will be none inside the pattern... See also: Concept of Space Token in JAPE
  2. ({Lookup.majorType=="Month"}) -> ({Lookup.minorType=="month"})
  3. (DAY_ONE|DAY_TWO) -> (DAY_ONE)

After corrections + after ANNIE pipeline for document 9 - January - 2017: GATE doc output

JAPE grammar after corrections:

Phase: datesearching
    Input: Token Lookup
    Options: control = appelt

    Macro: DAY_ONE
    ({Token.kind == number,Token.category==CD, Token.length == "1"})

    Macro: YEAR
    ({Token.kind == number,Token.category==CD, Token.length == "4"})

    Macro: MONTH
    ({Lookup.minorType=="month"})

    Rule: ddmmyyydash
    (
        (DAY_ONE)
        ({Token.string == ","}|{Token.string == "."} |{Token.string == "-"})
        (MONTH)
        ({Token.string == ","}|{Token.string == "."} |{Token.string == "-"})
        (YEAR)
    )
    :ddmmyyyydash
    -->
        :ddmmyyyydash.DateMonthYearDash= {rule = "ddmmyyyydash"}

What to do when JAPE does not produce anything

You have to investigate the input annotations and "debug" your JAPE grammar. Usually there is some expected input annotation missing or there is some extra annotation you did not expect to be there. There is a nice view in GATE for this purpose: annotation stack. Also some features of input annotations can have different name or value than you expected (e.g. What is correct: {Lookup.majorType=="Month"} or {Lookup.minorType=="month"}?).

By "debugging" a JAPE grammar I mean: try to simplify the rule as far as it starts working. Keep trying it on a simple document where it should match for sure. So in your case you can try it without the (DAY_ONE) part. If it still doesn't work, try only (MONTH)({Token.string == "-"})(YEAR), or even (MONTH) only... Until you find the mistake in the grammar...

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download