JoJo JoJo - 5 months ago 24
Javascript Question

Regex match varying number of words

Software - Adobe Professional XI

Programming - JavaScript with regular expression to match wild card words

Background - I have multiple pdf drawings with a title block, using java script, digital signature fields are added based on the location of the word (by matching with regex).

Currently am testing to see if words are present for revision 1 of a drawing title block.

The script searches for the Revision number 1 followed by a date, a title (with a varying number of words) and 4 sets of initials.

The number 1 is static, (the date, title and initials are all wild cards as they are different for each drawing).

I am using regular expressions to match the words.

This part of the regular expression finds the number 1 and date (this is working).

^1\s[0-9]{1,2}.[0-9]{1,2}.[0-9]{2}


The rest of the regular expression is not matching the title and initials (this is not working)

s\w+(\s+\w+){1,8}


If anyone can help with the regular expression to match the words and initials that will be most appreciated.

Once the regex matching is working will split at each location of the 4 sets of initials so the javascript can add digital signature fields at these locations.

Can assistance also be given on how to split words with regex too?

Here is the entire script (the javascript is working, help needed for regex only)

numWords = this.getPageNumWords(0);
// number of words on page
// loop through the words on page
for (var j = 0; j < numWords-1; j++)
{ // get word pair to test
ckWords = this.getPageNthWord(0, j) + ' ' + this.getPageNthWord(0, j + 1); // test words

// example of word string
// 1 26.05.16 THE REINFORCEMENT REVISED MM SB AE GM

if (ckWords.match(/^1\s[0-9]{1,2}.[0-9]{1,2}.[0-9]{2}\s\w+(\s+\w+){1,8}/))
{
console.println(ckWords);
}
}


pdf of title block with text

Answer

Add the initials to to the end of the regular expression, so you can match them separately.

ckWords = '1 26.05.16 THE REINFORCEMENT REVISED MM SB AE GM';

match = ckWords.match(/^1\s\d{1,2}\.\d{1,2}\.\d{2}\s\w+(?:\s+\w+){1,8}\s([A-Z]{2})\s([A-Z]{2})\s([A-Z]{2})\s([A-Z]{2})$/);
console.log(match);

This will put the initials in capture groups 1 through 4 of the match.

Also, don't forget that . has special meaning in regular expressions, so you need to escape them to match them explicitly.

Comments