lawyeR lawyeR - 4 months ago 16
LaTeX Question

Inserting bold text with knitr and LaTeX for terms that have already been indexed

My PDF produced by

knitr
and
LaTeX
using RStudio has more than 200 indexed terms. I realized too late that it would be good to bold those indexed terms so that I can spot them in the PDF. It seems plausible that there is a way to automate that bolding.

The tiny text vector below gives an example of what text in the .Rnw script looks like except for the additional escape code before the backslash preceding 'index'. For variety there is a space before the indexed word in the second string, and a non-indexed example in the third string. None of my indexed terms are longer than five words.

text <- c("blah blah \\index{words}words ramble on", "more blah more blah\\index{space words} space words ramble on",
"final blah\\textbf{bold words} ramble on")

library(stringr)


My efforts at a positive look-behind using regex and the
stringr
package, to spot '\index{' and pull out the word(s) that is indexed have failed. My hope was that the regex statement below would say, "if you find the word 'index' is followed by an open brace, five or fewer words, and a close brace, extract the words." Nope

wd <- str_extract(string = text, pattern = "(?<=index{\\w{1:5}})\\w+{1:5}")
Error in stri_match_first_regex(string, pattern, opts_regex = attr(pattern, :
Error in {min,max} interval. (U_REGEX_BAD_INTERVAL)


Would someone be good enough to set me straight on how to extract the word(s) in braces? My eventual goal, to be clear, is to enclose the word(s) that follow the open brace -- the index term -- with \txtbf{ }. If you give guidance on that step, even better!

EDIT
Thanks to the comments of Wiktor Stribiżew, I would like all indexed words to be bold font in the text. Thus, the first would be "blah blah words ramble on", the next would be "more blah more blah space words ramble on", etc. The .Rnw file will need to do so, in the first example, by inserting \textbf{words} -- with the word(s) in braces. I don't know how to accomplish that.

Answer

Taking into account your last comment:

I want to retain the index portion, but bold the word(s) that are indexed and come immediately after it. Thus, "blah blah \\index{words}\\textbf{words}"

I believe you need:

(\\index\{(\w+(?:\s+\w+){0,4})\})

and replace with \1\\textbf{\2}. See the regex demo.

Explanation:

  • (\\index\{(\w+(?:\s+\w+){0,4})\}) - Group 1 capturing all the pattern so that we could reference it with \1
  • \\index\{ - a literal character sequence \index{
  • (\w+(?:\s+\w+){0,4}) - Group 2 (referenced to as \2) capturing:
    • \w+ - one or more word chars (replace with \S+ to match 1+ any non-whitespace chars)
    • (?:\s+\w+){0,4} - zero to four sequences of:
      • \s+ - 1+ whitespaces
      • \w+ - 1+ word chars (may be replaced with \S+)
  • \} - a literal }

See the R demo:

text <- c("blah blah \\index{words}words ramble on", "more blah more blah\\index{space words} space words ramble on","final blah\\textbf{bold words} ramble on")
gsub("(\\\\index\\{(\\w+(?:\\s+\\w+){0,4})\\})","\\1\\\\textbf{\\2}", text)
## => [1] "blah blah \\index{words}\\textbf{words}words ramble on"                            
##    [2] "more blah more blah\\index{space words}\\textbf{space words}  space words ramble on"
##    [3] "final blah\\textbf{bold words} ramble on"
Comments