jakr jakr - 1 year ago 128
Bash Question

replace every nth occurrence of a pattern using awk

I'm trying to replace every nth occurrence of a string in a text file.

I have a huge bibtex file (called in.bib) containing hundreds of entries beginning with "@". But every entry has a different amount of lines. I want to write a string (e.g. "#") right before every (let's say) 6th occurrence of "@" so, in a second step, I can use csplit to split the huge file at "#" into files containing 5 entries each.

The problem is to find and replace every fifth "@".

Since I need it repeatedly, the suggested answer in printing with sed or awk a line following a matching pattern won't do the job. Again, I do not looking for just one matching place but many of it.

What I have so far:

awk '/^@/ && v++%5 {sub(/^@/, "\n#\n@")} {print > "out.bib"}' in.bib

replaces 2nd until 5th occurance (and no more).
(btw, I found and adopted this solution here: "Sed replace every nth occurrence". Initially, it was meant to replace every second occurence--which it does.)

And, second:

awk -v p="@" -v n="5" '$0~p{i++}i==n{sub(/^@/, "\n#\n@")}{print > "out.bib"}' in.bib

replaces exactly the 5th occurance and nothing else.
(adopted solution from here: "Display only the n'th match of grep"

What I need (and not able to write) is imho a loop. Would a for loop do the job? Something like:

for (i = 1; i <= 200; i * 5)
<find "@"> and <replace with "\n#\n@">
then print

The material I have looks like this:

title = {Jedno Kosova, Dva Srbije},
journal = {Ulaznica: Journal for Culture, Art and Social Issues},
author = {Karamanic, Slobodan},
year = {2007}

title = {Das Eigene, das Andere und ihre Vermischung. Zur Rolle von Sexualität und Reproduktion im Rassendiskurs des 19. Jahrhunderts},
comment = {Rest of lines snippet off here for usability -- as in following entries. All original entries may have a different amount of lines.}

title = {Inter-agency coordination in United Nations peacebuilding}

address = {Bielefeld},
title = {Subjekt}

What I want is every sixth entry looking like this:

address = {Bielefeld},
title = {Subjekt}

Thanks for your help.

Answer Source
awk -v p="@" -v n="5" '$0~p{i++}i%n==0{sub(/^@/, "\n#\n@")}{print}' in.bib > out.bib
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download