bosa djo bosa djo - 4 months ago 20
Linux Question

Regex match last occurrence of all characters between two strings

I'm trying to extract the torrent name from torrent files.
Without looking to deep in how torrent files are structured I noticed that I only need to match last occurrence of all characters between two strings which in my case are

:
*
12:piece lengthi
.

Here is the beginning of Arch Linux iso torrent file:

d8:announce42:http://tracker.archlinux.org:6969/announce7:comment41:Arch Linux 2015.07.01 (www.archlinux.org)10:created by13:mktorrent 1.013:creation datei1435770645e4:infod6:lengthi677380096e4:name29:archlinux-2015.07.01-dual.iso12:piece lengthi


I need to extract
archlinux-2015.07.01-dual.iso
witch is in between
:
and
12:piece lengthi
. I checked this pattern with other torrent files in my case it will work! I can't figure out how to combine the regex
(?<=:)(.*)(?=12:piece lengthi)
and
:(?:.(?!:))+$
if they are even correct at all.

I'm trying to make a bash script with
grep
OR
awk
OR
sed
or something that could with a linux command.

Final perfectly working solution (thoroughly tested):
This works with all types of non-standard characters for example Cyrillic
.

torrent_title=$(tr -d "\n" < "$filename" | iconv -f utf-8 -t utf-8 -c | sed 's/.*:\(.*\)12:piece lengthi.*/\1/')


Update:All suggestion work but Torrent files are binary files for example I tried
grep --text
and
strings file |
piped to grep or sed but random strings from the binary file are messing up the output.

Update 2 and SOLVED IT: so the final command is this

head -1 file.torrent| strings | tr -d "\n\r" | iconv -f utf-8 -t utf-8 -c| sed 's/.*:\(.*\)12:piece lengthi.*/\1/


I figured that the info is only in the first line of the file.
In my original example post I forgot to copy a couple of more strings at the end

d8:announce42:http://tracker.archlinux.org:6969/announce7:comment41:Arch Linux 2015.07.01 (www.archlinux.org)10:created by13:mktorrent 1.013:creation datei1435770645e4:infod6:lengthi677380096e4:name29:archlinux-2015.07.01-dual.iso12:piece lengthi524288e6:pieces25840:


witch are part of the first line so for that I needed to slightly change hek2mgl sed
answer.

Update 3 The right way to do it is to use a parser, I learned it the hard way.

Answer

I would use sed for that, like this:

sed 's/.*:\(.*\)12:piece lengthi/\1/' input.torrent