Yohan Obadia Yohan Obadia - 4 months ago 9
R Question

Getting 2 substrings/groups before and after last nth "_"

Lets look at an exemple:

abc_def_ghi_jkl


If I choose
n = 1
, I want the output to be:

group1 = abc_def_ghi
group2 = jkl


If I choose
n = 2
I want the output to be:

group1 = abc_def
group2 = ghi_jkl


Note The
_
that seperated the two groups is removed.

For now I only figured out how to select the last group but it also select the
_
:

(?:.(?!(?=\_)))+$


Note2 I am currently focusing on the regex part but it is a code to be used in R if it helps to get to a solution.

Answer

A possibility to split on the nth occurrence of _ from the end of the string:

strsplit("abc_def_ghi_jkl", "_(?=([^_]*_){0}[^_]*$)", perl = T)
                                     #    ^
                                     #  you can modify the quantifier here
#[[1]]                                         
#[1] "abc_def_ghi" "jkl"                    # split on the 1st

strsplit("abc_def_ghi_jkl", "_(?=([^_]*_){1}[^_]*$)", perl = T)
#[[1]]
#[1] "abc_def" "ghi_jkl"                    # split on the 2nd

strsplit("abc_def_ghi_jkl", "_(?=([^_]*_){2}[^_]*$)", perl = T)
#[[1]]
#[1] "abc"         "def_ghi_jkl"            # split on the 3rd

_(?=([^_]*_){2}[^_]*$) looks for _ before the pattern ([^_]*_){2}[^_]*$ via ?= look ahead syntax and the pattern starts from the end of the string $ and skips any non _ patterns [^_]* and matches ([^_]*_) for certain number of occurrences and after that split on the specified _.

Update with str_match from stringr package:

str_match("abc_def_ghi_jkl", "(.*)_((?:[^_]*_){0}[^_]*$)")[,2:3]
# [1] "abc_def_ghi" "jkl"     

str_match("abc_def_ghi_jkl", "(.*)_((?:[^_]*_){1}[^_]*$)")[,2:3]
# [1] "abc_def" "ghi_jkl"

str_match("abc_def_ghi_jkl", "(.*)_((?:[^_]*_){2}[^_]*$)")[,2:3]
# [1] "abc"         "def_ghi_jkl"