user3192046 user3192046 - 7 days ago 5
R Question

Regular expression to extract text from Dataframe and insert to new column

I have been hunting through all the posts on regular expression and yet cannot seem to make this work for me.

Example of line (some words are redacted or changed)




Df$text: "CommonWord #79 - EVENT type for 1200 seconds [Objects] xxx.xxx.xxx.xxx/## xxx.xxx.xxx.xxx/## Port: ##



  1. I want to extract the numeric value after the # and place it in a new column
    I tried:
    df$number <- sub("\\#([0-9]{2,4}).*", "\\1", df$text)






Resolved part 1
df$number <- sub(".\#(\d{1,4}).", "\1", df$text)




The result is "CommonWord 79" I cant seem to find the right regex to remove the first word.


  1. The next regex I want to pull "EVENT type" and put into another column. Both "EVENT" and "type" can change so I would need to pull text after the "- " and before the " for".


    1. The last two regexes I need are for the the IP addresses and subnet mask and then the port number (number only). I need all of this to new columns.







Sorry for the long winded question. Been beating my head on this one

Answer

you just had to add ".*" to indicate any #character before the number

sub(".*\\#([0-9]{2,4}).*", "\\1", x)

# to create a new column

 df$new_col <- as.numeric(sub(".*\\#([0-9]{2,4}).*", "\\1", df$text))