Sina Sh Sina Sh - 1 year ago 89
Python Question

Extract multiple substrings from a file and list them in another place using python/shell

I've got a log file similar to below:

/* BUG: axiom too complex: SubClassOf(ObjectOneOf([NamedIndividual(]),DataHasValue(DataProperty(,^^(periodic, */
/* BUG: axiom too complex: SubClassOf(ObjectOneOf([NamedIndividual(]),DataHasValue(DataProperty(,^^(latency, */

I want to extract the fields of t_Xi_xi, t_Ziz ,XoX_type and YoY_type and also the values after ^^( which in this case are latency and periodic.

Note: There are different alphabetic values for each X and Y in the file (e.g. X="sina" Y="Boom" so --> t_Xi_xi ~ t_Sina_sina) so I guess using the regex would be a better choice.

So the final result must be something like below:

t_Xi_xi XoX_type periodic
t_Ziz YoY_type latency

I've tried the regex below to extract them and hopefully to be able to replace the rest of it to " " in the file with the help of sed in shell, but I failed.


Any kind of help is appreciated on how to do this in Python (or even shell itself).

Answer Source
$ awk -F'#|\\^\\^\\(' '{for (i=2; i<NF; i++) printf "%s%s", gensub(/[^[:alnum:]_].*/,"",1,$i), (i<(NF-1) ? OFS : ORS) }' file
t_Xi_xi XoX_type periodic
t_Ziz YoY_type latency

The above uses GNU awk for gensub(), with other awks you'd use sub() and a separate printf statement.