jake wong jake wong - 3 months ago 11
Python Question

Manipulation of values in Pandas via Regex

This is actually a follow up question of here. I had not been clear in my previous question, and since it has been answered, I felt it was better to post a new question instead.

I have a dataframe like below:

Column1 Column2 Column3 Column4 Column5
5FQ 1.047 S$55.3 UG44.2 as of 02/Jun/2016 S$8.2 mm
600 (1.047) S$23.3 AG5.6 as of 02/Jun/2016 S$58 mm
KI2 1.695 S$5.35 RR59.5 as of 02/Jun/2016 S$705 mm
88G 0.0025 S$(5.3) NW44.2 as of 02/Jun/2016 S$112 mm
60G 5.63 S$78.4 UG21.2 as of 02/Jun/2016 S$6.21 mm
90F (5.562) S$(88.3) IG46.2 as of 02/Jun/2016 S$8 mm


I am trying to use
regex
to drop all the words and letters, only keeping the numbers. However, if the number is enclosed within a
()
, I would like to make it a negative number instead.

Desired output

Column1 Column2 Column3 Column4 Column5
5 1.047 55.3 44.2 8.2
600 -1.047 23.3 5.6 58
2 1.695 5.35 59.5 705
88 0.0025 -5.3 44.2 112
60 5.63 78.4 21.2 6.21
90 -5.562 -88.3 46.2 8


Would this be possible? I've tried playing around with this code, but was not sure what the appropriate
regex
combination should be.

df.apply(lambda x: x.astype(str).str.extract(r'(\d+\.?\d*)', expand=True).astype(np.float))

Answer
r1 = r'\((\d+\.?\d*)\)'
r2 = r'(-?\d+\.?\d*)'
df.stack().str.replace(r1, r'-\1', 1) \
          .str.extract(r2, expand=False).unstack()

enter image description here