GSxxx - 1 year ago 53

Perl Question

Given this data

`A 1.20 GBP 1.2 GBP`

B 1.2 GBP 1.20 GBP

C 01 GBP 1 GBP

D 1 GBP 01 GBP

E 1.0 GBP 1 GBP

F 1 GBP 1.0 GBP

G 2.10 GBP 3.2 GBP

H 4.1 GBP 3.20 GBP

I 04 GBP 3 GBP

J 4 GBP 03 GBP

K 4.0 GBP 3 GBP

L 4 GBP 3.0 GBP

I have to find lines where the values are different (using grep -P).

There is one space between each number and also

`3.2 = 03.20, 3.0 = 3`

I tried this

`grep -P '([1-9][0-9]*(?:\.[0-9]*[1-9])?)(\.?0*) ([A-Z]{3}) 0*(?!\1).* \3' filename`

Unfortunately it doesn't seem to work properly. I'm not actually certain about negative lookahead.

I know that there are many better ways to achieve this result.

However I'm student and it's an exercise that I have to do using grep with regular expressions.

What I have tried works until it gets more tricky tests, so if you can help, just tell me what I'm doing wrong.

The result should be:

`G 2.10 GBP 3.2 GBP`

H 4.1 GBP 3.20 GBP

I 04 GBP 3 GBP

J 4 GBP 03 GBP

K 4.0 GBP 3 GBP

L 4 GBP 3.0 GBP

I have tested my solution and it additionally returns:

`A 1.20 GBP 1.2 GBP`

B 1.2 GBP 1.20 GBP

D 1 GBP 01 GBP

I have also checked the regular expression in https://regex101.com/. And result was surprising, because for lines A and B regular expression takes only numbers after period. Check it to know what I'm saying.

I did not present whole exercise. There are, after every number, currencies and there is additional thing that they have to be the same, when so I use grep -v, it still doesn't work and it's known why. There has to be one negation.

Answer Source

You can use this bit complex regex for this task:

```
grep -P '\h+0*(?:(?:(\d+)\.?0*\h+0*\1\.?0*|(\d+\.\d*[1-9])0*\h+\g{2}0*)(*SKIP)(*F)|.*)$' file
G 2.10 3.2
H 4.1 3.20
I 04 3
J 4 03
K 4.0 3
L 4 3.0
```

PCRE verbs `(*SKIP)(*F)`

are used for skipping a match in an alternation.

**Alternatively**, you can use this negative lookahead regex as well:

```
grep -P '^\S+\h+(?!0*(?:(\d+)\.?0*\h+0*\1\.?0*|(\d+\.\d*[1-9])0*\h+\g{2}0*)$)' file
G 2.10 3.2
H 4.1 3.20
I 04 3
J 4 03
K 4.0 3
L 4 3.0
```