Moleman Moleman - 29 days ago 8
C# Question

SVG Path Data Regex C#

I am looking for a regex that matches each of the individual values within an SVG path.

m4507-396.4c-2.2-3.5-2.6-7.3-0.2-11.4M4545.5-428.7c3.5 1.2 7 2.7 9.9 7.3 5.5 8.3 3.7 20.7 3 22.4-2.6 6.6-6.5 9.4-10.1 12.7-1.6 6-6 4.8-9.7 5.4-3.9 3.5-9.4 4.3-11.8 4.3-11.8 0.1

m3918.1-733.6c-7.7-0.7-18-5.3-23.5-10.5-5.6-5.2-11.6-15.5-12.1-20.5-0.3-3.2 0.4-4.5 3.2-5.7 1.8-0.8 2.6-0.8 7.2-0.6 4.6 0.2 12.9 1.6 13.5 2.2 0.1 0.1-1 2.3-2.6 4.9-4.3 7.1-4.3 7.2-3.6 8.9 0.8 2 4.2 4.7 8 6.6 2.8 1.4 3.6 1.6 6.2 1.6l3 0 3.4-5.4c1.9-3 3.5-5.4 3.7-5.5 0.5-0.2 7.2 10.1 8.5 13 0.7 1.5 1.3 3.6 1.4 5.1 0.2 2.3 0.1 2.6-1.1 3.7-2.2 2-8 2.9-15.2 2.2z

m 3726.1737,-460.61233 36.0937,-2.74129 c 8.4162,-1.4953 14.662,-7.69317 13.4018,-30.15418 -13.0333,-2.66897 -13.7567,-3.44411 -16.7523,-4.56882


I have included 3 examples of how the path data might be formatted.

The values may be seperated by commas or whitespace and two values may be 'touching' by the '-' character so it would need to split those as well.

First Path Data Match Example:- Third Path Data Match Example:-
Match 1: m Match 1: m
Match 2: 4507 Match 2: 3726.1737
Match 3: 396.4 Match 3:-460.61233
Match 4: c Match 4: -2.74129
Match 5: c


So on and so forth. You get the idea. Has anyone come across this before?

Answer Source

A lexer that does not care about semantics is easy:

\([mzlhvcsqta]|[\+\-]?(\d*\.\d+|\d+\.?)([eE][\+\-]?\d+)*)\i

SVG path data consist of single-letter commands and numbers. Each token is either a command letter:

[mzlhvcsqta]

Each of them can occur in upper or lower case, therefore the search must be performed case-insensitive (\i).

Or it is a number: (split into muliple lines for readability)

[\+\-]?            // an optional sign
(\d*\.\d+|\d+\.?)  // an integer or a decimal number
([eE][\+\-]?\d+)*  // the exponential part of the scientific notation

The next thing to do would be to analyze the sequence of tokens. For example the count of numbers following a command can vary between 1 and 7, some of them can be any number, others only positive or flags (0 or 1). But that is not part of a lexer, and would also probably overly strain what a RegEx can do.