Martin Petri Bagger - 4 months ago 37

R Question

I am using the string-edit distance (Levenshtein-distance) to compare scan paths from an eye tracking experiment. *(Right now I am using the *

`stringdist`

Basically the letters of the strings refer to (gaze) position in a 6x4 matrix. The matrix is configured as follows:

`[,1] [,2] [,3] [,4]`

[1,] 'a' 'g' 'm' 's'

[2,] 'b' 'h' 'n' 't'

[3,] 'c' 'i' 'o' 'u'

[4,] 'd' 'j' 'p' 'v'

[5,] 'e' 'k' 'q' 'w'

[6,] 'f' 'l' 'r' 'x'

If I use the basic Levenshtein distance to compare strings, the comparison of

`a`

`g`

`a`

`x`

E.g.:

`'abc' compared to 'agc' -> 1`

'abc' compared to 'axc' -> 1

This means that the strings are equally (dis)similar

I would like to be able to put weights on the string comparison in a way that incorporates adjacency in the matrix. E.g. the distance between

`a`

`x`

`a`

`g`

One way could be to calculate the "walk" (horizontal and vertial steps) from one letter to the other in the matrix and divide by the max "walk"-distance (i.e. from

`a`

`x`

`a`

`g`

`a`

`x`

Is there a way to implement this (in either R or python)?

Answer

You need a version of the Wagner-Fisher algorithm that uses non-unit cost in its inner loop. I.e. where the usual algorithm has `+1`

, use `+del_cost(a[i])`

, etc. and define `del_cost`

, `ins_cost`

and `sub_cost`

as functions taking one or two symbols (probably just table lookups).