CodeGuy - 1 year ago 104

R Question

I have 5 (x,y) data points and I'm trying to find a best fit solution consisting of two lines which intersect at a point (x0,y0), and which follow these equations:

`y1 = (m1)(x1 - x0) + y0`

y2 = (m2)(x2 - x0) + y0

Specifically, I require that the intersection must occur between x=2 and x=3. Have a look at the code:

`#Initialize x1, y1, x2, y2`

x1 <- c(1,2)

y1 <- c(10,10)

x2 <- c(3,4,5)

y2 <- c(20,30,40)

g <- c(TRUE, TRUE, FALSE, FALSE, FALSE)

q <- nls(c(y1, y2) ~ ifelse(g == TRUE, m1 * (x1 - x0) + y0, m2 * (x2 - x0) + y0), start = c(m1 = -1, m2 = 1, y0 = 0, x0 = 2), algorithm = "port", lower = c(m1 = -Inf, m2 = -Inf, y0 = -Inf, x0 = 2), upper = c(m1 = Inf, m2 = Inf, y0 = Inf, x0 = 3))

coef <- coef(q)

m1 <- coef[1]

m2 <- coef[2]

y0 <- coef[3]

x0 <- coef[4]

#Plot the original x1, y1, and x2, y2

plot(x1,y1,xlim=c(1,5),ylim=c(0,50))

points(x2,y2)

#Plot the fits

x1 <- c(1,2,3,4,5)

fit1 <- m1 * (x1 - x0) + y0

lines(x1, fit1, col="red")

x2 <- c(1,2,3,4,5)

fit2 <- m2 * (x2 - x0) + y0

lines(x2, fit2, col="blue")

So, you can see the data points listed there. Then, I run it through my nls, get my parameters

`m1`

`m2`

`x0`

`y0`

But, take a look at the solution:

Clearly, the red line (which is supposed to only be based on the first 2 points) is not the best line of fit for the first 2 points. This is the same case with the blue line (the 2nd fit), which supposed to be is dependent on the last 3 points). What is wrong here?

Recommended for you: Get network issues from **WhatsUp Gold**. **Not end users.**

Answer Source

I'm not exactly sure what's wrong but I can get it to work by rearranging things a bit. Please note the comment in `?nls`

about "*Do not use ‘nls’ on artificial "zero-residual" data.*"; I added a bit of noise.

```
## Initialize x1, y1, x2, y2
x1 <- c(1,2)
y1 <- c(10,10)
x2 <- c(3,4,5)
y2 <- c(20,30,40)
## make single x, y vector
x <- c(x1,x2)
set.seed(1001)
## (add a bit of noise to avoid zero-residual artificiality)
y <- c(y1,y2)+rnorm(5,sd=0.01)
g <- c(TRUE,TRUE,FALSE,FALSE,FALSE) ## specify identities of points
## particular changes:
## * you have lower=upper=2 for x0. Did you want 2<x0<3?
## * specified data argument explicitly (allows use of predict() etc.)
## * changed name from 'q' to 'fit1' (avoid R built-in function)
fit1 <- nls(y ~ ifelse(g,m1,m1+delta_m)*(x - x0) + y0,
start = c(m1 = -1, delta_m = 2, y0 = 0, x0 = 2),
algorithm = "port",
lower = c(m1 = -Inf, delta_m = 0, y0 = -Inf, x0 = 2),
upper = c(m1 = Inf, delta_m = Inf, y0 = Inf, x0 = 3),
data=data.frame(x,y))
#Plot the original 'data'
plot(x,y,col=rep(c("red","blue"),c(2,3)),
xlim=c(1,5),ylim=c(0,50))
## add predicted values
xvec <- seq(1,5,length.out=101)
lines(xvec,predict(fit1,newdata=data.frame(x=xvec)))
```

**edit**: based `ifelse`

clause on point identity, not x position

**edit**: changed to require second slope to be > first slope

On a second look, I think the issue above is **probably** due to the use of separate vectors for `x1`

and `x2`

above, rather than a single `x`

vector: I suspect these got replicated by R to match up with the `g`

vector, which would have messed things up pretty badly. For example, this stripped-down example:

```
g <- c(TRUE, TRUE, FALSE, FALSE, FALSE)
ifelse(g,x1,x2)
## [1] 1 2 5 3 4
```

shows that `x2`

gets extended to `(3 4 5 3 4)`

before being used in the `ifelse`

clause. The scariest part is that normally one gets a warning such as this:

```
> x2 + 1:5
[1] 4 6 8 7 9
Warning message:
In x2 + 1:5 :
longer object length is not a multiple of shorter object length
```

but in this case there is no warning ...

Recommended from our users: **Dynamic Network Monitoring from WhatsUp Gold from IPSwitch**. ** Free Download**