Corey - 1 year ago 144
R Question

# LOESS warnings/errors related to span in R

I am running a LOESS regression in R and have come across warnings with some of my smaller data sets.

Warning messages:

1: In simpleLoess(y, x, w, span, degree = degree, parametric =
parametric,  :   pseudoinverse used at -2703.9

2: In simpleLoess(y, x, w, span, degree = degree, parametric =

3: In simpleLoess(y, x, w, span, degree = degree, parametric =
parametric,  :   reciprocal condition number  0

4: In simpleLoess(y, x, w, span, degree = degree, parametric =
parametric,  :   There are other near singularities as well.
6.1623e+005

These errors are discussed in another post here:
Understanding loess errors in R .

It seems to be that these warnings are related to the span set for the LOESS regression. I am trying to apply a similar methodology that was done with other data sets where the parameters for an acceptable smoothing span was between 0.3 and 0.6. In some cases, I am able to adjust the span to avoid these issues, but in other data sets, the span had to be increased beyond the acceptable levels in order to avoid the errors/warnings.

I am curious as to what specifically these warnings mean, and whether this would be a situation where the regression is usable, but it should be noted that these warnings occurred, or if the regression is completely invalid.

Here is an example of a data set that is having issues:

``````Period  Value   Total1  Total2
-2950   0.104938272 32.4    3.4
-2715   0.054347826 46  2.5
-2715   0.128378378 37  4.75
-2715   0.188679245 39.75   7.5
-3500   0.245014245 39  9.555555556
-3500   0.163120567 105.75  17.25
-3500   0.086956522 28.75   2.5
-4350   0.171038825 31.76666667 5.433333333
-3650   0.143798024 30.36666667 4.366666667
-4350   0.235588972 26.6    6.266666667
-3500   0.228840125 79.75   18.25
-4933   0.154931973 70  10.8452381
-4350   0.021428571 35  0.75
-3500   0.0625  28  1.75
-2715   0.160714286 28  4.5
-2715   0.110047847 52.25   5.75
-3500   0.176923077 32.5    5.75
-3500   0.226277372 34.25   7.75
-2715   0.132625995 188.5   25
``````

And here is the data without the line-breaks

``````Period  Value   Total1  Total2
-2950   0.104938272 32.4    3.4
-2715   0.054347826 46  2.5
-2715   0.128378378 37  4.75
-2715   0.188679245 39.75   7.5
-3500   0.245014245 39  9.555555556
-3500   0.163120567 105.75  17.25
-3500   0.086956522 28.75   2.5
-4350   0.171038825 31.76666667 5.433333333
-3650   0.143798024 30.36666667 4.366666667
-4350   0.235588972 26.6    6.266666667
-3500   0.228840125 79.75   18.25
-4933   0.154931973 70  10.8452381
-4350   0.021428571 35  0.75
-3500   0.0625  28  1.75
-2715   0.160714286 28  4.5
-2715   0.110047847 52.25   5.75
-3500   0.176923077 32.5    5.75
-3500   0.226277372 34.25   7.75
-2715   0.132625995 188.5   25
``````

Here is the code I am using:

``````Analysis <- read.csv(file.choose(), header = T)
plot(Value ~ Period, Analysis)
a <- order(Analysis\$Period)
Analysis.lo <- loess(Value ~ Period, Analysis, weights = Total1)
pred <- predict(Analysis.lo, se = TRUE)
lines(Analysis\$Period[a], pred\$fit[a], col="red", lwd=3)
lines(Analysis\$Period[a], pred\$fit[a] - qt(0.975, pred\$df)*pred\$se[a],lty=2)
lines(Analysis\$Period[a], pred\$fit[a] + qt(0.975,pred\$df)*pred\$se[a],lty=2)
``````

First image is without jittering

Second image is with jittering

The warnings are issued because the algorithm for `loess` finds numerical difficulties, due to the fact that `Period` has a few values which are repeated a relatively large number of times, as you can see from your plot and also with:

``````table(Analysis\$Period)
``````

In that respect, `Period` behaves in fact like a discrete variable (a factor), rather than a continuous one as it would be required for a proper smoothing. Adding some jitter removes the warnings:

``````Analysis <- read.table(header = T,text="Period  Value   Total1  Total2
-2950   0.104938272 32.4    3.4
-2715   0.054347826 46  2.5
-2715   0.128378378 37  4.75
-2715   0.188679245 39.75   7.5
-3500   0.245014245 39  9.555555556
-3500   0.163120567 105.75  17.25
-3500   0.086956522 28.75   2.5
-4350   0.171038825 31.76666667 5.433333333
-3650   0.143798024 30.36666667 4.366666667
-4350   0.235588972 26.6    6.266666667
-3500   0.228840125 79.75   18.25
-4933   0.154931973 70  10.8452381
-4350   0.021428571 35  0.75
-3500   0.0625  28  1.75
-2715   0.160714286 28  4.5
-2715   0.110047847 52.25   5.75
-3500   0.176923077 32.5    5.75
-3500   0.226277372 34.25   7.75
-2715   0.132625995 188.5   25")

table(Analysis\$Period)
Analysis\$Period <- jitter(Analysis\$Period, factor=0.2)

plot(Value ~ Period, Analysis)
a <- order(Analysis\$Period)
Analysis.lo <- loess(Value ~ Period, Analysis, weights = Total1)
pred <- predict(Analysis.lo, se = TRUE)
lines(Analysis\$Period[a], pred\$fit[a], col="red", lwd=3)
lines(Analysis\$Period[a], pred\$fit[a] - qt(0.975, pred\$df)*pred\$se[a],lty=2)
lines(Analysis\$Period[a], pred\$fit[a] + qt(0.975,pred\$df)*pred\$se[a],lty=2)
``````

Increasing the `span` parameter has the effect of "squashing out", along the `Period` axis, the piles of repeated values where they occur; with small datasets you need a lot of squashing to compensate for the piling up of repeated `Period`s.

From the practical viewpoint, I would generally still trust the regression, possibly after examination of the graphical output. But I would definitely not increase `span` to achieve the squashing: it is a lot better to use a tiny amount of `jitter` for that purpose; `span` should be dictated by other considerations, such as the overall spread of your `Period` data etc.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download