Alberto Mnemon - 2 months ago 33

R Question

I have recently posted a "very newie to R" question about the correct way of doing this, if you are interested in it you can find it [here].1

I have now managed to develop a simple R script that does the job, but now the results are what troubles me.

Long story short I'm using R to analyze

`lpp`

`mad.test`

`p.value`

These are the two not randomly distributed lpps.

Looking at them you can see some kind of clusters in the first one, but the second one only has three points, and seems to me that there is no way one can assure only three points are not corresponding to a random distribution. There are other tracks with one, two, three points but they all fall into the "random" lpps category, so I don't know why this one is different.

So here is the question:

I have also noticed that these two lpps have a much lower

`$statistic$rank`

`$statistic$rank`

My R script and all the shp files can be downloaded from here(850 Kb).

Thank you so much for your help.

Answer Source

It is impossible to give an universal answer to the question about how many points is needed for an analysis. Usually 0, 1 and 2 are too few for a standalone analysis. However, if they are part of repeated measurements of the same thing they might be interesting still. Also, I would normally say that your example with 3 points is too few to say anything interesting. However, an extreme example would be if you have a single long line segment where one point occurs close to one end and two other occur close to each other at the other end. This is not so likely to happen for CSR and you may be inclined to not believe that hypothesis. This appears to be what happened in your case.

Regarding your question about the rank you might want to read a bit more up on the Monte Carlo test you are preforming. Basically, you summarise the point pattern by a single number (maximum absolute deviation of linear K) and then you look at how extreme this number is compared to numbers generated at random from CSR. Assuming you use 99 simulations of CSR you have 100 numbers in total. If your data ranks as the most extreme (`$statistic$rank==1`

) among these it has p-value 1%. If it ranks as the 50th number the p-value is 50%. If you used another number of simulations you have to calculate accordingly. I.e. with 199 simulations rank 1 is 0.5%, rank 2 is 1%, etc.