fleems fleems - 27 days ago 9
R Question

SPSS, syntax: How to categorize each 260 cases as a seperate level in a single variable

We are analyzing geological data on a set that has around ~26k measurements (elemental readings) of a seashell. An identical second measurement (isotope readings) took place, it only has a 100 data points (because that's how the machine works).

Now we are trying to (cross)label each 262 cases of the elemental reading (1) to first data point of the isotope reading (2). Thus cases 1-262 = 1, 263 - 525 = 2 and et cetera.

DO IF(MISSING(interval_element)).
COMPUTE id_element=$SYSMIS.
ELSE IF (interval_element >= 1 AND interval_element < 262).
COMPUTE id_element = 1.
ELSE IF (interval_element >= 263 AND interval_element < 526).
COMPUTE id_element = 2.

*etc. to a 100.*

END IF.
EXECUTE.'


Doing it manually is an option (see above), but we have a 100 data points thus doing this with a loop/repeat command has peeked our interest for the obvious reasons.

We looked at http://www.spss-tutorials.com/spss-loop-command/, but we didn't manage to grab a solution from there. And most posts found on this site are about different or between variables, whilst we want a new variable that identifies 262 cases per level (up to a 100).

We are curious on how to do this in SPSS syntax, and are also quite curious how to do it in R. We're not that experienced but we know it is a simple question, we are prolly overlooking something small.

Thanks for reading!

EDIT

I think I figured out what the problem is. The thing is that we want the first 260 cases, without using the values. The reason I chose interval_element is because we want to segment the distances.

But looking at it now, it doesn't matter much if we aren't using its values. Using interval_element, we have 26k numbers ranging from [ .019926, 26.67770 ]). This explains why we have all one's, we never exceed the value of 262 in the data set.

Maybe I had to be a bit more clear, thought I was by saying cases instead of values, my bad.

SOLUTION

Using
id_element=1 + trunc(($CASENUM-0.1)/262)
gave NA's, after removing -0.1 it worked just fine!

The variable interval_element is distance measured in mm, ordering it wasn't a problem.

@eli-k Thanks :)

Answer

First, a more efficient way to do what you're doing: *creating an index variable. sort cases by interval_element. compute IEindex=$casenum.

recode IEindex
(1 thru 262=1)
(263 thru 524=2)
(525 thru 786=3)
.......
into id_element.

You'll still have to add 100 lines like this, to create 100 levels in variable id_element, so a more sophisticated approach is called for. Now loop and do repeat are usually used to loop through sets of variables. since you only have one variable here, This isn't a classical looping problem. So instead of looping, I suggest a simple calculation that determines id_element directly:

compute id_element=1 + trunc((IEindex-0.1)/262).
exe.

HTH.