I have a dataset containing something like this:
As an example of how I would approach this in SPSS, first lets read the example data into SPSS.
data list list (",") / case group val1 val2 val3 val4 (6F1.0). begin data 1,1,3,5,6,8 2,1,2,7,5,4 3,2,1,3,6,8 4,2,5,4,3,7 5,1,8,6,5,3 end data. dataset name orig.
Then we can use
SPLIT FILE and
PROXIMITIES to get our distance matrix by group. Note, as you mentioned in the comments to flodel's answer, this produces a seperate dataset we need to work with (also note case practically never matters in SPSS syntax, e.g.
split file and
SPLIT FILE are equivalent).
sort cases by group. split file by group. dataset declare dist. PROXIMITIES val1, val2, val3, val4 /STANDARDIZE = Z /MEASURE = EUCLID /PRINT = NONE /MATRIX = OUT('dist').
Unlike R, basically everything within an SPSS data matrix is like an R
data.frame, so SPLIT file near functionally replaces all the different
*ply functions in R. Very convienant, but less flexible in general. So now we need to aggregate the distances in the
dist file I saved the results to. We first sum across rows, and then sum by group via an
dataset activate dist. compute dist_sum = SUM(VAR1 to VAR3). *it appears SPSS keeps empty cases - we dont want them in the aggregation. select if MISSING(dist_sum) = 0. dataset activate dist. DATASET DECLARE dist_agg. AGGREGATE /OUTFILE='dist_agg' /BREAK=group /dist_sum = SUM(dist_sum) /N_Cases=N. dataset activate dist_agg. compute mean_dist = dist_sum /(N_Cases*(N_Cases - 1)).
Here I save the aggregated results into another dataset named
dist_agg. Because SPSS (annoyingly) saves the full distance matrix, the mean will not be
n*(n-1)/2 (as in the equivalent R syntax), but will be
n*(n-1) assuming you do not want to count the diagonal elements towards the mean. Then we can just merge these back into the
orig data file via a match files command.
*merge back into the original dataset. dataset activate orig. match files file = * /table = 'dist_agg' /by group. exe. *clean out old datasets if you like. dataset close dist. dataset close dist_agg.
The flexibility of R to go back and forth between
data.frame objects makes SPSS a bit more clunky for this job. I could write a much more concise program to do this in SPSS's
MATRIX language, but to do it across groups in
MATRIX is a pain in the butt (compared to R's