user919367 - 9 months ago 55

R Question

My question is very simple. I have a data frame with various numbers in each row, more than 100 columns. First column is always a non zero number. What I want to do is replace each nonzero number in each row (excluding the first column) with the first number in the row (the value of the first column)

I would think in the lines of an ifelse and a for loop that iterates through rows but there must be a simpler vectorised way to do it...

Answer

Another approach is to use `sapply`

, which is more efficient than looping. Assuming your data is in a data frame `df`

:

```
df[,-1] <- sapply(df[,-1], function(x) {ind <- which(x==0); x[ind] = df[ind,1]; return(x)})
```

Here, we are applying the `function`

over each and all columns of `df`

except for the first column. In the `function`

, `x`

is each of these columns in turn:

- First find the row indices of the column that are zeroes using
`which`

. - Set these rows in
`x`

to the corresponding values in the rows of the first column of`df`

. - Returns the column

Note that the operations in the function are all "vectorized" over the column. That is, no looping over the rows of the column. The result from `sapply`

is a matrix of the processed columns, which replaces all columns of `df`

that are not the first column.

See this for an excellent review of the `*apply`

family of functions.

Hope this helps.