Cebs - 9 months ago 50

R Question

I’m trying to obtain the proportions of individuals that that shares certain DNA sequences between two given points. And I want to use a specific sliding window. In order to show the problem I create this example. First I create a data frame with four columns.

`x<-c(rep("sc256",times=2000),rep("sc784",times=2000))`

pos1<-round(runif(2000,100,5000),digits=0)

pos2<-round(runif(2000,100,5000),digits=0)

y3<-rep(c(2,1),times=2000)

M1<-data.frame(x,pos1,pos2,y3)

colnames(M1)=c("iid","pos1","pos2","chr")

I also create a function to obtain the proportion of individuals that have sequences in a particular interval.

`roh_island<-function(pop,chr,p1,p2){`

a<-pop[pop$chr==chr,]

island<-subset(a,pos1>=p1 & pos2<=p2)

n<-nrow(island)/length(M1$iid)

return(n)

}

roh_island(M1,1,345,700)

Now I want to transform this interval into a sliding window of size 10 that moves between values 0 and 7000. So this window will take positions [0,10);(10,20),…,(6990,7000]. I also need that the new function with the slide window stores all the windows and proportion of individuals in each in a data frame to afterwards plot it. I try some solutions that I have found regarding sliding windows I saw but I could not make them work. Thanks

Answer

Source (Stackoverflow)