binning - converting activity start and end time into binned data for multiple groups in R dplyr tidyr -



binning - converting activity start and end time into binned data for multiple groups in R dplyr tidyr -

i have info looks this:

foo <- data.frame(userid = c("a","a","b","b","b"), activity = factor(c("x","y","z","z","x")), st=c(0, 20, 0, 10, 25), # start time et=c(20, 30, 10, 25, 30)) # end time

and want, each user, convert activity info 5 min time bins. result this:

result <- data.frame(userid = c("a", "b"), x1 = c("x", "z"), x2 = c("x", "z"), x3 = c("x", "z"), x4 = c("x", "z"), x5 = c("y", "z"), x6 = c("y", "x"))

the next approach works, quite cumbersome , slow. takes 15 minutes on modest-sized dataset.

library(dplyr) library(tidyr) lvls <- levels(foo$activity) time_bin <- function(st, et, act) { bins <- seq(0, 30, by=5) tb <- as.integer(bins>=st & bins<et)*as.integer(act) tb[tb>0] <- lvls[tb] data.frame(tb=tb, bins=bins) } new_foo <- foo %>% rowwise() %>% do(data.frame(., time_bin(.$st, .$et, .$activity))) %>% select(-(activity:et)) %>% group_by(userid) %>% subset(tb>0) %>% spread(bins, tb)

is there faster or more convenient way of going this?

you can try:

library(data.table) library(reshape2) dt = setdt(foo)[,seq(min(st)+5,max(et),5),.(userid,activity)] dcast(dt, userid~v1, value.var='activity') # userid 5 10 15 20 25 30 #1 x x x x y y #2 b z z z z z x

r binning

Comments

Popular posts from this blog

java - How to set log4j.defaultInitOverride property to false in jboss server 6 -

c - GStreamer 1.0 1.4.5 RTSP Example Server sends 503 Service unavailable -

Using ajax with sonata admin list view pagination -