r - dplyr and checking previous 24-60 months for observations -

i have bunch of unique cusip codes(unique id) , need check create sure there 24-60 previous month observations, not sure how check dplyr

reproducible illustration :

tdata <- structure(list(cusip = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,  2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,  2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,  2, 2, 2, 2, 2, 2), fyear = c("1971", "1971", "1971", "1971",  "1971", "1971", "1971", "1971", "1971", "1971", "1971", "1971",  "1972", "1972", "1972", "1972", "1972", "1972", "1972", "1972",  "1972", "1972", "1972", "1972", "1972", "1973", "1973", "1973",  "1973", "1973", "1973", "1973", "1973", "1973", "1973", "1973",  "1973", "1974", "1974", "1974", "1974", "1974", "1974", "1974",  "1974", "1974", "1974", "1974", "1974", "1975", "1975", "1975",  "1975", "1975", "1975", "1975", "1975", "1975", "1975", "1975" ), datadate = c(19711231l, 19710129l, 19710226l, 19710331l, 19710430l,  19710528l, 19710630l, 19710730l, 19710831l, 19710930l, 19711029l,  19711130l, 19721231l, 19720131l, 19720229l, 19720330l, 19720428l,  19720531l, 19720630l, 19720731l, 19720831l, 19720929l, 19721031l,  19721130l, 19721229l, 19731231l, 19730131l, 19730228l, 19730330l,  19730430l, 19730531l, 19730629l, 19730731l, 19730831l, 19730928l,  19731031l, 19731130l, 19741231l, 19740131l, 19740228l, 19740329l,  19740430l, 19740531l, 19740628l, 19740731l, 19740830l, 19740930l,  19741031l, 19741129l, 19751231l, 19750131l, 19750228l, 19750331l,  19750430l, 19750530l, 19750630l, 19750731l, 19750829l, 19750930l,  19751031l)), .names = c("cusip", "fyear", "datadate"), class = c("grouped_df",  "tbl_df", "tbl", "data.frame"), row.names = c(na, -60l), vars = list(     cusip, fyear), drop = true, indices = list(0:11, 12:24, 25:36,      37:48, 49:59), group_sizes = c(12l, 13l, 12l, 12l, 11l), biggest_group_size = 13l, labels = structure(list(     cusip = c(2, 2, 2, 2, 2), fyear = c("1971", "1972", "1973",      "1974", "1975")), class = "data.frame", row.names = c(na,  -5l), .names = c("cusip", "fyear"), vars = list(cusip, fyear)))

logic

i thinking checking total months each year, don't know know extract previous months check if 24/60 >= 0.4. how edited code check previous 60 months , ensure there @ to the lowest degree 24 months including....

tdata %>%    group_by(cusip, fyear) %>%    mutate(month = substr(datadate, 5, 6) %>%    mutate(pre_countmonths = length(unique(month))

edit 04/07/2015 :

here logic next loops. 1 of challenges i'm having r branching outside of loops. possible way edit dplyr can utilized instead of for loops? take exclusively long run current data.

for(i in min(tdata$cusip):max(tdata$cusip)){      (j in min(tdata$fyear):max(tdata$fyear) {       monthcheck <- filter(tdata, cusip == & (fyear == j-1 | fyear == j-2 | fyear == j-3 | fyear == j-4))       if(length(monthcheck$month) / 40 >= 0.4) if(any(tdata$fyear == j)) tdata$check <- 1  }}

edit : 04/08/2015 - added total sample dataset main variables

small subset : https://www.dropbox.com/s/mf0o0tbgbame6k8/testdata.csv?dl=0

this got within time limit. hope gives ideas, , other users provide improve solutions.

mydf <- as_data_frame(list(cusip = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,  2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,  2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,  2, 2, 2, 2, 2, 2), fyear = c("1971", "1971", "1971", "1971",  "1971", "1971", "1971", "1971", "1971", "1971", "1971", "1971",  "1972", "1972", "1972", "1972", "1972", "1972", "1972", "1972",  "1972", "1972", "1972", "1972", "1972", "1973", "1973", "1973",  "1973", "1973", "1973", "1973", "1973", "1973", "1973", "1973",  "1973", "1974", "1974", "1974", "1974", "1974", "1974", "1974",  "1974", "1974", "1974", "1974", "1974", "1975", "1975", "1975",  "1975", "1975", "1975", "1975", "1975", "1975", "1975", "1975" ), datadate = c(19711231l, 19710129l, 19710226l, 19710331l, 19710430l,  19710528l, 19710630l, 19710730l, 19710831l, 19710930l, 19711029l,  19711130l, 19721231l, 19720131l, 19720229l, 19720330l, 19720428l,  19720531l, 19720630l, 19720731l, 19720831l, 19720929l, 19721031l,  19721130l, 19721229l, 19731231l, 19730131l, 19730228l, 19730330l,  19730430l, 19730531l, 19730629l, 19730731l, 19730831l, 19730928l,  19731031l, 19731130l, 19741231l, 19740131l, 19740228l, 19740329l,  19740430l, 19740531l, 19740628l, 19740731l, 19740830l, 19740930l,  19741031l, 19741129l, 19751231l, 19750131l, 19750228l, 19750331l,  19750430l, 19750530l, 19750630l, 19750731l, 19750829l, 19750930l,  19751031l)))  #  create normal data.frame mydf <- data.frame(mydf)  # create   info frame new cusip mydf2 <- mutate(mydf, cusip = 3)  ### create new   info frame missing 1   info point foo <- bind_rows(mydf, mydf2[-4, ])

in pseudo data, cusip 3 missing 1 month data. means, not have consecutive 24-60-month info cusip 3. first, created column month , column date object. then, ordered info cusp , datadate. wanted select info points remain between 24-60 month period. first filter part. grouped info cusp. using month, wanted check if have consecutive info points or not. expect lead(month)-month = 1, 11, or 0. if have 2 info points same month, expect 0. happening in data. final filter can revise. here, wanted remove cusip got false in check. in draft, filter seems doing right thing; not see info of cusip 3 in end. hope helps you.

mutate(foo, month = as.numeric(substr(datadate, 5, 6))) %>% mutate(datadate = as.posixct(gsub("^(\\d{4})(\\d{2}).*$", "\\1-\\2-01", datadate),                   format("%y-%m-%d"), tz = "gmt")) %>%   arrange(cusip, datadate) %>%                         filter(between(datadate,         datadate[tail(which(month == 6, arr.ind = true), n = 1)] - (60*60*24*30*60),        datadate[tail(which(month == 6, arr.ind = true), n = 1)] -(60*60*24*30*24))) %>% group_by(cusip) %>% mutate(check = abs(lead(month)-month) == 11|abs(lead(month)-month) == 1|abs(lead(month)-month) == 0) %>% filter(all(check == true | check %in% na))

r dplyr

Search This Blog

Five

r - dplyr and checking previous 24-60 months for observations -

Comments

Post a Comment

Popular posts from this blog

java - How to set log4j.defaultInitOverride property to false in jboss server 6 -

c - GStreamer 1.0 1.4.5 RTSP Example Server sends 503 Service unavailable -

Using ajax with sonata admin list view pagination -