json - Conditional sampling in Pig -
json - Conditional sampling in Pig -
i using elephant-bird parse nested json in pig. i'd store sample probability of sampling depends on value of binary attribute "c" in parsed json.
one way conditional sampling split relation based on value of "c", , apply sample operator both subrelations, each different sampling probability.
is there more direct , efficient way accomplish this, in 1 pass? if not, recommended way split , combine subrelations together? operating big files, efficiency concern.thank you!
json apache-pig sample random-sample
Comments
Post a Comment