hadoop - FIles comparing field by field in PIG -
hadoop - FIles comparing field by field in PIG -
i have 2 files like
file 1
id,sal,location,code 1000,1000,jupiter,f 1001,2000,jupiter,f 1002,3000,jupiter,f 1003,4000,jupiter,f 1004,5000,jupiter,f
file 2
id,sal,location,code 1000,2000,jupiter,f 1001,2000,jupiter,z 1002,3000,jupiter,f 1003,4000,jupiter,f 1004,5000,jupiter,f
when compare file1 file 2, need output like
1000, sal 1001,code
basically, should tell me field changed previous file along id. can done in pig.
you can solve problem challenging part output format mentioned. requires little bit complex logic output format.
i have fixed of border cases can check input create sure works combinations.
file1:
1000,1000,jupiter,f 1001,2000,jupiter,f 1002,3000,jupiter,f 1003,4000,jupiter,f 1004,5000,jupiter,f
file2:
1000,2000,jupiter,f 1001,2000,jupiter,z 1002,3000,jupiter,f 1003,4000,jupiter,f 1004,5000,jupiter,f
pigscript:
= load 'file1' using pigstorage(',') (id,sal,location,code); b = load 'file2' using pigstorage(',') (id,sal,location,code); c = bring together id,b id; d = foreach c generate a::id id,((a::sal == b::sal)?'':'sal') sal, ((a::location == b::location)?'':'location') location, ((a::code == b::code)?'':'code') code; --remove mutual fields between 2 files e = filter d not (sal=='' , location=='' , code==''); --the below 2 lines used formatting output f = foreach e generate id,replace(bagtostring(tobag(sal,location,code),','),'(,,$|,$)','') finaloutput; g = foreach f generate id,replace(finaloutput,',,',','); dump g;
output:
(1000,sal) (1001,code)
hadoop mapreduce apache-pig
Comments
Post a Comment