hadoop - FIles comparing field by field in PIG -



hadoop - FIles comparing field by field in PIG -

i have 2 files like

file 1

id,sal,location,code 1000,1000,jupiter,f 1001,2000,jupiter,f 1002,3000,jupiter,f 1003,4000,jupiter,f 1004,5000,jupiter,f

file 2

id,sal,location,code 1000,2000,jupiter,f 1001,2000,jupiter,z 1002,3000,jupiter,f 1003,4000,jupiter,f 1004,5000,jupiter,f

when compare file1 file 2, need output like

1000, sal 1001,code

basically, should tell me field changed previous file along id. can done in pig.

you can solve problem challenging part output format mentioned. requires little bit complex logic output format.

i have fixed of border cases can check input create sure works combinations.

file1:

1000,1000,jupiter,f 1001,2000,jupiter,f 1002,3000,jupiter,f 1003,4000,jupiter,f 1004,5000,jupiter,f

file2:

1000,2000,jupiter,f 1001,2000,jupiter,z 1002,3000,jupiter,f 1003,4000,jupiter,f 1004,5000,jupiter,f

pigscript:

= load 'file1' using pigstorage(',') (id,sal,location,code); b = load 'file2' using pigstorage(',') (id,sal,location,code); c = bring together id,b id; d = foreach c generate a::id id,((a::sal == b::sal)?'':'sal') sal, ((a::location == b::location)?'':'location') location, ((a::code == b::code)?'':'code') code; --remove mutual fields between 2 files e = filter d not (sal=='' , location=='' , code==''); --the below 2 lines used formatting output f = foreach e generate id,replace(bagtostring(tobag(sal,location,code),','),'(,,$|,$)','') finaloutput; g = foreach f generate id,replace(finaloutput,',,',','); dump g;

output:

(1000,sal) (1001,code)

hadoop mapreduce apache-pig

Comments

Popular posts from this blog

java - How to set log4j.defaultInitOverride property to false in jboss server 6 -

c - GStreamer 1.0 1.4.5 RTSP Example Server sends 503 Service unavailable -

Using ajax with sonata admin list view pagination -