java - Reading delimited data from text file into different RDDs -
java - Reading delimited data from text file into different RDDs -
am new apache spark java have text file delimited space below
3,45.25,23.45 5,22.15,19.35 4,33.24,12.45 2,15.67,21.22
here columns mean:
1st column: index value 2nd column: latitude values 3rd column: longitude valuesam trying parse info 2 or 3 rdds (or pair rdds). code far:
javardd<string> info = sc.textfile("hdfs://data.txt"); javardd<double> data1 = data.flatmap( new flatmapfunction<string, double>() { public iterable<double> call(double data) { homecoming arrays.aslist(data.split(",")); } });
something (use java 8 improve readability)?
javardd<string> info = sc.textfile("hdfs://data.txt"); javardd<tuple3<integer, float, float>> parseddata = data.map((line) -> line.split(",")) .map((line) -> new tuple3<>(parseint(line[0]), parsefloat(line[1]), parsefloat(line[2]))) .cache(); // cache parsed avoid recomputation in subsequent .maptopair calls javapairrdd<integer, float> latitudebyindex = parseddata.maptopair((line) -> new tuple2<>(line._1(), line._2())); javapairrdd<integer, float> longitudebyindex = parseddata.maptopair((line) -> new tuple2<>(line._1(), line._3())); javapairrdd<integer, tuple2<float, float>> pointbyindex = parseddata.maptopair((line) -> new tuple2<>(line._1(), new tuple2<>(line._2(), line._3())));
java apache-spark
Comments
Post a Comment