authentication - Locally reading S3 files through Spark (or better: pyspark) -
authentication - Locally reading S3 files through Spark (or better: pyspark) -
i want read s3 file (local) machine, through spark (pyspark, really). now, maintain getting authentication errors like
java.lang.illegalargumentexception: aws access key id , secret access key must specified username or password (respectively) of s3n url, or setting fs.s3n.awsaccesskeyid or fs.s3n.awssecretaccesskey properties (respectively).
i looked everywhere here , on web, tried many things, apparently s3 has been changing on lastly year or months, , methods failed one:
pyspark.sparkcontext().textfile("s3n://user:password@bucket/key") (note s3n [s3 did not work]). now, don't want utilize url user , password because can appear in logs, , not sure how them ~/.aws/credentials file anyway.
so, how can read locally s3 through spark (or, better, pyspark) using aws credentials standard ~/.aws/credentials file (ideally, without copying credentials there yet configuration file)?
ps: tried os.environ["aws_access_key_id"] = … , os.environ["aws_secret_access_key"] = …, did not work.
pps: not sure "set fs.s3n.awsaccesskeyid or fs.s3n.awssecretaccesskey properties" (google did not come anything). however, did seek many ways of setting these: sparkcontext.setsystemproperty(), sc.setlocalproperty(), , conf = sparkconf(); conf.set(…); conf.set(…); sc = sparkcontext(conf=conf). nil worked.
yes, have utilize s3n instead of s3. s3 weird abuse of s3 benefits of unclear me.
you can pass credentials sc.hadoopfile or sc.newapihadoopfile calls:
rdd = sc.hadoopfile('s3n://my_bucket/my_file', conf = { 'fs.s3n.awsaccesskeyid': '...', 'fs.s3n.awssecretaccesskey': '...', }) authentication amazon-s3 apache-spark credentials pyspark
Comments
Post a Comment