authentication - Locally reading S3 files through Spark (or better: pyspark) -
authentication - Locally reading S3 files through Spark (or better: pyspark) -
i want read s3 file (local) machine, through spark (pyspark, really). now, maintain getting authentication errors like
java.lang.illegalargumentexception: aws access key id , secret access key must specified username or password (respectively) of s3n url, or setting fs.s3n.awsaccesskeyid or fs.s3n.awssecretaccesskey properties (respectively).
i looked everywhere here , on web, tried many things, apparently s3 has been changing on lastly year or months, , methods failed one:
pyspark.sparkcontext().textfile("s3n://user:password@bucket/key")
(note s3n
[s3
did not work]). now, don't want utilize url user , password because can appear in logs, , not sure how them ~/.aws/credentials
file anyway.
so, how can read locally s3 through spark (or, better, pyspark) using aws credentials standard ~/.aws/credentials
file (ideally, without copying credentials there yet configuration file)?
ps: tried os.environ["aws_access_key_id"] = …
, os.environ["aws_secret_access_key"] = …
, did not work.
pps: not sure "set fs.s3n.awsaccesskeyid or fs.s3n.awssecretaccesskey properties" (google did not come anything). however, did seek many ways of setting these: sparkcontext.setsystemproperty()
, sc.setlocalproperty()
, , conf = sparkconf(); conf.set(…); conf.set(…); sc = sparkcontext(conf=conf)
. nil worked.
yes, have utilize s3n
instead of s3
. s3
weird abuse of s3 benefits of unclear me.
you can pass credentials sc.hadoopfile
or sc.newapihadoopfile
calls:
rdd = sc.hadoopfile('s3n://my_bucket/my_file', conf = { 'fs.s3n.awsaccesskeyid': '...', 'fs.s3n.awssecretaccesskey': '...', })
authentication amazon-s3 apache-spark credentials pyspark
Comments
Post a Comment