使用密码从 pyspark 连接到 mongodb 时出错
Error connecting from pyspark to mongodb with password
我正在 运行 关注与 mongodb
相关的 pyspark 代码
sparkConf = SparkConf().setMaster("local").setAppName("MongoSparkConnectorTour").set("spark.app.id", "MongoSparkConnectorTour")
# If executed via pyspark, sc is already instantiated
sc = SparkContext(conf=sparkConf)
sqlContext = SQLContext(sc)
# create and load dataframe from MongoDB URI
df = sqlContext.read.format("com.mongodb.spark.sql.DefaultSource")\
.option("spark.mongodb.input.uri", config.MONGO_URL_AUTH + "/spark.times")\
.load()
在 Docker 图像中
CMD [ "spark-submit" \
, "--conf", "spark.mongodb.input.uri=mongodb://root:example@mongodb:27017/spark.times" \
, "--conf", "spark.mongodb.output.uri=mongodb://root:example@mongodb:27017/spark.output" \
, "--packages", "org.mongodb.spark:mongo-spark-connector_2.11:2.4.1" \
, "./spark.py" ]
config.MONGO_URL_AUTH
是 mongodb://root:example@mongodb:27017
但我在 运行 上遇到异常:
db_1 | 2019-10-09T13:44:34.354+0000 I ACCESS [conn4] Supported SASL mechanisms requested for unknown user 'root@spark'
db_1 | 2019-10-09T13:44:34.378+0000 I ACCESS [conn4] SASL SCRAM-SHA-1 authentication failed for root on spark from client 172.22.0.4:49302 ; UserNotFound: Could not find user "root" for db "spark"
pyspark_1 | Traceback (most recent call last):
pyspark_1 | File "/home/ubuntu/./spark.py", line 35, in <module>
pyspark_1 | .option("spark.mongodb.input.uri", config.MONGO_URL_AUTH + "/spark.times")\
pyspark_1 | File "/home/ubuntu/spark-2.4.4-bin-hadoop2.7/python/pyspark/sql/readwriter.py", line 172, in load
pyspark_1 | return self._df(self._jreader.load())
pyspark_1 | File "/home/ubuntu/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
pyspark_1 | File "/home/ubuntu/spark-2.4.4-bin-hadoop2.7/python/pyspark/sql/utils.py", line 63, in deco
pyspark_1 | return f(*a, **kw)
pyspark_1 | File "/home/ubuntu/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
pyspark_1 | py4j.protocol.Py4JJavaError: An error occurred while calling o34.load.
pyspark_1 | : com.mongodb.MongoSecurityException: Exception authenticating MongoCredential{mechanism=SCRAM-SHA-1, userName='root', source='spark', password=<hidden>, mechanismProperties={}}
pyspark_1 | at com.mongodb.internal.connection.SaslAuthenticator.wrapException(SaslAuthenticator.java:173)
如果我不在 mongodb docker 中使用用户名和密码,而只是使用 mongodb://mongodb:27017
地址连接,并且只是使用 pymongo
我可以用密码连接,当使用密码时,我的 spark to mongodb 配置有问题,我不明白是什么问题。
mongodb 的设置(docker-撰写文件的一部分):
db:
image: mongo
restart: always
networks:
miasnet:
aliases:
- "miasdb"
environment:
MONGO_INITDB_ROOT_USERNAME: root
MONGO_INITDB_ROOT_PASSWORD: example
MONGO_INITDB_DATABASE: spark
ports:
- "27017:27017"
volumes:
- /data/db:/data/db
https://hub.docker.com/_/mongo 读作:
MONGO_INITDB_ROOT_USERNAME
, MONGO_INITDB_ROOT_PASSWORD
These variables, used in conjunction, create a new user and set that user's password. This user is created in the admin authentication database and given the role of root, which is a "superuser" role.
您没有指定身份验证数据库,因此在您的情况下 mongo uses current database by default - spark
。
您需要在连接字符串中指定“admin”授权数据库:
spark.mongodb.input.uri=mongodb://root:example@mongodb:27017/spark.times?authSource=admin
spark.mongodb.output.uri=mongodb://root:example@mongodb:27017/spark.output?authSource=admin
我正在 运行 关注与 mongodb
相关的 pyspark 代码sparkConf = SparkConf().setMaster("local").setAppName("MongoSparkConnectorTour").set("spark.app.id", "MongoSparkConnectorTour")
# If executed via pyspark, sc is already instantiated
sc = SparkContext(conf=sparkConf)
sqlContext = SQLContext(sc)
# create and load dataframe from MongoDB URI
df = sqlContext.read.format("com.mongodb.spark.sql.DefaultSource")\
.option("spark.mongodb.input.uri", config.MONGO_URL_AUTH + "/spark.times")\
.load()
在 Docker 图像中
CMD [ "spark-submit" \
, "--conf", "spark.mongodb.input.uri=mongodb://root:example@mongodb:27017/spark.times" \
, "--conf", "spark.mongodb.output.uri=mongodb://root:example@mongodb:27017/spark.output" \
, "--packages", "org.mongodb.spark:mongo-spark-connector_2.11:2.4.1" \
, "./spark.py" ]
config.MONGO_URL_AUTH
是 mongodb://root:example@mongodb:27017
但我在 运行 上遇到异常:
db_1 | 2019-10-09T13:44:34.354+0000 I ACCESS [conn4] Supported SASL mechanisms requested for unknown user 'root@spark'
db_1 | 2019-10-09T13:44:34.378+0000 I ACCESS [conn4] SASL SCRAM-SHA-1 authentication failed for root on spark from client 172.22.0.4:49302 ; UserNotFound: Could not find user "root" for db "spark"
pyspark_1 | Traceback (most recent call last):
pyspark_1 | File "/home/ubuntu/./spark.py", line 35, in <module>
pyspark_1 | .option("spark.mongodb.input.uri", config.MONGO_URL_AUTH + "/spark.times")\
pyspark_1 | File "/home/ubuntu/spark-2.4.4-bin-hadoop2.7/python/pyspark/sql/readwriter.py", line 172, in load
pyspark_1 | return self._df(self._jreader.load())
pyspark_1 | File "/home/ubuntu/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
pyspark_1 | File "/home/ubuntu/spark-2.4.4-bin-hadoop2.7/python/pyspark/sql/utils.py", line 63, in deco
pyspark_1 | return f(*a, **kw)
pyspark_1 | File "/home/ubuntu/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
pyspark_1 | py4j.protocol.Py4JJavaError: An error occurred while calling o34.load.
pyspark_1 | : com.mongodb.MongoSecurityException: Exception authenticating MongoCredential{mechanism=SCRAM-SHA-1, userName='root', source='spark', password=<hidden>, mechanismProperties={}}
pyspark_1 | at com.mongodb.internal.connection.SaslAuthenticator.wrapException(SaslAuthenticator.java:173)
如果我不在 mongodb docker 中使用用户名和密码,而只是使用 mongodb://mongodb:27017
地址连接,并且只是使用 pymongo
我可以用密码连接,当使用密码时,我的 spark to mongodb 配置有问题,我不明白是什么问题。
mongodb 的设置(docker-撰写文件的一部分):
db:
image: mongo
restart: always
networks:
miasnet:
aliases:
- "miasdb"
environment:
MONGO_INITDB_ROOT_USERNAME: root
MONGO_INITDB_ROOT_PASSWORD: example
MONGO_INITDB_DATABASE: spark
ports:
- "27017:27017"
volumes:
- /data/db:/data/db
https://hub.docker.com/_/mongo 读作:
MONGO_INITDB_ROOT_USERNAME
,MONGO_INITDB_ROOT_PASSWORD
These variables, used in conjunction, create a new user and set that user's password. This user is created in the admin authentication database and given the role of root, which is a "superuser" role.
您没有指定身份验证数据库,因此在您的情况下 mongo uses current database by default - spark
。
您需要在连接字符串中指定“admin”授权数据库:
spark.mongodb.input.uri=mongodb://root:example@mongodb:27017/spark.times?authSource=admin
spark.mongodb.output.uri=mongodb://root:example@mongodb:27017/spark.output?authSource=admin