无法通过 python spark 连接 MysqlDB
Not able to connect MysqlDB through python spark
我想将我处理过的 rdd 保存到 mysql table 因为我正在使用 SparkDataFrame 但我收到以下错误
py4j.protocol.Py4JJavaError: An error occurred while calling o216.jdbc.
: java.sql.SQLException: No suitable driver found for jdbc:mysql://localhost:3306/student?user=root&password=root.
我添加了 mysql-jar 到 sparkshell
spark-shell --driver-class-path /path-to-mysql-jar/mysql-connectorjava-5.1.38-bin.jar.
from pyspark import SparkContext
from datetime import datetime
import os
import sys
from pyspark.sql import SQLContext, Row
sqlContext = SQLContext(sc)
file1 = sc.textFile("/home/hadoop/text1").cache()
file2 = sc.textFile("/home/hadoop/text2").cache()
file3 = file1.union(file2).coalesce(1).map(lambda line: line.split(','))
file1.unpersist()
file2.unpersist()
result = file3.map(lambda x: (x[0]+', '+x[1],float(x[2]))).reduceByKey(lambda a,b:a+b).sortByKey('true').coalesce(1)
result = result.map(lambda x:x[0]+','+str(x[1]))\
schema_site = sqlContext.createDataFrame(result)
schema_site.registerTempTable("table1")
mysql_url="jdbc:mysql://localhost:3306/test?user=root&password=root&driver=com.mysql.jdbc.Driver"
schema_site.write.jdbc(url=mysql_url, table="table1", mode="append")
我正在使用 spark spark-1.5.0-bin-hadoop2.4
同时设置 Hive Metastore。
那么我如何将我的 RDD 结果加载到 Mysql table.
输入文件是
file1 contents are
1234567 65656545 12
1234567 65675859 11
file2 contents are,
1234567 65656545 12
1234567 65675859 11
and the resultnat RDD is like
1234567 65656545 24
1234567 65675859 22
i created the table in mysql with three colunm,
std_id std_code std_res
我想要 table 输出,
std_id std_code std_res
1234567 65656545 24
1234567 65675859 24
将 jdbc 驱动程序或其他 java 依赖项传递给您的 spark 程序时,您应该使用 --jars 参数。
--jars Comma-separated list of local jars to include on the driver and executor classpaths.
通过将 --jar /path/to/mysql/connector 添加到 spark submit 来解决它,
./bin/spark-submit --jars lib/mysql-connector-java-5.1.38-bin.jar sample.py
我想将我处理过的 rdd 保存到 mysql table 因为我正在使用 SparkDataFrame 但我收到以下错误
py4j.protocol.Py4JJavaError: An error occurred while calling o216.jdbc.
: java.sql.SQLException: No suitable driver found for jdbc:mysql://localhost:3306/student?user=root&password=root.
我添加了 mysql-jar 到 sparkshell
spark-shell --driver-class-path /path-to-mysql-jar/mysql-connectorjava-5.1.38-bin.jar.
from pyspark import SparkContext
from datetime import datetime
import os
import sys
from pyspark.sql import SQLContext, Row
sqlContext = SQLContext(sc)
file1 = sc.textFile("/home/hadoop/text1").cache()
file2 = sc.textFile("/home/hadoop/text2").cache()
file3 = file1.union(file2).coalesce(1).map(lambda line: line.split(','))
file1.unpersist()
file2.unpersist()
result = file3.map(lambda x: (x[0]+', '+x[1],float(x[2]))).reduceByKey(lambda a,b:a+b).sortByKey('true').coalesce(1)
result = result.map(lambda x:x[0]+','+str(x[1]))\
schema_site = sqlContext.createDataFrame(result)
schema_site.registerTempTable("table1")
mysql_url="jdbc:mysql://localhost:3306/test?user=root&password=root&driver=com.mysql.jdbc.Driver"
schema_site.write.jdbc(url=mysql_url, table="table1", mode="append")
我正在使用 spark spark-1.5.0-bin-hadoop2.4
同时设置 Hive Metastore。
那么我如何将我的 RDD 结果加载到 Mysql table.
输入文件是
file1 contents are
1234567 65656545 12
1234567 65675859 11
file2 contents are,
1234567 65656545 12
1234567 65675859 11
and the resultnat RDD is like
1234567 65656545 24
1234567 65675859 22
i created the table in mysql with three colunm,
std_id std_code std_res
我想要 table 输出,
std_id std_code std_res
1234567 65656545 24
1234567 65675859 24
将 jdbc 驱动程序或其他 java 依赖项传递给您的 spark 程序时,您应该使用 --jars 参数。
--jars Comma-separated list of local jars to include on the driver and executor classpaths.
通过将 --jar /path/to/mysql/connector 添加到 spark submit 来解决它,
./bin/spark-submit --jars lib/mysql-connector-java-5.1.38-bin.jar sample.py