如何打印 Python TransformedDStream
How to print PythonTransformedDStream
我正在尝试 运行 集成 AWS Kinesis 流和 Apache Spark 的字数统计示例。随机线定期放入 Kinesis。
lines = KinesisUtils.createStream(...)
当我提交我的申请时,lines.pprint()
我没有看到任何打印的值。
尝试打印 lines
对象,我看到 <pyspark.streaming.dstream.TransformedDStream object at 0x7fa235724950>
如何打印PythonTransformedDStream
对象?并检查是否收到数据。
我确定没有凭据问题,如果我使用虚假凭据,我会出现访问异常。
添加了参考代码
import sys
from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kinesis import KinesisUtils, InitialPositionInStream
if __name__ == "__main__":
sc = SparkContext(appName="SparkKinesisApp")
ssc = StreamingContext(sc, 1)
lines = KinesisUtils.createStream(ssc, "SparkKinesisApp", "myStream", "https://kinesis.us-east-1.amazonaws.com","us-east-1", InitialPositionInStream.LATEST, 2)
# lines.saveAsTextFiles('/home/ubuntu/logs/out.txt')
lines.pprint()
counts = lines.flatMap(lambda line: line.split(" "))
.map(lambda word: (word, 1))
.reduceByKey(lambda a, b: a + b)
counts.pprint()
ssc.start()
ssc.awaitTermination()
由于lines.pprint()
没有打印任何东西,请确认您执行了:
ssc.start()
ssc.awaitTermination()
如此处示例中所述:https://github.com/apache/spark/blob/v2.1.0/examples/src/main/python/streaming/network_wordcount.py
pprint()
should work when the environment is configured correctly:
http://spark.apache.org/docs/2.1.0/streaming-programming-guide.html#output-operations-on-dstreams
Output Operations on DStreams
print()
- Prints the first ten elements of every batch of data in a DStream on the driver node running the streaming application. This
is useful for development and debugging. Python API This is called
pprint()
in the Python API.
终于成功了。
我在 https://github.com/apache/spark/blob/master/external/kinesis-asl/src/main/python/examples/streaming/kinesis_wordcount_asl.py 中引用的示例代码提交申请的命令有误。
我使用的正确命令是
$ bin/spark-submit --jars external/spark-streaming-kinesis-asl_2.11-2.1.0.jar --packages org.apache.spark:spark-streaming-kinesis-asl_2.11:2.1.0 /home/ubuntu/my_pyspark/spark_kinesis.py
我正在尝试 运行 集成 AWS Kinesis 流和 Apache Spark 的字数统计示例。随机线定期放入 Kinesis。
lines = KinesisUtils.createStream(...)
当我提交我的申请时,lines.pprint()
我没有看到任何打印的值。
尝试打印 lines
对象,我看到 <pyspark.streaming.dstream.TransformedDStream object at 0x7fa235724950>
如何打印PythonTransformedDStream
对象?并检查是否收到数据。
我确定没有凭据问题,如果我使用虚假凭据,我会出现访问异常。
添加了参考代码
import sys
from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kinesis import KinesisUtils, InitialPositionInStream
if __name__ == "__main__":
sc = SparkContext(appName="SparkKinesisApp")
ssc = StreamingContext(sc, 1)
lines = KinesisUtils.createStream(ssc, "SparkKinesisApp", "myStream", "https://kinesis.us-east-1.amazonaws.com","us-east-1", InitialPositionInStream.LATEST, 2)
# lines.saveAsTextFiles('/home/ubuntu/logs/out.txt')
lines.pprint()
counts = lines.flatMap(lambda line: line.split(" "))
.map(lambda word: (word, 1))
.reduceByKey(lambda a, b: a + b)
counts.pprint()
ssc.start()
ssc.awaitTermination()
由于lines.pprint()
没有打印任何东西,请确认您执行了:
ssc.start()
ssc.awaitTermination()
如此处示例中所述:https://github.com/apache/spark/blob/v2.1.0/examples/src/main/python/streaming/network_wordcount.py
pprint()
should work when the environment is configured correctly:
http://spark.apache.org/docs/2.1.0/streaming-programming-guide.html#output-operations-on-dstreams
Output Operations on DStreams
print()
- Prints the first ten elements of every batch of data in a DStream on the driver node running the streaming application. This is useful for development and debugging. Python API This is calledpprint()
in the Python API.
终于成功了。
我在 https://github.com/apache/spark/blob/master/external/kinesis-asl/src/main/python/examples/streaming/kinesis_wordcount_asl.py 中引用的示例代码提交申请的命令有误。
我使用的正确命令是
$ bin/spark-submit --jars external/spark-streaming-kinesis-asl_2.11-2.1.0.jar --packages org.apache.spark:spark-streaming-kinesis-asl_2.11:2.1.0 /home/ubuntu/my_pyspark/spark_kinesis.py