如何在pyspark中读取一个简单的字符串文本文件?
how to read a simple strings text file in pyspark?
我有一个字符串列表保存在没有 header 的文本文件中,想在数据块的 pyspark 笔记本中打开并打印所有行。
abcdef
vcdfgrs
vcvdfrs
vfdedsew
kgsldkflfdlfd
text = sc.textFile("path.../filename.txt)
print(text.collect())
此代码未打印行。感谢您的帮助。
开始了
#define a function which takes line and print
def f(line):
print(line)
#building the text file via list
my_list = [['my text line-1'],['line-2 text2 my2'],['some junk line-3']]
#create RDD via list (you have it via
txt_file = sc.parallelize(my_list)
#use for each to call the function and print will work
txt_file.foreach(f)
#if you want each word via line, use flatmap
我有一个字符串列表保存在没有 header 的文本文件中,想在数据块的 pyspark 笔记本中打开并打印所有行。
abcdef
vcdfgrs
vcvdfrs
vfdedsew
kgsldkflfdlfd
text = sc.textFile("path.../filename.txt)
print(text.collect())
此代码未打印行。感谢您的帮助。
开始了
#define a function which takes line and print
def f(line):
print(line)
#building the text file via list
my_list = [['my text line-1'],['line-2 text2 my2'],['some junk line-3']]
#create RDD via list (you have it via
txt_file = sc.parallelize(my_list)
#use for each to call the function and print will work
txt_file.foreach(f)
#if you want each word via line, use flatmap