如何传递值列表,json pyspark
How to pass list of values, json pyspark
>>> from pyspark.sql import SQLContext
>>> sqlContext = SQLContext(sc)
>>> rdd =sqlContext.jsonFile("tmp.json")
>>> rdd_new= rdd.map(lambda x:x.name,x.age)
它的工作properly.But有值列表
list1=["name","age","gene","xyz",.....]
当我路过
For each_value in list1:
`rdd_new=rdd.map(lambda x:x.each_value)` I am getting error
我想你需要的是传递你想要的字段的名称select。在这种情况下,请参阅以下内容:
r1 = ssc.jsonFile("test.json")
r1.printSchema()
r1.show()
l1 = ['number','string']
s1 = r1.select(*l1)
s1.printSchema()
s1.show()
root
|-- array: array (nullable = true)
| |-- element: long (containsNull = true)
|-- boolean: boolean (nullable = true)
|-- null: string (nullable = true)
|-- number: long (nullable = true)
|-- object: struct (nullable = true)
| |-- a: string (nullable = true)
| |-- c: string (nullable = true)
| |-- e: string (nullable = true)
|-- string: string (nullable = true)
array boolean null number object string
ArrayBuffer(1, 2, 3) true null 123 [b,d,f] Hello World
root
|-- number: long (nullable = true)
|-- string: string (nullable = true)
number string
123 Hello World
这是通过数据框完成的。注意 arg 列表的传递方式。更多可以看这个link
>>> from pyspark.sql import SQLContext
>>> sqlContext = SQLContext(sc)
>>> rdd =sqlContext.jsonFile("tmp.json")
>>> rdd_new= rdd.map(lambda x:x.name,x.age)
它的工作properly.But有值列表 list1=["name","age","gene","xyz",.....] 当我路过
For each_value in list1:
`rdd_new=rdd.map(lambda x:x.each_value)` I am getting error
我想你需要的是传递你想要的字段的名称select。在这种情况下,请参阅以下内容:
r1 = ssc.jsonFile("test.json")
r1.printSchema()
r1.show()
l1 = ['number','string']
s1 = r1.select(*l1)
s1.printSchema()
s1.show()
root
|-- array: array (nullable = true)
| |-- element: long (containsNull = true)
|-- boolean: boolean (nullable = true)
|-- null: string (nullable = true)
|-- number: long (nullable = true)
|-- object: struct (nullable = true)
| |-- a: string (nullable = true)
| |-- c: string (nullable = true)
| |-- e: string (nullable = true)
|-- string: string (nullable = true)
array boolean null number object string
ArrayBuffer(1, 2, 3) true null 123 [b,d,f] Hello World
root
|-- number: long (nullable = true)
|-- string: string (nullable = true)
number string
123 Hello World
这是通过数据框完成的。注意 arg 列表的传递方式。更多可以看这个link