在 PySpark 中转换数据框模式
Convert dataframe schema In PySpark
我有一个 DataFrame
+------------------+-------------------+--------------------+
| name| sku| description|
+------------------+-------------------+--------------------+
| Mary Rodriguez| hand-couple-manage|Senior word socia...|
| Jose Henderson| together-table-oil|Apply girl treatm...|
| Karen Villegas| child-somebody|Every tell serve....|
| Olivia Lynch|forget-matter-avoid|Perhaps environme...|
| Whitney Wiley| side-blue-dream|Quickly short soc...|
| Brittany Johnson| east-pretty|Indicate view sim...|
| Paul Morris| radio-window-us|Society month sho...|
| Jason Patterson| night-art-be-act|Entire around pla...|
| Kiara Gentry| compare-politics|Air my kind staff...|
架构
root
|-- sku: string (nullable = true)
|-- name_description: array (nullable = true)
| |-- element: string (containsNull = true)
如何按列 sku
分组并使用 name
和 description
中的值来获取列 name_description
,其中值作为 [=16= 的数组] 对于 PySpark 中 sku
中每个值的格式 [{"name":..., "description":...}, {"name":..., "description":...}, ....]
?
检查下面的代码。
df.show(false)
+---------------+-------------------+-------------------+
|name |sku |description |
+---------------+-------------------+-------------------+
|MaryRodriguez |hand-couple-manage |Seniorwordsocia... |
|JoseHenderson |together-table-oil |Applygirltreatm... |
|KarenVillegas |child-somebody |Everytellserve.... |
|OliviaLynch |forget-matter-avoid|Perhapsenvironme...|
|WhitneyWiley |side-blue-dream |Quicklyshortsoc... |
|BrittanyJohnson|east-pretty |Indicateviewsim... |
|PaulMorris |radio-window-us |Societymonthsho... |
|JasonPatterson |night-art-be-act |Entirearoundpla... |
|KiaraGentry |compare-politics |Airmykindstaff... |
+---------------+-------------------+-------------------+
df.groupBy(F.col("sku").agg(F.collect_list(F.struct(F.col("name"),F.col("description"))).alias("name_description")).toJSON.show(false)
+-------------------------------------------------------------------------------------------------------------+
|value |
+-------------------------------------------------------------------------------------------------------------+
|{"sku":"hand-couple-manage","name_description":[{"name":"MaryRodriguez","description":"Seniorwordsocia..."}]}|
|{"sku":"night-art-be-act","name_description":[{"name":"JasonPatterson","description":"Entirearoundpla..."}]} |
|{"sku":"forget-matter-avoid","name_description":[{"name":"OliviaLynch","description":"Perhapsenvironme..."}]}|
|{"sku":"compare-politics","name_description":[{"name":"KiaraGentry","description":"Airmykindstaff..."}]} |
|{"sku":"child-somebody","name_description":[{"name":"KarenVillegas","description":"Everytellserve...."}]} |
|{"sku":"side-blue-dream","name_description":[{"name":"WhitneyWiley","description":"Quicklyshortsoc..."}]} |
|{"sku":"radio-window-us","name_description":[{"name":"PaulMorris","description":"Societymonthsho..."}]} |
|{"sku":"east-pretty","name_description":[{"name":"BrittanyJohnson","description":"Indicateviewsim..."}]} |
|{"sku":"together-table-oil","name_description":[{"name":"JoseHenderson","description":"Applygirltreatm..."}]}|
+-------------------------------------------------------------------------------------------------------------+
我有一个 DataFrame
+------------------+-------------------+--------------------+
| name| sku| description|
+------------------+-------------------+--------------------+
| Mary Rodriguez| hand-couple-manage|Senior word socia...|
| Jose Henderson| together-table-oil|Apply girl treatm...|
| Karen Villegas| child-somebody|Every tell serve....|
| Olivia Lynch|forget-matter-avoid|Perhaps environme...|
| Whitney Wiley| side-blue-dream|Quickly short soc...|
| Brittany Johnson| east-pretty|Indicate view sim...|
| Paul Morris| radio-window-us|Society month sho...|
| Jason Patterson| night-art-be-act|Entire around pla...|
| Kiara Gentry| compare-politics|Air my kind staff...|
架构
root
|-- sku: string (nullable = true)
|-- name_description: array (nullable = true)
| |-- element: string (containsNull = true)
如何按列 sku
分组并使用 name
和 description
中的值来获取列 name_description
,其中值作为 [=16= 的数组] 对于 PySpark 中 sku
中每个值的格式 [{"name":..., "description":...}, {"name":..., "description":...}, ....]
?
检查下面的代码。
df.show(false)
+---------------+-------------------+-------------------+
|name |sku |description |
+---------------+-------------------+-------------------+
|MaryRodriguez |hand-couple-manage |Seniorwordsocia... |
|JoseHenderson |together-table-oil |Applygirltreatm... |
|KarenVillegas |child-somebody |Everytellserve.... |
|OliviaLynch |forget-matter-avoid|Perhapsenvironme...|
|WhitneyWiley |side-blue-dream |Quicklyshortsoc... |
|BrittanyJohnson|east-pretty |Indicateviewsim... |
|PaulMorris |radio-window-us |Societymonthsho... |
|JasonPatterson |night-art-be-act |Entirearoundpla... |
|KiaraGentry |compare-politics |Airmykindstaff... |
+---------------+-------------------+-------------------+
df.groupBy(F.col("sku").agg(F.collect_list(F.struct(F.col("name"),F.col("description"))).alias("name_description")).toJSON.show(false)
+-------------------------------------------------------------------------------------------------------------+
|value |
+-------------------------------------------------------------------------------------------------------------+
|{"sku":"hand-couple-manage","name_description":[{"name":"MaryRodriguez","description":"Seniorwordsocia..."}]}|
|{"sku":"night-art-be-act","name_description":[{"name":"JasonPatterson","description":"Entirearoundpla..."}]} |
|{"sku":"forget-matter-avoid","name_description":[{"name":"OliviaLynch","description":"Perhapsenvironme..."}]}|
|{"sku":"compare-politics","name_description":[{"name":"KiaraGentry","description":"Airmykindstaff..."}]} |
|{"sku":"child-somebody","name_description":[{"name":"KarenVillegas","description":"Everytellserve...."}]} |
|{"sku":"side-blue-dream","name_description":[{"name":"WhitneyWiley","description":"Quicklyshortsoc..."}]} |
|{"sku":"radio-window-us","name_description":[{"name":"PaulMorris","description":"Societymonthsho..."}]} |
|{"sku":"east-pretty","name_description":[{"name":"BrittanyJohnson","description":"Indicateviewsim..."}]} |
|{"sku":"together-table-oil","name_description":[{"name":"JoseHenderson","description":"Applygirltreatm..."}]}|
+-------------------------------------------------------------------------------------------------------------+