在 class 情况下过滤结构字段数组
Filter array of struct fields in case class
我有数据结构如下所示的数据集
case class AddressData(
addressId: String,
customerId: String,
address: String,
number: Option[Int],
road: Option[String],
city: Option[String],
country: Option[String]
)
case class CustomerDocument(
customerId: String,
forename: String,
surname: String,
address: Seq[AddressData]
)
架构
root
|-- customerId: string (nullable = true)
|-- forename: string (nullable = true)
|-- surname: string (nullable = true)
|-- accounts: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- customerId: string (nullable = true)
| | |-- accountId: string (nullable = true)
| | |-- balance: long (nullable = true)
|-- address: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- addressId: string (nullable = true)
| | |-- customerId: string (nullable = true)
| | |-- address: string (nullable = true)
| | |-- number: integer (nullable = true)
| | |-- road: string (nullable = true)
| | |-- city: string (nullable = true)
| | |-- country: string (nullable = true)
示例数据:
customerId
forename
surname
address
IND0222
Charles
Piper
[[ADR285,IND0222,424, Lexington Avenue, New York, United States of America]]
我需要从地址列表中筛选出一个国家(以粗体突出显示的项目,例如加拿大)并创建一个新列并将值设置为 'True'(如果该国家/地区可用)或 'False' 万一它不可用。
我不确定如何在结构数组中应用过滤条件来实现。某种形式的指导表示赞赏。谢谢
下面的代码帮我从结构数组中提取国家/地区字段。
val countryFlag = df.withcolumn("isPresent", array_contains($"address.country", "Canada"))
我有数据结构如下所示的数据集
case class AddressData(
addressId: String,
customerId: String,
address: String,
number: Option[Int],
road: Option[String],
city: Option[String],
country: Option[String]
)
case class CustomerDocument(
customerId: String,
forename: String,
surname: String,
address: Seq[AddressData]
)
架构
root
|-- customerId: string (nullable = true)
|-- forename: string (nullable = true)
|-- surname: string (nullable = true)
|-- accounts: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- customerId: string (nullable = true)
| | |-- accountId: string (nullable = true)
| | |-- balance: long (nullable = true)
|-- address: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- addressId: string (nullable = true)
| | |-- customerId: string (nullable = true)
| | |-- address: string (nullable = true)
| | |-- number: integer (nullable = true)
| | |-- road: string (nullable = true)
| | |-- city: string (nullable = true)
| | |-- country: string (nullable = true)
示例数据:
customerId | forename | surname | address |
---|---|---|---|
IND0222 | Charles | Piper | [[ADR285,IND0222,424, Lexington Avenue, New York, United States of America]] |
我需要从地址列表中筛选出一个国家(以粗体突出显示的项目,例如加拿大)并创建一个新列并将值设置为 'True'(如果该国家/地区可用)或 'False' 万一它不可用。
我不确定如何在结构数组中应用过滤条件来实现。某种形式的指导表示赞赏。谢谢
下面的代码帮我从结构数组中提取国家/地区字段。
val countryFlag = df.withcolumn("isPresent", array_contains($"address.country", "Canada"))