在 scala spark 中更改日期格式后列的值发生变化
Value of column changes after changing the Date format in scala spark
这是我没有数据格式的数据框
+---------------------+---------------+-------------------------+----------------+------------+-----+-----------+-------------------------+---------------------------+-------------------------+-----------------------------+----------------------------+---------------------+-------------------------+----------------------+--------------------------+--------------+------------------------+-------------+---------------+-------------------------+
|Source_organizationId|Source_sourceId|FilingDateTime_1 |SourceTypeCode_1|DocumentId_1|Dcn_1|DocFormat_1|StatementDate_1 |IsFilingDateTimeEstimated_1|ContainsPreliminaryData_1|CapitalChangeAdjustmentDate_1|CumulativeAdjustmentFactor_1|ContainsRestatement_1|FilingDateTimeUTCOffset_1|ThirdPartySourceCode_1|ThirdPartySourcePriority_1|SourceTypeId_1|ThirdPartySourceCodeId_1|FFAction|!|_1|DataPartition_1|TimeStamp |
+---------------------+---------------+-------------------------+----------------+------------+-----+-----------+-------------------------+---------------------------+-------------------------+-----------------------------+----------------------------+---------------------+-------------------------+----------------------+--------------------------+--------------+------------------------+-------------+---------------+-------------------------+
|4295876589 |1 |1977-02-14T03:00:00+00:00|YUH |null |null |null |1976-12-31T00:00:00+00:00|true |false |1976-12-31T00:00:00+00:00 |0.82457 |false |540 |SS |1 |3013057 |1000716240 |I|!| |Japan |2018-05-03T07:03:27+00:00|
|4295876589 |8 |1984-02-14T03:00:00+00:00|YUH |null |null |null |1983-12-31T00:00:00+00:00|true |false |1983-12-31T00:00:00+00:00 |0.82457 |false |540 |SS |1 |3013057 |1000716240 |I|!| |Japan |2018-05-03T09:46:58+00:00|
|4295876589 |1 |1977-02-14T03:00:00+00:00|YUH |null |null |null |1976-12-31T00:00:00+00:00|true |false |1976-12-31T00:00:00+00:00 |0.82457 |false |540 |SS |1 |3013057 |1000716240 |I|!| |Japan |2018-05-03T07:30:16+00:00|
+---------------------+---------------+-------------------------+----------------+------------+-----+-----------+-------------------------+---------------------------+-------------------------+-----------------------------+----------------------------+---------------------+-------------------------+----------------------+--------------------------+--------------+------------------------+-------------+---------------+-------------------------+
这是我更改数据格式的方法
val df2resultTimestamp = finalXmlDf.withColumn("FilingDateTime_1", date_format(col("FilingDateTime_1"), "yyyy-MM-dd'T'HH:mm:ss'Z'"))
.withColumn("StatementDate_1", date_format(col("StatementDate_1"), "yyyy-MM-dd'T'HH:mm:ss'Z'"))
.withColumn("CapitalChangeAdjustmentDate_1", date_format(col("CapitalChangeAdjustmentDate_1"), "yyyy-MM-dd'T'HH:mm:ss'Z'"))
.withColumn("CumulativeAdjustmentFactor_1", regexp_replace(format_number($"CumulativeAdjustmentFactor_1".cast(DoubleType), 5), ",", ""))
这是我在 FilingDateTime_1
列值更改后得到的输出
+---------------------+---------------+--------------------+----------------+------------+-----+-----------+--------------------+---------------------------+-------------------------+-----------------------------+----------------------------+---------------------+-------------------------+----------------------+--------------------------+--------------+------------------------+-------------+---------------+-------------------------+
|Source_organizationId|Source_sourceId|FilingDateTime_1 |SourceTypeCode_1|DocumentId_1|Dcn_1|DocFormat_1|StatementDate_1 |IsFilingDateTimeEstimated_1|ContainsPreliminaryData_1|CapitalChangeAdjustmentDate_1|CumulativeAdjustmentFactor_1|ContainsRestatement_1|FilingDateTimeUTCOffset_1|ThirdPartySourceCode_1|ThirdPartySourcePriority_1|SourceTypeId_1|ThirdPartySourceCodeId_1|FFAction|!|_1|DataPartition_1|TimeStamp |
+---------------------+---------------+--------------------+----------------+------------+-----+-----------+--------------------+---------------------------+-------------------------+-----------------------------+----------------------------+---------------------+-------------------------+----------------------+--------------------------+--------------+------------------------+-------------+---------------+-------------------------+
|4295876589 |1 |1977-02-14T08:30:00Z|YUH |null |null |null |1976-12-31T05:30:00Z|true |false |1976-12-31T05:30:00Z |0.82457 |false |540 |SS |1 |3013057 |1000716240 |I|!| |Japan |2018-05-03T07:03:27+00:00|
|4295876589 |8 |1984-02-14T08:30:00Z|YUH |null |null |null |1983-12-31T05:30:00Z|true |false |1983-12-31T05:30:00Z |0.82457 |false |540 |SS |1 |3013057 |1000716240 |I|!| |Japan |2018-05-03T09:46:58+00:00|
|4295876589 |1 |1977-02-14T08:30:00Z|YUH |null |null |null |1976-12-31T05:30:00Z|true |false |1976-12-31T05:30:00Z |0.82457 |false |540 |SS |1 |3013057 |1000716240 |I|!| |Japan |2018-05-03T07:30:16+00:00|
+---------------------+---------------+--------------------+----------------+------------+-----+-----------+--------------------+---------------------------+-------------------------+-----------------------------+----------------------------+---------------------+-------------------------+----------------------+--------------------------+--------------+------------------------+-------------+---------------+-------------------------+
值应该是1984-02-14T03:00:00Z
我不知道我在这里错过了什么..
你只需要添加to_timestamp
内置函数如下
val df2resultTimestamp = df.withColumn("FilingDateTime_1", date_format(to_timestamp(col("FilingDateTime_1"), "yyyy-MM-dd'T'HH:mm:ss"), "yyyy-MM-dd'T'HH:mm:ss'Z'"))
.withColumn("StatementDate_1", date_format(to_timestamp(col("StatementDate_1"), "yyyy-MM-dd'T'HH:mm:ss"), "yyyy-MM-dd'T'HH:mm:ss'Z'"))
.withColumn("CapitalChangeAdjustmentDate_1", date_format(to_timestamp(col("CapitalChangeAdjustmentDate_1"), "yyyy-MM-dd'T'HH:mm:ss"), "yyyy-MM-dd'T'HH:mm:ss'Z'"))
.withColumn("CumulativeAdjustmentFactor_1", regexp_replace(format_number($"CumulativeAdjustmentFactor_1".cast(DoubleType), 5), ",", ""))
这应该会给你正确的输出
+---------------------+---------------+--------------------+----------------+------------+-----+-----------+--------------------+---------------------------+-------------------------+-----------------------------+----------------------------+---------------------+-------------------------+----------------------+--------------------------+--------------+------------------------+-------------+---------------+-------------------------+
|Source_organizationId|Source_sourceId|FilingDateTime_1 |SourceTypeCode_1|DocumentId_1|Dcn_1|DocFormat_1|StatementDate_1 |IsFilingDateTimeEstimated_1|ContainsPreliminaryData_1|CapitalChangeAdjustmentDate_1|CumulativeAdjustmentFactor_1|ContainsRestatement_1|FilingDateTimeUTCOffset_1|ThirdPartySourceCode_1|ThirdPartySourcePriority_1|SourceTypeId_1|ThirdPartySourceCodeId_1|FFAction|!|_1|DataPartition_1|TimeStamp |
+---------------------+---------------+--------------------+----------------+------------+-----+-----------+--------------------+---------------------------+-------------------------+-----------------------------+----------------------------+---------------------+-------------------------+----------------------+--------------------------+--------------+------------------------+-------------+---------------+-------------------------+
|4295876589 |1 |1977-02-14T03:00:00Z|YUH |null |null |null |1976-12-31T00:00:00Z|true |false |1976-12-31T00:00:00Z |0.82457 |false |540 |SS |1 |3013057 |1000716240 |I|!| |Japan |2018-05-03T07:03:27+00:00|
|4295876589 |8 |1984-02-14T03:00:00Z|YUH |null |null |null |1983-12-31T00:00:00Z|true |false |1983-12-31T00:00:00Z |0.82457 |false |540 |SS |1 |3013057 |1000716240 |I|!| |Japan |2018-05-03T09:46:58+00:00|
|4295876589 |1 |1977-02-14T03:00:00Z|YUH |null |null |null |1976-12-31T00:00:00Z|true |false |1976-12-31T00:00:00Z |0.82457 |false |540 |SS |1 |3013057 |1000716240 |I|!| |Japan |2018-05-03T07:30:16+00:00|
+---------------------+---------------+--------------------+----------------+------------+-----+-----------+--------------------+---------------------------+-------------------------+-----------------------------+----------------------------+---------------------+-------------------------+----------------------+--------------------------+--------------+------------------------+-------------+---------------+-------------------------+
这是我没有数据格式的数据框
+---------------------+---------------+-------------------------+----------------+------------+-----+-----------+-------------------------+---------------------------+-------------------------+-----------------------------+----------------------------+---------------------+-------------------------+----------------------+--------------------------+--------------+------------------------+-------------+---------------+-------------------------+
|Source_organizationId|Source_sourceId|FilingDateTime_1 |SourceTypeCode_1|DocumentId_1|Dcn_1|DocFormat_1|StatementDate_1 |IsFilingDateTimeEstimated_1|ContainsPreliminaryData_1|CapitalChangeAdjustmentDate_1|CumulativeAdjustmentFactor_1|ContainsRestatement_1|FilingDateTimeUTCOffset_1|ThirdPartySourceCode_1|ThirdPartySourcePriority_1|SourceTypeId_1|ThirdPartySourceCodeId_1|FFAction|!|_1|DataPartition_1|TimeStamp |
+---------------------+---------------+-------------------------+----------------+------------+-----+-----------+-------------------------+---------------------------+-------------------------+-----------------------------+----------------------------+---------------------+-------------------------+----------------------+--------------------------+--------------+------------------------+-------------+---------------+-------------------------+
|4295876589 |1 |1977-02-14T03:00:00+00:00|YUH |null |null |null |1976-12-31T00:00:00+00:00|true |false |1976-12-31T00:00:00+00:00 |0.82457 |false |540 |SS |1 |3013057 |1000716240 |I|!| |Japan |2018-05-03T07:03:27+00:00|
|4295876589 |8 |1984-02-14T03:00:00+00:00|YUH |null |null |null |1983-12-31T00:00:00+00:00|true |false |1983-12-31T00:00:00+00:00 |0.82457 |false |540 |SS |1 |3013057 |1000716240 |I|!| |Japan |2018-05-03T09:46:58+00:00|
|4295876589 |1 |1977-02-14T03:00:00+00:00|YUH |null |null |null |1976-12-31T00:00:00+00:00|true |false |1976-12-31T00:00:00+00:00 |0.82457 |false |540 |SS |1 |3013057 |1000716240 |I|!| |Japan |2018-05-03T07:30:16+00:00|
+---------------------+---------------+-------------------------+----------------+------------+-----+-----------+-------------------------+---------------------------+-------------------------+-----------------------------+----------------------------+---------------------+-------------------------+----------------------+--------------------------+--------------+------------------------+-------------+---------------+-------------------------+
这是我更改数据格式的方法
val df2resultTimestamp = finalXmlDf.withColumn("FilingDateTime_1", date_format(col("FilingDateTime_1"), "yyyy-MM-dd'T'HH:mm:ss'Z'"))
.withColumn("StatementDate_1", date_format(col("StatementDate_1"), "yyyy-MM-dd'T'HH:mm:ss'Z'"))
.withColumn("CapitalChangeAdjustmentDate_1", date_format(col("CapitalChangeAdjustmentDate_1"), "yyyy-MM-dd'T'HH:mm:ss'Z'"))
.withColumn("CumulativeAdjustmentFactor_1", regexp_replace(format_number($"CumulativeAdjustmentFactor_1".cast(DoubleType), 5), ",", ""))
这是我在 FilingDateTime_1
列值更改后得到的输出
+---------------------+---------------+--------------------+----------------+------------+-----+-----------+--------------------+---------------------------+-------------------------+-----------------------------+----------------------------+---------------------+-------------------------+----------------------+--------------------------+--------------+------------------------+-------------+---------------+-------------------------+
|Source_organizationId|Source_sourceId|FilingDateTime_1 |SourceTypeCode_1|DocumentId_1|Dcn_1|DocFormat_1|StatementDate_1 |IsFilingDateTimeEstimated_1|ContainsPreliminaryData_1|CapitalChangeAdjustmentDate_1|CumulativeAdjustmentFactor_1|ContainsRestatement_1|FilingDateTimeUTCOffset_1|ThirdPartySourceCode_1|ThirdPartySourcePriority_1|SourceTypeId_1|ThirdPartySourceCodeId_1|FFAction|!|_1|DataPartition_1|TimeStamp |
+---------------------+---------------+--------------------+----------------+------------+-----+-----------+--------------------+---------------------------+-------------------------+-----------------------------+----------------------------+---------------------+-------------------------+----------------------+--------------------------+--------------+------------------------+-------------+---------------+-------------------------+
|4295876589 |1 |1977-02-14T08:30:00Z|YUH |null |null |null |1976-12-31T05:30:00Z|true |false |1976-12-31T05:30:00Z |0.82457 |false |540 |SS |1 |3013057 |1000716240 |I|!| |Japan |2018-05-03T07:03:27+00:00|
|4295876589 |8 |1984-02-14T08:30:00Z|YUH |null |null |null |1983-12-31T05:30:00Z|true |false |1983-12-31T05:30:00Z |0.82457 |false |540 |SS |1 |3013057 |1000716240 |I|!| |Japan |2018-05-03T09:46:58+00:00|
|4295876589 |1 |1977-02-14T08:30:00Z|YUH |null |null |null |1976-12-31T05:30:00Z|true |false |1976-12-31T05:30:00Z |0.82457 |false |540 |SS |1 |3013057 |1000716240 |I|!| |Japan |2018-05-03T07:30:16+00:00|
+---------------------+---------------+--------------------+----------------+------------+-----+-----------+--------------------+---------------------------+-------------------------+-----------------------------+----------------------------+---------------------+-------------------------+----------------------+--------------------------+--------------+------------------------+-------------+---------------+-------------------------+
值应该是1984-02-14T03:00:00Z
我不知道我在这里错过了什么..
你只需要添加to_timestamp
内置函数如下
val df2resultTimestamp = df.withColumn("FilingDateTime_1", date_format(to_timestamp(col("FilingDateTime_1"), "yyyy-MM-dd'T'HH:mm:ss"), "yyyy-MM-dd'T'HH:mm:ss'Z'"))
.withColumn("StatementDate_1", date_format(to_timestamp(col("StatementDate_1"), "yyyy-MM-dd'T'HH:mm:ss"), "yyyy-MM-dd'T'HH:mm:ss'Z'"))
.withColumn("CapitalChangeAdjustmentDate_1", date_format(to_timestamp(col("CapitalChangeAdjustmentDate_1"), "yyyy-MM-dd'T'HH:mm:ss"), "yyyy-MM-dd'T'HH:mm:ss'Z'"))
.withColumn("CumulativeAdjustmentFactor_1", regexp_replace(format_number($"CumulativeAdjustmentFactor_1".cast(DoubleType), 5), ",", ""))
这应该会给你正确的输出
+---------------------+---------------+--------------------+----------------+------------+-----+-----------+--------------------+---------------------------+-------------------------+-----------------------------+----------------------------+---------------------+-------------------------+----------------------+--------------------------+--------------+------------------------+-------------+---------------+-------------------------+
|Source_organizationId|Source_sourceId|FilingDateTime_1 |SourceTypeCode_1|DocumentId_1|Dcn_1|DocFormat_1|StatementDate_1 |IsFilingDateTimeEstimated_1|ContainsPreliminaryData_1|CapitalChangeAdjustmentDate_1|CumulativeAdjustmentFactor_1|ContainsRestatement_1|FilingDateTimeUTCOffset_1|ThirdPartySourceCode_1|ThirdPartySourcePriority_1|SourceTypeId_1|ThirdPartySourceCodeId_1|FFAction|!|_1|DataPartition_1|TimeStamp |
+---------------------+---------------+--------------------+----------------+------------+-----+-----------+--------------------+---------------------------+-------------------------+-----------------------------+----------------------------+---------------------+-------------------------+----------------------+--------------------------+--------------+------------------------+-------------+---------------+-------------------------+
|4295876589 |1 |1977-02-14T03:00:00Z|YUH |null |null |null |1976-12-31T00:00:00Z|true |false |1976-12-31T00:00:00Z |0.82457 |false |540 |SS |1 |3013057 |1000716240 |I|!| |Japan |2018-05-03T07:03:27+00:00|
|4295876589 |8 |1984-02-14T03:00:00Z|YUH |null |null |null |1983-12-31T00:00:00Z|true |false |1983-12-31T00:00:00Z |0.82457 |false |540 |SS |1 |3013057 |1000716240 |I|!| |Japan |2018-05-03T09:46:58+00:00|
|4295876589 |1 |1977-02-14T03:00:00Z|YUH |null |null |null |1976-12-31T00:00:00Z|true |false |1976-12-31T00:00:00Z |0.82457 |false |540 |SS |1 |3013057 |1000716240 |I|!| |Japan |2018-05-03T07:30:16+00:00|
+---------------------+---------------+--------------------+----------------+------------+-----+-----------+--------------------+---------------------------+-------------------------+-----------------------------+----------------------------+---------------------+-------------------------+----------------------+--------------------------+--------------+------------------------+-------------+---------------+-------------------------+