如何将整列的大小写更改为小写?
How to change case of whole column to lowercase?
我想在 Spark 数据集中将整列的大小写更改为小写
Desired Input
+------+--------------------+
|ItemID| Category name|
+------+--------------------+
| ABC|BRUSH & BROOM HAN...|
| XYZ|WHEEL BRUSH PARTS...|
+------+--------------------+
Desired Output
+------+--------------------+
|ItemID| Category name|
+------+--------------------+
| ABC|brush & broom han...|
| XYZ|wheel brush parts...|
+------+--------------------+
我尝试了 collectAsList()
和 toString()
,这对于非常大的数据集来说是缓慢而复杂的过程。
我也找到了一种方法 'lower' 但不知道如何在 dasaset 中使用它
请给我一个简单或有效的方法来完成上述操作。提前致谢
使用 org.apache.spark.sql.functions
中的 lower
函数
例如:
df.select($"q1Content", lower($"q1Content")).show
输出。
+--------------------+--------------------+
| q1Content| lower(q1Content)|
+--------------------+--------------------+
|What is the step ...|what is the step ...|
|What is the story...|what is the story...|
|How can I increas...|how can i increas...|
|Why am I mentally...|why am i mentally...|
|Which one dissolv...|which one dissolv...|
|Astrology: I am a...|astrology: i am a...|
| Should I buy tiago?| should i buy tiago?|
|How can I be a go...|how can i be a go...|
|When do you use ...|when do you use ...|
|Motorola (company...|motorola (company...|
|Method to find se...|method to find se...|
|How do I read and...|how do i read and...|
|What can make Phy...|what can make phy...|
|What was your fir...|what was your fir...|
|What are the laws...|what are the laws...|
|What would a Trum...|what would a trum...|
|What does manipul...|what does manipul...|
|Why do girls want...|why do girls want...|
|Why are so many Q...|why are so many q...|
|Which is the best...|which is the best...|
+--------------------+--------------------+
我知道了(使用 Functions#lower
,参见 Javadoc)
import org.apache.spark.sql.functions.lower
String columnName="Category name";
src=src.withColumn(columnName, lower(col(columnName)));
src.show();
这用保留整个数据集的新列替换了旧列。
+------+--------------------+
|ItemID| Category name|
+------+--------------------+
| ABC|brush & broom han...|
| XYZ|wheel brush parts...|
+------+--------------------+
首先你应该通过
添加库
import static org.apache.spark.sql.functions.lower;
那么您需要将 lower
方法放在正确的位置。这是一个例子:
.and(lower(df1.col("field_name")).equalTo("offeringname"))
我已经阅读了这里的所有答案,然后自己尝试了,出于某种原因,我在 IntelliJ Idea 上停留了几分钟,直到我能理解它(图书馆方面)。如果您遇到这个问题,只需按照 IntelliJ 的建议添加库,因为它会在未知时弹出。
祝你好运。
你可以在 Scala 中这样做:
import org.apache.spark.sql.functions._
val dfAfterLowerCase = dfInitial.withColumn("column_name", lower(col("column_name")))
dfAfterLowerCase.show()
我想在 Spark 数据集中将整列的大小写更改为小写
Desired Input
+------+--------------------+
|ItemID| Category name|
+------+--------------------+
| ABC|BRUSH & BROOM HAN...|
| XYZ|WHEEL BRUSH PARTS...|
+------+--------------------+
Desired Output
+------+--------------------+
|ItemID| Category name|
+------+--------------------+
| ABC|brush & broom han...|
| XYZ|wheel brush parts...|
+------+--------------------+
我尝试了 collectAsList()
和 toString()
,这对于非常大的数据集来说是缓慢而复杂的过程。
我也找到了一种方法 'lower' 但不知道如何在 dasaset 中使用它 请给我一个简单或有效的方法来完成上述操作。提前致谢
使用 org.apache.spark.sql.functions
lower
函数
例如:
df.select($"q1Content", lower($"q1Content")).show
输出。
+--------------------+--------------------+
| q1Content| lower(q1Content)|
+--------------------+--------------------+
|What is the step ...|what is the step ...|
|What is the story...|what is the story...|
|How can I increas...|how can i increas...|
|Why am I mentally...|why am i mentally...|
|Which one dissolv...|which one dissolv...|
|Astrology: I am a...|astrology: i am a...|
| Should I buy tiago?| should i buy tiago?|
|How can I be a go...|how can i be a go...|
|When do you use ...|when do you use ...|
|Motorola (company...|motorola (company...|
|Method to find se...|method to find se...|
|How do I read and...|how do i read and...|
|What can make Phy...|what can make phy...|
|What was your fir...|what was your fir...|
|What are the laws...|what are the laws...|
|What would a Trum...|what would a trum...|
|What does manipul...|what does manipul...|
|Why do girls want...|why do girls want...|
|Why are so many Q...|why are so many q...|
|Which is the best...|which is the best...|
+--------------------+--------------------+
我知道了(使用 Functions#lower
,参见 Javadoc)
import org.apache.spark.sql.functions.lower
String columnName="Category name";
src=src.withColumn(columnName, lower(col(columnName)));
src.show();
这用保留整个数据集的新列替换了旧列。
+------+--------------------+
|ItemID| Category name|
+------+--------------------+
| ABC|brush & broom han...|
| XYZ|wheel brush parts...|
+------+--------------------+
首先你应该通过
添加库import static org.apache.spark.sql.functions.lower;
那么您需要将 lower
方法放在正确的位置。这是一个例子:
.and(lower(df1.col("field_name")).equalTo("offeringname"))
我已经阅读了这里的所有答案,然后自己尝试了,出于某种原因,我在 IntelliJ Idea 上停留了几分钟,直到我能理解它(图书馆方面)。如果您遇到这个问题,只需按照 IntelliJ 的建议添加库,因为它会在未知时弹出。
祝你好运。
你可以在 Scala 中这样做:
import org.apache.spark.sql.functions._
val dfAfterLowerCase = dfInitial.withColumn("column_name", lower(col("column_name")))
dfAfterLowerCase.show()