列中的搜索值
search value in column
我想搜索某列是否包含值。
import pyspark
from pyspark.sql import SparkSession
from pyspark.sql.types import *
import pandas as pd
df_init = pd.DataFrame({'id':['1', '2'], 'val':[100, 200]})
spark = SparkSession.builder.appName('pandasToSparkDF').getOrCreate()
mySchema = StructType([ StructField("id", StringType(), True),
StructField("val", IntegerType(), True)])
df = spark.createDataFrame(df_init, schema=mySchema)
if df.filter(df.id == "3"):
print('Yes')
else:
print('No')
它总是打印 'Yes'.
在 pandas 数据框中,我会做:
if '3' in df_init['id].values:
print('Yes')
else:
print('No')```
but with pyspark I don't know how to handle this.
I tried using 'contains' , 'isin' but still the same.
您可以使用 collect_list
获取 'id' 列中的所有值作为列表。然后检查您的元素是否在此列表中:
from pyspark.sql import functions as F
if '3' in df.select(F.collect_list('id')).first()[0]:
print("Yes")
else:
print('No')
或者在过滤操作后检查计数是否 >=1:
if df.filter(df.id == "3").count() >= 1:
print("Yes")
else:
print('No')
我想搜索某列是否包含值。
import pyspark
from pyspark.sql import SparkSession
from pyspark.sql.types import *
import pandas as pd
df_init = pd.DataFrame({'id':['1', '2'], 'val':[100, 200]})
spark = SparkSession.builder.appName('pandasToSparkDF').getOrCreate()
mySchema = StructType([ StructField("id", StringType(), True),
StructField("val", IntegerType(), True)])
df = spark.createDataFrame(df_init, schema=mySchema)
if df.filter(df.id == "3"):
print('Yes')
else:
print('No')
它总是打印 'Yes'.
在 pandas 数据框中,我会做:
if '3' in df_init['id].values:
print('Yes')
else:
print('No')```
but with pyspark I don't know how to handle this.
I tried using 'contains' , 'isin' but still the same.
您可以使用 collect_list
获取 'id' 列中的所有值作为列表。然后检查您的元素是否在此列表中:
from pyspark.sql import functions as F
if '3' in df.select(F.collect_list('id')).first()[0]:
print("Yes")
else:
print('No')
或者在过滤操作后检查计数是否 >=1:
if df.filter(df.id == "3").count() >= 1:
print("Yes")
else:
print('No')