如何 select 字符串中的多个元素用“;”分隔和“,”在DataFrame中?

How to select multiple elements in a string separated by ";" and "," in a DataFrame?

示例:

数据帧的第一行:name 1, age 1, country 1; name 2, age 2, country 2; name 3, age 3, country 3

数据帧的第 2 行:name a, age a, country a; name b, age b, country b; name c, age c, country c

我只想 select 数据框每一行的国家,然后在同一数据框中创建一个新列:

country 1, country 2, country 3

country a, country b, country c

我试过了,但我只能得到每行最后一所学校的最后一个国家

df["countries"] = df["school_info"].apply(lambda x: str(x).split(",")[-1].strip())

输出:

country 3

country c

谢谢!

好的 - 现在我明白了你的要求

  1. 构建一个 tuples 的临时 list,您希望将其变成行
  2. 使用 explode() 将列表扩展为行
  3. 在每行的tuple中挑选出值来形成列。为了示例的目的,我已经挑选出所有组件并将原始编码字符串留在原处
data = """name 1, age 1, country 1; name 2, age 2, country 2; name 3, age 3, country 3
name a, age a, country a; name b, age b, country b; name c, age c, country c"""

df = pd.DataFrame({"school_info":data.split("\n")})
# df["data_tuple"] = df["school_info"].apply(lambda s: [tuple(t.split(",")) for t in s.split(";")])
df = df.assign(data_tuple=lambda dfa: dfa["school_info"].apply(
    # build a list of tuples - delimiter is ";" each tuple contains (name,age,country)
    lambda s: [tuple(t.split(",")) for t in s.split(";")]))\
    # explode the list and pick out each of the elements of resultant tuple
    .explode("data_tuple").assign(
        name=lambda dfa: dfa["data_tuple"].apply(lambda t: t[0]),
        age=lambda dfa: dfa["data_tuple"].apply(lambda t: t[1]),
        country=lambda dfa: dfa["data_tuple"].apply(lambda t: t[2]),
).drop("data_tuple", axis=1) # this was a temporary contruct drop it

print(df.to_string(index=False))

输出

                                                                  school_info     name     age     country
 name 1, age 1, country 1; name 2, age 2, country 2; name 3, age 3, country 3   name 1   age 1   country 1
 name 1, age 1, country 1; name 2, age 2, country 2; name 3, age 3, country 3   name 2   age 2   country 2
 name 1, age 1, country 1; name 2, age 2, country 2; name 3, age 3, country 3   name 3   age 3   country 3
 name a, age a, country a; name b, age b, country b; name c, age c, country c   name a   age a   country a
 name a, age a, country a; name b, age b, country b; name c, age c, country c   name b   age b   country b
 name a, age a, country a; name b, age b, country b; name c, age c, country c   name c   age c   country c

如果您的行位于名为 school_info 的一列中:

df["school_info"].apply(lambda r: ', '.join([c.split(",")[-1].strip() for c in r.split(";")]))

输入:

data = [["name 1, age 1, country 1; name 2, age 2, country 2; name 3, age 3, country 3"],
        ["name a, age a, country a; name b, age b, country b; name c, age c, country c"]]
df = pd.DataFrame(data, columns=['school_info'])

输出:

0    country 1, country 2, country 3
1    country a, country b, country c
Name: school_info, dtype: object