如何根据ID中整数的位置拆分python中字符串类型的ID？

Question

我的 pandas 数据框当前有一个标题为 BinLocation 的列，其中包含 material 在仓库中的位置。例如：如果零件位于 A02 列、第 33 行和 B21 层，则 BinLocation ID 为 A02033B21。对于某些列，格式可能是 A0233B21。命名约定不一致，但这不取决于我，现在我必须清理数据。我想拆分字符串，以便对于 BinLocation 的任何给定输入，我可以 return 列、行和级别。最终，我想为数据框创建 3 个新列（列、行、级别）。如果不清楚，ID的一般结构是ColumnChar_ColumnInt_RowInt_ColumnChar_LevelInt

现在，对于某些 BinLocations，ID 由连字符分隔，因此我为这些编写了以下代码：

def forHyphenRow(s):
    return s.split('-')[1]
def forHyphenColumn(s):
    return s.split('-')[0]

def forHyphenLevel(s):
    return s.split('-')[2]

除了其他 ID，我该怎么做？此外，在数据框中是否有将数据框中的列组合在一起的方法？（所以A02都归为一类，CB-22都归为一类等等）

Answer 1

如果始终将字符串的前三个字符作为 Column，将最后三个字符作为 Level（因此 Row 作为所有内容 in-between）：

def forNotHyphenColumn(s):
    return s[:3]


def forNotHyphenLevel(s):
    return s[-3:]


def forNotHyphenRow(s):
    return s[3:-3]

然后，您可以通过为 BinLocation 项创建单独的 DataFrame 列并使用 df.sort_values():

按列对 DataFrame 进行排序

df = pd.DataFrame(data={"BinLocation": ["A02033B21", "C02044C12", "A0233B21"]})
# Create dataframe columns for BinLocation items
df["Column"] = df["BinLocation"].apply(lambda x: forNotHyphenColumn(x))
df["Row"] = df["BinLocation"].apply(lambda x: forNotHyphenRow(x))
df["Level"] = df["BinLocation"].apply(lambda x: forNotHyphenLevel(x))
# Sort values
df.sort_values(by=["Column"], ascending=True, inplace=True)

df
#Out: 
#  BinLocation Column  Row Level
#0   A02033B21    A02  033   B21
#2    A0233B21    A02   33   B21
#1   C02044C12    C02  044   C12

编辑：

在 apply():

中也使用连字符函数

df = pd.DataFrame(data={"BinLocation": ["A02033B21", "C02044C12", "A0233B21", "A01-33-C13"]})
# Create dataframe columns for BinLocation items
df["Column"] = df["BinLocation"].apply(lambda x: forHyphenColumn(x) if "-" in x else forNotHyphenColumn(x))
df["Row"] = df["BinLocation"].apply(lambda x: forHyphenRow(x) if "-" in x else forNotHyphenRow(x))
df["Level"] = df["BinLocation"].apply(lambda x: forHyphenLevel(x) if "-" in x else forNotHyphenLevel(x))
# Sort values
df.sort_values(by=["Column"], ascending=True, inplace=True)

df
#Out: 
#  BinLocation Column  Row Level
#3  A01-33-C13    A01   33   C13
#0   A02033B21    A02  033   B21
#2    A0233B21    A02   33   B21
#1   C02044C12    C02  044   C12

Answer 2

这是一个答案：

使用 Python 正则表达式语法来解析您的 ID（处理带和不带连字符的情况，并且可以根据需要进行调整以适应历史 ID 的其他怪癖）
将 ID 置于规范化格式中
为 ID 组件添加列
根据 ID 组件进行排序，因此行被“分组”在一起（尽管不是 pandas 的“groupby”意义）

import pandas as pd
df = pd.DataFrame({'BinLocation':['A0233B21', 'A02033B21', 'A02-033-B21', 'A02-33-B21', 'A02-33-B15', 'A02-30-B21', 'A01-33-B21']})
print(df)
print()
df['RawBinLocation'] = df['BinLocation']
import re
def parse(s):
    m = re.match('^([A-Z])([0-9]{2})-?([0-9]+)-?([A-Z])([0-9]{2})$', s)
    if not m:
        return None
    tup = m.groups()
    colChar, colInt, rowInt, levelChar, levelInt = tup[0], int(tup[1]), int(tup[2]), tup[3], int(tup[4])
    tup = (colChar, colInt, rowInt, levelChar, levelInt)
    return pd.Series(tup)
df[['ColChar', 'ColInt', 'RowInt', 'LevChar', 'LevInt']] = df['BinLocation'].apply(parse)
df['BinLocation'] = df.apply(lambda x: f"{x.ColChar}{x.ColInt:02}-{x.RowInt:03}-{x.LevChar}{x.LevInt:02}", axis=1)
df.sort_values(by=['ColChar', 'ColInt', 'RowInt', 'LevChar', 'LevInt'], inplace=True, ignore_index=True)
print(df)

输出：

   BinLocation
0     A0233B21
1    A02033B21
2  A02-033-B21
3   A02-33-B21
4   A02-33-B15
5   A02-30-B21
6   A01-33-B21

   BinLocation RawBinLocation ColChar  ColInt  RowInt LevChar  LevInt
0  A01-033-B21     A01-33-B21       A       1      33       B      21
1  A02-030-B21     A02-30-B21       A       2      30       B      21
2  A02-033-B15     A02-33-B15       A       2      33       B      15
3  A02-033-B21       A0233B21       A       2      33       B      21
4  A02-033-B21      A02033B21       A       2      33       B      21
5  A02-033-B21    A02-033-B21       A       2      33       B      21
6  A02-033-B21     A02-33-B21       A       2      33       B      21

如何根据ID中整数的位置拆分python中字符串类型的ID？

How can I split an ID that is of type string in python according to postion of the integers in the ID?

python

string

split

pandas

编辑：