通过逗号分隔 Pandas DataFrame 生成的行编号
Numbering for Rows generated through comma separated Pandas DataFrame
我有一个 Pandas DataFrame 如下:
+----------+---------------+-----------+---------------+
| List No. | List Item No. | Item Name | Issues |
+----------+---------------+-----------+---------------+
| 1 | 1 | A | foo, bar, baz |
| 1 | 2 | B | foo, bar |
| 2 | 3A | A | bar, quz |
| 2 | 3B | C | baz, foo, quz |
+----------+---------------+-----------+---------------+
以上可以使用下面的代码生成
data = {'List No.':['1', '1', '2', '2'],
'List Item No.':['1', '2', '3A', '3B'],
'Item Name':['A', 'B', 'A', 'C'],
'Issues':['foo, bar, baz','foo, bar', 'bar, quz', 'baz, foo, quz']}
df = pd.DataFrame(data)
我想根据 Issues
中存在的值数量创建行。例如,有 3 个逗号分隔值,所以我想创建 3 行。每个值 1 个。这可以使用 [item for sublist in df.Issues.str.split(',').tolist() for item in sublist]
来完成。但是,我也想创建我无法创建的问题编号。
预期输出
+----------+---------------+-----------+-----------+-------+
| List No. | List Item No. | Item Name | Issue No. | Issue |
+----------+---------------+-----------+-----------+-------+
| 1 | 1 | A | 1 | foo |
| 1 | 1 | A | 2 | bar |
| 1 | 1 | A | 3 | baz |
| 1 | 2 | B | 1 | foo |
| 1 | 2 | B | 2 | bar |
| 2 | 3A | A | 1 | bar |
| 2 | 3A | A | 2 | quz |
| 2 | 3B | C | 1 | baz |
| 2 | 3B | C | 2 | foo |
| 2 | 3B | C | 3 | quz |
+----------+---------------+-----------+-----------+-------+
使用DataFrame.explode
with GroupBy.cumcount
:
df1 = df.assign(Issues = df.Issues.str.split(',')).explode('Issues')
df1['Issue No.'] = df1.groupby(level=0).cumcount().add(1)
如果列的位置很重要,请使用 DataFrame.insert
:
df1.insert(3, 'Issue No.', df1.groupby(level=0).cumcount().add(1))
print (df1)
List No. List Item No. Item Name Issue No. Issues
0 1 1 A 1 foo
0 1 1 A 2 bar
0 1 1 A 3 baz
1 1 2 B 1 foo
1 1 2 B 2 bar
2 2 3A A 1 bar
2 2 3A A 2 quz
3 2 3B C 1 baz
3 2 3B C 2 foo
3 2 3B C 3 quz
我有一个 Pandas DataFrame 如下:
+----------+---------------+-----------+---------------+
| List No. | List Item No. | Item Name | Issues |
+----------+---------------+-----------+---------------+
| 1 | 1 | A | foo, bar, baz |
| 1 | 2 | B | foo, bar |
| 2 | 3A | A | bar, quz |
| 2 | 3B | C | baz, foo, quz |
+----------+---------------+-----------+---------------+
以上可以使用下面的代码生成
data = {'List No.':['1', '1', '2', '2'],
'List Item No.':['1', '2', '3A', '3B'],
'Item Name':['A', 'B', 'A', 'C'],
'Issues':['foo, bar, baz','foo, bar', 'bar, quz', 'baz, foo, quz']}
df = pd.DataFrame(data)
我想根据 Issues
中存在的值数量创建行。例如,有 3 个逗号分隔值,所以我想创建 3 行。每个值 1 个。这可以使用 [item for sublist in df.Issues.str.split(',').tolist() for item in sublist]
来完成。但是,我也想创建我无法创建的问题编号。
预期输出
+----------+---------------+-----------+-----------+-------+
| List No. | List Item No. | Item Name | Issue No. | Issue |
+----------+---------------+-----------+-----------+-------+
| 1 | 1 | A | 1 | foo |
| 1 | 1 | A | 2 | bar |
| 1 | 1 | A | 3 | baz |
| 1 | 2 | B | 1 | foo |
| 1 | 2 | B | 2 | bar |
| 2 | 3A | A | 1 | bar |
| 2 | 3A | A | 2 | quz |
| 2 | 3B | C | 1 | baz |
| 2 | 3B | C | 2 | foo |
| 2 | 3B | C | 3 | quz |
+----------+---------------+-----------+-----------+-------+
使用DataFrame.explode
with GroupBy.cumcount
:
df1 = df.assign(Issues = df.Issues.str.split(',')).explode('Issues')
df1['Issue No.'] = df1.groupby(level=0).cumcount().add(1)
如果列的位置很重要,请使用 DataFrame.insert
:
df1.insert(3, 'Issue No.', df1.groupby(level=0).cumcount().add(1))
print (df1)
List No. List Item No. Item Name Issue No. Issues
0 1 1 A 1 foo
0 1 1 A 2 bar
0 1 1 A 3 baz
1 1 2 B 1 foo
1 1 2 B 2 bar
2 2 3A A 1 bar
2 2 3A A 2 quz
3 2 3B C 1 baz
3 2 3B C 2 foo
3 2 3B C 3 quz