如何复制 pandas DataFrame 行并定期更改一列
How to replicate pandas DataFrame rows and change periodically one column
我有 df
pd.DataFrame([["A1" "B1", "C1", "P"],
["A2" "B2", "C2", "P"],
["A3" "B3", "C3", "P"]], columns=["col_a" "col_b", "col_c", "col_d"])
col_a col_b col_c col_d
A1 B1 C1 P
A2 B2 C2 P
A3 B3 C3 P
...
我需要的结果基本上是重复并确保列在 col_d 中对于每个唯一行出现
都有 P Q R 扩展
col_a col_b col_c col_d
A1 B1 C1 P
A1 B1 C1 Q
A1 B1 C1 R
A2 B2 C2 P
A2 B2 C2 Q
A2 B2 C2 R
A3 B3 C3 P
A3 B3 C3 Q
A3 B3 C3 R
...
目前我只有:
new_df = pd.DataFrame(np.repeat(df.values, 3, axis=0), columns=df.columns)
这会导致这些值重复,但 col_d 没有变化
编辑:
现在我偶然发现了另一个需求,对于每个唯一的 col_a 和 col_b 我需要将“S”添加到 col_d
结果例如:
col_a col_b col_c col_d
A1 B1 C1 P
A1 B1 C1 Q
A1 B1 C1 R
A1 B1 T S
A2 B2 C2 P
A2 B2 C2 Q
A2 B2 C2 R
A2 B2 T S
非常感谢您的帮助!
向 col_d
列添加值 DataFrame.assign
with numpy.tile
:
L = ['P','Q','R']
new_df = (pd.DataFrame(np.repeat(df.values, 3, axis=0), columns=df.columns)
.assign(col_d = np.tile(L, len(df))))
print (new_df)
col_acol_b col_c col_d
0 A1B1 C1 P
1 A1B1 C1 Q
2 A1B1 C1 R
3 A2B2 C2 P
4 A2B2 C2 Q
5 A2B2 C2 R
6 A3B3 C3 P
7 A3B3 C3 Q
8 A3B3 C3 R
另一个类似的想法是通过 DataFrame.loc
:
重复索引和重复的行
L = ['P','Q','R']
new_df = (df.loc[df.index.repeat(3)]
.assign(col_d = np.tile(L, len(df)))
.reset_index(drop=True))
print (new_df)
col_acol_b col_c col_d
0 A1B1 C1 P
1 A1B1 C1 Q
2 A1B1 C1 R
3 A2B2 C2 P
4 A2B2 C2 Q
5 A2B2 C2 R
6 A3B3 C3 P
7 A3B3 C3 Q
8 A3B3 C3 R
编辑:
L = ['P','Q','R','S']
new_df = (pd.DataFrame(np.repeat(df.values, len(L), axis=0), columns=df.columns)
.assign(col_d = np.tile(L, len(df)),
col_c = lambda x: x['col_c'].mask(x['col_d'].eq('S'), 'T')))
print (new_df)
col_acol_b col_c col_d
0 A1B1 C1 P
1 A1B1 C1 Q
2 A1B1 C1 R
3 A1B1 T S
4 A2B2 C2 P
5 A2B2 C2 Q
6 A2B2 C2 R
7 A2B2 T S
8 A3B3 C3 P
9 A3B3 C3 Q
10 A3B3 C3 R
11 A3B3 T S
如果您已经有了第一个数据框,您可以 assign
and explode
:
l= ['P','Q','R']
new_df = df.assign(col_d=[l]*len(df)).explode('col_d')
或merge
:
new_df = df.drop(columns='col_d').merge(pd.Series(l, name='col_d'), how='cross')
输出:
col_acol_b col_c col_d
0 A1B1 C1 P
1 A1B1 C1 Q
2 A1B1 C1 R
3 A2B2 C2 P
4 A2B2 C2 Q
5 A2B2 C2 R
6 A3B3 C3 P
7 A3B3 C3 Q
8 A3B3 C3 R
您可以使用 complete from pyjanitor 轻松生成组合:
# pip install pyjanitor
import pandas as pd
import janitor
df.complete(['col_a', 'col_b', 'col_c'], {'col_d': ['P','Q','R']})
col_a col_b col_c col_d
0 A1 B1 C1 P
1 A1 B1 C1 Q
2 A1 B1 C1 R
3 A2 B2 C2 P
4 A2 B2 C2 Q
5 A2 B2 C2 R
6 A3 B3 C3 P
7 A3 B3 C3 Q
8 A3 B3 C3 R
基本上,您将 ['col_a', 'col_b', 'col_c']
与 {'col_d': ['P','Q','R']}
组合在一起;使用字典允许您将新值引入数据。
对于需要引入S
的场景,可以分解步骤:
(df
.complete(['col_a', 'col_b'], {'col_d': ['P','Q','R', 'S']})
.assign(col_c = lambda df: np.where(df.col_d.eq('S'), 'T', df.col_c))
.ffill()
)
col_a col_b col_c col_d
0 A1 B1 C1 P
1 A1 B1 C1 Q
2 A1 B1 C1 R
3 A1 B1 T S
4 A2 B2 C2 P
5 A2 B2 C2 Q
6 A2 B2 C2 R
7 A2 B2 T S
8 A3 B3 C3 P
9 A3 B3 C3 Q
10 A3 B3 C3 R
11 A3 B3 T S
我有 df
pd.DataFrame([["A1" "B1", "C1", "P"],
["A2" "B2", "C2", "P"],
["A3" "B3", "C3", "P"]], columns=["col_a" "col_b", "col_c", "col_d"])
col_a col_b col_c col_d
A1 B1 C1 P
A2 B2 C2 P
A3 B3 C3 P
...
我需要的结果基本上是重复并确保列在 col_d 中对于每个唯一行出现
都有 P Q R 扩展col_a col_b col_c col_d
A1 B1 C1 P
A1 B1 C1 Q
A1 B1 C1 R
A2 B2 C2 P
A2 B2 C2 Q
A2 B2 C2 R
A3 B3 C3 P
A3 B3 C3 Q
A3 B3 C3 R
...
目前我只有:
new_df = pd.DataFrame(np.repeat(df.values, 3, axis=0), columns=df.columns)
这会导致这些值重复,但 col_d 没有变化
编辑:
现在我偶然发现了另一个需求,对于每个唯一的 col_a 和 col_b 我需要将“S”添加到 col_d
结果例如:
col_a col_b col_c col_d
A1 B1 C1 P
A1 B1 C1 Q
A1 B1 C1 R
A1 B1 T S
A2 B2 C2 P
A2 B2 C2 Q
A2 B2 C2 R
A2 B2 T S
非常感谢您的帮助!
向 col_d
列添加值 DataFrame.assign
with numpy.tile
:
L = ['P','Q','R']
new_df = (pd.DataFrame(np.repeat(df.values, 3, axis=0), columns=df.columns)
.assign(col_d = np.tile(L, len(df))))
print (new_df)
col_acol_b col_c col_d
0 A1B1 C1 P
1 A1B1 C1 Q
2 A1B1 C1 R
3 A2B2 C2 P
4 A2B2 C2 Q
5 A2B2 C2 R
6 A3B3 C3 P
7 A3B3 C3 Q
8 A3B3 C3 R
另一个类似的想法是通过 DataFrame.loc
:
L = ['P','Q','R']
new_df = (df.loc[df.index.repeat(3)]
.assign(col_d = np.tile(L, len(df)))
.reset_index(drop=True))
print (new_df)
col_acol_b col_c col_d
0 A1B1 C1 P
1 A1B1 C1 Q
2 A1B1 C1 R
3 A2B2 C2 P
4 A2B2 C2 Q
5 A2B2 C2 R
6 A3B3 C3 P
7 A3B3 C3 Q
8 A3B3 C3 R
编辑:
L = ['P','Q','R','S']
new_df = (pd.DataFrame(np.repeat(df.values, len(L), axis=0), columns=df.columns)
.assign(col_d = np.tile(L, len(df)),
col_c = lambda x: x['col_c'].mask(x['col_d'].eq('S'), 'T')))
print (new_df)
col_acol_b col_c col_d
0 A1B1 C1 P
1 A1B1 C1 Q
2 A1B1 C1 R
3 A1B1 T S
4 A2B2 C2 P
5 A2B2 C2 Q
6 A2B2 C2 R
7 A2B2 T S
8 A3B3 C3 P
9 A3B3 C3 Q
10 A3B3 C3 R
11 A3B3 T S
如果您已经有了第一个数据框,您可以 assign
and explode
:
l= ['P','Q','R']
new_df = df.assign(col_d=[l]*len(df)).explode('col_d')
或merge
:
new_df = df.drop(columns='col_d').merge(pd.Series(l, name='col_d'), how='cross')
输出:
col_acol_b col_c col_d
0 A1B1 C1 P
1 A1B1 C1 Q
2 A1B1 C1 R
3 A2B2 C2 P
4 A2B2 C2 Q
5 A2B2 C2 R
6 A3B3 C3 P
7 A3B3 C3 Q
8 A3B3 C3 R
您可以使用 complete from pyjanitor 轻松生成组合:
# pip install pyjanitor
import pandas as pd
import janitor
df.complete(['col_a', 'col_b', 'col_c'], {'col_d': ['P','Q','R']})
col_a col_b col_c col_d
0 A1 B1 C1 P
1 A1 B1 C1 Q
2 A1 B1 C1 R
3 A2 B2 C2 P
4 A2 B2 C2 Q
5 A2 B2 C2 R
6 A3 B3 C3 P
7 A3 B3 C3 Q
8 A3 B3 C3 R
基本上,您将 ['col_a', 'col_b', 'col_c']
与 {'col_d': ['P','Q','R']}
组合在一起;使用字典允许您将新值引入数据。
对于需要引入S
的场景,可以分解步骤:
(df
.complete(['col_a', 'col_b'], {'col_d': ['P','Q','R', 'S']})
.assign(col_c = lambda df: np.where(df.col_d.eq('S'), 'T', df.col_c))
.ffill()
)
col_a col_b col_c col_d
0 A1 B1 C1 P
1 A1 B1 C1 Q
2 A1 B1 C1 R
3 A1 B1 T S
4 A2 B2 C2 P
5 A2 B2 C2 Q
6 A2 B2 C2 R
7 A2 B2 T S
8 A3 B3 C3 P
9 A3 B3 C3 Q
10 A3 B3 C3 R
11 A3 B3 T S