如何根据分隔符分隔数据框列中的字符串?
How to separate strings in a dataframe column based on a delimiter?
所以,我有一个如下所示的数据框:
我想根据“-”和“.”将 'Filename' 列中的值分隔成字符串。并删除扩展名。然后我想将 'Path' 列中的值分隔成基于“\”和“:”的字符串。我该怎么做?
不完全清楚您在这里寻找什么。但这是我最好的解释。
设置:
df = pd.DataFrame({
"Filename": ["doc-hi.txt", "oh-my-god.txt"],
"Path": ["C:\asdf\asdf\asdf\kd.txt", "C:\asdcsc.docx"]
})
分隔字符串
# "separate the values in 'Filename' column into strings based on '-' and '.' and also remove the extension name"
df["Filename_split"] = df["Filename"].apply(lambda _: os.path.splitext(_)[0]).str.split(r'\.|-')
# "separate the values in 'Path' column into strings based on '\' and ':'"
df["Path_split"] = df["Path"].str.split(r'\|:')
中间输出
Filename Path Filename_split Path_split
0 doc-hi.txt C:sdf\sdf\sdf\kd.txt [doc, hi] [C, , asdf, asdf, asdf, kd.txt]
1 oh-my-god.txt C:sdcsc.docx [oh, my, god] [C, sdcsc.docx]
将标记重新组合在一起
要将字符串列表重新组合成单个字符串,您 str.join
:
df['Filename_split'].str.join(' ')
df['Path_split'].str.join(' ')
所以,我有一个如下所示的数据框:
我想根据“-”和“.”将 'Filename' 列中的值分隔成字符串。并删除扩展名。然后我想将 'Path' 列中的值分隔成基于“\”和“:”的字符串。我该怎么做?
不完全清楚您在这里寻找什么。但这是我最好的解释。
设置:
df = pd.DataFrame({
"Filename": ["doc-hi.txt", "oh-my-god.txt"],
"Path": ["C:\asdf\asdf\asdf\kd.txt", "C:\asdcsc.docx"]
})
分隔字符串
# "separate the values in 'Filename' column into strings based on '-' and '.' and also remove the extension name"
df["Filename_split"] = df["Filename"].apply(lambda _: os.path.splitext(_)[0]).str.split(r'\.|-')
# "separate the values in 'Path' column into strings based on '\' and ':'"
df["Path_split"] = df["Path"].str.split(r'\|:')
中间输出
Filename Path Filename_split Path_split
0 doc-hi.txt C:sdf\sdf\sdf\kd.txt [doc, hi] [C, , asdf, asdf, asdf, kd.txt]
1 oh-my-god.txt C:sdcsc.docx [oh, my, god] [C, sdcsc.docx]
将标记重新组合在一起
要将字符串列表重新组合成单个字符串,您 str.join
:
df['Filename_split'].str.join(' ')
df['Path_split'].str.join(' ')