Pandas 从数据框中的列中提取部分字符串并将其存储在新列中
Pandas extract part of string form a column in a dataframe and store it in a new column
我有下面的代码来创建数据框
import pandas as pd
df = {'Connection':['Home 10Mbps','Broadband 5 Mbps','128 Kbps Internet','Discounted 512Kbps 2 years contract']}
df = pd.DataFrame (df)
df
我需要一种方法来仅从“连接”列中提取带宽并将结果存储在名为“带宽”的新列中,如下所示:
带宽
10 Mbps
5 Mbps
128 Kbps
512 Kbps
确保用所有可能的格式填写列表
lst = ['10Mbps', '10 Mbps', '5 Mbps', '128 Kbps', '512Kbps', '512 Kbps']
for i in lst:
df.loc[df['Connection'].str.contains(i), 'bandwidth'] = i
lst1 = []
for j in df.bandwidth:
if " " not in j:
lst1.append((re.sub("[A-Za-z]+", lambda ele: " " + ele[0] + " ", j)[:-1]))
else:
lst1.append(j)
df['bandwidth']=lst1
# output
Connection bandwidth
Home 10Mbps 10 Mbps
Broadband 5 Mbps 5 Mbps
128 Kbps Internet 128 Kbps
Discounted 512Kbps 2 years contract 512 Kbps
我有下面的代码来创建数据框
import pandas as pd
df = {'Connection':['Home 10Mbps','Broadband 5 Mbps','128 Kbps Internet','Discounted 512Kbps 2 years contract']}
df = pd.DataFrame (df)
df
我需要一种方法来仅从“连接”列中提取带宽并将结果存储在名为“带宽”的新列中,如下所示:
带宽
10 Mbps
5 Mbps
128 Kbps
512 Kbps
确保用所有可能的格式填写列表
lst = ['10Mbps', '10 Mbps', '5 Mbps', '128 Kbps', '512Kbps', '512 Kbps']
for i in lst:
df.loc[df['Connection'].str.contains(i), 'bandwidth'] = i
lst1 = []
for j in df.bandwidth:
if " " not in j:
lst1.append((re.sub("[A-Za-z]+", lambda ele: " " + ele[0] + " ", j)[:-1]))
else:
lst1.append(j)
df['bandwidth']=lst1
# output
Connection bandwidth
Home 10Mbps 10 Mbps
Broadband 5 Mbps 5 Mbps
128 Kbps Internet 128 Kbps
Discounted 512Kbps 2 years contract 512 Kbps