更改 pandas csv 列的值样式 python
change pandas csv column's values style python
我有如下数据集:
我的问题在 anotation 列中,我想将上面显示的列表样式更改为如下内容:
- ['航班………………]
我的意思是 anotation 列值有多种样式,我只是想将该样式更改为我的样式:)
示例:
['flight_search.price_range'] ==> ['flight-search price range']
['flight_search.stops'] ==> ['flight-search stop']
['flight_search.date.depart_origin'] ==> ['flight-search date depart origin']
并在进行此转换后,将其完全替换为旧的 anotation 列:)
注释样本:
anotation
['flight_search.destination1']
['flight_search.origin']
['flight_search.destination1']
['flight_search.type']
['flight_search.type']
['flight_search.airline']
['flight_search.stops']
['flight_search.stops']
['flight_search.price_range']
['flight_search.price_range']
['flight1_detail.from.time']
['flight_search.date.depart_origin']
annotation = [['flight_search.destination1'],
['flight_search.origin'],
['flight_search.destination1'] ,
['flight_search.type'] ,
['flight_search.type'] ,
['flight_search.airline'],
['flight_search.stops'] ,
['flight_search.stops'] ,
['flight_search.price_range'] ,
['flight_search.price_range'] ,
['flight1_detail.from.time'] ,
['flight_search.date.depart_origin']]
empty = []
for i in annotation:
empty.append([i[0].replace("_","-").replace("."," ")])
输出
[['flight-search destination1'],
['flight-search origin'],
['flight-search destination1'],
['flight-search type'],
['flight-search type'],
['flight-search airline'],
['flight-search stops'],
['flight-search stops'],
['flight-search price-range'],
['flight-search price-range'],
['flight1-detail from time'],
['flight-search date depart-origin']]
数据帧
# for dataframe
df["annotation"].apply(lambda x: [x[0].replace("_","-").replace("."," ")])
我相信这应该可以解决问题,如果其中没有拼写错误的话
Python String replace() 方法可能是一个选项。但是我看到您希望第一个下划线是 - 而第二个下划线是 space。我认为如果你深入研究 python 中的正则表达式,这个问题就可以解决。为了简单起见,到目前为止我已经做到了:
mystring = 'flight_search.price_range'
mystring = mystring.replace("_", "-")
mystring = mystring.replace(".", " ")
见https://www.w3schools.com/python/ref_string_replace.asp
编辑代码:
mystring = 'flight_search.price_range'
mystring = mystring.replace("_", "-",1)
mystring = mystring.replace(".", " ")
mystring = mystring.replace("_", " ")
print(mystring)
编辑代码的结果:
航班搜索价格范围
你要思考的是你需要对注释栏中的字符串做哪些改动。使用 df.replace()
函数,您可以对所有列应用简单的更改。
但是,如果您需要更多控制,则需要使用 df.apply()
函数。使用此函数,您可以使用自定义函数准确指定要对列中的每个字符串执行的操作。
例如,您可以采用这种方法开始,您可以更改自定义函数以获得您想要的结果:
import pandas as pd
annotation = ['flight_search.destination1',
'flight_search.origin',
'flight_search.destination1',
'flight_search.type' ,
'flight_search.type' ,
'flight_search.airline',
'flight_search.stops' ,
'flight_search.stops' ,
'flight_search.price_range' ,
'flight_search.price_range' ,
'flight1_detail.from.time' ,
'flight_search.date.depart_origin']
df = pd.DataFrame({"annotation":annotation})
def custom_func(string):
# replace the initial word
string = string.replace("flight_", "flight-")
string = string.replace("flight1_", "flight1-") # is this a typo?
# replace the other punctuataion marks with a space
for punctuation in ['_', '.']:
string = string.replace(punctuation, " ")
# retun the formatted string
return string
# apply the custom function to the annotation column
df["annotation"] = df["annotation"].apply(custom_func)
我有如下数据集:
我的问题在 anotation 列中,我想将上面显示的列表样式更改为如下内容:
- ['航班………………]
我的意思是 anotation 列值有多种样式,我只是想将该样式更改为我的样式:)
示例:
['flight_search.price_range'] ==> ['flight-search price range']
['flight_search.stops'] ==> ['flight-search stop']
['flight_search.date.depart_origin'] ==> ['flight-search date depart origin']
并在进行此转换后,将其完全替换为旧的 anotation 列:)
注释样本:
anotation
['flight_search.destination1']
['flight_search.origin']
['flight_search.destination1']
['flight_search.type']
['flight_search.type']
['flight_search.airline']
['flight_search.stops']
['flight_search.stops']
['flight_search.price_range']
['flight_search.price_range']
['flight1_detail.from.time']
['flight_search.date.depart_origin']
annotation = [['flight_search.destination1'],
['flight_search.origin'],
['flight_search.destination1'] ,
['flight_search.type'] ,
['flight_search.type'] ,
['flight_search.airline'],
['flight_search.stops'] ,
['flight_search.stops'] ,
['flight_search.price_range'] ,
['flight_search.price_range'] ,
['flight1_detail.from.time'] ,
['flight_search.date.depart_origin']]
empty = []
for i in annotation:
empty.append([i[0].replace("_","-").replace("."," ")])
输出
[['flight-search destination1'],
['flight-search origin'],
['flight-search destination1'],
['flight-search type'],
['flight-search type'],
['flight-search airline'],
['flight-search stops'],
['flight-search stops'],
['flight-search price-range'],
['flight-search price-range'],
['flight1-detail from time'],
['flight-search date depart-origin']]
数据帧
# for dataframe
df["annotation"].apply(lambda x: [x[0].replace("_","-").replace("."," ")])
我相信这应该可以解决问题,如果其中没有拼写错误的话
Python String replace() 方法可能是一个选项。但是我看到您希望第一个下划线是 - 而第二个下划线是 space。我认为如果你深入研究 python 中的正则表达式,这个问题就可以解决。为了简单起见,到目前为止我已经做到了:
mystring = 'flight_search.price_range'
mystring = mystring.replace("_", "-")
mystring = mystring.replace(".", " ")
见https://www.w3schools.com/python/ref_string_replace.asp
编辑代码:
mystring = 'flight_search.price_range'
mystring = mystring.replace("_", "-",1)
mystring = mystring.replace(".", " ")
mystring = mystring.replace("_", " ")
print(mystring)
编辑代码的结果: 航班搜索价格范围
你要思考的是你需要对注释栏中的字符串做哪些改动。使用 df.replace()
函数,您可以对所有列应用简单的更改。
但是,如果您需要更多控制,则需要使用 df.apply()
函数。使用此函数,您可以使用自定义函数准确指定要对列中的每个字符串执行的操作。
例如,您可以采用这种方法开始,您可以更改自定义函数以获得您想要的结果:
import pandas as pd
annotation = ['flight_search.destination1',
'flight_search.origin',
'flight_search.destination1',
'flight_search.type' ,
'flight_search.type' ,
'flight_search.airline',
'flight_search.stops' ,
'flight_search.stops' ,
'flight_search.price_range' ,
'flight_search.price_range' ,
'flight1_detail.from.time' ,
'flight_search.date.depart_origin']
df = pd.DataFrame({"annotation":annotation})
def custom_func(string):
# replace the initial word
string = string.replace("flight_", "flight-")
string = string.replace("flight1_", "flight1-") # is this a typo?
# replace the other punctuataion marks with a space
for punctuation in ['_', '.']:
string = string.replace(punctuation, " ")
# retun the formatted string
return string
# apply the custom function to the annotation column
df["annotation"] = df["annotation"].apply(custom_func)