更改 pandas csv 列的值样式 python

change pandas csv column's values style python

我有如下数据集:

我的问题在 anotation 列中,我想将上面显示的列表样式更改为如下内容:

我的意思是 anotation 列值有多种样式,我只是想将该样式更改为我的样式:)
示例:

['flight_search.price_range'] ==> ['flight-search price range']  
['flight_search.stops'] ==> ['flight-search stop']  
['flight_search.date.depart_origin'] ==> ['flight-search date depart origin']  

并在进行此转换后,将其完全替换为旧的 anotation 列:)


注释样本:

anotation
['flight_search.destination1']  
['flight_search.origin']  
['flight_search.destination1']  
['flight_search.type']  
['flight_search.type']  
['flight_search.airline']  
['flight_search.stops']  
['flight_search.stops']  
['flight_search.price_range']  
['flight_search.price_range']  
['flight1_detail.from.time']  
['flight_search.date.depart_origin']  
annotation = [['flight_search.destination1'],  
['flight_search.origin'],
['flight_search.destination1']  ,
['flight_search.type']  ,
['flight_search.type']  ,
['flight_search.airline'],  
['flight_search.stops']  ,
['flight_search.stops']  ,
['flight_search.price_range']  ,
['flight_search.price_range']  ,
['flight1_detail.from.time']  ,
['flight_search.date.depart_origin']]  
empty = []
for i in annotation:
    empty.append([i[0].replace("_","-").replace("."," ")])

输出

[['flight-search destination1'],
 ['flight-search origin'],
 ['flight-search destination1'],
 ['flight-search type'],
 ['flight-search type'],
 ['flight-search airline'],
 ['flight-search stops'],
 ['flight-search stops'],
 ['flight-search price-range'],
 ['flight-search price-range'],
 ['flight1-detail from time'],
 ['flight-search date depart-origin']]

数据帧

# for dataframe

df["annotation"].apply(lambda x: [x[0].replace("_","-").replace("."," ")])

我相信这应该可以解决问题,如果其中没有拼写错误的话

Python String replace() 方法可能是一个选项。但是我看到您希望第一个下划线是 - 而第二个下划线是 space。我认为如果你深入研究 python 中的正则表达式,这个问题就可以解决。为了简单起见,到目前为止我已经做到了:

mystring = 'flight_search.price_range'
mystring = mystring.replace("_", "-")
mystring = mystring.replace(".", " ")

https://www.w3schools.com/python/ref_string_replace.asp

编辑代码:

mystring = 'flight_search.price_range'
mystring = mystring.replace("_", "-",1)
mystring = mystring.replace(".", " ")
mystring = mystring.replace("_", " ")
print(mystring)

编辑代码的结果: 航班搜索价格范围

你要思考的是你需要对注释栏中的字符串做哪些改动。使用 df.replace() 函数,您可以对所有列应用简单的更改。

但是,如果您需要更多控制,则需要使用 df.apply() 函数。使用此函数,您可以使用自定义函数准确指定要对列中的每个字符串执行的操作。

例如,您可以采用这种方法开始,您可以更改自定义函数以获得您想要的结果:

import pandas as pd

annotation = ['flight_search.destination1',  
'flight_search.origin',
'flight_search.destination1',
'flight_search.type' ,
'flight_search.type'  ,
'flight_search.airline',  
'flight_search.stops'  ,
'flight_search.stops'  ,
'flight_search.price_range' ,
'flight_search.price_range' ,
'flight1_detail.from.time' ,
'flight_search.date.depart_origin']

df = pd.DataFrame({"annotation":annotation})

def custom_func(string):
    # replace the initial word
    string = string.replace("flight_", "flight-")
    string = string.replace("flight1_", "flight1-") # is this a typo?
    
    # replace the other punctuataion marks with a space
    for punctuation in ['_', '.']:
        string = string.replace(punctuation, " ")
    
    # retun the formatted string
    return string

# apply the custom function to the annotation column
df["annotation"] = df["annotation"].apply(custom_func)