更改 pandas csv 列的值样式 python

Question

我有如下数据集：

我的问题在 anotation 列中，我想将上面显示的列表样式更改为如下内容：

['航班………………]

我的意思是 anotation 列值有多种样式，我只是想将该样式更改为我的样式:)
示例：

['flight_search.price_range'] ==> ['flight-search price range']  
['flight_search.stops'] ==> ['flight-search stop']  
['flight_search.date.depart_origin'] ==> ['flight-search date depart origin']

并在进行此转换后，将其完全替换为旧的 anotation 列:)

注释样本：

anotation
['flight_search.destination1']  
['flight_search.origin']  
['flight_search.destination1']  
['flight_search.type']  
['flight_search.type']  
['flight_search.airline']  
['flight_search.stops']  
['flight_search.stops']  
['flight_search.price_range']  
['flight_search.price_range']  
['flight1_detail.from.time']  
['flight_search.date.depart_origin']

Answer 1

annotation = [['flight_search.destination1'],  
['flight_search.origin'],
['flight_search.destination1']  ,
['flight_search.type']  ,
['flight_search.type']  ,
['flight_search.airline'],  
['flight_search.stops']  ,
['flight_search.stops']  ,
['flight_search.price_range']  ,
['flight_search.price_range']  ,
['flight1_detail.from.time']  ,
['flight_search.date.depart_origin']]

empty = []
for i in annotation:
    empty.append([i[0].replace("_","-").replace("."," ")])

输出

[['flight-search destination1'],
 ['flight-search origin'],
 ['flight-search destination1'],
 ['flight-search type'],
 ['flight-search type'],
 ['flight-search airline'],
 ['flight-search stops'],
 ['flight-search stops'],
 ['flight-search price-range'],
 ['flight-search price-range'],
 ['flight1-detail from time'],
 ['flight-search date depart-origin']]

数据帧

# for dataframe

df["annotation"].apply(lambda x: [x[0].replace("_","-").replace("."," ")])

我相信这应该可以解决问题，如果其中没有拼写错误的话

Answer 2

Python String replace() 方法可能是一个选项。但是我看到您希望第一个下划线是 - 而第二个下划线是 space。我认为如果你深入研究 python 中的正则表达式，这个问题就可以解决。为了简单起见，到目前为止我已经做到了：

mystring = 'flight_search.price_range'
mystring = mystring.replace("_", "-")
mystring = mystring.replace(".", " ")

见https://www.w3schools.com/python/ref_string_replace.asp

编辑代码：

mystring = 'flight_search.price_range'
mystring = mystring.replace("_", "-",1)
mystring = mystring.replace(".", " ")
mystring = mystring.replace("_", " ")
print(mystring)

编辑代码的结果：航班搜索价格范围

Answer 3

你要思考的是你需要对注释栏中的字符串做哪些改动。使用 df.replace() 函数，您可以对所有列应用简单的更改。

但是，如果您需要更多控制，则需要使用 df.apply() 函数。使用此函数，您可以使用自定义函数准确指定要对列中的每个字符串执行的操作。

例如，您可以采用这种方法开始，您可以更改自定义函数以获得您想要的结果：

import pandas as pd

annotation = ['flight_search.destination1',  
'flight_search.origin',
'flight_search.destination1',
'flight_search.type' ,
'flight_search.type'  ,
'flight_search.airline',  
'flight_search.stops'  ,
'flight_search.stops'  ,
'flight_search.price_range' ,
'flight_search.price_range' ,
'flight1_detail.from.time' ,
'flight_search.date.depart_origin']

df = pd.DataFrame({"annotation":annotation})

def custom_func(string):
    # replace the initial word
    string = string.replace("flight_", "flight-")
    string = string.replace("flight1_", "flight1-") # is this a typo?
    
    # replace the other punctuataion marks with a space
    for punctuation in ['_', '.']:
        string = string.replace(punctuation, " ")
    
    # retun the formatted string
    return string

# apply the custom function to the annotation column
df["annotation"] = df["annotation"].apply(custom_func)

更改 pandas csv 列的值样式 python

change pandas csv column's values style python

python

dst

pandas