我想选择列的每一行的前 4 个单词,并根据值使用 python 将新值分配给另一个新创建的列

I want to to pick the first 4 words of each row of a column and based on the value assign a new value to another newly created column using python

我的数据集前 5 行的图片如下所示。我想要做的是,我想创建一个名为“停车类型”的新列,并根据另一个名为“Sign”的列将列的值分配为“Meter”、“Ticket”和“Other”。 “Sign”列是字符串,其中一些字符串值具有 MTR,一些具有 TKT,而一些则两者都没有。所以我只想在“停车类型”列中放入值“Meter”,如果“Sign”列 中的一行包含字符串“MTR”等等。我在做这样的事情:

pSignInfringe['Parking Type'] = pSignInfringe.Sign.apply(lambda x: "Meter" if x == "1P MTR M-SAT 7:30-19:30" or x == "1/2P MTR SAT 7:30-1930" else "Ticket")

但那样的话它需要太多的 or 语句。有没有更好的方法来做到这一点?我是 python 的新手,很抱歉这是一个初学者问题。数据框代码如下:

,Area Name,Street Name,Between Street 1,Between Street 2,Side Of Street,Street Marker,Arrival Time,Departure Time,Duration of Parking Event (in seconds),Sign,In Violation?,Street ID,Device ID,Month Number
8,City Square,FLINDERS STREET,SWANSTON STREET,RUSSELL STREET,3,1630N,2012-05-19 18:20:01,2012-05-19 19:19:58,3597,1/2P MTR SAT 7:30-1930,1,670,1123,5
10,Chinatown,RUSSELL STREET,Lt BOURKE STREET,BOURKE STREET,2,770E,2012-02-25 18:30:31,2012-02-25 21:02:36,9125,2P DIS M-SUN 0:00-23:59,1,1221,504,2
11,Princes Theatre,LONSDALE STREET,RUSSELL STREET,EXHIBITION STREET,1,C2858,2011-11-17 09:00:00,2011-11-17 10:41:06,6066,1P MTR M-SAT 7:30-19:30,1,894,1996,11
15,Southbank,COVENTRY STREET,DODDS STREET,WELLS STREET,4,9317S,2012-02-20 13:50:40,2012-02-20 16:33:33,9773,2P TKT A M-F 7:30-18:30,1,547,4054,2
28,Queensberry,VICTORIA STREET,KING STREET,HAWKE STREET,3,7642N,2012-02-15 11:32:34,2012-02-15 12:09:35,2221,1/4P M-SAT 7:30-18:30,1,1381,4001,2
30,Rialto,COLLINS STREET,KING STREET,WILLIAM STREET,3,2066N,2012-09-03 09:24:51,2012-09-03 10:45:41,4850,1/2P M-SAT 7:30-19:30,1,528,1290,9
45,Victoria Market,FRANKLIN STREET,QUEEN STREET,ELIZABETH STREET,1,C6628,2011-11-11 17:42:32,2011-11-11 19:50:44,7692,2P MTR M-SAT 7:30-20:30,1,681,2812,11
53,Hardware,LONSDALE STREET,QUEEN STREET,ELIZABETH STREET,1,C2942,2012-05-05 13:17:55,2012-05-05 14:59:35,6100,1P MTR M-SAT 7:30-19:30,1,894,2019,5
55,Hyatt,EXHIBITION STREET,Lt COLLINS STREET,COLLINS STREET,1,C364,2011-01-11 08:11:48,2011-01-11 16:48:39,31011,1P MTR M-SAT 7:30-19:30,1,647,243,1
56,Banks,QUEEN STREET,FLINDERS LANE,FLINDERS STREET,5,975W,2012-03-03 12:53:27,2012-03-03 14:06:27,4380,1P MTR M-SAT 7:30-19:30,1,1171,693,3

如果 "ParkingType 的期望值仅取决于“MTR”的存在,您可能会发现这样更好。这将考虑 MTR 在 .Sign 字段中的所有情况,而不必 hard-code 所有可能的值。

pSignInfringe['Parking Type'] = pSignInfringe.Sign.apply(lambda x: "Meter" if 'MTR' in x else "Ticket")

试着做一个 for 循环,你可以做综合列表,但我不推荐它,因为你是从 python 开始的。

我根据你的描述做了一些代码。

看一看,如果有效请告诉我

for item in Sign:
    if "MTR" in item:
        pSignInfringe['Parking Type'] = "MTR"
    else:
        pSignInfringe['Parking Type'] = "Ticket"

您可以使用 .str.contains,这将 return 一个与 df 具有相同索引的布尔系列,然后将其用作索引器

pSignInfringe.loc[
    pSignInfringe.Sign.srt.contains('MTR'),
    'Parking Type'] = 'Meter'

注意 pandas' 字符串访问器默认使用正则表达式。

这样可以避免 generic apply 调用,使代码更快。