为什么我通过在代码中以不同的顺序放置 'years' 和 'year' 来获得不同的输出
why do i get different output by placing 'years' and 'year' in my code, in different order in the code
我所做的只是将 'year' 和 'years' 的位置从第一行切换到第二行,反之亦然..
这里是原来的专栏
10+ years 653
< 1 year 249
2 years 243
3 years 235
5 years 202
4 years 191
1 year 177
6 years 163
7 years 127
8 years 108
9 years 72
. 2
Name: Employment.Length, dtype: int64
第一个例子(第一行'years',第二行'year')
raw_data['Employment.Length'] = raw_data['Employment.Length'].str.replace('years',' ')
raw_data['Employment.Length'] = raw_data['Employment.Length'].str.replace('year',' ')
raw_data['Employment.Length'] = np.where(raw_data['Employment.Length'].str[:2]=='10',10,raw_data['Employment.Length'])
raw_data['Employment.Length'] = np.where(raw_data['Employment.Length'].str[0]=='<',0,raw_data['Employment.Length'])
raw_data['Employment.Length'] = pd.to_numeric(raw_data['Employment.Length'], errors = 'coerce')
输出
10.0 653
0.0 249
2.0 243
3.0 235
5.0 202
4.0 191
1.0 177
6.0 163
7.0 127
8.0 108
9.0 72
Name: Employment.Length, dtype: int64
第二个例子(第一行'year',第二行'years')
raw_data_copy['Employment.Length'] = raw_data_copy['Employment.Length'].str.replace('year',' ')
raw_data_copy['Employment.Length'] = raw_data_copy['Employment.Length'].str.replace('years',' ')
raw_data_copy['Employment.Length'] = np.where(raw_data_copy['Employment.Length'].str[:2]=='10',10, raw_data_copy['Employment.Length'])
raw_data_copy['Employment.Length'] = np.where(raw_data_copy['Employment.Length'].str[0]=='<',0,raw_data_copy['Employment.Length'])
raw_data_copy['Employment.Length'] = pd.to_numeric(raw_data_copy['Employment.Length'], errors = 'coerce')
输出
10.0 653
0.0 249
1.0 177
Name: Employment.Length, dtype: int64
还有一件事是,当我用 'year' 注释掉我的第二行时,它给我的输出与第一个示例相同。
当我用 'years' 注释掉我的第二行时,它给我的输出与第二个示例相同。
第三个例子
raw_data_copy['Employment.Length'] = raw_data_copy['Employment.Length'].str.replace('years',' ')
#raw_data_copy['Employment.Length'] = raw_data_copy['Employment.Length'].str.replace('years',' ')
raw_data_copy['Employment.Length'] = np.where(raw_data_copy['Employment.Length'].str[:2]=='10',10, raw_data_copy['Employment.Length'])
raw_data_copy['Employment.Length'] = np.where(raw_data_copy['Employment.Length'].str[0]=='<',0,raw_data_copy['Employment.Length'])
raw_data_copy['Employment.Length'] = pd.to_numeric(raw_data_copy['Employment.Length'], errors = 'coerce')
输出
10.0 653
0.0 249
2.0 243
3.0 235
5.0 202
4.0 191
6.0 163
7.0 127
8.0 108
9.0 72
Name: Employment.Length, dtype: int64
如果您首先将 'year'
替换为 ' '
,则 'years'
将变为 ' s'
,并且 's'
不再被您后续的 [=16] 替换=].
不要使用多个后续替换,而是使用一个带有可选 s
的替换:'year[s]?'
import pandas as pd
s = pd.Series(['year', 'years', 'foo'])
s.str.replace('year[s]?', ' ')
#0
#1
#2 foo
#dtype: object
我所做的只是将 'year' 和 'years' 的位置从第一行切换到第二行,反之亦然..
这里是原来的专栏
10+ years 653
< 1 year 249
2 years 243
3 years 235
5 years 202
4 years 191
1 year 177
6 years 163
7 years 127
8 years 108
9 years 72
. 2
Name: Employment.Length, dtype: int64
第一个例子(第一行'years',第二行'year')
raw_data['Employment.Length'] = raw_data['Employment.Length'].str.replace('years',' ')
raw_data['Employment.Length'] = raw_data['Employment.Length'].str.replace('year',' ')
raw_data['Employment.Length'] = np.where(raw_data['Employment.Length'].str[:2]=='10',10,raw_data['Employment.Length'])
raw_data['Employment.Length'] = np.where(raw_data['Employment.Length'].str[0]=='<',0,raw_data['Employment.Length'])
raw_data['Employment.Length'] = pd.to_numeric(raw_data['Employment.Length'], errors = 'coerce')
输出
10.0 653
0.0 249
2.0 243
3.0 235
5.0 202
4.0 191
1.0 177
6.0 163
7.0 127
8.0 108
9.0 72
Name: Employment.Length, dtype: int64
第二个例子(第一行'year',第二行'years')
raw_data_copy['Employment.Length'] = raw_data_copy['Employment.Length'].str.replace('year',' ')
raw_data_copy['Employment.Length'] = raw_data_copy['Employment.Length'].str.replace('years',' ')
raw_data_copy['Employment.Length'] = np.where(raw_data_copy['Employment.Length'].str[:2]=='10',10, raw_data_copy['Employment.Length'])
raw_data_copy['Employment.Length'] = np.where(raw_data_copy['Employment.Length'].str[0]=='<',0,raw_data_copy['Employment.Length'])
raw_data_copy['Employment.Length'] = pd.to_numeric(raw_data_copy['Employment.Length'], errors = 'coerce')
输出
10.0 653
0.0 249
1.0 177
Name: Employment.Length, dtype: int64
还有一件事是,当我用 'year' 注释掉我的第二行时,它给我的输出与第一个示例相同。 当我用 'years' 注释掉我的第二行时,它给我的输出与第二个示例相同。
第三个例子
raw_data_copy['Employment.Length'] = raw_data_copy['Employment.Length'].str.replace('years',' ')
#raw_data_copy['Employment.Length'] = raw_data_copy['Employment.Length'].str.replace('years',' ')
raw_data_copy['Employment.Length'] = np.where(raw_data_copy['Employment.Length'].str[:2]=='10',10, raw_data_copy['Employment.Length'])
raw_data_copy['Employment.Length'] = np.where(raw_data_copy['Employment.Length'].str[0]=='<',0,raw_data_copy['Employment.Length'])
raw_data_copy['Employment.Length'] = pd.to_numeric(raw_data_copy['Employment.Length'], errors = 'coerce')
输出
10.0 653
0.0 249
2.0 243
3.0 235
5.0 202
4.0 191
6.0 163
7.0 127
8.0 108
9.0 72
Name: Employment.Length, dtype: int64
如果您首先将 'year'
替换为 ' '
,则 'years'
将变为 ' s'
,并且 's'
不再被您后续的 [=16] 替换=].
不要使用多个后续替换,而是使用一个带有可选 s
的替换:'year[s]?'
import pandas as pd
s = pd.Series(['year', 'years', 'foo'])
s.str.replace('year[s]?', ' ')
#0
#1
#2 foo
#dtype: object