将字符串变量转换为数据集中的整数

Question

我正在尝试将数据集特定列中的值从字符串转换为整数。我尝试使用 for 循环，尽管循环似乎确实在遍历数据，但它无法转换任何变量。我确定我犯了一个非常基本的错误，但由于我对此很陌生，所以无法弄清楚。

我从 https://www.kaggle.com/datasets/majunbajun/himalayan-climbing-expeditions

下载了一个数据文件

然后继续处理数据，以便我可以对其进行统计分析。

这是代码的开始

#import pandas
import pandas as pd
#import expeditions as csv file
exp = pd.read_csv('C:\file\path\to\expeditions.csv')
#create subset for success vs failure
exp_win_v_fail = exp[['termination_reason', 'basecamp_date', 'season']]
#drop successes in dispute
exp_win_v_fail = exp_win_v_fail[(exp_win_v_fail['termination_reason'] != 'Success (claimed)') & (exp_win_v_fail['termination_reason'] != 'Attempt rumoured')]

这是我想不通的部分

#recode termination reason to be binary
for element in exp_win_v_fail['termination_reason']:
   if element == 'Success (main peak)':
     element = 1
   elif element == 'Success (subpeak)':
     element = 1
   else:
     element = 0

非常感谢任何帮助

Answer 1

要将所有以 'Success' 开头的值替换为 1，并将所有其他值替换为 0：

from pandas import read_csv

RE = '^Success.*$'
NRE = '^((?!Success).)*$'
TR = 'termination_reason'
BD = 'basecamp_date'
SE = 'season'

data = read_csv('expeditions.csv')

exp_win_v_fail = data[[TR, BD, SE]]

for v, re_ in enumerate((NRE, RE)):
    exp_win_v_fail[TR] = exp_win_v_fail[TR].replace(to_replace=re_, value=v, regex=True)

for e in exp_win_v_fail[TR]:
    print(e)

将字符串变量转换为数据集中的整数

Convert string variables into ints in a dataset

python

dataset

recode