将字符串变量转换为数据集中的整数
Convert string variables into ints in a dataset
我正在尝试将数据集特定列中的值从字符串转换为整数。我尝试使用 for 循环,尽管循环似乎确实在遍历数据,但它无法转换任何变量。我确定我犯了一个非常基本的错误,但由于我对此很陌生,所以无法弄清楚。
我从 https://www.kaggle.com/datasets/majunbajun/himalayan-climbing-expeditions
下载了一个数据文件
然后继续处理数据,以便我可以对其进行统计分析。
这是代码的开始
#import pandas
import pandas as pd
#import expeditions as csv file
exp = pd.read_csv('C:\file\path\to\expeditions.csv')
#create subset for success vs failure
exp_win_v_fail = exp[['termination_reason', 'basecamp_date', 'season']]
#drop successes in dispute
exp_win_v_fail = exp_win_v_fail[(exp_win_v_fail['termination_reason'] != 'Success (claimed)') & (exp_win_v_fail['termination_reason'] != 'Attempt rumoured')]
这是我想不通的部分
#recode termination reason to be binary
for element in exp_win_v_fail['termination_reason']:
if element == 'Success (main peak)':
element = 1
elif element == 'Success (subpeak)':
element = 1
else:
element = 0
非常感谢任何帮助
要将所有以 'Success' 开头的值替换为 1,并将所有其他值替换为 0:
from pandas import read_csv
RE = '^Success.*$'
NRE = '^((?!Success).)*$'
TR = 'termination_reason'
BD = 'basecamp_date'
SE = 'season'
data = read_csv('expeditions.csv')
exp_win_v_fail = data[[TR, BD, SE]]
for v, re_ in enumerate((NRE, RE)):
exp_win_v_fail[TR] = exp_win_v_fail[TR].replace(to_replace=re_, value=v, regex=True)
for e in exp_win_v_fail[TR]:
print(e)
我正在尝试将数据集特定列中的值从字符串转换为整数。我尝试使用 for 循环,尽管循环似乎确实在遍历数据,但它无法转换任何变量。我确定我犯了一个非常基本的错误,但由于我对此很陌生,所以无法弄清楚。
我从 https://www.kaggle.com/datasets/majunbajun/himalayan-climbing-expeditions
下载了一个数据文件然后继续处理数据,以便我可以对其进行统计分析。
这是代码的开始
#import pandas
import pandas as pd
#import expeditions as csv file
exp = pd.read_csv('C:\file\path\to\expeditions.csv')
#create subset for success vs failure
exp_win_v_fail = exp[['termination_reason', 'basecamp_date', 'season']]
#drop successes in dispute
exp_win_v_fail = exp_win_v_fail[(exp_win_v_fail['termination_reason'] != 'Success (claimed)') & (exp_win_v_fail['termination_reason'] != 'Attempt rumoured')]
这是我想不通的部分
#recode termination reason to be binary
for element in exp_win_v_fail['termination_reason']:
if element == 'Success (main peak)':
element = 1
elif element == 'Success (subpeak)':
element = 1
else:
element = 0
非常感谢任何帮助
要将所有以 'Success' 开头的值替换为 1,并将所有其他值替换为 0:
from pandas import read_csv
RE = '^Success.*$'
NRE = '^((?!Success).)*$'
TR = 'termination_reason'
BD = 'basecamp_date'
SE = 'season'
data = read_csv('expeditions.csv')
exp_win_v_fail = data[[TR, BD, SE]]
for v, re_ in enumerate((NRE, RE)):
exp_win_v_fail[TR] = exp_win_v_fail[TR].replace(to_replace=re_, value=v, regex=True)
for e in exp_win_v_fail[TR]:
print(e)