用二进制数字填充 NaN 值
Filling NaN values with binary digits
你好,我有一些数据,在“性别”列中,它被列为男性或女性,当这个数据被翻译成 google colab 时,它在行中将所有数据表示为 NaN “性别”,我想知道是否有办法让这些数据代表男性 0 和女性 1。我已尝试使用替换功能,但我一直收到与图片中所示相同的错误,有帮助吗?Code/Error
Data
您正在尝试用整数替换字符串值,您应该输入
df.sex.replace(to_replace=["Male","Female"], value=["0", "1"])
然后将其转换为整数,如果你想使用类似这样的方法走这条路。
df['sex'] = df['sex'].astype(int)
enter code here
# import pandas library
import pandas as pd
data = pd.read_csv(file)
# creating a dict file
gender = {'male': 1,'female': 2}
# traversing through dataframe
# Gender column and writing
# values where key matches
data.Gender = [gender[item] for item in data.Gender]
print(data)
只是为了重现您的示例数据,并在解析它以获得所需结果的方式中进行了解释,希望它会有所帮助。
#!/home/Karn_python3/bin/python
from __future__ import (absolute_import, division, print_function)
import pandas as pd
import numpy as np
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('max_colwidth', None)
pd.set_option('expand_frame_repr', False)
# Read CSV and create dataframe.
df = pd.read_csv('adult_test.csv')
# It appears as your column name might have spaces around it, so let's trim them first.
# first to avoid any mapping/processing issues of data
df = df.apply(lambda x: x.str.strip() if x.dtype == "object" else x)
# Create a dictionary and map that to the desired column, which is easy and
# faster than replace.
m = {'Male': 0, 'Female': 1}
# As there may be Nan values so, better to fill them with int values
# whatever you like as used fillna & used 0 and convert the dtype to int
# otherwise you will get it float.
df['Sex'] = df['Sex'].map(m).fillna(0).astype(int)
print(df.head(20))
结果输出:
Age Workclass fnlwgt Education Education_Num Martial_Status Occupation Relationship Race Sex Capital_Gain Capital_Loss Hours_per_week Country Target
0 |1x3 Cross validator NaN NaN NaN NaN NaN NaN NaN NaN 0 NaN NaN NaN NaN NaN
1 25 Private 226802.0 11th 7.0 Never-married Machine-op-inspct Own-child Black 0 0.0 0.0 40.0 United-States <=50K.
2 38 Private 89814.0 HS-grad 9.0 Married-civ-spouse Farming-fishing Husband White 0 0.0 0.0 50.0 United-States <=50K.
3 28 Local-gov 336951.0 Assoc-acdm 12.0 Married-civ-spouse Protective-serv Husband White 0 0.0 0.0 40.0 United-States >50K.
4 44 Private 160323.0 Some-college 10.0 Married-civ-spouse Machine-op-inspct Husband Black 0 7688.0 0.0 40.0 United-States >50K.
5 18 NaN 103497.0 Some-college 10.0 Never-married NaN Own-child White 1 0.0 0.0 30.0 United-States <=50K.
6 34 Private 198693.0 10th 6.0 Never-married Other-service Not-in-family White 0 0.0 0.0 30.0 United-States <=50K.
7 29 NaN 227026.0 HS-grad 9.0 Never-married NaN Unmarried Black 0 0.0 0.0 40.0 United-States <=50K.
8 63 Self-emp-not-inc 104626.0 Prof-school 15.0 Married-civ-spouse Prof-specialty Husband White 0 3103.0 0.0 32.0 United-States >50K.
9 24 Private 369667.0 Some-college 10.0 Never-married Other-service Unmarried White 1 0.0 0.0 40.0 United-States <=50K.
10 55 Private 104996.0 7th-8th 4.0 Married-civ-spouse Craft-repair Husband White 0 0.0 0.0 10.0 United-States <=50K.
11 65 Private 184454.0 HS-grad 9.0 Married-civ-spouse Machine-op-inspct Husband White 0 6418.0 0.0 40.0 United-States >50K.
12 36 Federal-gov 212465.0 Bachelors 13.0 Married-civ-spouse Adm-clerical Husband White 0 0.0 0.0 40.0 United-States <=50K.
13 26 Private 82091.0 HS-grad 9.0 Never-married Adm-clerical Not-in-family White 1 0.0 0.0 39.0 United-States <=50K.
14 58 NaN 299831.0 HS-grad 9.0 Married-civ-spouse NaN Husband White 0 0.0 0.0 35.0 United-States <=50K.
15 48 Private 279724.0 HS-grad 9.0 Married-civ-spouse Machine-op-inspct Husband White 0 3103.0 0.0 48.0 United-States >50K.
16 43 Private 346189.0 Masters 14.0 Married-civ-spouse Exec-managerial Husband White 0 0.0 0.0 50.0 United-States >50K.
17 20 State-gov 444554.0 Some-college 10.0 Never-married Other-service Own-child White 0 0.0 0.0 25.0 United-States <=50K.
18 43 Private 128354.0 HS-grad 9.0 Married-civ-spouse Adm-clerical Wife White 1 0.0 0.0 30.0 United-States <=50K.
19 37 Private 60548.0 HS-grad 9.0 Widowed Machine-op-inspct Unmarried White 1 0.0 0.0 20.0 United-States <=50K.
只是为了更好地组织数据:
因为我们也有 Nan
值,所以我们最好将它们合并到 dict
中,例如 m = {'Male': 0, 'Female': 1, np.nan: 0}
所以,我们可以将它们全部映射而不是使用 fillna
稍后。
df = pd.read_csv('adult_test.csv')
df = df.apply(lambda x: x.str.strip() if x.dtype == "object" else x)
m = {'Male': 0, 'Female': 1, np.nan: 0}
df['Sex'] = df['Sex'].map(m)
print(df.head(20))
replace
的另一个解决方案:
只需使用 replace
,同时再次使用 dict
...
df = pd.read_csv('adult_test.csv')
df = df.apply(lambda x: x.str.strip() if x.dtype == "object" else x)
m = {'Male': 0, 'Female': 1, np.nan: 0}
df = df.replace({'Sex': m})
print(df.head(20))
这里参考@jpp的回答
你好,我有一些数据,在“性别”列中,它被列为男性或女性,当这个数据被翻译成 google colab 时,它在行中将所有数据表示为 NaN “性别”,我想知道是否有办法让这些数据代表男性 0 和女性 1。我已尝试使用替换功能,但我一直收到与图片中所示相同的错误,有帮助吗?Code/Error Data
您正在尝试用整数替换字符串值,您应该输入
df.sex.replace(to_replace=["Male","Female"], value=["0", "1"])
然后将其转换为整数,如果你想使用类似这样的方法走这条路。
df['sex'] = df['sex'].astype(int)
enter code here
# import pandas library
import pandas as pd
data = pd.read_csv(file)
# creating a dict file
gender = {'male': 1,'female': 2}
# traversing through dataframe
# Gender column and writing
# values where key matches
data.Gender = [gender[item] for item in data.Gender]
print(data)
只是为了重现您的示例数据,并在解析它以获得所需结果的方式中进行了解释,希望它会有所帮助。
#!/home/Karn_python3/bin/python
from __future__ import (absolute_import, division, print_function)
import pandas as pd
import numpy as np
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('max_colwidth', None)
pd.set_option('expand_frame_repr', False)
# Read CSV and create dataframe.
df = pd.read_csv('adult_test.csv')
# It appears as your column name might have spaces around it, so let's trim them first.
# first to avoid any mapping/processing issues of data
df = df.apply(lambda x: x.str.strip() if x.dtype == "object" else x)
# Create a dictionary and map that to the desired column, which is easy and
# faster than replace.
m = {'Male': 0, 'Female': 1}
# As there may be Nan values so, better to fill them with int values
# whatever you like as used fillna & used 0 and convert the dtype to int
# otherwise you will get it float.
df['Sex'] = df['Sex'].map(m).fillna(0).astype(int)
print(df.head(20))
结果输出:
Age Workclass fnlwgt Education Education_Num Martial_Status Occupation Relationship Race Sex Capital_Gain Capital_Loss Hours_per_week Country Target
0 |1x3 Cross validator NaN NaN NaN NaN NaN NaN NaN NaN 0 NaN NaN NaN NaN NaN
1 25 Private 226802.0 11th 7.0 Never-married Machine-op-inspct Own-child Black 0 0.0 0.0 40.0 United-States <=50K.
2 38 Private 89814.0 HS-grad 9.0 Married-civ-spouse Farming-fishing Husband White 0 0.0 0.0 50.0 United-States <=50K.
3 28 Local-gov 336951.0 Assoc-acdm 12.0 Married-civ-spouse Protective-serv Husband White 0 0.0 0.0 40.0 United-States >50K.
4 44 Private 160323.0 Some-college 10.0 Married-civ-spouse Machine-op-inspct Husband Black 0 7688.0 0.0 40.0 United-States >50K.
5 18 NaN 103497.0 Some-college 10.0 Never-married NaN Own-child White 1 0.0 0.0 30.0 United-States <=50K.
6 34 Private 198693.0 10th 6.0 Never-married Other-service Not-in-family White 0 0.0 0.0 30.0 United-States <=50K.
7 29 NaN 227026.0 HS-grad 9.0 Never-married NaN Unmarried Black 0 0.0 0.0 40.0 United-States <=50K.
8 63 Self-emp-not-inc 104626.0 Prof-school 15.0 Married-civ-spouse Prof-specialty Husband White 0 3103.0 0.0 32.0 United-States >50K.
9 24 Private 369667.0 Some-college 10.0 Never-married Other-service Unmarried White 1 0.0 0.0 40.0 United-States <=50K.
10 55 Private 104996.0 7th-8th 4.0 Married-civ-spouse Craft-repair Husband White 0 0.0 0.0 10.0 United-States <=50K.
11 65 Private 184454.0 HS-grad 9.0 Married-civ-spouse Machine-op-inspct Husband White 0 6418.0 0.0 40.0 United-States >50K.
12 36 Federal-gov 212465.0 Bachelors 13.0 Married-civ-spouse Adm-clerical Husband White 0 0.0 0.0 40.0 United-States <=50K.
13 26 Private 82091.0 HS-grad 9.0 Never-married Adm-clerical Not-in-family White 1 0.0 0.0 39.0 United-States <=50K.
14 58 NaN 299831.0 HS-grad 9.0 Married-civ-spouse NaN Husband White 0 0.0 0.0 35.0 United-States <=50K.
15 48 Private 279724.0 HS-grad 9.0 Married-civ-spouse Machine-op-inspct Husband White 0 3103.0 0.0 48.0 United-States >50K.
16 43 Private 346189.0 Masters 14.0 Married-civ-spouse Exec-managerial Husband White 0 0.0 0.0 50.0 United-States >50K.
17 20 State-gov 444554.0 Some-college 10.0 Never-married Other-service Own-child White 0 0.0 0.0 25.0 United-States <=50K.
18 43 Private 128354.0 HS-grad 9.0 Married-civ-spouse Adm-clerical Wife White 1 0.0 0.0 30.0 United-States <=50K.
19 37 Private 60548.0 HS-grad 9.0 Widowed Machine-op-inspct Unmarried White 1 0.0 0.0 20.0 United-States <=50K.
只是为了更好地组织数据:
因为我们也有 Nan
值,所以我们最好将它们合并到 dict
中,例如 m = {'Male': 0, 'Female': 1, np.nan: 0}
所以,我们可以将它们全部映射而不是使用 fillna
稍后。
df = pd.read_csv('adult_test.csv')
df = df.apply(lambda x: x.str.strip() if x.dtype == "object" else x)
m = {'Male': 0, 'Female': 1, np.nan: 0}
df['Sex'] = df['Sex'].map(m)
print(df.head(20))
replace
的另一个解决方案:
只需使用 replace
,同时再次使用 dict
...
df = pd.read_csv('adult_test.csv')
df = df.apply(lambda x: x.str.strip() if x.dtype == "object" else x)
m = {'Male': 0, 'Female': 1, np.nan: 0}
df = df.replace({'Sex': m})
print(df.head(20))
这里参考@jpp的回答