为什么在使用 qcut 后我的数据值被 "NaN" 替换了?
Why have my data values been replaced by "NaN" after using qcut?
我正在处理 pandas 9000 行和 6 列的数据框。在这一点上,我正在尝试将连续变量 'Experience' 年的工作转换为分类变量 'Level' 的专业知识(初学者 - 中级 - 高级 - 专家)对于 4 个工作中的每一个(商业经理) - 业务开发人员 - 网络营销人员 - 流量管理员)。
考虑到每个工作的年限不一样,我用“qcut”将数据分为4组如下:
(您可以运行下面的代码获取数据帧示例)
import pandas as pd
df = pd.DataFrame({'Job': ['Commercial Manager', 'Traffic Manager', 'Web Marketer', 'Commercial Manager', 'Commercial Manager', 'Web Marketer', 'Commercial Manager', 'Commercial Manager', 'Traffic Manager', 'Business Developer', 'Business Developer', 'Web Marketer', 'Traffic Manager', 'Traffic Manager', 'Commercial Manager', 'Business Developer', 'Traffic Manager', 'Commercial Manager', 'Business Developer', 'Business Developer', 'Web Marketer'],
'Experience': [1.00000, 3.00000, 3.00000, 1.50000, 2.00000, 6.00000, 0.00000, 4.00000, 8.00000, 5.00000, 0.50000, 3.00000, 3.00000, 0.00000, 2.00000, 3.00000, 0.50000, 3.00000, 3.00000, 8.00000, 3.50000]})
levels = ["beginner", "intermediate", "advanced", "expert"]
jobs = ["Commercial Manager", "Business Developer", "Web Marketer", "Traffic Manager"]
def convert(levels, jobs):
for j in jobs:
df["Level"] = pd.qcut(df.loc[df["Job"] == j, "Experience"].rank(method="first"), q = 4, labels = levels, duplicates = "drop")
return df
convert(levels, jobs)
这是使用“qcut”后的输出:
Job Experience Level
0 Commercial Manager 1.00000 NaN
1 Traffic Manager 3.00000 intermediate
2 Web Marketer 3.00000 NaN
3 Commercial Manager 1.50000 NaN
4 Commercial Manager 2.00000 NaN
5 Web Marketer 6.00000 NaN
6 Commercial Manager 0.00000 NaN
7 Commercial Manager 4.00000 NaN
8 Traffic Manager 8.00000 expert
9 Business Developer 5.00000 NaN
10 Business Developer 0.50000 NaN
11 Web Marketer 3.00000 NaN
12 Traffic Manager 3.00000 intermediate
13 Traffic Manager 0.00000 beginner
14 Commercial Manager 2.00000 NaN
15 Business Developer 3.00000 NaN
16 Traffic Manager 0.50000 beginner
17 Commercial Manager 3.00000 NaN
18 Business Developer 3.00000 NaN
19 Business Developer 8.00000 NaN
20 Web Marketer 3.50000 NaN
它似乎只适用于“流量管理器”,它用 NaN 取代了其他 level
体验。我真的迷路了。有什么帮助吗?
您想在 groupby 操作中执行此操作:
import numpy
import pandas
levels = ["beginner", "intermediate", "advanced", "expert"]
jobs = ["Commercial Manager", "Business Developer", "Web Marketer", "Traffic Manager"]
df = pandas.DataFrame({
'Job': numpy.random.choice(levels, size=150),
'Experience': numpy.random.uniform(0.25, 10.5, size=150)
}).assign(
level=df.groupby(['Job'])['Experience'] # for each unique job...
# apply a quantile (quartile) cut
.apply(lambda g: pd.qcut(g, q=4, labels=levels, duplicates="drop"))
)
# I would just change two things to what Paul suggested (jobs instead of levels and the rank(method="first") because there was still an error:
levels = ["beginner", "intermediate", "advanced", "expert"]
jobs = ["Commercial Manager", "Business Developer", "Web Marketer", "Traffic Manager"]
df = pandas.DataFrame({
'Job': numpy.random.choice(jobs, size=150),
'Experience': numpy.random.uniform(0.25, 10.5, size=150)
}).assign(
level=df.groupby(['Job'])['Experience'] # for each unique job...
# apply a quantile (quartile) cut
.apply(lambda g: pd.qcut(g.rank(method="first"), q=4, labels=levels, duplicates="drop"))
)
我正在处理 pandas 9000 行和 6 列的数据框。在这一点上,我正在尝试将连续变量 'Experience' 年的工作转换为分类变量 'Level' 的专业知识(初学者 - 中级 - 高级 - 专家)对于 4 个工作中的每一个(商业经理) - 业务开发人员 - 网络营销人员 - 流量管理员)。
考虑到每个工作的年限不一样,我用“qcut”将数据分为4组如下:
(您可以运行下面的代码获取数据帧示例)
import pandas as pd
df = pd.DataFrame({'Job': ['Commercial Manager', 'Traffic Manager', 'Web Marketer', 'Commercial Manager', 'Commercial Manager', 'Web Marketer', 'Commercial Manager', 'Commercial Manager', 'Traffic Manager', 'Business Developer', 'Business Developer', 'Web Marketer', 'Traffic Manager', 'Traffic Manager', 'Commercial Manager', 'Business Developer', 'Traffic Manager', 'Commercial Manager', 'Business Developer', 'Business Developer', 'Web Marketer'],
'Experience': [1.00000, 3.00000, 3.00000, 1.50000, 2.00000, 6.00000, 0.00000, 4.00000, 8.00000, 5.00000, 0.50000, 3.00000, 3.00000, 0.00000, 2.00000, 3.00000, 0.50000, 3.00000, 3.00000, 8.00000, 3.50000]})
levels = ["beginner", "intermediate", "advanced", "expert"]
jobs = ["Commercial Manager", "Business Developer", "Web Marketer", "Traffic Manager"]
def convert(levels, jobs):
for j in jobs:
df["Level"] = pd.qcut(df.loc[df["Job"] == j, "Experience"].rank(method="first"), q = 4, labels = levels, duplicates = "drop")
return df
convert(levels, jobs)
这是使用“qcut”后的输出:
Job Experience Level
0 Commercial Manager 1.00000 NaN
1 Traffic Manager 3.00000 intermediate
2 Web Marketer 3.00000 NaN
3 Commercial Manager 1.50000 NaN
4 Commercial Manager 2.00000 NaN
5 Web Marketer 6.00000 NaN
6 Commercial Manager 0.00000 NaN
7 Commercial Manager 4.00000 NaN
8 Traffic Manager 8.00000 expert
9 Business Developer 5.00000 NaN
10 Business Developer 0.50000 NaN
11 Web Marketer 3.00000 NaN
12 Traffic Manager 3.00000 intermediate
13 Traffic Manager 0.00000 beginner
14 Commercial Manager 2.00000 NaN
15 Business Developer 3.00000 NaN
16 Traffic Manager 0.50000 beginner
17 Commercial Manager 3.00000 NaN
18 Business Developer 3.00000 NaN
19 Business Developer 8.00000 NaN
20 Web Marketer 3.50000 NaN
它似乎只适用于“流量管理器”,它用 NaN 取代了其他 level
体验。我真的迷路了。有什么帮助吗?
您想在 groupby 操作中执行此操作:
import numpy
import pandas
levels = ["beginner", "intermediate", "advanced", "expert"]
jobs = ["Commercial Manager", "Business Developer", "Web Marketer", "Traffic Manager"]
df = pandas.DataFrame({
'Job': numpy.random.choice(levels, size=150),
'Experience': numpy.random.uniform(0.25, 10.5, size=150)
}).assign(
level=df.groupby(['Job'])['Experience'] # for each unique job...
# apply a quantile (quartile) cut
.apply(lambda g: pd.qcut(g, q=4, labels=levels, duplicates="drop"))
)
# I would just change two things to what Paul suggested (jobs instead of levels and the rank(method="first") because there was still an error:
levels = ["beginner", "intermediate", "advanced", "expert"]
jobs = ["Commercial Manager", "Business Developer", "Web Marketer", "Traffic Manager"]
df = pandas.DataFrame({
'Job': numpy.random.choice(jobs, size=150),
'Experience': numpy.random.uniform(0.25, 10.5, size=150)
}).assign(
level=df.groupby(['Job'])['Experience'] # for each unique job...
# apply a quantile (quartile) cut
.apply(lambda g: pd.qcut(g.rank(method="first"), q=4, labels=levels, duplicates="drop"))
)