用数据框中的新值替换唯一值,pandas?
Replace unique values with new values in dataframe, pandas?
我有如下所示的数据框,我想通过替换列的唯一值来使其不敏感。即我想用从 'faker' 库生成的一些假姓氏替换姓氏列。
代码片段如下。
import pandas as pd
from faker import Faker
fake = Faker()
print(fake.first_name())
print(fake.last_name())
last = ('Meyer', 'Maier', 'Meyer', 'Mayer', 'Meyr', 'Mair')
job = ('data analyst', 'programmer', 'computer scientist',
'data scientist', 'accountant', 'psychiatrist')
language = ('Python', 'Perl', 'Java', 'Java', 'Cobol', 'Brainfuck')
df = pd.DataFrame(list(zip(last, job, language)),
columns =['last', 'job', 'language'],
index=first)
我想要的输出是用假名更改姓氏列,但例如,Meyer 应始终替换为相同的假姓氏。
获取所有唯一名称,创建一个映射唯一名称 -> 假名称的字典,并将其映射到您的列:
import pandas as pd
first = ('Mike', 'Dorothee', 'Tom', 'Bill', 'Pete', 'Kate')
last = ('Meyer', 'Maier', 'Meyer', 'Mayer', 'Meyr', 'Mair')
job = ('data analyst', 'programmer', 'computer scientist',
'data scientist', 'accountant', 'psychiatrist')
language = ('Python', 'Perl', 'Java', 'Java', 'Cobol', 'Brainfuck')
df = pd.DataFrame(list(zip(last, job, language)),
columns =['last', 'job', 'language'],
index=first)
print(df)
# get all unique names - this can easily hande a couple tenthousand names
all_names = set(df["last"])
# create mapper: you would use fake.last_name() instead of 42+i
# mapper = {k: fake.last_name() for k in all_names }
mapper = {k: 42 + i for i, k in enumerate(all_names )}
# apply it
df["last"] = df["last"].map(mapper)
print(df)
输出:
# before
last job language
Mike Meyer data analyst Python
Dorothee Maier programmer Perl
Tom Meyer computer scientist Java
Bill Mayer data scientist Java
Pete Meyr accountant Cobol
Kate Mair psychiatrist Brainfuck
# after
last job language
Mike 44 data analyst Python
Dorothee 43 programmer Perl
Tom 44 computer scientist Java
Bill 45 data scientist Java
Pete 46 accountant Cobol
Kate 47 psychiatrist Brainfuck
我有如下所示的数据框,我想通过替换列的唯一值来使其不敏感。即我想用从 'faker' 库生成的一些假姓氏替换姓氏列。
代码片段如下。
import pandas as pd
from faker import Faker
fake = Faker()
print(fake.first_name())
print(fake.last_name())
last = ('Meyer', 'Maier', 'Meyer', 'Mayer', 'Meyr', 'Mair')
job = ('data analyst', 'programmer', 'computer scientist',
'data scientist', 'accountant', 'psychiatrist')
language = ('Python', 'Perl', 'Java', 'Java', 'Cobol', 'Brainfuck')
df = pd.DataFrame(list(zip(last, job, language)),
columns =['last', 'job', 'language'],
index=first)
我想要的输出是用假名更改姓氏列,但例如,Meyer 应始终替换为相同的假姓氏。
获取所有唯一名称,创建一个映射唯一名称 -> 假名称的字典,并将其映射到您的列:
import pandas as pd
first = ('Mike', 'Dorothee', 'Tom', 'Bill', 'Pete', 'Kate')
last = ('Meyer', 'Maier', 'Meyer', 'Mayer', 'Meyr', 'Mair')
job = ('data analyst', 'programmer', 'computer scientist',
'data scientist', 'accountant', 'psychiatrist')
language = ('Python', 'Perl', 'Java', 'Java', 'Cobol', 'Brainfuck')
df = pd.DataFrame(list(zip(last, job, language)),
columns =['last', 'job', 'language'],
index=first)
print(df)
# get all unique names - this can easily hande a couple tenthousand names
all_names = set(df["last"])
# create mapper: you would use fake.last_name() instead of 42+i
# mapper = {k: fake.last_name() for k in all_names }
mapper = {k: 42 + i for i, k in enumerate(all_names )}
# apply it
df["last"] = df["last"].map(mapper)
print(df)
输出:
# before
last job language
Mike Meyer data analyst Python
Dorothee Maier programmer Perl
Tom Meyer computer scientist Java
Bill Mayer data scientist Java
Pete Meyr accountant Cobol
Kate Mair psychiatrist Brainfuck
# after
last job language
Mike 44 data analyst Python
Dorothee 43 programmer Perl
Tom 44 computer scientist Java
Bill 45 data scientist Java
Pete 46 accountant Cobol
Kate 47 psychiatrist Brainfuck