使用 pandas 将唯一数字转换为 md5 哈希
Convert unique numbers to md5 hash using pandas
大家早上好
我想将我的社会安全号码转换为 md5 哈希十六进制数。结果应该是每个社会保险号的唯一 md5 哈希十六进制数。
我的数据格式如下:
ob = onboard[['regions','lname','ssno']][:10]
ob
regions lname ssno
0 Northern Region (R1) Banderas 123456789
1 Northern Region (R1) Garfield 234567891
2 Northern Region (R1) Pacino 345678912
3 Northern Region (R1) Baldwin 456789123
4 Northern Region (R1) Brody 567891234
5 Northern Region (R1) Johnson 6789123456
6 Northern Region (R1) Guinness 7890123456
7 Northern Region (R1) Hopkins 891234567
8 Northern Region (R1) Paul 891234567
9 Northern Region (R1) Arkin 987654321
我使用 hashlib
尝试了以下代码:
import hashlib
ob['md5'] = hashlib.md5(['ssno'])
这给了我一个错误,它必须是一个字符串而不是一个列表。所以我尝试了以下方法:
ob['md5'] = hashlib.md5('ssno').hexdigest()
regions lname ssno md5
0 Northern Region (R1) Banderas 123456789 a1b3ec3d8a026d392ad551701ad7881e
1 Northern Region (R1) Garfield 234567891 a1b3ec3d8a026d392ad551701ad7881e
2 Northern Region (R1) Pacino 345678912 a1b3ec3d8a026d392ad551701ad7881e
3 Northern Region (R1) Baldwin 456789123 a1b3ec3d8a026d392ad551701ad7881e
4 Northern Region (R1) Brody 567891234 a1b3ec3d8a026d392ad551701ad7881e
5 Northern Region (R1) Johnson 678912345 a1b3ec3d8a026d392ad551701ad7881e
6 Northern Region (R1) Johnson 789123456 a1b3ec3d8a026d392ad551701ad7881e
7 Northern Region (R1) Guiness 891234567 a1b3ec3d8a026d392ad551701ad7881e
8 Northern Region (R1) Hopkins 912345678 a1b3ec3d8a026d392ad551701ad7881e
9 Northern Region (R1) Paul 159753456 a1b3ec3d8a026d392ad551701ad7881e
这与我需要的非常接近,但无论社会保险号是否不同,所有十六进制数字的结果都是一样的。我正在尝试为每个社会安全号码获取一个具有唯一十六进制数字的十六进制数字。
有什么建议吗?
hashlib.md5
将单个字符串作为输入——您不能像某些 NumPy/Pandas 函数那样向它传递一个值数组。因此,您可以使用 list comprehension 构建 md5sums 列表:
ob['md5'] = [hashlib.md5(val).hexdigest() for val in ob['ssno']]
如果您要散列为 SHA256,则需要先将字符串编码为(可能)UTF-8:
ob['sha256'] = [hashlib.sha256(val.encode('UTF-8')).hexdigest() for val in ob['ssno']]
大家早上好
我想将我的社会安全号码转换为 md5 哈希十六进制数。结果应该是每个社会保险号的唯一 md5 哈希十六进制数。
我的数据格式如下:
ob = onboard[['regions','lname','ssno']][:10]
ob
regions lname ssno
0 Northern Region (R1) Banderas 123456789
1 Northern Region (R1) Garfield 234567891
2 Northern Region (R1) Pacino 345678912
3 Northern Region (R1) Baldwin 456789123
4 Northern Region (R1) Brody 567891234
5 Northern Region (R1) Johnson 6789123456
6 Northern Region (R1) Guinness 7890123456
7 Northern Region (R1) Hopkins 891234567
8 Northern Region (R1) Paul 891234567
9 Northern Region (R1) Arkin 987654321
我使用 hashlib
尝试了以下代码:
import hashlib
ob['md5'] = hashlib.md5(['ssno'])
这给了我一个错误,它必须是一个字符串而不是一个列表。所以我尝试了以下方法:
ob['md5'] = hashlib.md5('ssno').hexdigest()
regions lname ssno md5
0 Northern Region (R1) Banderas 123456789 a1b3ec3d8a026d392ad551701ad7881e
1 Northern Region (R1) Garfield 234567891 a1b3ec3d8a026d392ad551701ad7881e
2 Northern Region (R1) Pacino 345678912 a1b3ec3d8a026d392ad551701ad7881e
3 Northern Region (R1) Baldwin 456789123 a1b3ec3d8a026d392ad551701ad7881e
4 Northern Region (R1) Brody 567891234 a1b3ec3d8a026d392ad551701ad7881e
5 Northern Region (R1) Johnson 678912345 a1b3ec3d8a026d392ad551701ad7881e
6 Northern Region (R1) Johnson 789123456 a1b3ec3d8a026d392ad551701ad7881e
7 Northern Region (R1) Guiness 891234567 a1b3ec3d8a026d392ad551701ad7881e
8 Northern Region (R1) Hopkins 912345678 a1b3ec3d8a026d392ad551701ad7881e
9 Northern Region (R1) Paul 159753456 a1b3ec3d8a026d392ad551701ad7881e
这与我需要的非常接近,但无论社会保险号是否不同,所有十六进制数字的结果都是一样的。我正在尝试为每个社会安全号码获取一个具有唯一十六进制数字的十六进制数字。
有什么建议吗?
hashlib.md5
将单个字符串作为输入——您不能像某些 NumPy/Pandas 函数那样向它传递一个值数组。因此,您可以使用 list comprehension 构建 md5sums 列表:
ob['md5'] = [hashlib.md5(val).hexdigest() for val in ob['ssno']]
如果您要散列为 SHA256,则需要先将字符串编码为(可能)UTF-8:
ob['sha256'] = [hashlib.sha256(val.encode('UTF-8')).hexdigest() for val in ob['ssno']]