计算大型数据框中的不同字符
Counting different characters in a large dataframe
我想计算不同单词在数据框中出现的次数,然后将其重新制作成显示每个单词计数的新数据框。
比如我有这样一个数据table:
Col1
Col2
Col3
Col4
Col5
Continues...
Passwords1
GHSME12
POWDER2
JOHNC
PLOW01
PLANE
Usercode20
HUNG1
GHSME12
PLOW01
GORGE09
JOHNC
Usercode15
PLOW01
GORGE09
JOHNC
POWDER2
SYRUP9
Continues...
...
...
...
...
...
我希望能够计算数据中每个单词在每个 Col1 中出现的次数。虽然我可以做诸如 WordX = wordX 的项目数之类的事情,但有数百个密码,使得手动计数变得困难,所以我想知道在这种情况下我是否必须使用 for 循环和空白数据框来实现这样的事情:
Passwords
Passwords1
Usercode20
Usercode15
Continues...
GHSME12
1
1
0
...
POWDER2
1
0
1
...
JOHNC
1
1
1
...
PLOW01
1
1
1
...
PLANE
1
0
0
...
HUNG1
0
1
0
...
GORGE09
0
1
1
...
SYRUP9
0
0
1
...
如果有人对解决这个问题有好的想法,我将不胜感激。谢谢!
table(cbind(stack(df, -Col1)['values'], df['Col1']))
Col1
values Passwords1 Usercode15 Usercode20
GHSME12 1 0 1
GORGE09 0 1 1
HUNG1 0 0 1
JOHNC 1 1 1
PLANE 1 0 0
PLOW01 1 1 1
POWDER2 1 1 0
SYRUP9 0 1 0
整洁宇宙:
library(tidyverse)
df %>%
pivot_longer(-Col1) %>%
pivot_wider(names_from = Col1, values_from = name,
values_fn = length, values_fill = 0)
# A tibble: 8 x 4
value Passwords1 Usercode20 Usercode15
<chr> <int> <int> <int>
1 GHSME12 1 1 0
2 POWDER2 1 0 1
3 JOHNC 1 1 1
4 PLOW01 1 1 1
5 PLANE 1 0 0
6 HUNG1 0 1 0
7 GORGE09 0 1 1
8 SYRUP9 0 0 1
我想计算不同单词在数据框中出现的次数,然后将其重新制作成显示每个单词计数的新数据框。
比如我有这样一个数据table:
Col1 | Col2 | Col3 | Col4 | Col5 | Continues... |
---|---|---|---|---|---|
Passwords1 | GHSME12 | POWDER2 | JOHNC | PLOW01 | PLANE |
Usercode20 | HUNG1 | GHSME12 | PLOW01 | GORGE09 | JOHNC |
Usercode15 | PLOW01 | GORGE09 | JOHNC | POWDER2 | SYRUP9 |
Continues... | ... | ... | ... | ... | ... |
我希望能够计算数据中每个单词在每个 Col1 中出现的次数。虽然我可以做诸如 WordX = wordX 的项目数之类的事情,但有数百个密码,使得手动计数变得困难,所以我想知道在这种情况下我是否必须使用 for 循环和空白数据框来实现这样的事情:
Passwords | Passwords1 | Usercode20 | Usercode15 | Continues... |
---|---|---|---|---|
GHSME12 | 1 | 1 | 0 | ... |
POWDER2 | 1 | 0 | 1 | ... |
JOHNC | 1 | 1 | 1 | ... |
PLOW01 | 1 | 1 | 1 | ... |
PLANE | 1 | 0 | 0 | ... |
HUNG1 | 0 | 1 | 0 | ... |
GORGE09 | 0 | 1 | 1 | ... |
SYRUP9 | 0 | 0 | 1 | ... |
如果有人对解决这个问题有好的想法,我将不胜感激。谢谢!
table(cbind(stack(df, -Col1)['values'], df['Col1']))
Col1
values Passwords1 Usercode15 Usercode20
GHSME12 1 0 1
GORGE09 0 1 1
HUNG1 0 0 1
JOHNC 1 1 1
PLANE 1 0 0
PLOW01 1 1 1
POWDER2 1 1 0
SYRUP9 0 1 0
整洁宇宙:
library(tidyverse)
df %>%
pivot_longer(-Col1) %>%
pivot_wider(names_from = Col1, values_from = name,
values_fn = length, values_fill = 0)
# A tibble: 8 x 4
value Passwords1 Usercode20 Usercode15
<chr> <int> <int> <int>
1 GHSME12 1 1 0
2 POWDER2 1 0 1
3 JOHNC 1 1 1
4 PLOW01 1 1 1
5 PLANE 1 0 0
6 HUNG1 0 1 0
7 GORGE09 0 1 1
8 SYRUP9 0 0 1