去掉重复的单词句子
Remove duplicate word sentence
我有一个句子列表。
我想像这样处理重复项:
- 白鞋女
- 鞋子女士白色
- 女款小白鞋
我想做成这个:
- 白鞋女
我可以在 Notepad++ 中执行此操作吗?
或者其他软件?
使用 "some other software" 选项。
input.txt
文件的内容:
White shoes women
Shoes women white
Women white shoes
Men black boots
Black boots men
Boots men black
girl yellow shirt
yellow girl shirt
pants blue boy
Python 3:
sentences = []
with open('input.txt', mode='r') as infile:
for line in infile:
wordlist = line.split(' ')
words = []
for word in wordlist:
word = word.strip()
words.append(word.lower())
words.sort()
if words not in sentences:
sentences.append(words)
with open('output.txt', mode='w') as outfile:
for sentence in sentences:
for word in sentence:
outfile.write(word + ' ')
outfile.write('\n')
output.txt
文件的内容:
shoes white women
black boots men
girl shirt yellow
blue boy pants
我认为你在 Npp 做不到这样的工作。
这里有一种使用 perl 来完成这项工作的方法,可以保持第一行的大小写和顺序。
(感谢@jwpfox 提供输入示例)。
use Modern::Perl;
my $prev = '';
while(<DATA>) {
chomp;
my $str = join'',sort split' ',lc$_;
say $_ if $str ne $prev;
$prev = $str;
}
__DATA__
White shoes women
Shoes women white
Women white shoes
White shoes women
Shoes women white
Women white shoes
Men black boots
Black boots men
Boots men black
girl yellow shirt
yellow girl shirt
pants blue boy
输出:
White shoes women
Men black boots
girl yellow shirt
pants blue boy
PHP中的一个版本:
$s = array(
'White shoes women',
'Shoes women white',
'Women white shoes',
'White shoes women',
'Shoes women white',
'Women white shoes',
'Men black boots',
'Black boots men',
'Boots men black',
'girl yellow shirt',
'yellow girl shirt',
'pants blue boy');
$prev = '';
foreach($s as $line) {
$list = explode(' ', strtolower($line));
sort($list);
$str = implode('',$list);
if ($str != $prev) echo $line,"\n";
$prev = $str;
}
输出:
White shoes women
Men black boots
girl yellow shirt
pants blue boy
我有一个句子列表。
我想像这样处理重复项:
- 白鞋女
- 鞋子女士白色
- 女款小白鞋
我想做成这个:
- 白鞋女
我可以在 Notepad++ 中执行此操作吗?
或者其他软件?
使用 "some other software" 选项。
input.txt
文件的内容:
White shoes women
Shoes women white
Women white shoes
Men black boots
Black boots men
Boots men black
girl yellow shirt
yellow girl shirt
pants blue boy
Python 3:
sentences = []
with open('input.txt', mode='r') as infile:
for line in infile:
wordlist = line.split(' ')
words = []
for word in wordlist:
word = word.strip()
words.append(word.lower())
words.sort()
if words not in sentences:
sentences.append(words)
with open('output.txt', mode='w') as outfile:
for sentence in sentences:
for word in sentence:
outfile.write(word + ' ')
outfile.write('\n')
output.txt
文件的内容:
shoes white women
black boots men
girl shirt yellow
blue boy pants
我认为你在 Npp 做不到这样的工作。
这里有一种使用 perl 来完成这项工作的方法,可以保持第一行的大小写和顺序。
(感谢@jwpfox 提供输入示例)。
use Modern::Perl;
my $prev = '';
while(<DATA>) {
chomp;
my $str = join'',sort split' ',lc$_;
say $_ if $str ne $prev;
$prev = $str;
}
__DATA__
White shoes women
Shoes women white
Women white shoes
White shoes women
Shoes women white
Women white shoes
Men black boots
Black boots men
Boots men black
girl yellow shirt
yellow girl shirt
pants blue boy
输出:
White shoes women
Men black boots
girl yellow shirt
pants blue boy
PHP中的一个版本:
$s = array(
'White shoes women',
'Shoes women white',
'Women white shoes',
'White shoes women',
'Shoes women white',
'Women white shoes',
'Men black boots',
'Black boots men',
'Boots men black',
'girl yellow shirt',
'yellow girl shirt',
'pants blue boy');
$prev = '';
foreach($s as $line) {
$list = explode(' ', strtolower($line));
sort($list);
$str = implode('',$list);
if ($str != $prev) echo $line,"\n";
$prev = $str;
}
输出:
White shoes women
Men black boots
girl yellow shirt
pants blue boy