正则表达式匹配除重复之外的所有行

Regex to match all lines except duplicate

我有这段文字:

156.48.459.20 - - [11/Aug/2019
156.48.459.20 - - [11/Aug/2019
235.145.41.12 - - [11/Aug/2019
235.145.41.12 - - [11/Aug/2019
66.23.114.251 - - [11/Aug/2019

我想匹配当天的所有行,所以我制作了这个简单的正则表达式 '/.*11\/Aug\/2019.*'

如您所见,文本中有两个重复的 IP,我不想匹配重复的行,所以我搜索了一下,发现了这个正则表达式:(.).* DEMO 虽然这个正则表达式有点奇怪我试图将它应用到我当前的正则表达式中,所以我做了:(.*11\/Aug\/2019.*),它没有用。有人可以帮忙吗?

这是我想要的结果:

156.48.459.20 - - [11/Aug/2019
235.145.41.12 - - [11/Aug/2019
66.23.114.251 - - [11/Aug/2019

注意:我正在使用函数 preg_match_all()

preg_match_all('/(.*11\/Aug\/2019.*)/', $input_lines, $output_array);

是否需要纯正则表达式?

您可以使用 PHP 获取唯一身份:

<?php
$input_lines = '156.48.459.20 - - [11/Aug/2019
156.48.459.20 - - [11/Aug/2019
235.145.41.12 - - [11/Aug/2019
235.145.41.12 - - [11/Aug/2019
66.23.114.251 - - [11/Aug/2019';

preg_match_all( '/.*11\/Aug\/2019/m', $input_lines, $output_array );

// PHP associative array abuse incoming
// Flip the array so that the values become keys and flip it back
// This guarantees that only uniques survive
$output_array[ 0 ] = array_keys( array_flip( $output_array[ 0 ] ) );

var_dump( $output_array );

输出:

array(1) {
  [0]=>
  array(3) {
    [1]=>
    string(30) "156.48.459.20 - - [11/Aug/2019"
    [3]=>
    string(30) "235.145.41.12 - - [11/Aug/2019"
    [4]=>
    string(30) "66.23.114.251 - - [11/Aug/2019"
  }
}
$txt = <<<'EOD'
156.48.459.20 - - [11/Aug/2019
156.48.459.20 - - [11/Aug/2019
235.145.41.12 - - [11/Aug/2019
235.145.41.12 - - [11/Aug/2019
66.23.114.251 - - [11/Aug/2019
EOD;

$url = 'data:text/plain;base64,' . base64_encode($txt);
// change this line with the url of your log file: $url = '/path/to/file.log';

$result = [];

if ( false !== $handle = fopen($url, 'r') ) {
    while ( false !== $data = fgetcsv($handle, 1000, ' ') ) {
        if ( $data[3] === '[11/Aug/2019' )
            $result[$data[0]] = 1;
    }
}

$result = array_keys($result);

print_r($result);

几乎是 1 班轮

'~(?m)^(?:([\d.]*[- ]*\[11/Aug/2019.*)\R*(?=[\S\s]*?)|(?!.*\[11/Aug/2019).*\R*)~'

Sample

Php

 $target = <<<'EOS'
 156.48.459.20 - - [11/Aug/2019
 156.48.459.20 - - [11/Aug/2019
 235.145.41.12 - - [11/Aug/2019
 235.145.41.12 - - [11/Aug/2019
 66.23.114.251 - - [11/Aug/2019
 66.23.114.251 - - [09/Aug/2019
 156.48.459.20 - - [11/Aug/2019
 235.145.41.12 - - [11/Aug/2019
 66.23.114.251 - - [01/Aug/2019
 66.23.114.251 - - [11/Aug/2019
 235.145.41.12 - - [11/Aug/2019
 EOS;


 $res = preg_replace ( '~(?m)^(?:([\d.]*[- ]*\[11/Aug/2019.*)\R*(?=[\S\s]*?)|(?!.*\[11/Aug/2019).*\R*)~', '', $target );

 echo $res."\n";

输出

156.48.459.20 - - [11/Aug/2019
66.23.114.251 - - [11/Aug/2019
235.145.41.12 - - [11/Aug/2019

更好看

 (?m)
 ^ 
 (?:
      ( [\d.]* [- ]* \[ 11/Aug/2019 .* )  # (1)
      \R* 
      (?= [\S\s]*?  )
   |  
      (?! .* \[ 11/Aug/2019 )
      .*  \R* 
 )