php:将字符串拆分为关联数组的更好方法
php: better way to split string into associative array
我有这样的字符串:
"ALARM_ID/I4=1010001 ALARM_STATE/U4=eventcode ALARM_TEXT/A=WMR_MAP_EXPORT LOTS/A[1]=[ STEFANO ] ALARM_STATE/U1=1 WAFER/U4=1 VI_KLARF_MAP/A=/test/klarf.map KLARF_STEPID/A=StepID KLARF_DEVICEID/A=DeviceID KLARF_EQUIPMENTID/A=EquipmentID KLARF_SETUP_ID/A=SetupID RULE_ID/U4=1234 RULE_FORMULA_EXPRESSION/A=a < b && c > d RULE_FORMULA_TEXT/A=1 < 0 && 2 > 3 RULE_FORMULA_RESULT/A=FAIL TIMESTAMP/A=10-Nov-2020 09:10:11 99999999"
我的目标是拆分成关联数组:
Array
(
[ALARM_ID/I4] => 1010001
[ALARM_STATE/U4] => eventcode
[ALARM_TEXT/A] => WMR_MAP_EXPORT
[LOTS/A[1]] => [ STEFANO ]
[ALARM_STATE/U1] => 1
[WAFER/U4] => 1
[VI_KLARF_MAP/A] => /test/klarf.map
[KLARF_STEPID/A] => StepID
[KLARF_DEVICEID/A] => DeviceID
[KLARF_EQUIPMENTID/A] => EquipmentID
[KLARF_SETUP_ID/A] => SetupID
[RULE_ID/U4] => 1234
[RULE_FORMULA_EXPRESSION/A] => a < b && c > d
[RULE_FORMULA_TEXT/A] => 1 < 0 && 2 > 3
[RULE_FORMULA_RESULT/A] => FAIL
[TIMESTAMP/A] => 10-Nov-2020 09:10:11 99999999
)
我发现的独特(但可能很脏)的方式是通过这个脚本:
<?php
$msg = "ALARM_ID/I4=1010001 ALARM_STATE/U4=eventcode ALARM_TEXT/A=WMR_MAP_EXPORT LOTS/A[1]=[ STEFANO ] ALARM_STATE/U1=1 WAFER/U4=1 VI_KLARF_MAP/A=/test/klarf.map KLARF_STEPID/A=StepID KLARF_DEVICEID/A=DeviceID KLARF_EQUIPMENTID/A=EquipmentID KLARF_SETUP_ID/A=SetupID RULE_ID/U4=1234 RULE_FORMULA_EXPRESSION/A=a < b && c > d RULE_FORMULA_TEXT/A=1 < 0 && 2 > 3 RULE_FORMULA_RESULT/A=FAIL TIMESTAMP/A=10-Nov-2020 09:10:11 99999999";
$split = explode("=", $msg);
foreach($split as $k => $s) {
$s = explode(" ", $s);
$keys[] = array_pop($s);
if ($s) $values[] = implode(" ", $s);
}
/*
* this is needed if last parameter TIMESTAMP does not have ' ' (spaces) into value
*/
if (count($values) + 2 == count($keys)) array_push($values, array_pop($keys));
else $values[ count($values) - 1 ] .= " " . array_pop($keys);
$params = array_combine($keys, $values);
print_r($params);
?>
您是否看到更好的拆分方法,例如使用正则表达式或不同的(优雅的?)方法?
我使用基本的 PHP 功能管理了这段代码。我认为正则表达式使代码更难阅读。大多数时候,即使以拥有更冗长的代码为代价,最好不要使用正则表达式。也可能会对性能产生影响。
$message = "ALARM_ID/I4=1010001 ALARM_STATE/U4=eventcode ALARM_TEXT/A=WMR_MAP_EXPORT LOTS/A[1]=[ STEFANO ] ALARM_STATE/U1=1 WAFER/U4=1 VI_KLARF_MAP/A=/test/klarf.map KLARF_STEPID/A=StepID KLARF_DEVICEID/A=DeviceID KLARF_EQUIPMENTID/A=EquipmentID KLARF_SETUP_ID/A=SetupID RULE_ID/U4=1234 RULE_FORMULA_EXPRESSION/A=a < b && c > d RULE_FORMULA_TEXT/A=1 < 0 && 2 > 3 RULE_FORMULA_RESULT/A=FAIL TIMESTAMP/A=10-Nov-2020 09:10:11 99999999";
foreach (explode(' ', $message) as $word) {
if (strpos($word, '=')) {
if (isset($key)) $parameters[$key] = $value;
list($key, $value) = explode('=', $word);
}
else $value .= " $word";
}
$parameters[$key] = $value;
echo '<pre>';
print_r($parameters);
echo '</pre>';
我选择按空格拆分,然后我寻找 =
个字符以找到其中包含键的单词。
当然,还有其他方法可以做到这一点,但由于消息的格式很奇怪,所有方法都会涉及一些额外的工作。
此例程目前不能容忍消息字符串中的错误,但可以轻松扩展以容忍各种类型的输入错误。
您可以利用所有键中 /
的存在
([^\s=/]+/[^\s=]+)=(.*?)(?=\h+[^\s=/]+/|$)
说明
(
捕获 组 1
[^\s=/]+
匹配除空格之外的任何字符 0+ 次 =
或 /
/[^\s=]+
然后匹配 /
然后是密钥的其余部分
)
关闭组 1
=
字面匹配
(.*?)
捕获第2组,尽可能匹配除换行符外的任何字符
(?=\h+[^\s=/]+/|$)
断言包含 /
(如第 1 组中所用) 的格式
看到一个Regex demo and a Php demo。
示例代码
$re = '`([^\s=/]+/[^\s=]+)=(.*?)(?=\h+[^\s=/]+/|$)`';
$str = 'ALARM_ID/I4=1010001 ALARM_STATE/U4=eventcode ALARM_TEXT/A=WMR_MAP_EXPORT LOTS/A[1]=[ STEFANO ] ALARM_STATE/U1=1 WAFER/U4=1 VI_KLARF_MAP/A=/test/klarf.map KLARF_STEPID/A=StepID KLARF_DEVICEID/A=DeviceID KLARF_EQUIPMENTID/A=EquipmentID KLARF_SETUP_ID/A=SetupID RULE_ID/U4=1234 RULE_FORMULA_EXPRESSION/A=a < b && c > d RULE_FORMULA_TEXT/A=1 < 0 && 2 > 3 RULE_FORMULA_RESULT/A=FAIL TIMESTAMP/A=10-Nov-2020 09:10:11 99999999
';
preg_match_all($re, $str, $matches);
$result = array_combine($matches[1], $matches[2]);
print_r($result);
输出
Array
(
[ALARM_ID/I4] => 1010001
[ALARM_STATE/U4] => eventcode
[ALARM_TEXT/A] => WMR_MAP_EXPORT
[LOTS/A[1]] => [ STEFANO ]
[ALARM_STATE/U1] => 1
[WAFER/U4] => 1
[VI_KLARF_MAP/A] => /test/klarf.map
[KLARF_STEPID/A] => StepID
[KLARF_DEVICEID/A] => DeviceID
[KLARF_EQUIPMENTID/A] => EquipmentID
[KLARF_SETUP_ID/A] => SetupID
[RULE_ID/U4] => 1234
[RULE_FORMULA_EXPRESSION/A] => a < b && c > d
[RULE_FORMULA_TEXT/A] => 1 < 0 && 2 > 3
[RULE_FORMULA_RESULT/A] => FAIL
[TIMESTAMP/A] => 10-Nov-2020 09:10:11 99999999
)
如果键都应以下划线分隔的单词字符开头,您可以使用重复部分开始模式 [^\W_]+(?:_[^\W_]+)*
它将匹配除 _
之外的字符,然后重复匹配 _
后跟除 _
之外的字符,直到匹配 /
([^\W_]+(?:_[^\W_]+)*/[^\s=]*)=(.*?)(?=\h+[^\s=/]+/|$)
保持准确性的重要事情是确保“键”正确匹配。
键字符串永远不会包含 space 或等号。值字符串可能包含任何一个。值字符串将 运行 字符串的末尾或后跟 space 然后是下一个键(可能没有任何 space 或等号)。
key字符串可以在第一次遇到=
.
出现之前进行“贪婪”匹配
值字符串不能贪婪匹配。这样可以确保该值不会过度扩展到下一个键值对中。
值字符串后的前瞻确保潜在的后续键不是 damaged/consumed。
模式分解:
([^=]+) #capture one ore more non-equals sign (greedily) and store as capture group #1
= #match but do not capture an equals sign
(.+?) #capture one or more of any non-newline character (giving back when possible / non-greedy) and store as capture group #2
(?= #start lookahead
$ #match the end of the string
| #OR operator
[^ =]+= #match space, then one or more non-space and non-equals characters, then match equals sign
) #end lookahead
代码:(Demo)
$msg = "ALARM_ID/I4=1010001 ALARM_STATE/U4=eventcode ALARM_TEXT/A=WMR_MAP_EXPORT LOTS/A[1]=[ STEFANO ] ALARM_STATE/U1=1 WAFER/U4=1 VI_KLARF_MAP/A=/test/klarf.map KLARF_STEPID/A=StepID KLARF_DEVICEID/A=DeviceID KLARF_EQUIPMENTID/A=EquipmentID KLARF_SETUP_ID/A=SetupID RULE_ID/U4=1234 RULE_FORMULA_EXPRESSION/A=a < b && c > d RULE_FORMULA_TEXT/A=1 < 0 && 2 > 3 RULE_FORMULA_RESULT/A=FAIL TIMESTAMP/A=10-Nov-2020 09:10:11 99999999";
preg_match_all('~([^=]+)=(.+?)(?=$| [^ =]+=)~', $msg, $out);
var_export(array_combine($out[1], $out[2]));
输出:
array (
'ALARM_ID/I4' => '1010001',
'ALARM_STATE/U4' => 'eventcode',
'ALARM_TEXT/A' => 'WMR_MAP_EXPORT',
'LOTS/A[1]' => '[ STEFANO ]',
'ALARM_STATE/U1' => '1',
'WAFER/U4' => '1',
'VI_KLARF_MAP/A' => '/test/klarf.map',
'KLARF_STEPID/A' => 'StepID',
'KLARF_DEVICEID/A' => 'DeviceID',
'KLARF_EQUIPMENTID/A' => 'EquipmentID',
'KLARF_SETUP_ID/A' => 'SetupID',
'RULE_ID/U4' => '1234',
'RULE_FORMULA_EXPRESSION/A' => 'a < b && c > d',
'RULE_FORMULA_TEXT/A' => '1 < 0 && 2 > 3',
'RULE_FORMULA_RESULT/A' => 'FAIL',
'TIMESTAMP/A' => '10-Nov-2020 09:10:11 99999999',
)
我有这样的字符串:
"ALARM_ID/I4=1010001 ALARM_STATE/U4=eventcode ALARM_TEXT/A=WMR_MAP_EXPORT LOTS/A[1]=[ STEFANO ] ALARM_STATE/U1=1 WAFER/U4=1 VI_KLARF_MAP/A=/test/klarf.map KLARF_STEPID/A=StepID KLARF_DEVICEID/A=DeviceID KLARF_EQUIPMENTID/A=EquipmentID KLARF_SETUP_ID/A=SetupID RULE_ID/U4=1234 RULE_FORMULA_EXPRESSION/A=a < b && c > d RULE_FORMULA_TEXT/A=1 < 0 && 2 > 3 RULE_FORMULA_RESULT/A=FAIL TIMESTAMP/A=10-Nov-2020 09:10:11 99999999"
我的目标是拆分成关联数组:
Array
(
[ALARM_ID/I4] => 1010001
[ALARM_STATE/U4] => eventcode
[ALARM_TEXT/A] => WMR_MAP_EXPORT
[LOTS/A[1]] => [ STEFANO ]
[ALARM_STATE/U1] => 1
[WAFER/U4] => 1
[VI_KLARF_MAP/A] => /test/klarf.map
[KLARF_STEPID/A] => StepID
[KLARF_DEVICEID/A] => DeviceID
[KLARF_EQUIPMENTID/A] => EquipmentID
[KLARF_SETUP_ID/A] => SetupID
[RULE_ID/U4] => 1234
[RULE_FORMULA_EXPRESSION/A] => a < b && c > d
[RULE_FORMULA_TEXT/A] => 1 < 0 && 2 > 3
[RULE_FORMULA_RESULT/A] => FAIL
[TIMESTAMP/A] => 10-Nov-2020 09:10:11 99999999
)
我发现的独特(但可能很脏)的方式是通过这个脚本:
<?php
$msg = "ALARM_ID/I4=1010001 ALARM_STATE/U4=eventcode ALARM_TEXT/A=WMR_MAP_EXPORT LOTS/A[1]=[ STEFANO ] ALARM_STATE/U1=1 WAFER/U4=1 VI_KLARF_MAP/A=/test/klarf.map KLARF_STEPID/A=StepID KLARF_DEVICEID/A=DeviceID KLARF_EQUIPMENTID/A=EquipmentID KLARF_SETUP_ID/A=SetupID RULE_ID/U4=1234 RULE_FORMULA_EXPRESSION/A=a < b && c > d RULE_FORMULA_TEXT/A=1 < 0 && 2 > 3 RULE_FORMULA_RESULT/A=FAIL TIMESTAMP/A=10-Nov-2020 09:10:11 99999999";
$split = explode("=", $msg);
foreach($split as $k => $s) {
$s = explode(" ", $s);
$keys[] = array_pop($s);
if ($s) $values[] = implode(" ", $s);
}
/*
* this is needed if last parameter TIMESTAMP does not have ' ' (spaces) into value
*/
if (count($values) + 2 == count($keys)) array_push($values, array_pop($keys));
else $values[ count($values) - 1 ] .= " " . array_pop($keys);
$params = array_combine($keys, $values);
print_r($params);
?>
您是否看到更好的拆分方法,例如使用正则表达式或不同的(优雅的?)方法?
我使用基本的 PHP 功能管理了这段代码。我认为正则表达式使代码更难阅读。大多数时候,即使以拥有更冗长的代码为代价,最好不要使用正则表达式。也可能会对性能产生影响。
$message = "ALARM_ID/I4=1010001 ALARM_STATE/U4=eventcode ALARM_TEXT/A=WMR_MAP_EXPORT LOTS/A[1]=[ STEFANO ] ALARM_STATE/U1=1 WAFER/U4=1 VI_KLARF_MAP/A=/test/klarf.map KLARF_STEPID/A=StepID KLARF_DEVICEID/A=DeviceID KLARF_EQUIPMENTID/A=EquipmentID KLARF_SETUP_ID/A=SetupID RULE_ID/U4=1234 RULE_FORMULA_EXPRESSION/A=a < b && c > d RULE_FORMULA_TEXT/A=1 < 0 && 2 > 3 RULE_FORMULA_RESULT/A=FAIL TIMESTAMP/A=10-Nov-2020 09:10:11 99999999";
foreach (explode(' ', $message) as $word) {
if (strpos($word, '=')) {
if (isset($key)) $parameters[$key] = $value;
list($key, $value) = explode('=', $word);
}
else $value .= " $word";
}
$parameters[$key] = $value;
echo '<pre>';
print_r($parameters);
echo '</pre>';
我选择按空格拆分,然后我寻找 =
个字符以找到其中包含键的单词。
当然,还有其他方法可以做到这一点,但由于消息的格式很奇怪,所有方法都会涉及一些额外的工作。
此例程目前不能容忍消息字符串中的错误,但可以轻松扩展以容忍各种类型的输入错误。
您可以利用所有键中 /
的存在
([^\s=/]+/[^\s=]+)=(.*?)(?=\h+[^\s=/]+/|$)
说明
(
捕获 组 1[^\s=/]+
匹配除空格之外的任何字符 0+ 次=
或/
/[^\s=]+
然后匹配/
然后是密钥的其余部分
)
关闭组 1=
字面匹配(.*?)
捕获第2组,尽可能匹配除换行符外的任何字符(?=\h+[^\s=/]+/|$)
断言包含/
(如第 1 组中所用) 的格式
看到一个Regex demo and a Php demo。
示例代码
$re = '`([^\s=/]+/[^\s=]+)=(.*?)(?=\h+[^\s=/]+/|$)`';
$str = 'ALARM_ID/I4=1010001 ALARM_STATE/U4=eventcode ALARM_TEXT/A=WMR_MAP_EXPORT LOTS/A[1]=[ STEFANO ] ALARM_STATE/U1=1 WAFER/U4=1 VI_KLARF_MAP/A=/test/klarf.map KLARF_STEPID/A=StepID KLARF_DEVICEID/A=DeviceID KLARF_EQUIPMENTID/A=EquipmentID KLARF_SETUP_ID/A=SetupID RULE_ID/U4=1234 RULE_FORMULA_EXPRESSION/A=a < b && c > d RULE_FORMULA_TEXT/A=1 < 0 && 2 > 3 RULE_FORMULA_RESULT/A=FAIL TIMESTAMP/A=10-Nov-2020 09:10:11 99999999
';
preg_match_all($re, $str, $matches);
$result = array_combine($matches[1], $matches[2]);
print_r($result);
输出
Array
(
[ALARM_ID/I4] => 1010001
[ALARM_STATE/U4] => eventcode
[ALARM_TEXT/A] => WMR_MAP_EXPORT
[LOTS/A[1]] => [ STEFANO ]
[ALARM_STATE/U1] => 1
[WAFER/U4] => 1
[VI_KLARF_MAP/A] => /test/klarf.map
[KLARF_STEPID/A] => StepID
[KLARF_DEVICEID/A] => DeviceID
[KLARF_EQUIPMENTID/A] => EquipmentID
[KLARF_SETUP_ID/A] => SetupID
[RULE_ID/U4] => 1234
[RULE_FORMULA_EXPRESSION/A] => a < b && c > d
[RULE_FORMULA_TEXT/A] => 1 < 0 && 2 > 3
[RULE_FORMULA_RESULT/A] => FAIL
[TIMESTAMP/A] => 10-Nov-2020 09:10:11 99999999
)
如果键都应以下划线分隔的单词字符开头,您可以使用重复部分开始模式 [^\W_]+(?:_[^\W_]+)*
它将匹配除 _
之外的字符,然后重复匹配 _
后跟除 _
之外的字符,直到匹配 /
([^\W_]+(?:_[^\W_]+)*/[^\s=]*)=(.*?)(?=\h+[^\s=/]+/|$)
保持准确性的重要事情是确保“键”正确匹配。
键字符串永远不会包含 space 或等号。值字符串可能包含任何一个。值字符串将 运行 字符串的末尾或后跟 space 然后是下一个键(可能没有任何 space 或等号)。
key字符串可以在第一次遇到=
.
值字符串不能贪婪匹配。这样可以确保该值不会过度扩展到下一个键值对中。
值字符串后的前瞻确保潜在的后续键不是 damaged/consumed。
模式分解:
([^=]+) #capture one ore more non-equals sign (greedily) and store as capture group #1
= #match but do not capture an equals sign
(.+?) #capture one or more of any non-newline character (giving back when possible / non-greedy) and store as capture group #2
(?= #start lookahead
$ #match the end of the string
| #OR operator
[^ =]+= #match space, then one or more non-space and non-equals characters, then match equals sign
) #end lookahead
代码:(Demo)
$msg = "ALARM_ID/I4=1010001 ALARM_STATE/U4=eventcode ALARM_TEXT/A=WMR_MAP_EXPORT LOTS/A[1]=[ STEFANO ] ALARM_STATE/U1=1 WAFER/U4=1 VI_KLARF_MAP/A=/test/klarf.map KLARF_STEPID/A=StepID KLARF_DEVICEID/A=DeviceID KLARF_EQUIPMENTID/A=EquipmentID KLARF_SETUP_ID/A=SetupID RULE_ID/U4=1234 RULE_FORMULA_EXPRESSION/A=a < b && c > d RULE_FORMULA_TEXT/A=1 < 0 && 2 > 3 RULE_FORMULA_RESULT/A=FAIL TIMESTAMP/A=10-Nov-2020 09:10:11 99999999";
preg_match_all('~([^=]+)=(.+?)(?=$| [^ =]+=)~', $msg, $out);
var_export(array_combine($out[1], $out[2]));
输出:
array (
'ALARM_ID/I4' => '1010001',
'ALARM_STATE/U4' => 'eventcode',
'ALARM_TEXT/A' => 'WMR_MAP_EXPORT',
'LOTS/A[1]' => '[ STEFANO ]',
'ALARM_STATE/U1' => '1',
'WAFER/U4' => '1',
'VI_KLARF_MAP/A' => '/test/klarf.map',
'KLARF_STEPID/A' => 'StepID',
'KLARF_DEVICEID/A' => 'DeviceID',
'KLARF_EQUIPMENTID/A' => 'EquipmentID',
'KLARF_SETUP_ID/A' => 'SetupID',
'RULE_ID/U4' => '1234',
'RULE_FORMULA_EXPRESSION/A' => 'a < b && c > d',
'RULE_FORMULA_TEXT/A' => '1 < 0 && 2 > 3',
'RULE_FORMULA_RESULT/A' => 'FAIL',
'TIMESTAMP/A' => '10-Nov-2020 09:10:11 99999999',
)