操纵大量字符串
Manipulate massive array of strings
我从我们的一个供应商那里收到了一张包含 12K+ 图像的 DVD。在将它们放到我们的网络服务器上之前,我需要调整大小、重命名并复制它们。
为此,我正在编写一个 PHP cli 程序。
看来我有点坚持了...
所有文件都符合特定模式。
复制和重命名不是问题,字符串的操作才是问题。
为了简化示例代码:假设我有一个包含字符串的数组,我想将它们放入一个新数组中。
原始数组如下所示:
$names = array (
'FIX1_VARA_000.1111_FIX2',
'FIX1_VARB_000.1111.2_FIX2',
'FIX1_VARB_222.2582_FIX2',
'FIX1_VARC_555.8794_FIX2',
'FIX1_VARD_111.0X00(2-5)_FIX2',
'FIX1_VARA_112.01XX(09-13)_FIX2',
'FIX1_VARB_444.XXX1(203-207).2_FIX2'
);
此数组中的每个字符串分别在前面以相同的固定部分开始,在结尾 FIX1 和 FIX2 中以相同的固定部分结束。
在 FIX1 之后总是有一个下划线,后面跟着一个可变部分,然后是一个下划线。我对固定部分或可变部分不感兴趣。所以我把它全部剪掉了。
剩下的字符串可以是以下两种类型:
如果它只包含数字和点:那么它是一个有效的字符串,我把它
在 $clean 数组中。例如:000.1111 或 000.111.2
如果字符串中不仅有数字和点,那么它总是有几个 X 和一个开括号,一个用数字和一个 - 括起来。
喜欢 444.XXX1(203-207).2
括号中的数字组成一个数列,这个数列中的每个数都需要替换X。应该放在 $clean 数组中的字符串是:
444.2031.2
444.2041.2
444.2051.2
444.2061.2
444.2071.2
这是我正在努力解决的部分。
$clean = array();
foreach ($names as $name){
$item = trim(strstr(str_replace(array('FIX1_', '_FIX2'),'',$name), '_'),'_');
// $item get the values:
/*
* 000.1111,
* 000.1111.2,
* 222.2582,
* 555.8794,
* 111.0X00(2-5),
* 112.01XX(09-13),
* 444.XXX1(203-207).2
*
*/
// IF an item has no X in it, it can be put in the $clean array
if (strpos($item,'X') === false){
//this is true for the first 4 array values in the example
$clean[] = $item;
}
else {
//this is for the last 3 array values in the example
$b = strpos($item,'(');
$e = strpos($item,')');
$sequence = substr($item,$b,$e-$b+1);
$item = str_replace($sequence,'',$item);
/* This is the part were I'm stuck */
/* ------------------------------- */
/* it should get the values in the sequence variable and iterate over them:
*
* So for $names[5] ('FIX1_VARA_112.01XX(09-13)_FIX2') I want the folowing values entered into the $clean array:
* Value of $sequence = '(09-13)'
*
* 112.0109
* 112.0110
* 112.0111
* 112.0112
* 112.0113
*
*/
}
}
//NOW ECHO ALL VALUES IN $clean:
foreach ($clean as $c){
echo $c . "\n";
}
最终输出应该是:
000.1111
000.1111.2
222.2582
555.8794
111.0200
111.0300
111.0400
111.0500
112.0109
112.0110
112.0111
112.0112
112.0113
444.2031.2
444.2041.2
444.2051.2
444.2061.2
444.2071.2
任何有关 "Here I'm stuck" 部分的帮助将不胜感激。
首先,我假设你所有的文件都有有效的模式,所以没有文件有问题,否则,只需添加安全条件...
在 $sequence
中,您得到 (09-13)
。
要使用数字,您必须删除 (
和 )
,因此创建另一个变量:
$range = substr($item,$b,$e-$b+1);
// you get '09-13'
那么你需要拆分它:
list($min, $max) = explode("-",$range);
// $min = '09', $max = '13'
$nbDigits = strlen($max);
// $nbDigits = 2
然后你需要从最小值到最大值的所有数字:
$numbersList = array();
$min = (int)$min; // $min becomes 9, instead of '09'
$max = (int)$max;
for($i=(int)$min; $i<=(int)$max; $i++) {
// set a number, including leading zeros
$numbersList[] = str_pad($i, $nbDigits, '0', STR_PAD_LEFT);
}
然后你必须用这些数字生成文件名:
$xPlace = strpos($item,'X');
foreach($numbersList as $number) {
$filename = $item;
for($i=0; $i<$nbDigits; $i++) {
// replacing one digit at a time, to replace each 'X'
$filename[$xPlace+$i] = $number[$i];
}
$clean[] = $filename;
}
它应该做一些工作,可能会有一些错误,但这是一个好的开始,试一试:)
就像@stdob-- 提到的那样,正则表达式确实是您想要的。这是代码的工作版本:
$names = array (
'FIX1_VARA_000.1111_FIX2',
'FIX1_VARB_000.1111.2_FIX2',
'FIX1_VARB_222.2582_FIX2',
'FIX1_VARC_555.8794_FIX2',
'FIX1_VARD_111.0X00(2-5)_FIX2',
'FIX1_VARA_112.01XX(09-13)_FIX2',
'FIX1_VARB_444.XXX1(203-207).2_FIX2'
);
$clean = array();
foreach ($names as $name){
$item = trim(strstr(str_replace(array('FIX1_', '_FIX2'),'',$name), '_'),'_');
// $item get the values:
/*
* 000.1111,
* 000.1111.2,
* 222.2582,
* 555.8794,
* 111.0X00(2-5),
* 112.01XX(09-13),
* 444.XXX1(203-207).2
*
*/
// IF an item has no X in it, it can be put in the $clean array
if (strpos($item,'X') === false){
//this is true for the first 4 array values in the example
$clean[] = $item;
}
else {
// Initialize the empty matches array (I prefer [] to array(), but pick your poison)
$matches = [];
// Check out: https://www.regex101.com/r/qG4jS4/1 to see visually how this works (also, regex101.com is just rad)
// This uses capture groups, which get stored in the $matches array.
preg_match('/\((\d*)-(\d*)\)/', $item, $matches);
// Now we've got the array of values that we want to have in our clean array
$range = range($matches[1], $matches[2]);
// Since preg_match has our parenthesis and digits grabbed for us, get rid of those from the string
$item = str_replace($matches[0],'',$item);
// Truly regrettable variable names, but you get the idea!
foreach($range as $number){
// Here's where it gets ugly. You're wanting the numbers to work like strings (have consistent length
// like 09 and 13) but also work like numbers (when you create a sequence of numbers). That kind of
// thinking begets hackery. This probably isn't your fault, but it seems helpful to point out.
// Anyways, we can use the number of X's in the string to figure out how many characters we ought
// to be adding. This is important because otherwise we'll end up with 112.019 instead of 112.0109.
// PHP casts that '09' to (int) 9 when we run the range() function, so we lose the leading zero.
$xCount = substr_count($item, 'X');
if($xCount > strlen($number)){
// This function adds a given number ($xCount, in our case) of a character ('0') to
// the end of a string (unless it's given the STR_PAD_LEFT flag, in which case it adds
// the padding to the left side)
$number = str_pad($number, $xCount, '0', STR_PAD_LEFT);
}
// With a quick cheat by padding an empty string with the same number of X's we counted earlier...
$xString = str_pad('', $xCount, 'X');
// Now we can add the fixed string into the clean array.
$clean[] = str_replace($xString, $number, $item);
}
}
}
// I also happen to prefer var_dump to echo, but again, your mileage may vary.
var_dump($clean);
它输出:
array (size=18)
0 => string '000.1111' (length=8)
1 => string '000.1111.2' (length=10)
2 => string '222.2582' (length=8)
3 => string '555.8794' (length=8)
4 => string '111.0200' (length=8)
5 => string '111.0300' (length=8)
6 => string '111.0400' (length=8)
7 => string '111.0500' (length=8)
8 => string '112.0109' (length=8)
9 => string '112.0110' (length=8)
10 => string '112.0111' (length=8)
11 => string '112.0112' (length=8)
12 => string '112.0113' (length=8)
13 => string '444.2031.2' (length=10)
14 => string '444.2041.2' (length=10)
15 => string '444.2051.2' (length=10)
16 => string '444.2061.2' (length=10)
17 => string '444.2071.2' (length=10)
--编辑--删除了我关于 strpos
和 ==
的警告,看起来有人已经在评论中指出了这一点。
我从我们的一个供应商那里收到了一张包含 12K+ 图像的 DVD。在将它们放到我们的网络服务器上之前,我需要调整大小、重命名并复制它们。 为此,我正在编写一个 PHP cli 程序。 看来我有点坚持了...
所有文件都符合特定模式。
复制和重命名不是问题,字符串的操作才是问题。
为了简化示例代码:假设我有一个包含字符串的数组,我想将它们放入一个新数组中。
原始数组如下所示:
$names = array (
'FIX1_VARA_000.1111_FIX2',
'FIX1_VARB_000.1111.2_FIX2',
'FIX1_VARB_222.2582_FIX2',
'FIX1_VARC_555.8794_FIX2',
'FIX1_VARD_111.0X00(2-5)_FIX2',
'FIX1_VARA_112.01XX(09-13)_FIX2',
'FIX1_VARB_444.XXX1(203-207).2_FIX2'
);
此数组中的每个字符串分别在前面以相同的固定部分开始,在结尾 FIX1 和 FIX2 中以相同的固定部分结束。 在 FIX1 之后总是有一个下划线,后面跟着一个可变部分,然后是一个下划线。我对固定部分或可变部分不感兴趣。所以我把它全部剪掉了。
剩下的字符串可以是以下两种类型: 如果它只包含数字和点:那么它是一个有效的字符串,我把它 在 $clean 数组中。例如:000.1111 或 000.111.2 如果字符串中不仅有数字和点,那么它总是有几个 X 和一个开括号,一个用数字和一个 - 括起来。 喜欢 444.XXX1(203-207).2
括号中的数字组成一个数列,这个数列中的每个数都需要替换X。应该放在 $clean 数组中的字符串是:
444.2031.2
444.2041.2
444.2051.2
444.2061.2
444.2071.2
这是我正在努力解决的部分。
$clean = array();
foreach ($names as $name){
$item = trim(strstr(str_replace(array('FIX1_', '_FIX2'),'',$name), '_'),'_');
// $item get the values:
/*
* 000.1111,
* 000.1111.2,
* 222.2582,
* 555.8794,
* 111.0X00(2-5),
* 112.01XX(09-13),
* 444.XXX1(203-207).2
*
*/
// IF an item has no X in it, it can be put in the $clean array
if (strpos($item,'X') === false){
//this is true for the first 4 array values in the example
$clean[] = $item;
}
else {
//this is for the last 3 array values in the example
$b = strpos($item,'(');
$e = strpos($item,')');
$sequence = substr($item,$b,$e-$b+1);
$item = str_replace($sequence,'',$item);
/* This is the part were I'm stuck */
/* ------------------------------- */
/* it should get the values in the sequence variable and iterate over them:
*
* So for $names[5] ('FIX1_VARA_112.01XX(09-13)_FIX2') I want the folowing values entered into the $clean array:
* Value of $sequence = '(09-13)'
*
* 112.0109
* 112.0110
* 112.0111
* 112.0112
* 112.0113
*
*/
}
}
//NOW ECHO ALL VALUES IN $clean:
foreach ($clean as $c){
echo $c . "\n";
}
最终输出应该是:
000.1111
000.1111.2
222.2582
555.8794
111.0200
111.0300
111.0400
111.0500
112.0109
112.0110
112.0111
112.0112
112.0113
444.2031.2
444.2041.2
444.2051.2
444.2061.2
444.2071.2
任何有关 "Here I'm stuck" 部分的帮助将不胜感激。
首先,我假设你所有的文件都有有效的模式,所以没有文件有问题,否则,只需添加安全条件...
在 $sequence
中,您得到 (09-13)
。
要使用数字,您必须删除 (
和 )
,因此创建另一个变量:
$range = substr($item,$b,$e-$b+1);
// you get '09-13'
那么你需要拆分它:
list($min, $max) = explode("-",$range);
// $min = '09', $max = '13'
$nbDigits = strlen($max);
// $nbDigits = 2
然后你需要从最小值到最大值的所有数字:
$numbersList = array();
$min = (int)$min; // $min becomes 9, instead of '09'
$max = (int)$max;
for($i=(int)$min; $i<=(int)$max; $i++) {
// set a number, including leading zeros
$numbersList[] = str_pad($i, $nbDigits, '0', STR_PAD_LEFT);
}
然后你必须用这些数字生成文件名:
$xPlace = strpos($item,'X');
foreach($numbersList as $number) {
$filename = $item;
for($i=0; $i<$nbDigits; $i++) {
// replacing one digit at a time, to replace each 'X'
$filename[$xPlace+$i] = $number[$i];
}
$clean[] = $filename;
}
它应该做一些工作,可能会有一些错误,但这是一个好的开始,试一试:)
就像@stdob-- 提到的那样,正则表达式确实是您想要的。这是代码的工作版本:
$names = array (
'FIX1_VARA_000.1111_FIX2',
'FIX1_VARB_000.1111.2_FIX2',
'FIX1_VARB_222.2582_FIX2',
'FIX1_VARC_555.8794_FIX2',
'FIX1_VARD_111.0X00(2-5)_FIX2',
'FIX1_VARA_112.01XX(09-13)_FIX2',
'FIX1_VARB_444.XXX1(203-207).2_FIX2'
);
$clean = array();
foreach ($names as $name){
$item = trim(strstr(str_replace(array('FIX1_', '_FIX2'),'',$name), '_'),'_');
// $item get the values:
/*
* 000.1111,
* 000.1111.2,
* 222.2582,
* 555.8794,
* 111.0X00(2-5),
* 112.01XX(09-13),
* 444.XXX1(203-207).2
*
*/
// IF an item has no X in it, it can be put in the $clean array
if (strpos($item,'X') === false){
//this is true for the first 4 array values in the example
$clean[] = $item;
}
else {
// Initialize the empty matches array (I prefer [] to array(), but pick your poison)
$matches = [];
// Check out: https://www.regex101.com/r/qG4jS4/1 to see visually how this works (also, regex101.com is just rad)
// This uses capture groups, which get stored in the $matches array.
preg_match('/\((\d*)-(\d*)\)/', $item, $matches);
// Now we've got the array of values that we want to have in our clean array
$range = range($matches[1], $matches[2]);
// Since preg_match has our parenthesis and digits grabbed for us, get rid of those from the string
$item = str_replace($matches[0],'',$item);
// Truly regrettable variable names, but you get the idea!
foreach($range as $number){
// Here's where it gets ugly. You're wanting the numbers to work like strings (have consistent length
// like 09 and 13) but also work like numbers (when you create a sequence of numbers). That kind of
// thinking begets hackery. This probably isn't your fault, but it seems helpful to point out.
// Anyways, we can use the number of X's in the string to figure out how many characters we ought
// to be adding. This is important because otherwise we'll end up with 112.019 instead of 112.0109.
// PHP casts that '09' to (int) 9 when we run the range() function, so we lose the leading zero.
$xCount = substr_count($item, 'X');
if($xCount > strlen($number)){
// This function adds a given number ($xCount, in our case) of a character ('0') to
// the end of a string (unless it's given the STR_PAD_LEFT flag, in which case it adds
// the padding to the left side)
$number = str_pad($number, $xCount, '0', STR_PAD_LEFT);
}
// With a quick cheat by padding an empty string with the same number of X's we counted earlier...
$xString = str_pad('', $xCount, 'X');
// Now we can add the fixed string into the clean array.
$clean[] = str_replace($xString, $number, $item);
}
}
}
// I also happen to prefer var_dump to echo, but again, your mileage may vary.
var_dump($clean);
它输出:
array (size=18)
0 => string '000.1111' (length=8)
1 => string '000.1111.2' (length=10)
2 => string '222.2582' (length=8)
3 => string '555.8794' (length=8)
4 => string '111.0200' (length=8)
5 => string '111.0300' (length=8)
6 => string '111.0400' (length=8)
7 => string '111.0500' (length=8)
8 => string '112.0109' (length=8)
9 => string '112.0110' (length=8)
10 => string '112.0111' (length=8)
11 => string '112.0112' (length=8)
12 => string '112.0113' (length=8)
13 => string '444.2031.2' (length=10)
14 => string '444.2041.2' (length=10)
15 => string '444.2051.2' (length=10)
16 => string '444.2061.2' (length=10)
17 => string '444.2071.2' (length=10)
--编辑--删除了我关于 strpos
和 ==
的警告,看起来有人已经在评论中指出了这一点。