去除以特定单词开头并以“.”结尾的子字符串。

Strip out a substring beginning with a specific word and ending with "."

我需要从文本中间删掉一个句子或更好的关于产品成分的信息。 背后的逻辑总是一样的。 以 "Ingredients" 开头,以点“.”结尾。

例如(这是我的$prodDesc):

Coca Cola is the most famous soft drink in America.
Ingredients: Carbon water, Sugar (sucrose or high-fructose corn syrup (HFCS) depending on country of origin), Caramel colour (E150d), Phosphoric Acid, Caffeine (34 mg/12 fl oz), natural Flavours. Nutrition Facts: 1 Serving Per Container - Serving Size: 1 Can. Total Fat 0g Sodium 45mg Total Carbohydrate 39g Total Sugars (Includes 39g Added Sugars) Cholesterol 0mg Protein 0g Vitamin D 0g Calcium 0g Iron 0g Potassium 0g

到目前为止,我尝试使用 strpros,但事实上它在文本中间,我得到了从 "Ingredients" 到结尾的所有内容。

我只需要这个作为输出:

$prodIngredientsData = "Ingredients: Carbon water, Sugar (sucrose or high-fructose corn syrup (HFCS) depending on country of origin), Caramel colour (E150d), Phosphoric Acid, Caffeine (34 mg/12 fl oz), natural Flavours."

鉴于 $prodDesc 是上面的描述,我的尝试是:

$searchstring = $prodDesc;
$prodIngredientsData = false;
if (strpos($searchstring, "Ingredients") !== false)
{
    $sd_array = explode("Ingredients", $searchstring);
    $sd = end($sd_array);
    $prodIngredientsData = "Ingredients " . $sd;
}
else {
    $prodIngredientsData = false;
}

但如前所述,我从 "Ingredients" 开始了解所有内容,直到描述结束。但它应该在 "Ingredients... ...natural Flavours."

示例中的第一个句号处停止

试试 preg_match:

$prodIngredientsData = "Ingredients: Carbon water, Sugar (sucrose or high-fructose corn syrup (HFCS) depending on country of origin), Caramel colour (E150d), Phosphoric Acid, Caffeine (34 mg/12 fl oz), natural Flavours."
preg_match('/(Ingredients:([^.]+))/', $prodIngredientsData, $matches);

echo $matches[0];

输出:

Ingredients: Carbon water, Sugar (sucrose or high-fructose corn syrup (HFCS) depending on country of origin), Caramel colour (E150d), Phosphoric Acid, Caffeine (34 mg/12 fl oz), natural Flavou rs

你需要正则表达式。就像是 preg_match('/Ingredients.*?\./', $string, $match);

您可以再次使用 strpos 来找到句点,并缩短字符串。

$searchstring = $prodDesc;
$prodIngredientsData = false;
$ingredientsPos = strpos($searchstring, "Ingredients");
if ($ingredientsPos !== false) {
    $prodIngredientsData = substr($searchstring, $ingredientsPos);
    $stopPos = strpos($prodIngredientsData, ".");
    if ($stopPos !== false) {
        $prodIngredientsData = substr(
                    $prodIngredientsData,
                    0,
                    $stopPos + 1);
    }
}
echo $prodIngredientsData;

你快到了。 $prodIngredientsData存储"Ingredients "之后的字符串。因此,我们需要提取 "Ingredients " 和第一个 "."

之间的字符串
if (strpos($searchstring, "Ingredients") !== false)
{
    $sd_array = explode("Ingredients", $searchstring);
    $sd = end($sd_array);
    $prodIngredientsData = "Ingredients " . $sd;
    $end_pos   = strpos($prodIngredientsData, ".");
    $prodIngredientsData = substr($prodIngredientsData , 0, $end_pos+1);

} else {
    $prodIngredientsData = false;
}

您可以使用 preg_replace 完成此类任务。

$strippedString = preg_replace('/Ingredients:[^\.]+\./', '', $prodIngredientsData);

正则表达式 Ingredients:[^\.]+\. 表示一个字符串(基本上放在 $prodIngredientsData 中的任何地方)匹配(字面意思) Ingredients: 并且后跟任何字符集但点 [^\.] 至少出现一次 (+) 并以点 \.

结尾

请注意:如果配料在某处有一个点并继续,这基本上只会剥掉其中的一部分。

可以用str_pos搜索开头和结尾,保存中间的字符串,然后进行下一步搜索,直到结尾。检查 demo

$begin_offset = 0;
$result = [];
$string = ""
while(false !== ($begin_offset=strpos($string,"Ingredients",$begin_offset)) && false !== ($end_offset=strpos($string,".",$begin_offset))){
    $result[] = substr($string,$begin_offset,$end_offset-$begin_offset);
    $begin_offset = $end_offset;
}
var_dump($result);

演示结果,

array(2) {
  [0]=>
  string(195) "Ingredients: Carbon water, Sugar (sucrose or high-fructose corn syrup (HFCS) depending on country of origin), Caramel colour (E150d), Phosphoric Acid, Caffeine (34 mg/12 fl oz), natural Flavours."
  [1]=>
  string(77) "Ingredients: Carbon water, Sugar (sucrose or high-fructose corn syrup (HFCS)."
}