在文件中搜索多个字符串并输出数据
Searching a file for multiple strings and output the data
如何在 .tsv 文件中搜索与字符串的多个匹配项并将它们导出到数据库?
我想做的是在一个名为 mdata.tsv
(150 万行)的大文件中搜索从数组中给定的字符串。之后输出匹配的列数据。
当前代码是我卡住的地方:
<?php
$file = fopen("mdata.tsv","r"); //open file
$movies = glob('./uploads/Videos/*/*/*/*.mp4', GLOB_BRACE); //Find all the movies
$movID = array(); //Array for movies IDs
//Get XML and add the IDs to $movID()
foreach ($movies as $movie){
$pos = strrpos($movie, '/');
$xml = simplexml_load_file((substr($movie, 0, $pos + 1) .'movie.xml'));
array_push($movID, $xml->id);
}
//Loop through the TSV rows and search for the $tmdbID then print out the movies category.
foreach ($movID as $tmdbID) {
while(($row = fgetcsv($file, 0, "\t")) !== FALSE) {
fseek($file,0);
$myString = $row[0];
$b = strstr( $myString, $tmdbID );
//Dump out the row for the sake of clarity.
//var_dump($row);
$myString = $row[0];
if ($b == $tmdbID){
echo 'Match ' . $row[0] .' '. $row[8];
} // Displays movie ID and category
}
}
fclose($file);
?>
tsv 文件示例:
tt0043936 movie The Lawton Story The Lawton Story 0 1949 \N \N Drama,Family
tt0043937 short The Prize Pest The Prize Pest 0 1951 \N 7 Animation,Comedy,Family
tt0043938 movie The Prowler The Prowler 0 1951 \N 92 Drama,Film-Noir,Thriller
tt0043939 movie Przhevalsky Przhevalsky 0 1952 \N \N Biography,Drama
看起来您可以通过使用 in_array()
而不是嵌套循环来查看当前行是否在所需 ID 列表中来简化此代码。确保此功能有效所需的一项更改是您需要确保将字符串存储在 $movID
数组中。
$file = fopen("mdata.tsv","r"); //open file
$movies = glob('./uploads/Videos/*/*/*/*.mp4', GLOB_BRACE); //Find all the movies
$movID = array(); //Array for movies IDs
//Get XML and add the IDs to $movID()
foreach ($movies as $movie){
$pos = strrpos($movie, '/');
$xml = simplexml_load_file((substr($movie, 0, $pos + 1) .'movie.xml'));
// Store ID as string
$movID[] = (string) $xml->id;
}
while(($row = fgetcsv($file, 0, "\t")) !== FALSE) {
if ( in_array($row[0], $movID) ){
echo 'Match ' . $row[0] .' '. $row[8];
} // Displays movie ID and category
}
如何在 .tsv 文件中搜索与字符串的多个匹配项并将它们导出到数据库?
我想做的是在一个名为 mdata.tsv
(150 万行)的大文件中搜索从数组中给定的字符串。之后输出匹配的列数据。
当前代码是我卡住的地方:
<?php
$file = fopen("mdata.tsv","r"); //open file
$movies = glob('./uploads/Videos/*/*/*/*.mp4', GLOB_BRACE); //Find all the movies
$movID = array(); //Array for movies IDs
//Get XML and add the IDs to $movID()
foreach ($movies as $movie){
$pos = strrpos($movie, '/');
$xml = simplexml_load_file((substr($movie, 0, $pos + 1) .'movie.xml'));
array_push($movID, $xml->id);
}
//Loop through the TSV rows and search for the $tmdbID then print out the movies category.
foreach ($movID as $tmdbID) {
while(($row = fgetcsv($file, 0, "\t")) !== FALSE) {
fseek($file,0);
$myString = $row[0];
$b = strstr( $myString, $tmdbID );
//Dump out the row for the sake of clarity.
//var_dump($row);
$myString = $row[0];
if ($b == $tmdbID){
echo 'Match ' . $row[0] .' '. $row[8];
} // Displays movie ID and category
}
}
fclose($file);
?>
tsv 文件示例:
tt0043936 movie The Lawton Story The Lawton Story 0 1949 \N \N Drama,Family
tt0043937 short The Prize Pest The Prize Pest 0 1951 \N 7 Animation,Comedy,Family
tt0043938 movie The Prowler The Prowler 0 1951 \N 92 Drama,Film-Noir,Thriller
tt0043939 movie Przhevalsky Przhevalsky 0 1952 \N \N Biography,Drama
看起来您可以通过使用 in_array()
而不是嵌套循环来查看当前行是否在所需 ID 列表中来简化此代码。确保此功能有效所需的一项更改是您需要确保将字符串存储在 $movID
数组中。
$file = fopen("mdata.tsv","r"); //open file
$movies = glob('./uploads/Videos/*/*/*/*.mp4', GLOB_BRACE); //Find all the movies
$movID = array(); //Array for movies IDs
//Get XML and add the IDs to $movID()
foreach ($movies as $movie){
$pos = strrpos($movie, '/');
$xml = simplexml_load_file((substr($movie, 0, $pos + 1) .'movie.xml'));
// Store ID as string
$movID[] = (string) $xml->id;
}
while(($row = fgetcsv($file, 0, "\t")) !== FALSE) {
if ( in_array($row[0], $movID) ){
echo 'Match ' . $row[0] .' '. $row[8];
} // Displays movie ID and category
}