寻找一种更好的方法来处理大型 XML 文件,并使用 Laravel 将部分导入 MySQL 数据库
Looking for a better way to process large XML file with Laravel to import portions into MySQL database
我有一个相当大的 (~65MB) XML 文件,有近 100 万行,我正在使用 Laravel 来解析和处理内容,然后将新数据插入 MySQL数据库。
这是我定期更新的音乐库,我使用的软件会生成这个 XML 文件。
代码本身工作正常,但需要很长时间。处理大约 50,000 条记录需要 30 多分钟!我正在寻找一种方法来加快速度。如果有帮助,我正在 Ubuntu 服务器 运行 Apache 上使用 Laravel 6。
我基本上阅读了 XML 文件,提取我需要的内容,稍微清理数据,然后将其插入我的数据库。这是我的代码的相关部分。谁能建议一种更好的方法来提高效率?我不是 Laravel 专家,所以任何反馈都很好。
use App\Music;
Music::truncate(); //clear existing data
\DB::disableQueryLog(); //helps speed up queries by disabling log
ini_set('memory_limit', '512M'); //boost memory limit
ini_set('max_execution_time', '90'); //try to prevent time-out
//list of files to import (I sometimes have more than 1):
$files = [
"path/to/my/database.xml",
"path/to/my/database2.xml"
];
$video_files = ["mp4","mov","avi","flv"]; //used to identify music videos
foreach($files as $file){
$reader = new XMLReader();
if(!$reader->open($file)){
die("Failed to open xml file!");
}
$doc = new DOMDocument;
while ($reader->read() && $reader->name !== 'Song');
while ($reader->name === 'Song'){
$song = simplexml_import_dom($doc->importNode($reader->expand(), true));
if(strpos($song['FilePath'], 'netsearch://') === false && strpos($song['FilePath'], ':/DJ Tools/') === false){
$music = new Music; //create new instance of model
foreach ($song->Tags as $tag){
if(($tag['Author'] != "" || $tag['Title'] != "") && ($tag['Grouping'] != "Studio")){
$insert = true; //insert record or not
foreach($song->Infos as $info){
$music->length = gmdate("H:i:s",floatval($info['SongLength']));
$music->file_date = date("Y-m-d",intval($info['FirstSeen']));
}
if($insert == true){
$music->bpm = ($tag['Bpm'] > 0) ? round(1 / floatval($tag['Bpm']) * 60) : null; //to calculate use 1/bpm * 60 and round
$music->file_path = $song['FilePath'];
$music->artist = trim($tag['Author']);
$music->title = trim($tag['Title']);
$music->remix = trim($tag['Remix']);
$music->album = trim($tag['Album']);
$music->genre = trim($tag['Genre']);
$music->filetype = substr($song['FilePath'],-3);
$music->year = ($tag['Year'] > 0) ? intval($tag['Year']) : null;
//set the kind (audio, video or karaoke):
if(strpos($song['FilePath'], '/Karaoke/') !== false){
$kind = "karaoke";
}
elseif(in_array(strtolower(substr($song['FilePath'],-3)),$video_files)){
$kind = "video";
}
else{
$kind = "audio";
}
$music->kind = $kind;
$music->save(); //adds song to mysql
}//end if insert true
} //end has title or author + non-studio
} //end for each tag
} //end not a netsearch file
$reader->next('Song');
} //end while
$reader->close();
} //end for each files
XML 文件的结构如下所示:
<Song FilePath="D:/Path/To/Music/Michael Jackson/The Ultimate Collection/2-03 Thriller.mp3" FileSize="12974048">
<Tags Author="Michael Jackson" Title="Thriller" Genre="Pop" Album="The Ultimate Collection" Composer="Rod Temperton" TrackNumber="3/11" Grouping="Halloween" Year="2004" Bpm="0.504202" Key="G#m" Flag="1" />
<Infos SongLength="356.960363" FirstSeen="1501430558" Bitrate="282" Cover="1" />
<Comment>Great for parties</Comment>
<Scan Version="801" Bpm="0.506077" AltBpm="0.379569" Volume="1.101067" Key="G#m" Flag="32768" />
<Poi Pos="17.171541" Type="beatgrid" />
<Poi Pos="0.634195" Type="automix" Point="realStart" />
<Poi Pos="356.051882" Type="automix" Point="realEnd" />
<Poi Pos="17.30" Type="automix" Point="fadeStart" />
<Poi Pos="352.750" Type="automix" Point="fadeEnd" />
<Poi Pos="41.695057" Type="automix" Point="cutStart" />
<Poi Pos="343.074830" Type="automix" Point="cutEnd" />
<Poi Pos="44.289569" Type="automix" Point="tempoStart" />
<Poi Pos="298.550091" Type="automix" Point="tempoEnd" />
</Song>
<Song FilePath="D:/Path/To/Music/Black Sabbath/We Sold Our Soul for Rock 'n' Roll/09 Sweet Leaf.m4a" FileSize="10799807">
<Tags Author="Black Sabbath" Title="Sweet Leaf" Genre="Heavy Metal" Album="We Sold Our Soul For Rock 'n' Roll" Composer="Geezer Butler" TrackNumber="9/14" Year="1987" Key="Am" Flag="1" />
<Infos SongLength="306.456961" FirstSeen="1501430556" Bitrate="259" Cover="1" />
<Scan Version="801" Bpm="0.411757" AltBpm="0.617438" Volume="0.680230" Key="Am" Flag="32768" />
<Poi Pos="1.753537" Type="beatgrid" />
<Poi Pos="0.220590" Type="automix" Point="realStart" />
<Poi Pos="301.146848" Type="automix" Point="realEnd" />
<Poi Pos="0.30" Type="automix" Point="fadeStart" />
<Poi Pos="291.50" Type="automix" Point="fadeEnd" />
</Song>
...tens of thousands of more songs, nearly 1 million lines
如果您创建一个实例并在每个循环中插入一条记录,它会为每个循环创建一个 Music 实例和 1 个插入查询,这样效率不高。如果先将数据保存到一个数组,分块,然后再保存到数据库呢?
比如你有1000条音乐数据,如果你为每个循环创建音乐实例,它会创建1000次音乐实例和1000次数据库插入操作。但是,如果您先将音乐数据保存到一个数组中,然后将其分块为 20 个数组(每个数组包含 50 个音乐数据),它将仅执行 20 次插入操作。有点更有效率,不是吗?
因此,您的代码将如下所示:
<?php
use App\Music;
Music::truncate(); //clear existing data
\DB::disableQueryLog(); //helps speed up queries by disabling log
ini_set('memory_limit', '512M'); //boost memory limit
ini_set('max_execution_time', '90'); //try to prevent time-out
//list of files to import (I sometimes have more than 1):
$files = [
"path/to/my/database.xml",
"path/to/my/database2.xml"
];
$video_files = ["mp4","mov","avi","flv"]; //used to identify music videos
//declare array of music here
$arrayOfMusic = [];
foreach($files as $file){
$reader = new XMLReader();
if(!$reader->open($file)){
die("Failed to open xml file!");
}
$doc = new DOMDocument;
while ($reader->read() && $reader->name !== 'Song');
while ($reader->name === 'Song') {
$song = simplexml_import_dom($doc->importNode($reader->expand(), true));
if(strpos($song['FilePath'], 'netsearch://') === false && strpos($song['FilePath'], ':/DJ Tools/') === false) {
foreach ($song->Tags as $tag) {
if (($tag['Author'] != "" || $tag['Title'] != "") && ($tag['Grouping'] != "Studio")) {
$insert = true; //insert record or not
foreach ($song->Infos as $info) {
$length = gmdate("H:i:s",floatval($info['SongLength']));
$file_date = date("Y-m-d",intval($info['FirstSeen']));
}
if($insert == true){
//set the kind (audio, video or karaoke):
if(strpos($song['FilePath'], '/Karaoke/') !== false){
$kind = "karaoke";
} elseif (in_array(strtolower(substr($song['FilePath'],-3)),$video_files)) {
$kind = "video";
} else{
$kind = "audio";
}
//Fill array of music
$arrayOfMusic[] = [
'bpm' => ($tag['Bpm'] > 0) ? round(1 / floatval($tag['Bpm']) * 60) : null, //to calculate use 1/bpm * 60 and round
'file_path' => $song['FilePath'],
'artist' => trim($tag['Author']),
'length' => $length ?? '0', //set $length to 0 if it cannot be found
'file_date' => $file_date ?? '0', //set $file_date to 0 if it cannot be found
'title' => trim($tag['Title']),
'remix' => trim($tag['Remix']),
'album' => trim($tag['Album']),
'genre' => trim($tag['Genre']),
'filetype' => substr($song['FilePath'],-3),
'year' => ($tag['Year'] > 0) ? intval($tag['Year']) : null;
'kind' => $kind,
];
}//end if insert true
} //end has title or author + non-studio
} //end for each tag
} //end not a netsearch file
$reader->next('Song');
} //end while
$reader->close();
} //end for each files
//Chunk the array if $arrayOfMusic is not null
if (!empty($arrayOfMusic)) {
$arrayOfMusicChunked = array_chunk($arrayOfMusic, 30); //Chunk large array, in this example, chunked array will contains 30 items
//loop the array and insert it use insert() function
foreach ($arrayOfMusicChunked as $arrayOfMusicToSave) {
Music::insert($arrayOfMusicToSave);
}
}
来源
我有一个相当大的 (~65MB) XML 文件,有近 100 万行,我正在使用 Laravel 来解析和处理内容,然后将新数据插入 MySQL数据库。
这是我定期更新的音乐库,我使用的软件会生成这个 XML 文件。
代码本身工作正常,但需要很长时间。处理大约 50,000 条记录需要 30 多分钟!我正在寻找一种方法来加快速度。如果有帮助,我正在 Ubuntu 服务器 运行 Apache 上使用 Laravel 6。
我基本上阅读了 XML 文件,提取我需要的内容,稍微清理数据,然后将其插入我的数据库。这是我的代码的相关部分。谁能建议一种更好的方法来提高效率?我不是 Laravel 专家,所以任何反馈都很好。
use App\Music;
Music::truncate(); //clear existing data
\DB::disableQueryLog(); //helps speed up queries by disabling log
ini_set('memory_limit', '512M'); //boost memory limit
ini_set('max_execution_time', '90'); //try to prevent time-out
//list of files to import (I sometimes have more than 1):
$files = [
"path/to/my/database.xml",
"path/to/my/database2.xml"
];
$video_files = ["mp4","mov","avi","flv"]; //used to identify music videos
foreach($files as $file){
$reader = new XMLReader();
if(!$reader->open($file)){
die("Failed to open xml file!");
}
$doc = new DOMDocument;
while ($reader->read() && $reader->name !== 'Song');
while ($reader->name === 'Song'){
$song = simplexml_import_dom($doc->importNode($reader->expand(), true));
if(strpos($song['FilePath'], 'netsearch://') === false && strpos($song['FilePath'], ':/DJ Tools/') === false){
$music = new Music; //create new instance of model
foreach ($song->Tags as $tag){
if(($tag['Author'] != "" || $tag['Title'] != "") && ($tag['Grouping'] != "Studio")){
$insert = true; //insert record or not
foreach($song->Infos as $info){
$music->length = gmdate("H:i:s",floatval($info['SongLength']));
$music->file_date = date("Y-m-d",intval($info['FirstSeen']));
}
if($insert == true){
$music->bpm = ($tag['Bpm'] > 0) ? round(1 / floatval($tag['Bpm']) * 60) : null; //to calculate use 1/bpm * 60 and round
$music->file_path = $song['FilePath'];
$music->artist = trim($tag['Author']);
$music->title = trim($tag['Title']);
$music->remix = trim($tag['Remix']);
$music->album = trim($tag['Album']);
$music->genre = trim($tag['Genre']);
$music->filetype = substr($song['FilePath'],-3);
$music->year = ($tag['Year'] > 0) ? intval($tag['Year']) : null;
//set the kind (audio, video or karaoke):
if(strpos($song['FilePath'], '/Karaoke/') !== false){
$kind = "karaoke";
}
elseif(in_array(strtolower(substr($song['FilePath'],-3)),$video_files)){
$kind = "video";
}
else{
$kind = "audio";
}
$music->kind = $kind;
$music->save(); //adds song to mysql
}//end if insert true
} //end has title or author + non-studio
} //end for each tag
} //end not a netsearch file
$reader->next('Song');
} //end while
$reader->close();
} //end for each files
XML 文件的结构如下所示:
<Song FilePath="D:/Path/To/Music/Michael Jackson/The Ultimate Collection/2-03 Thriller.mp3" FileSize="12974048">
<Tags Author="Michael Jackson" Title="Thriller" Genre="Pop" Album="The Ultimate Collection" Composer="Rod Temperton" TrackNumber="3/11" Grouping="Halloween" Year="2004" Bpm="0.504202" Key="G#m" Flag="1" />
<Infos SongLength="356.960363" FirstSeen="1501430558" Bitrate="282" Cover="1" />
<Comment>Great for parties</Comment>
<Scan Version="801" Bpm="0.506077" AltBpm="0.379569" Volume="1.101067" Key="G#m" Flag="32768" />
<Poi Pos="17.171541" Type="beatgrid" />
<Poi Pos="0.634195" Type="automix" Point="realStart" />
<Poi Pos="356.051882" Type="automix" Point="realEnd" />
<Poi Pos="17.30" Type="automix" Point="fadeStart" />
<Poi Pos="352.750" Type="automix" Point="fadeEnd" />
<Poi Pos="41.695057" Type="automix" Point="cutStart" />
<Poi Pos="343.074830" Type="automix" Point="cutEnd" />
<Poi Pos="44.289569" Type="automix" Point="tempoStart" />
<Poi Pos="298.550091" Type="automix" Point="tempoEnd" />
</Song>
<Song FilePath="D:/Path/To/Music/Black Sabbath/We Sold Our Soul for Rock 'n' Roll/09 Sweet Leaf.m4a" FileSize="10799807">
<Tags Author="Black Sabbath" Title="Sweet Leaf" Genre="Heavy Metal" Album="We Sold Our Soul For Rock 'n' Roll" Composer="Geezer Butler" TrackNumber="9/14" Year="1987" Key="Am" Flag="1" />
<Infos SongLength="306.456961" FirstSeen="1501430556" Bitrate="259" Cover="1" />
<Scan Version="801" Bpm="0.411757" AltBpm="0.617438" Volume="0.680230" Key="Am" Flag="32768" />
<Poi Pos="1.753537" Type="beatgrid" />
<Poi Pos="0.220590" Type="automix" Point="realStart" />
<Poi Pos="301.146848" Type="automix" Point="realEnd" />
<Poi Pos="0.30" Type="automix" Point="fadeStart" />
<Poi Pos="291.50" Type="automix" Point="fadeEnd" />
</Song>
...tens of thousands of more songs, nearly 1 million lines
如果您创建一个实例并在每个循环中插入一条记录,它会为每个循环创建一个 Music 实例和 1 个插入查询,这样效率不高。如果先将数据保存到一个数组,分块,然后再保存到数据库呢?
比如你有1000条音乐数据,如果你为每个循环创建音乐实例,它会创建1000次音乐实例和1000次数据库插入操作。但是,如果您先将音乐数据保存到一个数组中,然后将其分块为 20 个数组(每个数组包含 50 个音乐数据),它将仅执行 20 次插入操作。有点更有效率,不是吗?
因此,您的代码将如下所示:
<?php
use App\Music;
Music::truncate(); //clear existing data
\DB::disableQueryLog(); //helps speed up queries by disabling log
ini_set('memory_limit', '512M'); //boost memory limit
ini_set('max_execution_time', '90'); //try to prevent time-out
//list of files to import (I sometimes have more than 1):
$files = [
"path/to/my/database.xml",
"path/to/my/database2.xml"
];
$video_files = ["mp4","mov","avi","flv"]; //used to identify music videos
//declare array of music here
$arrayOfMusic = [];
foreach($files as $file){
$reader = new XMLReader();
if(!$reader->open($file)){
die("Failed to open xml file!");
}
$doc = new DOMDocument;
while ($reader->read() && $reader->name !== 'Song');
while ($reader->name === 'Song') {
$song = simplexml_import_dom($doc->importNode($reader->expand(), true));
if(strpos($song['FilePath'], 'netsearch://') === false && strpos($song['FilePath'], ':/DJ Tools/') === false) {
foreach ($song->Tags as $tag) {
if (($tag['Author'] != "" || $tag['Title'] != "") && ($tag['Grouping'] != "Studio")) {
$insert = true; //insert record or not
foreach ($song->Infos as $info) {
$length = gmdate("H:i:s",floatval($info['SongLength']));
$file_date = date("Y-m-d",intval($info['FirstSeen']));
}
if($insert == true){
//set the kind (audio, video or karaoke):
if(strpos($song['FilePath'], '/Karaoke/') !== false){
$kind = "karaoke";
} elseif (in_array(strtolower(substr($song['FilePath'],-3)),$video_files)) {
$kind = "video";
} else{
$kind = "audio";
}
//Fill array of music
$arrayOfMusic[] = [
'bpm' => ($tag['Bpm'] > 0) ? round(1 / floatval($tag['Bpm']) * 60) : null, //to calculate use 1/bpm * 60 and round
'file_path' => $song['FilePath'],
'artist' => trim($tag['Author']),
'length' => $length ?? '0', //set $length to 0 if it cannot be found
'file_date' => $file_date ?? '0', //set $file_date to 0 if it cannot be found
'title' => trim($tag['Title']),
'remix' => trim($tag['Remix']),
'album' => trim($tag['Album']),
'genre' => trim($tag['Genre']),
'filetype' => substr($song['FilePath'],-3),
'year' => ($tag['Year'] > 0) ? intval($tag['Year']) : null;
'kind' => $kind,
];
}//end if insert true
} //end has title or author + non-studio
} //end for each tag
} //end not a netsearch file
$reader->next('Song');
} //end while
$reader->close();
} //end for each files
//Chunk the array if $arrayOfMusic is not null
if (!empty($arrayOfMusic)) {
$arrayOfMusicChunked = array_chunk($arrayOfMusic, 30); //Chunk large array, in this example, chunked array will contains 30 items
//loop the array and insert it use insert() function
foreach ($arrayOfMusicChunked as $arrayOfMusicToSave) {
Music::insert($arrayOfMusicToSave);
}
}
来源