寻找使用 Laravel 处理大型 XML 文件以将部分导入 MySQL 数据库的更好方法

乔伊·R。

我有一个相当大(~65MB)的 XML 文件,有近 100 万行,我使用 Laravel 来解析和处理内容,然后将新数据插入 MySQL 数据库。

这是一个我定期更新的音乐库,我使用的软件会生成这个 XML 文件。
代码本身运行良好,但需要很长时间。超过 30 分钟处理约 50,000 条记录!我正在寻找一种方法来加快速度。如果有帮助,我将在运行 Apache 的 Ubuntu 服务器上使用 Laravel 6。

我基本上是读取 XML 文件,提取我需要的内容,稍微清理数据,然后将其插入到我的数据库中。这是我的代码的相关部分。任何人都可以提出更好的方法来提高效率吗?我不是 Laravel 专家,所以任何反馈都会很棒。

                use App\Music;
                Music::truncate(); //clear existing data
                \DB::disableQueryLog(); //helps speed up queries by disabling log
                ini_set('memory_limit', '512M'); //boost memory limit
                ini_set('max_execution_time', '90'); //try to prevent time-out

                //list of files to import (I sometimes have more than 1):
                $files = [                    
                    "path/to/my/database.xml",
                    "path/to/my/database2.xml"
                ];
                $video_files = ["mp4","mov","avi","flv"]; //used to identify music videos

                foreach($files as $file){
                    $reader = new XMLReader();
                    if(!$reader->open($file)){
                        die("Failed to open xml file!");
                    }
                    $doc = new DOMDocument;
                    while ($reader->read() && $reader->name !== 'Song');

                    while ($reader->name === 'Song'){
                        $song = simplexml_import_dom($doc->importNode($reader->expand(), true));
                        if(strpos($song['FilePath'], 'netsearch://') === false && strpos($song['FilePath'], ':/DJ Tools/') === false){

                            $music = new Music;  //create new instance of model

                            foreach ($song->Tags as $tag){
                                if(($tag['Author'] != "" || $tag['Title'] != "") && ($tag['Grouping'] != "Studio")){

                                    $insert = true; //insert record or not

                                    foreach($song->Infos as $info){
                                        $music->length = gmdate("H:i:s",floatval($info['SongLength']));
                                        $music->file_date = date("Y-m-d",intval($info['FirstSeen']));
                                    }

                                    if($insert == true){
                                        $music->bpm = ($tag['Bpm'] > 0) ? round(1 / floatval($tag['Bpm']) * 60) : null; //to calculate use 1/bpm * 60 and round
                                        $music->file_path = $song['FilePath'];
                                        $music->artist = trim($tag['Author']);
                                        $music->title = trim($tag['Title']);
                                        $music->remix = trim($tag['Remix']);
                                        $music->album = trim($tag['Album']);
                                        $music->genre = trim($tag['Genre']);
                                        $music->filetype = substr($song['FilePath'],-3);
                                        $music->year = ($tag['Year'] > 0) ? intval($tag['Year']) : null;

                                        //set the kind (audio, video or karaoke):
                                        if(strpos($song['FilePath'], '/Karaoke/') !== false){
                                            $kind = "karaoke";
                                        }
                                        
                                        elseif(in_array(strtolower(substr($song['FilePath'],-3)),$video_files)){
                                            $kind = "video";
                                        }
                                        else{
                                            $kind = "audio";
                                        }
                                        $music->kind = $kind;
                                        
                                        $music->save();  //adds song to mysql

                                    }//end if insert true
                                } //end has title or author + non-studio
                            } //end for each tag

                        } //end not a netsearch file
                        $reader->next('Song');
                    } //end while
                    $reader->close();
                } //end for each files

XML 文件的结构如下所示:

 <Song FilePath="D:/Path/To/Music/Michael Jackson/The Ultimate Collection/2-03 Thriller.mp3" FileSize="12974048">
  <Tags Author="Michael Jackson" Title="Thriller" Genre="Pop" Album="The Ultimate Collection" Composer="Rod Temperton" TrackNumber="3/11" Grouping="Halloween" Year="2004" Bpm="0.504202" Key="G#m" Flag="1" />
  <Infos SongLength="356.960363" FirstSeen="1501430558" Bitrate="282" Cover="1" />
  <Comment>Great for parties</Comment>
  <Scan Version="801" Bpm="0.506077" AltBpm="0.379569" Volume="1.101067" Key="G#m" Flag="32768" />
  <Poi Pos="17.171541" Type="beatgrid" />
  <Poi Pos="0.634195" Type="automix" Point="realStart" />
  <Poi Pos="356.051882" Type="automix" Point="realEnd" />
  <Poi Pos="17.30" Type="automix" Point="fadeStart" />
  <Poi Pos="352.750" Type="automix" Point="fadeEnd" />
  <Poi Pos="41.695057" Type="automix" Point="cutStart" />
  <Poi Pos="343.074830" Type="automix" Point="cutEnd" />
  <Poi Pos="44.289569" Type="automix" Point="tempoStart" />
  <Poi Pos="298.550091" Type="automix" Point="tempoEnd" />
 </Song>
 <Song FilePath="D:/Path/To/Music/Black Sabbath/We Sold Our Soul for Rock &apos;n&apos; Roll/09 Sweet Leaf.m4a" FileSize="10799807">
  <Tags Author="Black Sabbath" Title="Sweet Leaf" Genre="Heavy Metal" Album="We Sold Our Soul For Rock &apos;n&apos; Roll" Composer="Geezer Butler" TrackNumber="9/14" Year="1987" Key="Am" Flag="1" />
  <Infos SongLength="306.456961" FirstSeen="1501430556" Bitrate="259" Cover="1" />
  <Scan Version="801" Bpm="0.411757" AltBpm="0.617438" Volume="0.680230" Key="Am" Flag="32768" />
  <Poi Pos="1.753537" Type="beatgrid" />
  <Poi Pos="0.220590" Type="automix" Point="realStart" />
  <Poi Pos="301.146848" Type="automix" Point="realEnd" />
  <Poi Pos="0.30" Type="automix" Point="fadeStart" />
  <Poi Pos="291.50" Type="automix" Point="fadeEnd" />
 </Song>
 ...tens of thousands of more songs, nearly 1 million lines
奥莉亚·维古纳

如果您创建一个实例并在每个循环中插入一条记录,它将为每个循环创建一个 Music 实例和 1 个插入查询,这效率不高。如果您先将数据保存到数组中,然后将其分块,然后将其保存到数据库中会怎样。

例如,您有 1000 个音乐数据,如果您为每个循环制作音乐实例,它将在数据库中创建 1000 次音乐实例和1000 次插入操作。但是,如果您首先将音乐数据保存到一个数组中并将其分块到 20 个数组(每个数组包含 50 个音乐数据),它将执行20 个插入操作效率更高,不是吗?

所以,你的代码看起来像这样:

<?php

use App\Music;
Music::truncate(); //clear existing data
\DB::disableQueryLog(); //helps speed up queries by disabling log
ini_set('memory_limit', '512M'); //boost memory limit
ini_set('max_execution_time', '90'); //try to prevent time-out

//list of files to import (I sometimes have more than 1):
$files = [                    
    "path/to/my/database.xml",
    "path/to/my/database2.xml"
];
$video_files = ["mp4","mov","avi","flv"]; //used to identify music videos

//declare array of music here
$arrayOfMusic = [];

foreach($files as $file){
    $reader = new XMLReader();
    if(!$reader->open($file)){
        die("Failed to open xml file!");
    }
    $doc = new DOMDocument;
    while ($reader->read() && $reader->name !== 'Song');

    while ($reader->name === 'Song') {
        $song = simplexml_import_dom($doc->importNode($reader->expand(), true));
        if(strpos($song['FilePath'], 'netsearch://') === false && strpos($song['FilePath'], ':/DJ Tools/') === false) {

            foreach ($song->Tags as $tag) {
                if (($tag['Author'] != "" || $tag['Title'] != "") && ($tag['Grouping'] != "Studio")) {

                    $insert = true; //insert record or not

                    foreach ($song->Infos as $info) {
                        $length = gmdate("H:i:s",floatval($info['SongLength']));
                        $file_date = date("Y-m-d",intval($info['FirstSeen']));
                    }

                    if($insert == true){
                        //set the kind (audio, video or karaoke):
                        if(strpos($song['FilePath'], '/Karaoke/') !== false){
                            $kind = "karaoke";
                        } elseif (in_array(strtolower(substr($song['FilePath'],-3)),$video_files)) {
                            $kind = "video";
                        } else{
                            $kind = "audio";
                        }

                        //Fill array of music
                        $arrayOfMusic[] = [
                            'bpm'       => ($tag['Bpm'] > 0) ? round(1 / floatval($tag['Bpm']) * 60) : null, //to calculate use 1/bpm * 60 and round
                            'file_path' => $song['FilePath'],
                            'artist'    => trim($tag['Author']),
                            'length'    => $length ?? '0', //set $length to 0 if it cannot be found
                            'file_date' => $file_date ?? '0', //set $file_date to 0 if it cannot be found
                            'title'     => trim($tag['Title']),
                            'remix'     => trim($tag['Remix']),
                            'album'     => trim($tag['Album']),
                            'genre'     => trim($tag['Genre']),
                            'filetype'  => substr($song['FilePath'],-3),
                            'year'      => ($tag['Year'] > 0) ? intval($tag['Year']) : null;
                            'kind'      => $kind,
                        ];
                    }//end if insert true
                } //end has title or author + non-studio
            } //end for each tag

        } //end not a netsearch file
        $reader->next('Song');
    } //end while
    $reader->close();
} //end for each files

//Chunk the array if $arrayOfMusic is not null
if (!empty($arrayOfMusic)) {
    $arrayOfMusicChunked = array_chunk($arrayOfMusic, 30); //Chunk large array, in this example, chunked array will contains 30 items

    //loop the array and insert it use insert() function
    foreach ($arrayOfMusicChunked as $arrayOfMusicToSave) {
        Music::insert($arrayOfMusicToSave);
    }
}

来源

  1. https://www.w3schools.com/php/func_array_chunk.asp

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章

处理大型xml文件

将XML导入MySQL数据库

使用php将txt文件导入mysql数据库

使用php将csv文件导入mysql数据库

使用Laravel在MySQL中导入大型CSV文件

使用MySQL数据库的NodeJS vs Laravel在性能方面更好

使用laravel导入kml文件并更新mysql数据库中的表

使用laravel将Firebase实时数据库连接到MYSQL数据库表

使用带有列映射的python将XML文件加载到MySQL数据库

将数据从xml文件检索到mysql数据库

使用Pandas将大型csv导入数据库

使用 Python 将大型 CSV 文件导入 MySQL

如何使用多处理来解析和处理大型 XML 文件?

将XML文件同步到MySQL数据库

尝试使用phpadmin将csv文件中的数据导入mysql数据库

使用 XSLT 将 XML 导入文件格式化到数据库

同时处理大型,简单的XML文件的SAX?

C#将大型数据库导出为XML

如何将导出的mysql数据库文件划分为多个部分并导入数据库?

使用Perl或Python将日志文件内容导入MySQL数据库

使用python3将csv文件导入mysql数据库

上传CSV文件并使用Laravel将其导入数据库

大型企业如何导入和处理XML和数据提要?

Laravel:如何使用select2将多个数据插入mysql数据库?

我正在使用laravel 5.8,并且无法将数据插入mysql数据库

使用大型mysql数据库改善查询更新

如何使用准备好的语句和批处理将大型csv导入带有java的mysql

如何使用PHP将XML内容插入mysql数据库

将XML数据发送到MySQL数据库