如何提取所有嵌套的 tar.gz 并压缩到 PHP 中的目录?

How to extract all nested tar.gz and zip to a directory in PHP?

我需要在 PHP 中提取一个 tar.gz 文件。该文件包含许多JSON文件,tar.gzzip文件,和子目录。我只需要将 JSON 文件移动到目录 ./Dataset/processing 并继续提取嵌套的 tar.gz 并压缩以从那里获取所有 JSON 文件。这些文件也可以有嵌套的文件夹/目录。

结构如下:

origin.tar.gz
 ├───sub1.tar.gz
 │   ├───sub2.tar.gz
 │   ├───├───a.json
 │   ├───├───├───├───├───├───...(unknown depth)
 │   ├───b.json
 │   ├───c.json
 ├───sub3.zip
 │   ├───sub4.tar.gz
 │   ├───├───d.json
 │   ├───├───├───├───├───├───...(unknown depth)
 │   ├───e.json
 │   ├───f.json
 ├───subdirectory
 │   ├───g.json
 ├───h.json
 ├───i.json
 |   ..........
 |   ..........
 |   ..........
 |   many of them

提取后 ./Dataset 将如下所示

Dataset/processing
 ├───a.json
 ├───b.json
 ├───c.json
 ├───d.json
 ├───e.json
 ├───f.json
 ├───g.json
 ├───h.json
 ├───i.json
 |   ..........
 |   ..........
 |   ..........
 |   many of them

我知道如何使用 PHP 中的 PharData 提取 tar.gz,但它仅适用于单层深度。我在想是否某种递归可以使这项工作适用于多级深度。

$phar = new PharData('origin.tar.gz');
$phar->extractTo('/full/path'); // extract all files in the tar.gz

我稍微改进了我的代码并尝试了这个,它适用于多深度但当有一个目录(文件夹或嵌套文件夹)也包含 JSON 时失败。谁能帮我把它们也提取出来。

<?php

$path = './';

// Extraction of compressed file
function fun($path) {    
    $array = scandir($path); 
    for ($i = 0; $i < count($array); $i++) {
        if($i == 0 OR $i == 1){continue;}
        else {
            $item = $array[$i];
            $fileExt = explode('.', $item);

            // Getting the extension of the file
            $fileActualExt = strtolower(end($fileExt));
            if(($fileActualExt == 'gz') or ($fileActualExt == 'zip')){
                $pathnew = $path.$item; // Dataset ./data1.tar.gz
                $phar = new PharData($pathnew);
                // Moving the files
                $phar->extractTo($path);
                // Del the files
                unlink($pathnew);
                $i=0;
            }
        }
        $array = scandir($path);


    }
}
fun($path);

// Move only the json to ./dataset(I will add it later)
?>

提前致谢。

第一步,像您提到的那样提取您的 tar.gz 文件:

$phar = new PharData('origin.tar.gz');
$phar->extractTo('/full/path'); // extract all files in the tar.gz

然后递归读取目录,将所有json类型的文件移动到你的目标目录中,这是我的带注释的代码:

$dirPath='./';       // the root path of your very first extraction of your tar.gz

recursion_readdir($dirPath,1);


function recursion_readdir($dirPath,$Deep=0){
    $resDir=opendir($dirPath);
    while($basename=readdir($resDir)){
        //current file path
        $path=$dirPath.'/'.$basename;
        if(is_dir($path) AND $basename!='.' AND $basename!='..'){
            //it is directory, then go deeper
            $Deep++;//depth+1
            recursion_readdir($path,$Deep);
        }else if(basename($path)!='.' AND basename($path)!='..'){
            //it is not directory,
            //when the file is json file
                if(strstr($basename,'json')) {
                        //copy the file to your destination path
                    copy($path, './dest/' . $basename);

            } else if(strstr($basename,'tar')){
                //when the file is tar.gz file, extract this tar.gz file
                $phar = new PharData($basename);
                $phar->extractTo($dirPath, null, true);
            }
        }

    }
    closedir($resDir);
}
function forChar($char='-',$times=0){
  $result='';
  for($i=0;$i<$times;$i++){
     $result.=$char;
  }
  return $result;
}

经过一番研究,我解决了这个问题。这解决了问题。

有3个函数:

  • recursiveScanProtected():提取所有压缩文件
  • 扫描JSON():它将扫描JSON个文件并将它们移动到处理文件夹。
  • delete_files():此函数删除除处理文件夹之外的所有内容,其中包含 JSON 文件,以及根目录中的 index.php。
<?php

// Root directory
$path = './';

// Directory where I want to extract the JSON files
$path_json = $path.'processing/';


// Function to extract all the compressed files
function recursiveScanProtected($dir, $conn) {
    if($dir != '') {
        $tree = glob(rtrim($dir, '/') . '/*');
        if (is_array($tree)) {
            for ($i = 0; $i < count($tree); $i++) {
                $file = $tree[$i];
                if (is_dir($file)) {
                    recursiveScanProtected($file, $conn); // Recursive call if directory
                } elseif (is_file($file)) {

                    $item = $file;
                    $fileExt = explode('.', $item); 
                    // Getting the extension of the file
                    $fileActualExt = strtolower(end($fileExt));
                    // Check if the file is a zip or a tar.gz
                    if(($fileActualExt == 'gz') or ($fileActualExt == 'zip')){

                        // Moving the file - Overwriting true
                        $phar->extractTo($dir.$i."/", null, true);

                        // Del the compressed file
                        unlink($item);

                        recursiveScanProtected($dir.$i, $conn); // Recursive call
                    }

                }
            }
        }
    }
}
recursiveScanProtected($path, $conn);


// Move the JSON files to processing
function scanJSON($dir, $path_json) {
    if($dir != '') {
        $tree = glob(rtrim($dir, '/') . '/*');
        if (is_array($tree)) {
            foreach($tree as $file) {
                if (is_dir($file)) {
                    // Do not scan processing recursively, but all other directories should be scanned
                    if($file != './processing'){
                        scanJSON($file, $path_json);
                    }
                } elseif (is_file($file)) {

                    $ext = pathinfo($file);

                    if(strtolower($ext['extension']) == 'json'){
                        // Move the JSON files to processing
                        rename($file, $path_json.$ext['basename']);
                    }
                }
            }
        }
    }
}

scanJSON($path, $path_json);

/* 
 * php delete function that deals with directories recursively
 * It deletes everything except ./dataset/processing and index.php
 */
function delete_files($target) {

    if(is_dir($target)){
        $files = glob( $target . '*', GLOB_MARK ); //GLOB_MARK adds a slash to directories returned
        foreach( $files as $file ){
            if($file == './processing/' || $file == './index.php'){
                continue;
            } else{
                delete_files( $file );
            }
        }
        if($target != './'){
            rmdir( $target );
        }
    } elseif(is_file($target)) {
        unlink( $target );  
    }
}

delete_files($path);
?>