如何提取所有嵌套的 tar.gz 并压缩到 PHP 中的目录?
How to extract all nested tar.gz and zip to a directory in PHP?
我需要在 PHP 中提取一个 tar.gz 文件。该文件包含许多JSON文件,tar.gz,zip文件,和子目录。我只需要将 JSON 文件移动到目录 ./Dataset/processing 并继续提取嵌套的 tar.gz 并压缩以从那里获取所有 JSON 文件。这些文件也可以有嵌套的文件夹/目录。
结构如下:
origin.tar.gz
├───sub1.tar.gz
│ ├───sub2.tar.gz
│ ├───├───a.json
│ ├───├───├───├───├───├───...(unknown depth)
│ ├───b.json
│ ├───c.json
├───sub3.zip
│ ├───sub4.tar.gz
│ ├───├───d.json
│ ├───├───├───├───├───├───...(unknown depth)
│ ├───e.json
│ ├───f.json
├───subdirectory
│ ├───g.json
├───h.json
├───i.json
| ..........
| ..........
| ..........
| many of them
提取后 ./Dataset 将如下所示
Dataset/processing
├───a.json
├───b.json
├───c.json
├───d.json
├───e.json
├───f.json
├───g.json
├───h.json
├───i.json
| ..........
| ..........
| ..........
| many of them
我知道如何使用 PHP 中的 PharData 提取 tar.gz,但它仅适用于单层深度。我在想是否某种递归可以使这项工作适用于多级深度。
$phar = new PharData('origin.tar.gz');
$phar->extractTo('/full/path'); // extract all files in the tar.gz
我稍微改进了我的代码并尝试了这个,它适用于多深度但当有一个目录(文件夹或嵌套文件夹)也包含 JSON 时失败。谁能帮我把它们也提取出来。
<?php
$path = './';
// Extraction of compressed file
function fun($path) {
$array = scandir($path);
for ($i = 0; $i < count($array); $i++) {
if($i == 0 OR $i == 1){continue;}
else {
$item = $array[$i];
$fileExt = explode('.', $item);
// Getting the extension of the file
$fileActualExt = strtolower(end($fileExt));
if(($fileActualExt == 'gz') or ($fileActualExt == 'zip')){
$pathnew = $path.$item; // Dataset ./data1.tar.gz
$phar = new PharData($pathnew);
// Moving the files
$phar->extractTo($path);
// Del the files
unlink($pathnew);
$i=0;
}
}
$array = scandir($path);
}
}
fun($path);
// Move only the json to ./dataset(I will add it later)
?>
提前致谢。
第一步,像您提到的那样提取您的 tar.gz 文件:
$phar = new PharData('origin.tar.gz');
$phar->extractTo('/full/path'); // extract all files in the tar.gz
然后递归读取目录,将所有json类型的文件移动到你的目标目录中,这是我的带注释的代码:
$dirPath='./'; // the root path of your very first extraction of your tar.gz
recursion_readdir($dirPath,1);
function recursion_readdir($dirPath,$Deep=0){
$resDir=opendir($dirPath);
while($basename=readdir($resDir)){
//current file path
$path=$dirPath.'/'.$basename;
if(is_dir($path) AND $basename!='.' AND $basename!='..'){
//it is directory, then go deeper
$Deep++;//depth+1
recursion_readdir($path,$Deep);
}else if(basename($path)!='.' AND basename($path)!='..'){
//it is not directory,
//when the file is json file
if(strstr($basename,'json')) {
//copy the file to your destination path
copy($path, './dest/' . $basename);
} else if(strstr($basename,'tar')){
//when the file is tar.gz file, extract this tar.gz file
$phar = new PharData($basename);
$phar->extractTo($dirPath, null, true);
}
}
}
closedir($resDir);
}
function forChar($char='-',$times=0){
$result='';
for($i=0;$i<$times;$i++){
$result.=$char;
}
return $result;
}
经过一番研究,我解决了这个问题。这解决了问题。
有3个函数:
- recursiveScanProtected():提取所有压缩文件
- 扫描JSON():它将扫描JSON个文件并将它们移动到处理文件夹。
- delete_files():此函数删除除处理文件夹之外的所有内容,其中包含 JSON 文件,以及根目录中的 index.php。
<?php
// Root directory
$path = './';
// Directory where I want to extract the JSON files
$path_json = $path.'processing/';
// Function to extract all the compressed files
function recursiveScanProtected($dir, $conn) {
if($dir != '') {
$tree = glob(rtrim($dir, '/') . '/*');
if (is_array($tree)) {
for ($i = 0; $i < count($tree); $i++) {
$file = $tree[$i];
if (is_dir($file)) {
recursiveScanProtected($file, $conn); // Recursive call if directory
} elseif (is_file($file)) {
$item = $file;
$fileExt = explode('.', $item);
// Getting the extension of the file
$fileActualExt = strtolower(end($fileExt));
// Check if the file is a zip or a tar.gz
if(($fileActualExt == 'gz') or ($fileActualExt == 'zip')){
// Moving the file - Overwriting true
$phar->extractTo($dir.$i."/", null, true);
// Del the compressed file
unlink($item);
recursiveScanProtected($dir.$i, $conn); // Recursive call
}
}
}
}
}
}
recursiveScanProtected($path, $conn);
// Move the JSON files to processing
function scanJSON($dir, $path_json) {
if($dir != '') {
$tree = glob(rtrim($dir, '/') . '/*');
if (is_array($tree)) {
foreach($tree as $file) {
if (is_dir($file)) {
// Do not scan processing recursively, but all other directories should be scanned
if($file != './processing'){
scanJSON($file, $path_json);
}
} elseif (is_file($file)) {
$ext = pathinfo($file);
if(strtolower($ext['extension']) == 'json'){
// Move the JSON files to processing
rename($file, $path_json.$ext['basename']);
}
}
}
}
}
}
scanJSON($path, $path_json);
/*
* php delete function that deals with directories recursively
* It deletes everything except ./dataset/processing and index.php
*/
function delete_files($target) {
if(is_dir($target)){
$files = glob( $target . '*', GLOB_MARK ); //GLOB_MARK adds a slash to directories returned
foreach( $files as $file ){
if($file == './processing/' || $file == './index.php'){
continue;
} else{
delete_files( $file );
}
}
if($target != './'){
rmdir( $target );
}
} elseif(is_file($target)) {
unlink( $target );
}
}
delete_files($path);
?>
我需要在 PHP 中提取一个 tar.gz 文件。该文件包含许多JSON文件,tar.gz,zip文件,和子目录。我只需要将 JSON 文件移动到目录 ./Dataset/processing 并继续提取嵌套的 tar.gz 并压缩以从那里获取所有 JSON 文件。这些文件也可以有嵌套的文件夹/目录。
结构如下:
origin.tar.gz
├───sub1.tar.gz
│ ├───sub2.tar.gz
│ ├───├───a.json
│ ├───├───├───├───├───├───...(unknown depth)
│ ├───b.json
│ ├───c.json
├───sub3.zip
│ ├───sub4.tar.gz
│ ├───├───d.json
│ ├───├───├───├───├───├───...(unknown depth)
│ ├───e.json
│ ├───f.json
├───subdirectory
│ ├───g.json
├───h.json
├───i.json
| ..........
| ..........
| ..........
| many of them
提取后 ./Dataset 将如下所示
Dataset/processing
├───a.json
├───b.json
├───c.json
├───d.json
├───e.json
├───f.json
├───g.json
├───h.json
├───i.json
| ..........
| ..........
| ..........
| many of them
我知道如何使用 PHP 中的 PharData 提取 tar.gz,但它仅适用于单层深度。我在想是否某种递归可以使这项工作适用于多级深度。
$phar = new PharData('origin.tar.gz');
$phar->extractTo('/full/path'); // extract all files in the tar.gz
我稍微改进了我的代码并尝试了这个,它适用于多深度但当有一个目录(文件夹或嵌套文件夹)也包含 JSON 时失败。谁能帮我把它们也提取出来。
<?php
$path = './';
// Extraction of compressed file
function fun($path) {
$array = scandir($path);
for ($i = 0; $i < count($array); $i++) {
if($i == 0 OR $i == 1){continue;}
else {
$item = $array[$i];
$fileExt = explode('.', $item);
// Getting the extension of the file
$fileActualExt = strtolower(end($fileExt));
if(($fileActualExt == 'gz') or ($fileActualExt == 'zip')){
$pathnew = $path.$item; // Dataset ./data1.tar.gz
$phar = new PharData($pathnew);
// Moving the files
$phar->extractTo($path);
// Del the files
unlink($pathnew);
$i=0;
}
}
$array = scandir($path);
}
}
fun($path);
// Move only the json to ./dataset(I will add it later)
?>
提前致谢。
第一步,像您提到的那样提取您的 tar.gz 文件:
$phar = new PharData('origin.tar.gz');
$phar->extractTo('/full/path'); // extract all files in the tar.gz
然后递归读取目录,将所有json类型的文件移动到你的目标目录中,这是我的带注释的代码:
$dirPath='./'; // the root path of your very first extraction of your tar.gz
recursion_readdir($dirPath,1);
function recursion_readdir($dirPath,$Deep=0){
$resDir=opendir($dirPath);
while($basename=readdir($resDir)){
//current file path
$path=$dirPath.'/'.$basename;
if(is_dir($path) AND $basename!='.' AND $basename!='..'){
//it is directory, then go deeper
$Deep++;//depth+1
recursion_readdir($path,$Deep);
}else if(basename($path)!='.' AND basename($path)!='..'){
//it is not directory,
//when the file is json file
if(strstr($basename,'json')) {
//copy the file to your destination path
copy($path, './dest/' . $basename);
} else if(strstr($basename,'tar')){
//when the file is tar.gz file, extract this tar.gz file
$phar = new PharData($basename);
$phar->extractTo($dirPath, null, true);
}
}
}
closedir($resDir);
}
function forChar($char='-',$times=0){
$result='';
for($i=0;$i<$times;$i++){
$result.=$char;
}
return $result;
}
经过一番研究,我解决了这个问题。这解决了问题。
有3个函数:
- recursiveScanProtected():提取所有压缩文件
- 扫描JSON():它将扫描JSON个文件并将它们移动到处理文件夹。
- delete_files():此函数删除除处理文件夹之外的所有内容,其中包含 JSON 文件,以及根目录中的 index.php。
<?php
// Root directory
$path = './';
// Directory where I want to extract the JSON files
$path_json = $path.'processing/';
// Function to extract all the compressed files
function recursiveScanProtected($dir, $conn) {
if($dir != '') {
$tree = glob(rtrim($dir, '/') . '/*');
if (is_array($tree)) {
for ($i = 0; $i < count($tree); $i++) {
$file = $tree[$i];
if (is_dir($file)) {
recursiveScanProtected($file, $conn); // Recursive call if directory
} elseif (is_file($file)) {
$item = $file;
$fileExt = explode('.', $item);
// Getting the extension of the file
$fileActualExt = strtolower(end($fileExt));
// Check if the file is a zip or a tar.gz
if(($fileActualExt == 'gz') or ($fileActualExt == 'zip')){
// Moving the file - Overwriting true
$phar->extractTo($dir.$i."/", null, true);
// Del the compressed file
unlink($item);
recursiveScanProtected($dir.$i, $conn); // Recursive call
}
}
}
}
}
}
recursiveScanProtected($path, $conn);
// Move the JSON files to processing
function scanJSON($dir, $path_json) {
if($dir != '') {
$tree = glob(rtrim($dir, '/') . '/*');
if (is_array($tree)) {
foreach($tree as $file) {
if (is_dir($file)) {
// Do not scan processing recursively, but all other directories should be scanned
if($file != './processing'){
scanJSON($file, $path_json);
}
} elseif (is_file($file)) {
$ext = pathinfo($file);
if(strtolower($ext['extension']) == 'json'){
// Move the JSON files to processing
rename($file, $path_json.$ext['basename']);
}
}
}
}
}
}
scanJSON($path, $path_json);
/*
* php delete function that deals with directories recursively
* It deletes everything except ./dataset/processing and index.php
*/
function delete_files($target) {
if(is_dir($target)){
$files = glob( $target . '*', GLOB_MARK ); //GLOB_MARK adds a slash to directories returned
foreach( $files as $file ){
if($file == './processing/' || $file == './index.php'){
continue;
} else{
delete_files( $file );
}
}
if($target != './'){
rmdir( $target );
}
} elseif(is_file($target)) {
unlink( $target );
}
}
delete_files($path);
?>