Golang:在 windows 中压缩文件后,Go 中的解压缩文件在文件名中出现字符编码问题

Golang: Unzip files in Go gets char encoding problems in the files names when file has been zipped in windows

我正在尝试使用 zip 库在 Go (Golang) 中解压缩文件。问题是,当 zip 文件被压缩成 windows 时,所有特殊字符都会变得混乱。 windows 可能使用 windows1252 字符编码。只是不知道如何解压缩这些文件。 我已经尝试使用 golang.org/x/text/encoding/charmapgolang.org/x/text/transform,但没有成功。 我想,在 zip 库中应该有一个 anternative 来改变 charmap。

另一个问题:有时应用程序会解压缩压缩在 windows 上的文件,有时会压缩在不同的 OS 上。因此,应用程序需要识别字符编码。

这是代码(感谢:https://golangcode.com/unzip-files-in-go/):

package main

import (
    "archive/zip"
    "fmt"
    "io"
    "log"
    "os"
    "path/filepath"
    "strings"
)

func main() {

    files, err := Unzip("Edificações e Instalações Operacionais - 08.03 a 12.03.2021.zip", "output-folder")
    if err != nil {
        log.Fatal(err)
    }

    fmt.Println("Unzipped:\n" + strings.Join(files, "\n"))
}

// Unzip will decompress a zip archive, moving all files and folders
// within the zip file (parameter 1) to an output directory (parameter 2).
func Unzip(src string, dest string) ([]string, error) {

    var filenames []string

    r, err := zip.OpenReader(src)
    if err != nil {
        return filenames, err
    }
    defer r.Close()

    for _, f := range r.File {

        // Store filename/path for returning and using later on
        fpath := filepath.Join(dest, f.Name)

        
        if !strings.HasPrefix(fpath, filepath.Clean(dest)+string(os.PathSeparator)) {
            return filenames, fmt.Errorf("%s: illegal file path", fpath)
        }

        filenames = append(filenames, fpath)

        if f.FileInfo().IsDir() {
            // Make Folder
            os.MkdirAll(fpath, os.ModePerm)
            continue
        }

        // Make File
        if err = os.MkdirAll(filepath.Dir(fpath), os.ModePerm); err != nil {
            return filenames, err
        }

        outFile, err := os.OpenFile(fpath, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, f.Mode())
        if err != nil {
            return filenames, err
        }

        rc, err := f.Open()
        if err != nil {
            return filenames, err
        }

        _, err = io.Copy(outFile, rc)

        // Close the file without defer to close before next iteration of loop
        outFile.Close()
        rc.Close()

        if err != nil {
            return filenames, err
        }
    }
    return filenames, nil
}

This is The Output

如果我们只打印第一个压缩条目:

package main
import "archive/zip"

func main() {
   s := "Edificações_e_Instalações_Operacionais_08_03_a_12_03_2021.zip"
   f, e := zip.OpenReader(s)
   if e != nil {
      panic(e)
   }
   defer f.Close()
   println(f.File[0].Name)
}

我们得到这个结果:

Edifica��es e Instala��es Operacionais - 08.03 a 12.03.2021/

根据此页面:

In Brazil, however, the most widespread codepage —and that which DOS in Brazilian portuguese used by default— was code page 850.

https://wikipedia.org/wiki/Code_page_860

所以我们可以修改代码来处理这个问题:

package main

import (
   "archive/zip"
   "golang.org/x/text/encoding/charmap"
)

func main() {
   z := "Edificações_e_Instalações_Operacionais_08_03_a_12_03_2021.zip"
   f, e := zip.OpenReader(z)
   if e != nil {
      panic(e)
   }
   defer f.Close()
   s, e := charmap.CodePage850.NewDecoder().String(f.File[0].Name)
   if e != nil {
      panic(e)
   }
   println(s)
}

我们得到正确的结果:

Edificações e Instalações Operacionais - 08.03 a 12.03.2021/

https://pkg.go.dev/golang.org/x/text/encoding/charmap