csv 文件 UTF-8(带 BOM)到 ANSI / Windows-1251
csv file UTF-8 (with BOM) to ANSI / Windows-1251
我想创建一个批处理文件/宏来删除自动生成的 UTF-8 CSV 的第一行并将其转换为 Windows 代码页 1251 ("ANSI")。
我一直在网上寻找并尝试了很多东西,但就是找不到一个有效的...
删除第一行很简单
@echo off
set "csv=test.csv"
more +1 "%csv%" >"%csv%.new"
move /y "%csv%.new" "export\%csv%" >nul
在那之后我迷路了,我尝试使用 DOS 设置的 TYPE
cmd /a /c TYPE test.csv > ansi.csv
还有许多变体,但它要么是 returns 一个空的 CP1251 文件,要么只是另一个 UTF 文件。
我试过使用 vbs,但这返回了另一个 UTF-8 文件,但现在没有 BOM
Option Explicit
Private Const adReadAll = -1
Private Const adSaveCreateOverWrite = 2
Private Const adTypeBinary = 1
Private Const adTypeText = 2
Private Const adWriteChar = 0
Private Sub UTF8toANSI(ByVal UTF8FName, ByVal ANSIFName)
Dim strText
With CreateObject("ADODB.Stream")
.Open
.Type = adTypeBinary
.LoadFromFile UTF8FName
.Type = adTypeText
.Charset = "utf-8"
strText = .ReadText(adReadAll)
.Position = 0
.SetEOS
.Charset = "_autodetect" 'Use current ANSI codepage.
.WriteText strText, adWriteChar
.SaveToFile ANSIFName, adSaveCreateOverWrite
.Close
End With
End Sub
UTF8toANSI "UTF8-wBOM.txt", "ANSI1.txt"
UTF8toANSI "UTF8-noBOM.txt", "ANSI2.txt"
MsgBox "Complete!", vbOKOnly, WScript.ScriptName
编辑1:
尝试使用 vbs
转换为 iso-8859-1 而不是 cp1251
Option Explicit
Private Const adReadAll = -1
Private Const adSaveCreateOverWrite = 2
Private Const adTypeBinary = 1
Private Const adTypeText = 2
Private Const adWriteChar = 0
Private Sub UTF8toANSI(ByVal UTF8FName, ByVal ANSIFName)
Dim strText
With CreateObject("ADODB.Stream")
.Open
.Type = adTypeBinary
.LoadFromFile UTF8FName
.Type = adTypeText
.Charset = "utf-8"
strText = .ReadText(adReadAll)
.Position = 0
.SetEOS
.Charset = "iso-8859-1"
.WriteText strText, adWriteChar
.SaveToFile ANSIFName, adSaveCreateOverWrite
.Close
End With
End Sub
UTF8toANSI WScript.Arguments(0), WScript.Arguments(1)
然而这也没有用。
编辑 2:
我找到了一种使用 stringconverter.exe 将文件从 UTF 转换为 ANSI 的方法
(从 http://www.computerperformance.co.uk/ezine/tools.htm 下载)
Setlocal
Set _source=C:\Users\lloyd.EVD\delFirstBat\import
Set _dest=C:\Users\lloyd.EVD\delFirstBat\export
For /F "Tokens=*" %%I In ('dir /b /a-d "%_source%\*.CSV"') Do stringconverter "%_source%\%%~nxI" "%_dest%\%%~nxI" /ANSI
现在当我删除文件的第一行(之前或之后,无关紧要)时,它又变成了没有 BOM 的 UTF-8。
所以我现在需要的是一个脚本来删除第一行而不更改字符集。
下一个 VBScript 可以提供帮助:过程 UTF8toANSI
将 UTF-8
编码的文本文件转换为另一种编码。
Option Explicit
Private Const adReadAll = -1
Private Const adSaveCreateOverWrite = 2
Private Const adTypeBinary = 1
Private Const adTypeText = 2
Private Const adWriteChar = 0
Private Sub UTF8toANSI(ByVal UTF8FName, ByVal ANSIFName, ByVal ANSICharSet)
Dim strText
With CreateObject("ADODB.Stream")
.Type = adTypeText
.Charset = "utf-8"
.Open
.LoadFromFile UTF8FName
strText = .ReadText(adReadAll)
.Close
.Charset = ANSICharSet
.Open
.WriteText strText, adWriteChar
.SaveToFile ANSIFName, adSaveCreateOverWrite
.Close
End With
End Sub
'UTF8toANSI WScript.Arguments(0), WScript.Arguments(1)
UTF8toANSI "D:\test\SO835837utf8.csv", "D:\test\SO835837ansi1250.csv", "windows-1250"
UTF8toANSI "D:\test\SO835837utf8.csv", "D:\test\SO835837ansi1251.csv", "windows-1251"
UTF8toANSI "D:\test\SO835837utf8.csv", "D:\test\SO835837ansi1253.csv", "windows-1253"
有关系统已知的字符集名称的列表,请参阅 Windows 注册表中 HKEY_CLASSES_ROOT\MIME\Database\Charset
的子项:
for /F "tokens=5* delims=\" %# in ('reg query HKCR\MIME\Database\Charset') do @echo "%#"
数据(38835837utf8.csv
文件):
1st Line 1250 852 čeština (Čechie)
2nd Line 1251 966 русский (Россия)
3rd Line 1253 737 ελληνικά (Ελλάδα)
Output表明那些不能转换为特定字符集的字符使用Character Decomposition Mapping转换(č
=>c
、š
=>s
、Č
=>C
等);如果不适用,那么这些都将转换为 ?
问号(常用替换字符):
==> chcp 1250
Active code page: 1250
==> type D:\test\SO835837ansi1250.csv
1st Line 1250 852 čeština (Čechie)
2nd Line 1251 966 ??????? (??????)
3rd Line 1253 737 ???????? (??????)
==> chcp 1251
Active code page: 1251
==> type D:\test\SO835837ansi1251.csv
1st Line 1250 852 cestina (Cechie)
2nd Line 1251 966 русский (Россия)
3rd Line 1253 737 ???????? (??????)
==> chcp 1253
Active code page: 1253
==> type D:\test\SO835837ansi1253.csv
1st Line 1250 852 cestina (Cechie)
2nd Line 1251 966 ??????? (??????)
3rd Line 1253 737 ελληνικά (Ελλάδα)
我想创建一个批处理文件/宏来删除自动生成的 UTF-8 CSV 的第一行并将其转换为 Windows 代码页 1251 ("ANSI")。 我一直在网上寻找并尝试了很多东西,但就是找不到一个有效的...
删除第一行很简单
@echo off
set "csv=test.csv"
more +1 "%csv%" >"%csv%.new"
move /y "%csv%.new" "export\%csv%" >nul
在那之后我迷路了,我尝试使用 DOS 设置的 TYPE
cmd /a /c TYPE test.csv > ansi.csv
还有许多变体,但它要么是 returns 一个空的 CP1251 文件,要么只是另一个 UTF 文件。
我试过使用 vbs,但这返回了另一个 UTF-8 文件,但现在没有 BOM
Option Explicit
Private Const adReadAll = -1
Private Const adSaveCreateOverWrite = 2
Private Const adTypeBinary = 1
Private Const adTypeText = 2
Private Const adWriteChar = 0
Private Sub UTF8toANSI(ByVal UTF8FName, ByVal ANSIFName)
Dim strText
With CreateObject("ADODB.Stream")
.Open
.Type = adTypeBinary
.LoadFromFile UTF8FName
.Type = adTypeText
.Charset = "utf-8"
strText = .ReadText(adReadAll)
.Position = 0
.SetEOS
.Charset = "_autodetect" 'Use current ANSI codepage.
.WriteText strText, adWriteChar
.SaveToFile ANSIFName, adSaveCreateOverWrite
.Close
End With
End Sub
UTF8toANSI "UTF8-wBOM.txt", "ANSI1.txt"
UTF8toANSI "UTF8-noBOM.txt", "ANSI2.txt"
MsgBox "Complete!", vbOKOnly, WScript.ScriptName
编辑1: 尝试使用 vbs
转换为 iso-8859-1 而不是 cp1251Option Explicit
Private Const adReadAll = -1
Private Const adSaveCreateOverWrite = 2
Private Const adTypeBinary = 1
Private Const adTypeText = 2
Private Const adWriteChar = 0
Private Sub UTF8toANSI(ByVal UTF8FName, ByVal ANSIFName)
Dim strText
With CreateObject("ADODB.Stream")
.Open
.Type = adTypeBinary
.LoadFromFile UTF8FName
.Type = adTypeText
.Charset = "utf-8"
strText = .ReadText(adReadAll)
.Position = 0
.SetEOS
.Charset = "iso-8859-1"
.WriteText strText, adWriteChar
.SaveToFile ANSIFName, adSaveCreateOverWrite
.Close
End With
End Sub
UTF8toANSI WScript.Arguments(0), WScript.Arguments(1)
然而这也没有用。
编辑 2: 我找到了一种使用 stringconverter.exe 将文件从 UTF 转换为 ANSI 的方法 (从 http://www.computerperformance.co.uk/ezine/tools.htm 下载)
Setlocal
Set _source=C:\Users\lloyd.EVD\delFirstBat\import
Set _dest=C:\Users\lloyd.EVD\delFirstBat\export
For /F "Tokens=*" %%I In ('dir /b /a-d "%_source%\*.CSV"') Do stringconverter "%_source%\%%~nxI" "%_dest%\%%~nxI" /ANSI
现在当我删除文件的第一行(之前或之后,无关紧要)时,它又变成了没有 BOM 的 UTF-8。
所以我现在需要的是一个脚本来删除第一行而不更改字符集。
下一个 VBScript 可以提供帮助:过程 UTF8toANSI
将 UTF-8
编码的文本文件转换为另一种编码。
Option Explicit
Private Const adReadAll = -1
Private Const adSaveCreateOverWrite = 2
Private Const adTypeBinary = 1
Private Const adTypeText = 2
Private Const adWriteChar = 0
Private Sub UTF8toANSI(ByVal UTF8FName, ByVal ANSIFName, ByVal ANSICharSet)
Dim strText
With CreateObject("ADODB.Stream")
.Type = adTypeText
.Charset = "utf-8"
.Open
.LoadFromFile UTF8FName
strText = .ReadText(adReadAll)
.Close
.Charset = ANSICharSet
.Open
.WriteText strText, adWriteChar
.SaveToFile ANSIFName, adSaveCreateOverWrite
.Close
End With
End Sub
'UTF8toANSI WScript.Arguments(0), WScript.Arguments(1)
UTF8toANSI "D:\test\SO835837utf8.csv", "D:\test\SO835837ansi1250.csv", "windows-1250"
UTF8toANSI "D:\test\SO835837utf8.csv", "D:\test\SO835837ansi1251.csv", "windows-1251"
UTF8toANSI "D:\test\SO835837utf8.csv", "D:\test\SO835837ansi1253.csv", "windows-1253"
有关系统已知的字符集名称的列表,请参阅 Windows 注册表中 HKEY_CLASSES_ROOT\MIME\Database\Charset
的子项:
for /F "tokens=5* delims=\" %# in ('reg query HKCR\MIME\Database\Charset') do @echo "%#"
数据(38835837utf8.csv
文件):
1st Line 1250 852 čeština (Čechie)
2nd Line 1251 966 русский (Россия)
3rd Line 1253 737 ελληνικά (Ελλάδα)
Output表明那些不能转换为特定字符集的字符使用Character Decomposition Mapping转换(č
=>c
、š
=>s
、Č
=>C
等);如果不适用,那么这些都将转换为 ?
问号(常用替换字符):
==> chcp 1250
Active code page: 1250
==> type D:\test\SO835837ansi1250.csv
1st Line 1250 852 čeština (Čechie)
2nd Line 1251 966 ??????? (??????)
3rd Line 1253 737 ???????? (??????)
==> chcp 1251
Active code page: 1251
==> type D:\test\SO835837ansi1251.csv
1st Line 1250 852 cestina (Cechie)
2nd Line 1251 966 русский (Россия)
3rd Line 1253 737 ???????? (??????)
==> chcp 1253
Active code page: 1253
==> type D:\test\SO835837ansi1253.csv
1st Line 1250 852 cestina (Cechie)
2nd Line 1251 966 ??????? (??????)
3rd Line 1253 737 ελληνικά (Ελλάδα)