Perl 将网页从 cp1256 保存为 utf-8

Perl save web page from cp1256 into utf-8

我正在尝试将此网页似乎将 cp1256 编码保存到 UTF-8 编码格式的文本文件中,如果我在保存之前尝试替换 html 实体,则会出现问题 & # 1548;通过其阿拉伯字符“Ì”,保存的文件内容不再是阿拉伯语。

#!C:\perl\bin\perl.exe
use Encode;
use LWP::Simple;

binmode STDOUT, ':encoding(UTF-8)';

my $url = qq{https://www.altafsir.com/Tafasir.asp?tMadhNo=1&tTafsirNo=7&tSoraNo=1&tAyahNo=1&tDisplay=yes&UserProfile=0&LanguageId=1};
my $content = get($url);

$content = decode('cp1256', $content);

my $ch = chr(0x60c);
# this line causes the problem
$content =~ s/\&#1548\;/$ch/mg;

open File, ">filecontent.txt" or die "Error creating file.\n";
binmode File, ':encoding(UTF-8)';
print File $content;
close File;

exit;

LWP::UserAgent中使用decoded_content,而是使用Content-Typeheader。

use strict;
use warnings;
use autodie;
use LWP::UserAgent qw();
require LWP::Protocol::https;
my $url = 'https://www.altafsir.com/Tafasir.asp'
    . '?tMadhNo=1&tTafsirNo=7&tSoraNo=1&tAyahNo=1'
    . '&tDisplay=yes&UserProfile=0&LanguageId=1';
my $ua = LWP::UserAgent->new;
my $response = $ua->get($url);
if ($response->is_success) {
    my $content = $response->decoded_content;
    $content =~ s/،/\N{ARABIC COMMA}/g;
    open my $fh, '>:encoding(UTF-8)', 'filecontent.html';
    $fh->print($content);
} else {
    die $response->status_line;
}