Pango:在梵文字符串中查找位置
Pango: Finding positions in Devanagari strings
我正在使用 Pango 排版梵文。
考虑由梵文字母 U、梵文字母 MA、梵文符号 VIRAMA、梵文字母 KA、梵文字母 NA、梵文符号 VIRAMA、梵文字母 CHA、梵文元音符号 AU 组成的字符串 उम्कन्छौ。
在排版这个字符串的时候,我想知道छ(CHA)的起点来做一个视觉标记。
对于普通字符串,我会采用前面部分的长度,उम्कन्,但这在这里不起作用,因为如您所见,न्(一半न)与छ结合,所以结果略有偏差。
有没有办法在涉及组合的情况下获取正确的字母起点?
我试过使用 index_to_pos() 查询 Pango 布局,但这似乎适用于字节级别(不是字符)。
这个小的 Perl 程序显示了这个问题。垂直线偏向右侧。
use strict;
use warnings;
use utf8;
use Cairo;
use Pango;
my $surface = Cairo::PdfSurface->create ("out.pdf", 595, 842);
my $cr = Cairo::Context->create ($surface);
my $layout = Pango::Cairo::create_layout($cr);
my $font = Pango::FontDescription->from_string('Lohit Devanagari');
$layout->set_font_description($font);
# Two parts of the phrase. Phrase1 ends in न् (half न).
my $phrase1 = 'उम्कन्';
my $phrase2 = 'छौ';
# Set the first part of the phrase, and get its width.
$layout->set_markup($phrase1);
my $w = ($layout->get_size)[0]/1024;
# Set the complete phrase.
$layout->set_markup($phrase1.$phrase2);
my ($x, $y ) = ( 100, 100 );
# Show phrase.
$cr->move_to( $x, $y );
$cr->set_source_rgba( 0, 0, 0, 1 );
Pango::Cairo::show_layout($cr, $layout);
# Show marker at width.
$cr->set_line_width(0.25);
$cr->move_to( $x + $w, $y-10 );
$cr->line_to( $x + $w, $y+50 );
$cr->stroke;
$cr->show_page;
您无法测量局部渲染。而是测量整个渲染并逐字遍历字符串以找到位置。另见:https://gankra.github.io/blah/text-hates-you/#style-can-change-mid-ligature
use strict;
use warnings;
use utf8;
use Cairo;
use Pango;
use List::Util qw(uniq);
use Encode qw(encode);
my $surface = Cairo::PdfSurface->create('out.pdf', 595, 842);
my $cr = Cairo::Context->create ($surface);
my $layout = Pango::Cairo::create_layout($cr);
my $font = Pango::FontDescription->from_string('Lohit Devanagari');
$layout->set_font_description($font);
my $phrase = 'उम्कन्छौ';
my @octets = split '', encode 'UTF-8', $phrase; # index_to_pos operates on octets
$layout->set_markup($phrase);
my ($x, $y) = (100, 100);
$cr->move_to($x, $y);
$cr->set_source_rgba(0, 0, 0, 1);
Pango::Cairo::show_layout($cr, $layout);
$cr->set_line_width(0.25);
my @offsets = uniq map { $layout->index_to_pos($_)->{x}/1024 } 0..$#octets;
# (0, 9.859375, 16.09375, 27.796875, 33.953125, 49.1875)
for my $offset (@offsets) {
$cr->move_to($x+$offset, $y-5);
$cr->line_to($x+$offset, $y+25);
$cr->stroke;
}
my @graphemes = $phrase =~ /\X/g; # qw(उ म् क न् छौ)
while (my ($idx, $g) = each @graphemes) {
if ($g =~ /^छ/) {
$cr->move_to($x+$offsets[$idx], $y-10);
$cr->line_to($x+$offsets[$idx], $y+50);
$cr->stroke;
last;
}
}
$cr->show_page;
我正在使用 Pango 排版梵文。 考虑由梵文字母 U、梵文字母 MA、梵文符号 VIRAMA、梵文字母 KA、梵文字母 NA、梵文符号 VIRAMA、梵文字母 CHA、梵文元音符号 AU 组成的字符串 उम्कन्छौ。 在排版这个字符串的时候,我想知道छ(CHA)的起点来做一个视觉标记。
对于普通字符串,我会采用前面部分的长度,उम्कन्,但这在这里不起作用,因为如您所见,न्(一半न)与छ结合,所以结果略有偏差。
有没有办法在涉及组合的情况下获取正确的字母起点?
我试过使用 index_to_pos() 查询 Pango 布局,但这似乎适用于字节级别(不是字符)。
这个小的 Perl 程序显示了这个问题。垂直线偏向右侧。
use strict;
use warnings;
use utf8;
use Cairo;
use Pango;
my $surface = Cairo::PdfSurface->create ("out.pdf", 595, 842);
my $cr = Cairo::Context->create ($surface);
my $layout = Pango::Cairo::create_layout($cr);
my $font = Pango::FontDescription->from_string('Lohit Devanagari');
$layout->set_font_description($font);
# Two parts of the phrase. Phrase1 ends in न् (half न).
my $phrase1 = 'उम्कन्';
my $phrase2 = 'छौ';
# Set the first part of the phrase, and get its width.
$layout->set_markup($phrase1);
my $w = ($layout->get_size)[0]/1024;
# Set the complete phrase.
$layout->set_markup($phrase1.$phrase2);
my ($x, $y ) = ( 100, 100 );
# Show phrase.
$cr->move_to( $x, $y );
$cr->set_source_rgba( 0, 0, 0, 1 );
Pango::Cairo::show_layout($cr, $layout);
# Show marker at width.
$cr->set_line_width(0.25);
$cr->move_to( $x + $w, $y-10 );
$cr->line_to( $x + $w, $y+50 );
$cr->stroke;
$cr->show_page;
您无法测量局部渲染。而是测量整个渲染并逐字遍历字符串以找到位置。另见:https://gankra.github.io/blah/text-hates-you/#style-can-change-mid-ligature
use strict;
use warnings;
use utf8;
use Cairo;
use Pango;
use List::Util qw(uniq);
use Encode qw(encode);
my $surface = Cairo::PdfSurface->create('out.pdf', 595, 842);
my $cr = Cairo::Context->create ($surface);
my $layout = Pango::Cairo::create_layout($cr);
my $font = Pango::FontDescription->from_string('Lohit Devanagari');
$layout->set_font_description($font);
my $phrase = 'उम्कन्छौ';
my @octets = split '', encode 'UTF-8', $phrase; # index_to_pos operates on octets
$layout->set_markup($phrase);
my ($x, $y) = (100, 100);
$cr->move_to($x, $y);
$cr->set_source_rgba(0, 0, 0, 1);
Pango::Cairo::show_layout($cr, $layout);
$cr->set_line_width(0.25);
my @offsets = uniq map { $layout->index_to_pos($_)->{x}/1024 } 0..$#octets;
# (0, 9.859375, 16.09375, 27.796875, 33.953125, 49.1875)
for my $offset (@offsets) {
$cr->move_to($x+$offset, $y-5);
$cr->line_to($x+$offset, $y+25);
$cr->stroke;
}
my @graphemes = $phrase =~ /\X/g; # qw(उ म् क न् छौ)
while (my ($idx, $g) = each @graphemes) {
if ($g =~ /^छ/) {
$cr->move_to($x+$offsets[$idx], $y-10);
$cr->line_to($x+$offsets[$idx], $y+50);
$cr->stroke;
last;
}
}
$cr->show_page;