正则表达式：如何匹配非 ascii 字符串的所有大写字符

Question

我有像 Japan Company、Chinese Company 这样的字符串，这个正则表达式 /([A-Z])/g 用于获取所有大写字符，然后加入结果以发出它们。但是当输入字符串不是英文字母时，正则表达式不起作用。

let string = "Japan Company";
console.log(string.match(/[A-Z]/g).join('')); // JC

但是当我有像 日本の会社

这样的字符串时

let string = "日本の会社";
console.log(string.match(/[A-Z]/g).join(''));

这会引发异常，因为 string.match(/[A-Z]/g) 的结果是 null。

由于我试图省略这些字符串并且象形文字没有大写字母，因此正则表达式应该只匹配每个单词的第一个字符，单词之间用空格分隔。

为此我应该使用什么通用正则表达式？

类似于 POSIX 的 [:upper:] 但这不适用于 JavaScript 正则表达式引擎。

Answer 1

根据您的评论，我认为这应该符合您的要求。

function getUppercaseLetters(string) {
    // Find uppercase letters in the string
    let matches = string.match(/[A-Z]/g);
    // If there are no uppercase letters, get the first letter of each word
    if (!matches) matches = string.split(" ").map(word => word[0]);
    // If there are still no matches, return an empty string. This should prevent against any edge cases.
    if (!matches) return "";
    // Join the array elements and make it uppercase.
    return matches.join("").toUpperCase();
}

Answer 2

你可以使用

(string.match(/(?<!\S)\S/g) || [string]).join('')

查看 JavaScript 演示：

const strings = ["Japan Company", "japan company", "日本の会社"];
for (const string of strings) {
    console.log(string, '=>', (string.match(/(?<!\S)\S/g) || [string]).join('').toUpperCase())
}

(?<!\S)\S 正则表达式匹配字符串开头或空白字符后的 non-whitespace 字符。

Safari，non-lookbehind，图案：

var strings = ["Japan Company", "japan company", "日本の会社"];
for (var i=0; i<strings.length; i++) {
    var m = strings[i].match(/(?:^|\s)(\S)/g)
    if (m === null) {
        console.log(strings[i], '=> ', strings[i])
    } else {
        console.log(strings[i], '=>', m.join('').replace(/\s+/g, '').toUpperCase())
    }
}

正则表达式：如何匹配非 ascii 字符串的所有大写字符

Regex: How to match all uppercase characters of non-ascii string

javascript

regex

为此我应该使用什么通用正则表达式？