消除 java 中的“\u3000”错误

Question

当我尝试编译 java 文件时，编译器说 "illegal character \u3000"、

搜索后发现是CJK Unified Ideographs 中国韩国人和日本人SPACE。我决定编写一个简单的搜索和删除 java 文件来消除它，而不是手动删除特殊 SPACE。

但是没有指出索引错误。那么如何写一个代码来消除这个特殊的SPACE

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.File;
import java.io.IOException;
import java.util.*;
public class BufferReadAFile {
    public static void main(String[] args) {

        //BufferedReader br = null;
        String sCurrentLine;
        String message = "";
        try {

            /*br = new BufferedReader(new FileReader("/Users/apple/Test/Instance1.java"));

            while ((sCurrentLine = br.readLine()) != null) {
                message += sCurrentLine;
            }
            */
            String content = new Scanner(new File("/Users/apple/Coding/Instance1.java")).useDelimiter("\Z").next();
            //System.out.println(content);
            searchSubString(content.toCharArray(),"\u3000".toCharArray());

        } catch (IOException e) {
            e.printStackTrace();
        } 

    }


    public static void searchSubString(char[] text, char[] ptrn) {
        int i = 0, j = 0;
        // pattern and text lengths
        int ptrnLen = ptrn.length;
        int txtLen = text.length;

        // initialize new array and preprocess the pattern
        int[] b = preProcessPattern(ptrn);

        while (i < txtLen) {
            while (j >= 0 && text[i] != ptrn[j]) {
                j = b[j];
            }
            i++;
            j++;

            // a match is found
            if (j == ptrnLen) {
                System.out.println("found substring at index:" + (i - ptrnLen));
                j = b[j];
            }
        }
    }


    public static int[] preProcessPattern(char[] ptrn) {
        int i = 0, j = -1;
        int ptrnLen = ptrn.length;
        int[] b = new int[ptrnLen + 1];

        b[i] = j;
        while (i < ptrnLen) {            
                while (j >= 0 && ptrn[i] != ptrn[j]) {
                // if there is mismatch consider the next widest border
                // The borders to be examined are obtained in decreasing order from 
                //  the values b[i], b[b[i]] etc.
                j = b[j];
            }
            i++;
            j++;
            b[i] = j;
        }
    return b;
    }


}

Answer 1

在我的问题中，我试图使用 KMP 算法在我的 java 文件中搜索模式的索引

如果我们使用"\u3000".toCharArray()，编译器将遍历每个字符。这不是我们想要的。 \u3000是一个special white space. It is FULL-WIDTHspace，只存在于中文韩语和日语中。

如果我们尝试使用 FULL-WIDTH Space 来写句子。它看起来像：

Ｈｅｒｅ　ｉｓ　Ｆｕｌｌ－ｗｉｄｔｈ　ｄｅｍｏｎｓｔｒａｔｉｏｎ．

很有特色space。但在 java 文件中并不那么明显。它激发了我编写下面的代码

import java.util.*;
    import java.io.*;


public class CheckEmpty{
        public static void main(String []args){
            try{
                 String content = new Scanner(new File("/Users/apple/Coding/Instance1.java")).useDelimiter("\Z").next();
                if(content.contains(" ")){
                     System.out.println("English Space");
                }
                if(content.contains("\u3000")){
                     System.out.println("Backslash 3000");
                }

                if(content.contains("　")){// notice the space is a SPECIAL SPACE
                     System.out.println("C J K　ｆｕｌｌｗｉｄｔｈ");
                    //Chinese Japanese Korean white space
                }
            }catch(FileNotFoundException e){
                e.printStackTrace();
           }

       }
}

不出所料，结果显示：

这意味着 java 文件同时包含 普通和全角 Space.

之后我想写另一个 java 文件来删除所有特殊的 space:

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.File;
import java.io.PrintWriter;
import java.io.IOException;
import java.util.*;
public class DeleteTheSpecialSpace {

public static void main(String[] args) {

    //BufferedReader br = null;
    String sCurrentLine;
    String message = "";
    try {


        String content = new Scanner(new File("/Users/apple/Coding/Instance1.java")).useDelimiter("\Z").next();
        content.replaceAll("　",""); // notice the left parameter is a SPECIAL SPACE
        //System.out.println(content);

    PrintWriter out = new PrintWriter( "/Users/apple/Coding/Instance1.java" );
        out.println(content);


    } catch (IOException e) {
        e.printStackTrace();
    } 

}

}

终于：神奇的事情发生了，"Instance1.java"没有错误，因为全角space都被淘汰了

编译成功:)

Answer 2

我认为 "\u3000" 不是您想要的。你可以把字符串打印出来，自己看看里面的内容。您应该改用 "\u3000"。请注意单个反斜杠。

System.out.println("\u3000"); // This prints out \u3000
System.out.println("\u3000");  // This prints out the CJK space

或者，您可以直接使用实际的 CJK space 字符，就像在 CheckEmpty class.

中的 if 检查之一中一样

消除 java 中的“\u3000”错误

Eliminate the "\u3000" error in java

java

unicode

ascii

chinese-locale