如何在解析前忽略 ASCII 字符?

How to ignore an ASCII Character before parsing?

import java.io.*;
import java.util.ArrayList;
import java.util.List;
import edu.stanford.nlp.tagger.maxent.MaxentTagger;

public class TagText {
    public static void main(String[] args) throws IOException, ClassNotFoundException {
        // Initializing the tagger
        MaxentTagger tagger = new MaxentTagger("taggers/english-left3words-distsim.tagger");
        List<String> lines = new ArrayList<>();
        lines = new ReadCSV().readColumn("Tt2.csv", 4);
        for (String line : lines) {
            String tagged = tagger.tagString(line);
            System.out.println(tagged);
        }
    }
}

我正在尝试解析一个 CSV 文件,我有一个字符 (BIN 10010111, —) 值,我想让文本解析器忽略这个字符。我该怎么做?

所以我猜你想删除所有特殊字符?

我猜它是这样的:replaceAll("[^\w\s]", "");

编辑:完整代码

import java.io.*;
import java.util.ArrayList;
import java.util.List;
import edu.stanford.nlp.tagger.maxent.MaxentTagger;

public class TagText {
    public static void main(String[] args) throws IOException, ClassNotFoundException {
        // Initializing the tagger
        MaxentTagger tagger = new MaxentTagger("taggers/english-left3words-distsim.tagger");
        List<String> lines = new ArrayList<>();
        lines = new ReadCSV().readColumn("Tt2.csv", 4);
        for (String line : lines) {
            String tagged = tagger.tagString(line.replace("\uFFFD",""));
            System.out.println(tagged);
        }
    }
}