为 Unity3d 编写最简单的 newick 解析器(c# 或 Actionscript)

Writing the most simple newick parser for Unity3d (c# or Actionscript)

我正在尝试弄清楚如何读取许多动物物种的 Newick 文件,但我一直无法找到 "logical method / process" 以简单的编程语言对 Newick 字符串进行排序。我可以阅读 C#、AS、JS、GLSL 和 HLSL。

我找不到任何简单的资源,wiki 文章甚至没有谈论递归。如何解析 newick 的伪代码非常棒,但我找不到。

有谁知道在 Unity3d 中读取 newick 文件的最快方法吗?您能否帮助我走上正确的轨道,以便通过逻辑过程对 newick 代码进行排序,即:

(A,B,(C,D));

分支长度数暂时不重要。

目标项目文件:

        (
            (
                (
                    (
                        (
                            (
                                Falco_rusticolus:0.846772,
                                Falco_jugger:0.846772
                            ):0.507212,
                            (
                                Falco_cherrug:0.802297,
                                Falco_subniger:0.802297
                            ):0.551687
                        ):0.407358,
                        Falco_biarmicus:1.761342
                    ):1.917030,
                    (
                        Falco_peregrinus:0.411352,
                        Falco_pelegrinoides:0.411352
                    ):3.267020
                ):2.244290,
                Falco_mexicanus:5.922662
            ):1.768128,
                Falco_columbarius:7.69079
        )

如果您没有形式语法的背景,那么实现解析器可能会很困难。因此,最简单的方法似乎是使用解析器生成器,例如语法中的 ANTLR, and then you only need to familiarize yourself with the grammar notation. You can generate a parser written in C#

幸运的是,您可以在线找到新的语法:here

更新:

如果您执行了上述操作,那么您将得到如下内容:

public class Branch
{
    public double Length { get; set; }
    public List<Branch> SubBranches { get; set; } = new List<Branch>();
}
public class Leaf : Branch
{
    public string Name { get; set; }
}

public class Parser
{
    private int currentPosition;
    private string input;

    public Parser(string text)
    {
        input = new string(text.Where(c=>!char.IsWhiteSpace(c)).ToArray());
        currentPosition = 0;
    }
    public Branch ParseTree()
    {
        return new Branch { SubBranches = ParseBranchSet() };
    }
    private List<Branch> ParseBranchSet()
    {
        var ret = new List<Branch>();
        ret.Add(ParseBranch());
        while (PeekCharacter() == ',')
        {
            currentPosition++; // ','
            ret.Add(ParseBranch());
        }
        return ret;
    }
    private Branch ParseBranch()
    {
        var tree = ParseSubTree();
        currentPosition++; // ':'
        tree.Length = ParseDouble();
        return tree;
    }
    private Branch ParseSubTree()
    {
        if (char.IsLetter(PeekCharacter()))
        {
            return new Leaf { Name = ParseIdentifier() };
        }

        currentPosition++; // '('
        var branches = ParseBranchSet();
        currentPosition++; // ')'
        return new Branch { SubBranches = branches };
    }        
    private string ParseIdentifier()
    {
        var identifer = "";
        char c;
        while ((c = PeekCharacter()) != 0 && (char.IsLetter(c) || c == '_'))
        {
            identifer += c;
            currentPosition++;
        }
        return identifer;
    }
    private double ParseDouble()
    {
        var num = "";
        char c;
        while((c = PeekCharacter()) != 0 && (char.IsDigit(c) || c == '.'))
        {
            num += c;
            currentPosition++;
        }
        return double.Parse(num, CultureInfo.InvariantCulture);
    }
    private char PeekCharacter()
    {
        if (currentPosition >= input.Length-1)
        {
            return (char)0;
        }
        return input[currentPosition + 1];
    }
}

可以这样使用:

var tree = new Parser("((A:1, B:2):3, C:4)").ParseTree();

顺便说一句,上面的解析器实现了以下语法而没有任何类型的错误处理:

Tree -> "(" BranchSet ")"   
BranchSet -> Branch ("," Branch)*   
Branch -> Subtree ":" NUM
Subtree -> IDENTIFIER | "(" BranchSet ")"

希望您有兴趣将 Newick 转换为 JSON/Regular object,我想我找到了解决方案。

快速 google 给了我 JS 实现的链接:
https://www.npmjs.com/package/biojs-io-newick
https://github.com/daviddao/biojs-io-newick

而且,将 JS 代码移植到 AS3 中对我来说并不难:

// The very funciton of converting Newick
function convertNewickToJSON(source:String):Object
{
    var ancestors:Array = [];
    var tree:Object = {};
    var tokens:Array = source.split(/\s*(;|\(|\)|,|:)\s*/);
    var subtree:Object;
    for (var i = 0; i < tokens.length; i++)
    {
        var token:String = tokens[i];
        switch (token)
        {
            case '(': // new children
                subtree = {};
                tree.children = [subtree];
                ancestors.push(tree);
                tree = subtree;
                break;

            case ',': // another branch
                subtree = {};
                ancestors[ancestors.length-1].children.push(subtree);
                tree = subtree;
                break;

            case ')': // optional name next
                tree = ancestors.pop();
                break;

            case ':': // optional length next
                break;

            default:
                var x = tokens[i-1];
                if (x == ')' || x == '(' || x == ',')
                {
                    tree.name = token;
                } else if (x == ':')
                {
                    tree.branch_length = parseFloat(token);
                }
        }
    }

    return tree;
};

// Util function for parsing an object into a string
function objectToStr(obj:Object, paramsSeparator:String = "", isNeedUseSeparatorForChild:Boolean = false):String
{
    var str:String = "";
    if (isSimpleType(obj))
    {
        str = String(obj);

    }else
    {
        var childSeparator:String = "";
        if (isNeedUseSeparatorForChild)
        {
            childSeparator = paramsSeparator;
        }
        for (var propName:String in obj)
        {
            if (str == "")
            {
                str += "{ ";
            }else
            {
                str += ", ";
            }
            str += propName + ": " + objectToStr(obj[propName], childSeparator) + paramsSeparator;
        }

        str += " }";
    }

    return str;
}

// One more util function
function isSimpleType(obj:Object):Boolean
{
    var isSimple:Boolean = false;
    if (typeof(obj) == "string" || typeof(obj) == "number" || typeof(obj) == "boolean")
    {
        isSimple = true;
    }

    return isSimple;
}

var tempNewickSource:String = "((((((Falco_rusticolus:0.846772,Falco_jugger:0.846772):0.507212,(Falco_cherrug:0.802297,Falco_subniger:0.802297):0.551687):0.407358,Falco_biarmicus:1.761342):1.917030,(Falco_peregrinus:0.411352,Falco_pelegrinoides:0.411352):3.267020):2.244290,Falco_mexicanus:5.922662):1.768128,Falco_columbarius:7.69079)";
var tempNewickJSON:Object = this.convertNewickToJSON(tempNewickSource);
var tempNewickJSONText:String = objectToStr(tempNewickJSON);
trace(tempNewickJSONText);

上面的代码给出了下一条轨迹:

{ name: , children: { 0: { name: , children: { 0: { name: , children: { 0: { name: , children: { 0: { name: , children: { 0: { name: , children: { 0: { name: Falco_rusticolus, branch_length: 0.846772 }, 1: { name: Falco_jugger, branch_length: 0.846772 } }, branch_length: 0.507212 }, 1: { name: , children: { 0: { name: Falco_cherrug, branch_length: 0.802297 }, 1: { name: Falco_subniger, branch_length: 0.802297 } }, branch_length: 0.551687 } }, branch_length: 0.407358 }, 1: { name: Falco_biarmicus, branch_length: 1.761342 } }, branch_length: 1.91703 }, 1: { name: , children: { 0: { name: Falco_peregrinus, branch_length: 0.411352 }, 1: { name: Falco_pelegrinoides, branch_length: 0.411352 } }, branch_length: 3.26702 } }, branch_length: 2.24429 }, 1: { name: Falco_mexicanus, branch_length: 5.922662 } }, branch_length: 1.768128 }, 1: { name: Falco_columbarius, branch_length: 7.69079 } } }

因此,这种方法提供了一种与 JSON 一样使用 Newick 格式的方法。

根据标题,您不仅对 C# 感兴趣,而且对 AS3 实现也感兴趣(我不确定您是否能够在 C# 中正确使用它 "out-of-the-box",但也许您将能够将其移植到 C#)。