在 ExtendScript(符合 ES3)中将 CSV 解析为二维数组

Parsing CSV to 2D Array in ExtendScript (ES3-compliant)

我在 Adob​​e 的 ExtendScript 中解析 Vimeo 评论 page-formatted CSV 时遇到问题。问题是 ExtendScript 基于 ES3,大多数解决方案似乎都不起作用,因为它们基于现代 JS。

CSV 文件还有一个 header 行,最后是空行,一些但不是所有字段的双引号(我想删除)和潜在的换行符和特殊字符(包括。逗号)在字段中。有没有办法得到一个 'clean' 二维数组?

我在这里尝试过解决方案:Javascript code to parse CSV data 在这里:How can I parse a CSV string with Javascript, which contains comma in data?

但无法让它们工作,我认为这些问题与 ExtendScript 过时有关。

CSV 文件

"Test Video-01.mp4",1,00:00:00,AVT,"test comment 1",--,"Tuesday, July 9, 2019 At 8:49 AM",No
"Test Video-01.mp4",2,00:00:00,AVT,"another at same timecode",--,"Tuesday, July 9, 2019 At 8:50 AM",Yes
,3,00:00:00,--,"another at same timecode","reply here from anon","Tuesday, July 9, 2019 At 8:54 AM",Yes
"Test Video-01.mp4",3,00:00:11,AVT,"really long comment Lorem ipsum dolor sit amet, Purus sit amet volutpat consequat mauris nunc congue nisi. Semper viverra nam libero justo laoreet sit amet cursus. Id interdum velit laoreet id. Bibendum est ultricies integer quis auctor elit sed vulputate. And some special chars to boot: !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~

Eros donec ac odio tempor orci dapibus. Nam libero justo laoreet sit amet. Pellentesque pulvinar pellentesque habitant morbi. Pellentesque eu tincidunt tortor aliquam nulla facilisi cras fermentum.",--,"Tuesday, July 9, 2019 At 8:50 AM",No
"Test Video-01.mp4",4,00:00:19,AVT,"another one different timecode",--,"Tuesday, July 9, 2019 At 8:50 AM",Yes
"Test Video-01.mp4",5,00:00:43,AVT,"comment here tooo",--,"Tuesday, July 9, 2019 At 8:51 AM",No
,6,00:00:43,AVT,"comment here tooo","reply to a comment","Tuesday, July 9, 2019 At 8:51 AM",No
,7,00:00:43,AVT,"comment here tooo","reply again","Tuesday, July 9, 2019 At 8:51 AM",No
,8,00:00:43,"PJ Palomaki","comment here tooo","Different person reply","Tuesday, July 9, 2019 At 8:52 AM",No
,9,00:00:43,--,"comment here tooo","Anon reply reply","Tuesday, July 9, 2019 At 8:53 AM",No
"Test Video-01.mp4",6,00:01:29,--,"Anon comment",--,"Tuesday, July 9, 2019 At 8:53 AM",No
,7,00:01:29,--,"Anon comment","Anon reply","Tuesday, July 9, 2019 At 8:53 AM",No
,,,,,,,

如果我用 split("\n") 解析,带换行符的字段会被拆分。如果我使用 split(",") 任何带逗号的字段都会被拆分。

此外,我想包含解析函数 in-line(在主脚本中,而不是加载外部脚本),因为我更喜欢在部署时使用单个文件。

谢谢, PJ

对于 Extendscript 项目,我使用了 BabyParse library.. I had to edit it a bit to be used in Extendscript. Here is the gist. 它会为您提供一个 JSON 对象,您可以将其转换为二维数组。

Also, I'd like to include the parsing function in-line (in with the main script, rather than loading an external script) as I'd prefer to use a single file when deploying.

为此使用一些构建工具,例如 gulp。或者您可以使用 Extendscript // @include "path/to/file.jsx"#include "path/to/file.jsx" include syntax。 然后你可以使用 github.com/fabianmoronzirfas/extendscript-bundlr.

组合它们

(所有链接不要脸self-promotion;-))

据我所知,问题是此 CSV 文件根本不是有效的 CSV 文件。特别是“真的很长的评论......”里面有换行符和双引号。他们应该首先以某种方式逃脱。紧接着解析就变成了一项微不足道的任务。

因此,问题实际上是:在此类文本中查找和处理双引号和换行符以将它们转换为有效的 CSV 数据然后将它们转换为二维数组的最佳方法是什么?

我不确定是否可以针对任意文本完成该任务。不需要的 double-quotes AND 换行符和内部逗号的组合不太可能是一个不可逾越的障碍。