在我的情况下，按 char (getc) 读取或按块读取 (fgets) 哪个更有效

What is more effective in my case read by char (getc) or read by chunks (fgets)

我在大学的团队正在用 C 语言编写编译器。编译器获取Goland子语言的源代码，输出类似汇编语言的字节码。我的问题是 – 哪种方法更有效，逐字符读取源文件（getc）并根据当前字符更改 FSM 的状态或按块读取（fgets）并调用包含 FSM 的辅助函数来处理单个词位和输出代币？

My team at the university is programming a compiler in the C language.

What is more effective in my case read by char (getc) or read by chunks (fgets)

在实践中，应该没什么大不了的。

因为 CPU 比大多数磁盘快很多，包括 SSD 磁盘。而且因为 page cache.

编写您的应用程序，调试它，在您的编译器（编译您的 C 代码的那个）中启用优化，然后对其进行基准测试。它特定于操作系统和编译器。

在 Linux，阅读 GCC, GDB, gprof(1), time(1), syscalls(2), fopen(3), fflush(3), setvbuf(3), fgetc(3), fgets(3), getline(3), readline(3), time(7) and Advanced Linux Programming

的文档

如果你碰巧使用 GCC, study its source code for inspiration, and read the documentation of its internals, and invoke it as gcc -O2 -ftime-report -ftime-report-details; you'll discover that most of the time spent by a compiler is on optimizations，而不是解析。

您应该考虑元编程方法：为您的编译器项目程序使用（或开发）生成 C 程序（受 ANTLR, GNU bison, iburg, SWIG, RPCGEN, etc...). Observe that GCC has a dozen of specialized domain specific languages and corresponding code generators. Consider perhaps using GPP (or, in november 2020, RefPerSys 启发）用于此类目的。

你的大部分编译器应该转换内部编译器表示（例如，如果你的老师允许的话，简化abstract syntax trees, perhaps inspired by GIMPLE, and you might consider using libgccjit）

在其他操作系统和编译器上，阅读它们的文档。

当然要阅读Dragon book。

在我的情况下，按 char (getc) 读取或按块读取 (fgets) 哪个更有效

What is more effective in my case read by char (getc) or read by chunks (fgets)

c

compiler-construction

performance

在实践中，应该没什么大不了的。

在其他操作系统和编译器上，阅读它们的文档。