Golang 中的 bufio.NewScanner 是否读取内存中的整个文件而不是每个文件一行?

Does bufio.NewScanner in Golang reads the entire file in memory instead of a line each?

我尝试使用 bufio.NewScanner.

通过以下函数逐行读取文件
func TailFromStart(fd *os.File, wg *sync.WaitGroup)  {

    fd.Seek(0,0)
    scanner := bufio.NewScanner(fd)
    for scanner.Scan() {
        line := scanner.Text()
        offset, _ := fd.Seek(0, 1)
        fmt.Println(offset)
        fmt.Println(line)
        offsetreset, _ := fd.Seek(offset, 0)
        fmt.Println(offsetreset)
    }
    offset, err := fd.Seek(0, 1)
    CheckError(err)
    fmt.Println(offset)
    wg.Done()

}

我原以为它会按递增顺序打印偏移量,但是,它在每次迭代中打印相同的值,直到文件达到 EOF

127.0.0.1 - - [11/Aug/2016:22:10:39 +0530] "GET /ttt HTTP/1.1" 404 437 "-" "curl/7.38.0"
613
613
127.0.0.1 - - [11/Aug/2016:22:10:42 +0530] "GET /qqq HTTP/1.1" 404 437 "-" "curl/7.38.0"
613

613 是文件中的字符总数。

cat /var/log/apache2/access.log | wc
  7      84     613

我理解错了吗,还是 bufio.NewScanner 读取了内存中的整个文件,并遍历了内存中的那个文件?如果是这样,有没有更好的逐行阅读方式?

请参阅 func (s *Scanner) Buffer(buf []byte, max int) 文档:

Buffer sets the initial buffer to use when scanning and the maximum size of buffer that may be allocated during scanning. The maximum token size is the larger of max and cap(buf).
If max <= cap(buf), Scan will use this buffer only and do no allocation.

By default, Scan uses an internal buffer and sets the maximum token size to MaxScanTokenSize.

Buffer panics if it is called after scanning has started.

并且:

MaxScanTokenSize is the maximum size used to buffer a token unless the user provides an explicit buffer with Scan.Buffer. The actual maximum token size may be smaller as the buffer may need to include, for instance, a newline.

MaxScanTokenSize = 64 * 1024

startBufSize = 4096 // Size of initial allocation for buffer.

不,正如@JimB 所说,它只读取缓冲区大小,请参阅此测试示例:

对于小于 4096 字节的文件,它会将所有文件内容读取到缓冲区,
但对于大文件只读取 4096 字节,
用大文件试试这个:

package main

import (
    "bufio"
    "fmt"
    "os"
)

func main() {
    fd, err := os.Open("big.txt")
    if err != nil {
        panic(err)
    }
    defer fd.Close()

    n, err := fd.Seek(0, 0)
    if err != nil {
        panic(err)
    }
    fmt.Println("n =", n) // 0

    scanner := bufio.NewScanner(fd)
    for scanner.Scan() {
        fmt.Println(scanner.Text())
        break
    }

    offset, err := fd.Seek(0, 1)
    if err != nil {
        panic(err)
    }
    fmt.Println("offset =", offset) //4096

    offsetreset, err := fd.Seek(offset, 0)
    if err != nil {
        panic(err)
    }
    fmt.Println("offsetreset =", offsetreset) //4096

    offset, err = fd.Seek(0, 1)
    if err != nil {
        panic(err)
    }
    fmt.Println("offset =", offset) //4096

}

输出:

n = 0

offset = 4096
offsetreset = 4096
offset = 4096

您可以增加扫描仪的缓冲区大小

例如:-

scanner := bufio.NewScanner(file)
buf := make([]byte, 0, 64*1024)
scanner.Buffer(buf, 1024*1024) //1024*1024 => 1mb max (you can change value here to read larger files
for scanner.Scan() {
    // do your stuff
}