读取 CSV 文件的最快方法 java

Fastest way to read a CSV file java

我一直在尝试使用 openCSV 读取几个 csv 文件(大约 20 MB),但到目前为止速度很慢。我试图读取 4 个 csv 文件,我将它们加载到我设计的堆中。我想知道,是否有任何其他方法可以在更短的时间内完成。

private Heap<VOMovingViolations> datosHeap; 

public void loadMovingViolations()
{
    Runtime garbage = Runtime.getRuntime(); 
    garbage.gc();
    try 
    {
        FileReader fileReaderMes1 = new FileReader(FECHAS[0]);
        FileReader fileReaderMes2 = new FileReader(FECHAS[1]); 
        FileReader fileReaderMes3 = new FileReader(FECHAS[2]); 
        FileReader fileReaderMes4 = new FileReader(FECHAS[3]); 
        CSVReader enero = new CSVReaderBuilder(fileReaderMes1).withSkipLines(1).build();
        CSVReader febrero = new CSVReaderBuilder(fileReaderMes2).withSkipLines(1).build();
        CSVReader marzo = new CSVReaderBuilder(fileReaderMes3).withSkipLines(1).build();
        CSVReader abril = new CSVReaderBuilder(fileReaderMes4).withSkipLines(1).build();

        String[] row; 


        while((row = enero.readNext()) != null)
        {
            int objectId = Integer.parseInt(row[0]); 
            int totalPaid = (int)Double.parseDouble(row[9]);
            short fi = Short.parseShort(row[8]);
            short penalty1 = Short.parseShort(row[10]);
            datosHeap.insert(new VOMovingViolations(objectId, totalPaid,  fi,  row[2], row[13],
                        row[12],row[14], row[15], row[4], row[3], penalty1));
        }

        while((row = febrero.readNext()) != null)
        {
            int objectId = Integer.parseInt(row[0]); 
            int totalPaid = (int)Double.parseDouble(row[9]);
            short fi = Short.parseShort(row[8]);
            short penalty1 = Short.parseShort(row[10]);
            datosHeap.insert(new VOMovingViolations(objectId, totalPaid,  fi,  row[2], row[13],
                        row[12],row[14], row[15], row[4], row[3], penalty1));
        }

        while((row = marzo.readNext()) != null)
        {
            int objectId = Integer.parseInt(row[0]); 
            int totalPaid = (int)Double.parseDouble(row[9]);
            short fi = Short.parseShort(row[8]);
            short penalty1 = Short.parseShort(row[10]);
            datosHeap.insert(new VOMovingViolations(objectId, totalPaid,  fi,  row[2], row[13],
                        row[12],row[14], row[15], row[4], row[3], penalty1));
        }

        while((row = abril.readNext()) != null)
        {
            int objectId = Integer.parseInt(row[0]); 
            int totalPaid = (int)Double.parseDouble(row[9]);
            short fi = Short.parseShort(row[8]);
            short penalty1 = Short.parseShort(row[10]);
            datosHeap.insert(new VOMovingViolations(objectId, totalPaid,  fi,  row[2], row[13],
                        row[12],row[14], row[15], row[4], row[3], penalty1));
        }

    } 
    catch (FileNotFoundException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    } 


}

如果有人能给我任何帮助或任何想法,我将不胜感激。

tl;博士

读取一个 20 MB 的 CSV 文件,并在每行中实例化一个对象,总耗时不到 1 秒

详情

你没有定义“慢”这个词。所以我做了一个实验,一个随意的基准测试。

首先,我们创建一个包含 40,000 Person 条记录的 20 MB 文件。每个 Person 都有一个法语名字和一个姓氏,一个 UUID, and some arbitrary text as a description. The data is written as four columns in a CSV file in UTF-8. I used the Apache Commons CSV 可以读写的图书馆。

其次,读取这个写入的文件。每行数据被读入内存,然后用于实例化和收集一个Person对象。

读取此文件并为每一行实例化 Person 对象花费的总时间不到一秒。每行大约需要 20K nanoseconds. Actually, this includes reading the file twice, as we do a scan to count the number of rows of data to set initial capacity of the collected instances. Also, we are parsing a hex string input into the 128-bit value of a UUID,所以我们有一些时间花在数据处理上(不仅仅是阅读)。

对于 Java 16+,将 Person class 定义为 record. We override toString 以避免打印出较长的 description 内容。

record Person ( String givenName , String surname , UUID id , String description ) 
{
    static public  String LOREM_IPSUM = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.";

    @Override
    public String toString ()
    {
        return "Person{ " +
                "givenName='" + givenName + '\'' +
                " | surname='" + surname + '\'' +
                " | id='" + id + '\'' +
                " }";
    }
}

对于前面的Java,写一个常规的Personclass.

package work.basil.example;

import java.util.UUID;

public class Person
{
    // Static
   static public  String LOREM_IPSUM = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.";

    // Member variables.
    public String givenName, surname, description;
    public UUID id;

    public Person ( String givenName , String surname , UUID id , String description )
    {
        this.givenName = givenName;
        this.surname = surname;
        this.id = id;
        this.description = description ;
    }

    @Override
    public String toString ()
    {
        return "Person{ " +
                "givenName='" + givenName + '\'' +
                " | surname='" + surname + '\'' +
                " | id='" + id + '\'' +
                " }";
    }
}

这是完整的应用程序,它写入然后读取 20 MB 的文件。请学习和批评,因为我很快就把它掀起来了。我没有仔细检查我的工作。

您会找到一个 write 方法和一个 read 方法。 main 方法调用两者,并跟踪时间。

package work.basil.example;

import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVPrinter;
import org.apache.commons.csv.CSVRecord;

import java.io.BufferedReader;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.time.Duration;
import java.time.Instant;
import java.time.temporal.ChronoUnit;
import java.util.ArrayList;
import java.util.List;
import java.util.UUID;
import java.util.concurrent.ThreadLocalRandom;

public class CsvSpeed
{
    public List < Person > read ( Path path )
    {
        // TODO: Add a check for valid file existing.

        List < Person > list = List.of();  // Default to empty list.
        try
        {
            // Prepare list.
            int initialCapacity = ( int ) Files.lines( path ).count();
            list = new ArrayList <>( initialCapacity );

            // Read CSV file. For each row, instantiate and collect `DailyProduct`.
            BufferedReader reader = Files.newBufferedReader( path );
            Iterable < CSVRecord > records = CSVFormat.RFC4180.withFirstRecordAsHeader().parse( reader );
            for ( CSVRecord record : records )
            {
                String givenName = record.get( "givenName" );
                String surname = record.get( "surname" );
                UUID id = UUID.fromString( record.get( "id" ) );
                String description = record.get( "description" );
                // Instantiate `Person` object, and collect it.
                Person person = new Person( givenName , surname , id , description );
                list.add( person );
            }
        } catch ( IOException e )
        {
            e.printStackTrace();
        }
        return list;
    }

    public void write ( final Path path )
    {
        ThreadLocalRandom random = ThreadLocalRandom.current();
        try ( final CSVPrinter printer = CSVFormat.RFC4180.withHeader( "givenName" , "surname" , "id" , "description" ).print( path , StandardCharsets.UTF_8 ) ; )
        {
            int limit = 40_000;  // 40_000 yields about 20 MB of data.
            List < String > givenNames = List.of( "Adrien" , "Aimon" , "Alerion" , "Alexis" , "Alezan" , "Ancil" , "Andre" , "Antoine" , "Archard" , "Aurélien" , "Averill" , "Baptiste" , "Barnard" , "Bartelemy" , "Bastien" , "Baylee" , "Beale" , "Beau" , "Beaumont" , "Beauregard" , "Bellamy" , "Berger" , "Blaize" , "Blondel" , "Boyce" , "Bruce" , "Brunelle" , "Brys" , "Burcet" , "Burnell" , "Burrell" , "Byron" , "Canaan" , "Carden" , "Carolas" , "Cavell" , "Chace" , "Chanler" , "Chante" , "Chappel" , "Charles" , "Chasen" , "Chason" , "Chemin" , "Chene" , "Cher" , "Chevalier" , "Cheyne" , "Clément" , "Clemence" , "Corbin" , "Coty" , "Cygne" , "Damien" , "Dandre" , "Dariel" , "Darl" , "Dauphine" , "Davet" , "Dax" , "Dean" , "Delice" , "Delmon" , "Destin" , "Dominique" , "Donatien" , "Duke" , "Eliott" , "Elroy" , "Enzo" , "Erwan" , "Etalon" , "Ethan" , "Fabron" , "Ferrand" , "Filberte" , "Florent" , "Florian" , "Fontaine" , "Forest" , "Fortune" , "Franchot" , "Francois" , "Fraser" , "Frayne" , "Gaëtan" , "Gabin" , "Gage" , "Gaige" , "Garland" , "Garner" , "Gaston" , "Gauge" , "Gaylord" , "Germain" , "Germaine" , "German" , "Gervaise" , "Giles" , "Gilles" , "Gitan" , "Grosvener" , "Guifford" , "Guion" , "Guy" , "Guzman" , "Henri" , "Holland" , "Hugo" , "Hugues" , "Hyacinthe" , "Jérémy" , "Jacquan" , "Jacques" , "Jacquez" , "Janvier" , "Jardan" , "Jay" , "Jaye" , "Jehan" , "Jemond" , "Jocquez" , "Jonathan" , "Jules" , "Julien" , "Justus" , "Karoly" , "Lado" , "Lafayette" , "Lamond" , "Lancelin" , "Landis" , "Landry" , "Laron" , "Larrimore" , "Laurent" , "LaValle" , "Leandre" , "Leggett" , "Leonce" , "Leron" , "Leverett" , "Lilian" , "Loïc" , "Lorenzo" , "Louis" , "Lowell" , "Luc" , "Lucien" , "Lukas" , "Macaire" , "Mace" , "Mahieu" , "Maison" , "Malleville" , "Manneville" , "Mantel" , "Marc" , "Marcel" , "Marion" , "Marius" , "Markez" , "Markis" , "Marmion" , "Marquis" , "Marquise" , "Marshall" , "Martial" , "Maslin" , "Mason" , "Matheo" , "Mathias" , "Mathys" , "Matthieu" , "Maxence" , "Mayson" , "Mehdi" , "Merle" , "Merville" , "Montague" , "Montaigu" , "Monte" , "Montgomery" , "Montreal" , "Montrel" , "Moore" , "Morel" , "Mortimer" , "Nerville" , "Neuveville" , "Nicolas" , "Noë" , "Noah" , "Noe" , "Norman" , "Norville" , "Nouel" , "Olivier" , "Onfroi" , "Paien" , "Parfait" , "Parnell" , "Pascal" , "Patrice" , "Paul" , "Peppin" , "Percival" , "Percy" , "Pernell" , "Peverell" , "Philipe" , "Pierpont" , "Pierre" , "Pomeroy" , "Prewitt" , "Purvis" , "Quennell" , "Quentin" , "Quincey" , "Quincy" , "Quintin" , "Rémi" , "Rafaelle" , "Ranger" , "Raoul" , "Raphaël" , "Rapier" , "Rawlins" , "Ray" , "Raynard" , "Remi" , "René" , "Renard" , "Rene" , "Reule" , "Reynard" , "Robin" , "Romain" , "Rondel" , "Roy" , "Royal" , "Ruff" , "Rush" , "Russel" , "Rustin" , "Sabastien" , "Sacha" , "Salomon" , "Samuel" , "Satordi" , "Saville" , "Scoville" , "Sebastien" , "Sennett" , "Severin" , "Shant" , "Shantae" , "Sidney" , "Siffre" , "Simeon" , "Simon" , "Sinclair" , "Sofiane" , "Somer" , "Stephane" , "Sully" , "Sydney" , "Sylvain" , "Talbot" , "Talon" , "Telford" , "Tempest" , "Teppo" , "Théo" , "Thayer" , "Thibault" , "Thibaut" , "Thiery" , "Tiennan" , "Tiennot" , "Titouan" , "Toussaint" , "Travaris" , "Tyson" , "Urson" , "Vachel" , "Valentin" , "Valere" , "Vallis" , "Verdun" , "Victoir" , "Victor" , "Waltier" , "William" , "Wyatt" , "Yanis" , "Yann" , "Yves" , "Yvon" , "Zosime" , "Abrial" , "Abrielle" , "Abril" , "Adele" , "Alair" , "Alerion" , "Amee" , "Angelique" , "Annette" , "Antonella" , "Arian" , "Ariane" , "Armandina" , "Aubree" , "Aubrielle" , "Audra" , "Avril" , "Bella" , "Berneta" , "Bette" , "Blaise" , "Blanche" , "Blasa" , "Bonte" , "Brie" , "Brienne" , "Brigit" , "Cachay" , "Calice" , "Camille" , "Camylle" , "Caprice" , "Caressa" , "Caroline" , "Catin" , "Celesta" , "Celeste" , "Cera" , "Cerise" , "Chablis" , "Chalice" , "Chambray" , "Champagne" , "Chandell" , "Chaney" , "Chantal" , "Chante" , "Chanterelle" , "Chantile" , "Chantilly" , "Chantrice" , "Charla" , "Charlotte" , "Charmane" , "Chaton" , "Chemin" , "Chenetta" , "Cher" , "Chere" , "Cheri" , "Cheryl" , "Christine" , "Cidney" , "Cinderella" , "Claire" , "Claudette" , "Colette" , "Cordelle" , "Cydnee" , "Daeja" , "Daija" , "Daja" , "Damzel" , "Darelle" , "Darlene" , "Darselle" , "Dejanelle" , "Deleena" , "Delice" , "Demeri" , "Deni" , "Denise" , "Desgracias" , "Desire" , "Desiree" , "Destanee" , "Destiny" , "Dior" , "Domanique" , "Dominique" , "Elaina" , "Elaine" , "Elayna" , "Elise" , "Eloisa" , "Elyse" , "Emeline" , "Emmaline" , "Emmeline" , "Estella" , "Estrella" , "Etiennette" , "Evette" , "Fabienne" , "Fabrienne" , "Fanchon" , "Fancy" , "Fawna" , "Fayana" , "Fayette" , "Fifi" , "Fleur" , "Fleurette" , "Fontanna" , "Fosette" , "Francine" , "Frederique" , "Gabriel" , "Gabriele" , "Gabrielle" , "Gaby" , "Garcelle" , "Gena" , "Genie" , "Georgette" , "Germaine" , "Gervaise" , "Gitana" , "Harriet" , "Heloisa" , "Holland" , "Honnetta" , "Isabelle" , "Ivette" , "Ivonne" , "Jacqueena" , "Jacquetta" , "Jacquiline" , "Jacyline" , "Jaime" , "Jakqueline" , "Janeen" , "Janelly" , "Janina" , "Janiqua" , "Janique" , "Jannnelle" , "Jaquita" , "Jardena" , "Jeanetta" , "Jermaine" , "Jessamine" , "Jewel" , "Jewell" , "Joli" , "Jolie" , "Josephine" , "Jozephine" , "Julieta" , "Karessa" , "Karmaine" , "Klara" , "Laine" , "Lanelle" , "Laramie" , "Layne" , "Layney" , "Leala" , "Leonette" , "Lissette" , "Lizette" , "Lourdes" , "Lucienne" , "Ly" , "Lyla" , "Lysette" , "Madelaine" , "Malerie" , "Manette" , "Marais" , "Marcelle" , "Marché" , "Mardi" , "Margo" , "Marguerite" , "Marie" , "Marie Claude" , "Marie Frances" , "Marie Joelle" , "Marie Pascale" , "Marie Sophie" , "Marjolaine" , "Marquise" , "Marvella" , "Mathieu" , "Matisse" , "Maurelle" , "Maurissa" , "Mavis" , "Melisande" , "Michelle" , "Miette" , "Mignon" , "Mimi" , "Mirya" , "Monet" , "Moniqua" , "Monteen" , "Musetta" , "Myrlie" , "Nadeen" , "Nadia" , "Nadiyah" , "Naeva" , "Nanon" , "Natalle" , "Naudia" , "Nettie" , "Nicholas" , "Nicki" , "Nicky" , "Nicole" , "Nicolette" , "Nicolina" , "Nicolle" , "Nikolette" , "Ninette" , "Ninon" , "Noelle" , "Nycole" , "Odelette" , "Opaline" , "Orane" , "Orva" , "Page" , "Parisa" , "Parnel" , "Parris" , "Patrice" , "Peridot" , "Pippi" , "Prairie" , "Rachele" , "Rachelle" , "Racquel" , "Raphaelle" , "Raquelle" , "Remi" , "Renée" , "Renea" , "Renelle" , "Renita" , "Risette" , "Rochelle" , "Romy" , "Rosabel" , "Rosiclara" , "Ruba" , "Russhell" , "Saleena" , "Salina" , "Satin" , "Sedona" , "Serene" , "Shandelle" , "Shanta" , "Shante" , "Shariah" , "Sharita" , "Sharleen" , "Sheree" , "Shereen" , "Sherell" , "Sherice" , "Sherry" , "Sidnee" , "Sidney" , "Sidnie" , "Sidonie" , "Sinclaire" , "Solange" , "Solen" , "Sorrel" , "Suzette" , "Sydnee" , "Sydney" , "Tallis" , "Tempest" , "Toinette" , "Turquoise" , "Veronique" , "Vignette" , "Villette" , "Violeta" , "Virginie" , "Voleta" , "Vonny" );
            List < String > surnames = List.of( "Arceneau" , "Aucoin" , "Babin" , "Babineaux" , "Benoit" , "Bergeron" , "Bernard" , "Bertrand" , "Bessette" , "Blanc" , "Blanchard" , "Bonnet" , "Boucher" , "Bourg" , "Bourque" , "Boutin" , "Bouvier" , "Braud" , "Broussard" , "Brun" , "Chevalier" , "David" , "Depaul" , "Desmarais" , "Disney" , "Dubois" , "Dupont" , "Dupuis" , "Durand" , "Fortescue" , "Fournier" , "Garnier" , "Gaudet" , "Gillet" , "Gillette" , "Girard" , "Gravois" , "Grosvenor" , "Lambert" , "Landry" , "Laroche" , "Laurent" , "Lefevre" , "Leroy" , "Leveque" , "Lisle" , "Martin" , "Michel" , "Molyneux" , "Moreau" , "Morel" , "Neville" , "Pelletier" , "Petit" , "Prideux" , "Renard" , "Richard" , "Robert" , "Rousseau" , "Roux" , "Rufus" , "Simon" , "Thomas" );
            for ( int i = 1 ; i <= limit ; i++ )
            {
                String givenName = givenNames.get( random.nextInt( 0 , givenNames.size() ) );
                String surname = surnames.get( random.nextInt( 0 , surnames.size() ) );
                UUID id = UUID.randomUUID();
                String description = Person.LOREM_IPSUM;
                printer.printRecord( givenName , surname , id , description );
            }
        } catch ( IOException e )
        {
            e.printStackTrace();
        }
    }

    public static void main ( final String[] args )
    {
        // Launch the app.
        CsvSpeed app = new CsvSpeed();

        // Write.
        String when = Instant.now().truncatedTo( ChronoUnit.SECONDS ).toString().replace( ":" , "•" );
        Path pathOutput = Paths.get( "/Users/basilbourque/persons.csv" );
        app.write( pathOutput );
        System.out.println( "Writing file: " + pathOutput );

        // Read.
        long start = System.nanoTime();
        Path pathInput = Paths.get( "/Users/basilbourque/persons.csv" );
        List < Person > list = app.read( pathInput );
        long stop = System.nanoTime();

        // Time.
        long elapsed = ( stop - start );
        Duration d = Duration.ofNanos( elapsed );
        System.out.println( "Reading elapsed: " + d );
        System.out.println( "Reading took nanos per row: " + ( elapsed / list.size() ) );
        System.out.println( "nanos elapsed: " + elapsed + "  |  list.size: " + list.size() );
    }
}

当运行:

Writing file: /Users/basilbourque/persons.csv

Reading elapsed: PT0.857816234S

Reading took nanos per row: 21445

nanos elapsed: 857816234 | list.size: 40000

技术栈:

  • Java 11.0.2 — Zulu 由 Azul Systems(基于 OpenJDK 构建)
  • 运行 IntelliJ 2019.1
  • macOS Mojave
  • MacBook Pro(视网膜显示屏,15 英寸,2013 年末)
  • 处理器:2.3 GHz Intel Core i7(4 核,8 hyper)
  • 16 GB 1600 MHz DDR3
  • 存储:Apple 内置固态

除了按照@Basil 的建议使用 java.nio 之外,简单地用 BufferedReader 包装 FileReader 应该可以实现显着的加速。

FileReader fileReaderMes1 = new BufferedReader( new FileReader(FECHAS[0]));

关于 csv-parsers-comparison we can find comparison between CSV Reader/Writer-s. The fastest is uniVocity CSV parser. Third is Jackson 我个人比较喜欢。使用 @Basil Bourque 很好的例子,我稍微改变了它并使用了 Jackson 类。方法阅读 returns MappingIterator,您可以使用它来初始化堆对象(请参阅我如何向 List 添加元素)。我没有包括时间细节,但您可以使用 Basil 和此解决方案自己完成:

import com.fasterxml.jackson.databind.MappingIterator;
import com.fasterxml.jackson.databind.ObjectReader;
import com.fasterxml.jackson.databind.ObjectWriter;
import com.fasterxml.jackson.dataformat.csv.CsvMapper;
import com.fasterxml.jackson.dataformat.csv.CsvSchema;

import java.io.File;
import java.io.FileWriter;
import java.time.Duration;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Iterator;
import java.util.List;
import java.util.UUID;
import java.util.concurrent.ThreadLocalRandom;

public class CsvSpeed {

    public static void main(String[] args) throws Exception {
        File csvFile = new File("./resource/persons.csv").getAbsoluteFile();

        CsvSchema schema = CsvSchema.builder()
                .addColumn("givenName")
                .addColumn("surname")
                .addColumn("id")
                .addColumn("description")
                .build().withHeader();

        CsvSpeed csvSpeed = new CsvSpeed();
        csvSpeed.write(csvFile, schema);

        // Read.
        long start = System.nanoTime();
        MappingIterator<Person> personMappingIterator = csvSpeed.read(csvFile, schema);
        List<Person> persons = new ArrayList<>(40_000);
        personMappingIterator.forEachRemaining(persons::add);

        long stop = System.nanoTime();

        System.out.println(persons.size());

        // Time.
        long elapsed = (stop - start);
        Duration d = Duration.ofNanos(elapsed);
        System.out.println("Reading elapsed: " + d);
        System.out.println("Reading took nanos per row: " + (elapsed / persons.size()));
        System.out.println("nanos elapsed: " + elapsed + "  |  list.size: " + persons.size());
    }

    public MappingIterator<Person> read(final File path, CsvSchema schema) throws Exception {
        CsvMapper csvMapper = new CsvMapper();

        ObjectReader reader = csvMapper.readerFor(Person.class).with(schema);
        return reader.readValues(path);
    }

    public void write(final File path, CsvSchema schema) throws Exception {
        ThreadLocalRandom random = ThreadLocalRandom.current();

        CsvMapper csvMapper = new CsvMapper();
        ObjectWriter writer = csvMapper.writerFor(Person.class).with(schema);

        try (FileWriter fileWriter = new FileWriter(path)) {
            List<String> givenNames = Arrays.asList("Adrien", "Aimon", "Alerion", "Alexis", "Alezan", "Ancil", "Andre", "Antoine", "Archard", "Aurélien", "Averill", "Baptiste", "Barnard", "Bartelemy", "Bastien", "Baylee", "Beale", "Beau", "Beaumont", "Beauregard", "Bellamy", "Berger", "Blaize", "Blondel", "Boyce", "Bruce", "Brunelle", "Brys", "Burcet", "Burnell", "Burrell", "Byron", "Canaan", "Carden", "Carolas", "Cavell", "Chace", "Chanler", "Chante", "Chappel", "Charles", "Chasen", "Chason", "Chemin", "Chene", "Cher", "Chevalier", "Cheyne", "Clément", "Clemence", "Corbin", "Coty", "Cygne", "Damien", "Dandre", "Dariel", "Darl", "Dauphine", "Davet", "Dax", "Dean", "Delice", "Delmon", "Destin", "Dominique", "Donatien", "Duke", "Eliott", "Elroy", "Enzo", "Erwan", "Etalon", "Ethan", "Fabron", "Ferrand", "Filberte", "Florent", "Florian", "Fontaine", "Forest", "Fortune", "Franchot", "Francois", "Fraser", "Frayne", "Gaëtan", "Gabin", "Gage", "Gaige", "Garland", "Garner", "Gaston", "Gauge", "Gaylord", "Germain", "Germaine", "German", "Gervaise", "Giles", "Gilles", "Gitan", "Grosvener", "Guifford", "Guion", "Guy", "Guzman", "Henri", "Holland", "Hugo", "Hugues", "Hyacinthe", "Jérémy", "Jacquan", "Jacques", "Jacquez", "Janvier", "Jardan", "Jay", "Jaye", "Jehan", "Jemond", "Jocquez", "Jonathan", "Jules", "Julien", "Justus", "Karoly", "Lado", "Lafayette", "Lamond", "Lancelin", "Landis", "Landry", "Laron", "Larrimore", "Laurent", "LaValle", "Leandre", "Leggett", "Leonce", "Leron", "Leverett", "Lilian", "Loïc", "Lorenzo", "Louis", "Lowell", "Luc", "Lucien", "Lukas", "Macaire", "Mace", "Mahieu", "Maison", "Malleville", "Manneville", "Mantel", "Marc", "Marcel", "Marion", "Marius", "Markez", "Markis", "Marmion", "Marquis", "Marquise", "Marshall", "Martial", "Maslin", "Mason", "Matheo", "Mathias", "Mathys", "Matthieu", "Maxence", "Mayson", "Mehdi", "Merle", "Merville", "Montague", "Montaigu", "Monte", "Montgomery", "Montreal", "Montrel", "Moore", "Morel", "Mortimer", "Nerville", "Neuveville", "Nicolas", "Noë", "Noah", "Noe", "Norman", "Norville", "Nouel", "Olivier", "Onfroi", "Paien", "Parfait", "Parnell", "Pascal", "Patrice", "Paul", "Peppin", "Percival", "Percy", "Pernell", "Peverell", "Philipe", "Pierpont", "Pierre", "Pomeroy", "Prewitt", "Purvis", "Quennell", "Quentin", "Quincey", "Quincy", "Quintin", "Rémi", "Rafaelle", "Ranger", "Raoul", "Raphaël", "Rapier", "Rawlins", "Ray", "Raynard", "Remi", "René", "Renard", "Rene", "Reule", "Reynard", "Robin", "Romain", "Rondel", "Roy", "Royal", "Ruff", "Rush", "Russel", "Rustin", "Sabastien", "Sacha", "Salomon", "Samuel", "Satordi", "Saville", "Scoville", "Sebastien", "Sennett", "Severin", "Shant", "Shantae", "Sidney", "Siffre", "Simeon", "Simon", "Sinclair", "Sofiane", "Somer", "Stephane", "Sully", "Sydney", "Sylvain", "Talbot", "Talon", "Telford", "Tempest", "Teppo", "Théo", "Thayer", "Thibault", "Thibaut", "Thiery", "Tiennan", "Tiennot", "Titouan", "Toussaint", "Travaris", "Tyson", "Urson", "Vachel", "Valentin", "Valere", "Vallis", "Verdun", "Victoir", "Victor", "Waltier", "William", "Wyatt", "Yanis", "Yann", "Yves", "Yvon", "Zosime", "Abrial", "Abrielle", "Abril", "Adele", "Alair", "Alerion", "Amee", "Angelique", "Annette", "Antonella", "Arian", "Ariane", "Armandina", "Aubree", "Aubrielle", "Audra", "Avril", "Bella", "Berneta", "Bette", "Blaise", "Blanche", "Blasa", "Bonte", "Brie", "Brienne", "Brigit", "Cachay", "Calice", "Camille", "Camylle", "Caprice", "Caressa", "Caroline", "Catin", "Celesta", "Celeste", "Cera", "Cerise", "Chablis", "Chalice", "Chambray", "Champagne", "Chandell", "Chaney", "Chantal", "Chante", "Chanterelle", "Chantile", "Chantilly", "Chantrice", "Charla", "Charlotte", "Charmane", "Chaton", "Chemin", "Chenetta", "Cher", "Chere", "Cheri", "Cheryl", "Christine", "Cidney", "Cinderella", "Claire", "Claudette", "Colette", "Cordelle", "Cydnee", "Daeja", "Daija", "Daja", "Damzel", "Darelle", "Darlene", "Darselle", "Dejanelle", "Deleena", "Delice", "Demeri", "Deni", "Denise", "Desgracias", "Desire", "Desiree", "Destanee", "Destiny", "Dior", "Domanique", "Dominique", "Elaina", "Elaine", "Elayna", "Elise", "Eloisa", "Elyse", "Emeline", "Emmaline", "Emmeline", "Estella", "Estrella", "Etiennette", "Evette", "Fabienne", "Fabrienne", "Fanchon", "Fancy", "Fawna", "Fayana", "Fayette", "Fifi", "Fleur", "Fleurette", "Fontanna", "Fosette", "Francine", "Frederique", "Gabriel", "Gabriele", "Gabrielle", "Gaby", "Garcelle", "Gena", "Genie", "Georgette", "Germaine", "Gervaise", "Gitana", "Harriet", "Heloisa", "Holland", "Honnetta", "Isabelle", "Ivette", "Ivonne", "Jacqueena", "Jacquetta", "Jacquiline", "Jacyline", "Jaime", "Jakqueline", "Janeen", "Janelly", "Janina", "Janiqua", "Janique", "Jannnelle", "Jaquita", "Jardena", "Jeanetta", "Jermaine", "Jessamine", "Jewel", "Jewell", "Joli", "Jolie", "Josephine", "Jozephine", "Julieta", "Karessa", "Karmaine", "Klara", "Laine", "Lanelle", "Laramie", "Layne", "Layney", "Leala", "Leonette", "Lissette", "Lizette", "Lourdes", "Lucienne", "Ly", "Lyla", "Lysette", "Madelaine", "Malerie", "Manette", "Marais", "Marcelle", "Marché", "Mardi", "Margo", "Marguerite", "Marie", "Marie Claude", "Marie Frances", "Marie Joelle", "Marie Pascale", "Marie Sophie", "Marjolaine", "Marquise", "Marvella", "Mathieu", "Matisse", "Maurelle", "Maurissa", "Mavis", "Melisande", "Michelle", "Miette", "Mignon", "Mimi", "Mirya", "Monet", "Moniqua", "Monteen", "Musetta", "Myrlie", "Nadeen", "Nadia", "Nadiyah", "Naeva", "Nanon", "Natalle", "Naudia", "Nettie", "Nicholas", "Nicki", "Nicky", "Nicole", "Nicolette", "Nicolina", "Nicolle", "Nikolette", "Ninette", "Ninon", "Noelle", "Nycole", "Odelette", "Opaline", "Orane", "Orva", "Page", "Parisa", "Parnel", "Parris", "Patrice", "Peridot", "Pippi", "Prairie", "Rachele", "Rachelle", "Racquel", "Raphaelle", "Raquelle", "Remi", "Renée", "Renea", "Renelle", "Renita", "Risette", "Rochelle", "Romy", "Rosabel", "Rosiclara", "Ruba", "Russhell", "Saleena", "Salina", "Satin", "Sedona", "Serene", "Shandelle", "Shanta", "Shante", "Shariah", "Sharita", "Sharleen", "Sheree", "Shereen", "Sherell", "Sherice", "Sherry", "Sidnee", "Sidney", "Sidnie", "Sidonie", "Sinclaire", "Solange", "Solen", "Sorrel", "Suzette", "Sydnee", "Sydney", "Tallis", "Tempest", "Toinette", "Turquoise", "Veronique", "Vignette", "Villette", "Violeta", "Virginie", "Voleta", "Vonny");
            List<String> surnames = Arrays.asList("Arceneau", "Aucoin", "Babin", "Babineaux", "Benoit", "Bergeron", "Bernard", "Bertrand", "Bessette", "Blanc", "Blanchard", "Bonnet", "Boucher", "Bourg", "Bourque", "Boutin", "Bouvier", "Braud", "Broussard", "Brun", "Chevalier", "David", "Depaul", "Desmarais", "Disney", "Dubois", "Dupont", "Dupuis", "Durand", "Fortescue", "Fournier", "Garnier", "Gaudet", "Gillet", "Gillette", "Girard", "Gravois", "Grosvenor", "Lambert", "Landry", "Laroche", "Laurent", "Lefevre", "Leroy", "Leveque", "Lisle", "Martin", "Michel", "Molyneux", "Moreau", "Morel", "Neville", "Pelletier", "Petit", "Prideux", "Renard", "Richard", "Robert", "Rousseau", "Roux", "Rufus", "Simon", "Thomas");
            Iterable<Person> persons = () -> {
                return new Iterator<Person>() {
                    int counter = 40_000; //0_000;  // 40_000 yields about 20 MB of data.

                    @Override
                    public boolean hasNext() {
                        return counter-- > 0;
                    }

                    @Override
                    public Person next() {
                        String givenName = givenNames.get(random.nextInt(0, givenNames.size()));
                        String surname = surnames.get(random.nextInt(0, surnames.size()));
                        UUID id = UUID.randomUUID();
                        String description = Person.LOREM_IPSUM;
                        return new Person(givenName, surname, id, description);
                    }
                };
            };
            writer.writeValues(fileWriter).writeAll(persons);
        }
    }
}

class Person {
    // Static
    static public String LOREM_IPSUM = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.";

    // Member variables.
    private String givenName, surname, description;
    private UUID id;

    public Person() {

    }

    public Person(String givenName, String surname, UUID id, String description) {
        this.givenName = givenName;
        this.surname = surname;
        this.id = id;
        this.description = description;
    }

    public String getGivenName() {
        return givenName;
    }

    public void setGivenName(String givenName) {
        this.givenName = givenName;
    }

    public String getSurname() {
        return surname;
    }

    public void setSurname(String surname) {
        this.surname = surname;
    }

    public String getDescription() {
        return description;
    }

    public void setDescription(String description) {
        this.description = description;
    }

    public UUID getId() {
        return id;
    }

    public void setId(UUID id) {
        this.id = id;
    }

    @Override
    public String toString() {
        return "Person{ " +
                "givenName='" + givenName + '\'' +
                " | surname='" + surname + '\'' +
                " | id='" + id + '\'' +
                " }";
    }
}

这是 Basil 提供的我的解决方案版本,但它使用 univocity-parsers:

public class CsvSpeed {

public static class Person {
    // Static
    static public String LOREM_IPSUM = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.";

    // Member variables.
    @Parsed
    public String givenName, surname, description;

    public UUID id;

    @Parsed
    public void id(String id) {
        this.id = UUID.fromString(id);
    }

    @Override
    public String toString() {
        return "Person{ " +
                "givenName='" + givenName + '\'' +
                " | surname='" + surname + '\'' +
                " | id='" + id + '\'' +
                " }";
    }
}

public List<Person> read(Path path) {
    return new CsvRoutines(Csv.parseRfc4180()).parseAll(Person.class, path.toFile(), "UTF-8", 40_000);
}

public void write(final Path path) {
    ThreadLocalRandom random = ThreadLocalRandom.current();
    CsvWriter writer = new CsvWriter(path.toFile(), "UTF-8", Csv.writeRfc4180());
    writer.writeHeaders("givenName" , "surname" , "id" , "description");

    int limit = 40_000;  // 40_000 yields about 20 MB of data.
    List<String> givenNames = List.of("Adrien", "Aimon", "Alerion", "Alexis", "Alezan", "Ancil", "Andre", "Antoine", "Archard", "Aurélien", "Averill", "Baptiste", "Barnard", "Bartelemy", "Bastien", "Baylee", "Beale", "Beau", "Beaumont", "Beauregard", "Bellamy", "Berger", "Blaize", "Blondel", "Boyce", "Bruce", "Brunelle", "Brys", "Burcet", "Burnell", "Burrell", "Byron", "Canaan", "Carden", "Carolas", "Cavell", "Chace", "Chanler", "Chante", "Chappel", "Charles", "Chasen", "Chason", "Chemin", "Chene", "Cher", "Chevalier", "Cheyne", "Clément", "Clemence", "Corbin", "Coty", "Cygne", "Damien", "Dandre", "Dariel", "Darl", "Dauphine", "Davet", "Dax", "Dean", "Delice", "Delmon", "Destin", "Dominique", "Donatien", "Duke", "Eliott", "Elroy", "Enzo", "Erwan", "Etalon", "Ethan", "Fabron", "Ferrand", "Filberte", "Florent", "Florian", "Fontaine", "Forest", "Fortune", "Franchot", "Francois", "Fraser", "Frayne", "Gaëtan", "Gabin", "Gage", "Gaige", "Garland", "Garner", "Gaston", "Gauge", "Gaylord", "Germain", "Germaine", "German", "Gervaise", "Giles", "Gilles", "Gitan", "Grosvener", "Guifford", "Guion", "Guy", "Guzman", "Henri", "Holland", "Hugo", "Hugues", "Hyacinthe", "Jérémy", "Jacquan", "Jacques", "Jacquez", "Janvier", "Jardan", "Jay", "Jaye", "Jehan", "Jemond", "Jocquez", "Jonathan", "Jules", "Julien", "Justus", "Karoly", "Lado", "Lafayette", "Lamond", "Lancelin", "Landis", "Landry", "Laron", "Larrimore", "Laurent", "LaValle", "Leandre", "Leggett", "Leonce", "Leron", "Leverett", "Lilian", "Loïc", "Lorenzo", "Louis", "Lowell", "Luc", "Lucien", "Lukas", "Macaire", "Mace", "Mahieu", "Maison", "Malleville", "Manneville", "Mantel", "Marc", "Marcel", "Marion", "Marius", "Markez", "Markis", "Marmion", "Marquis", "Marquise", "Marshall", "Martial", "Maslin", "Mason", "Matheo", "Mathias", "Mathys", "Matthieu", "Maxence", "Mayson", "Mehdi", "Merle", "Merville", "Montague", "Montaigu", "Monte", "Montgomery", "Montreal", "Montrel", "Moore", "Morel", "Mortimer", "Nerville", "Neuveville", "Nicolas", "Noë", "Noah", "Noe", "Norman", "Norville", "Nouel", "Olivier", "Onfroi", "Paien", "Parfait", "Parnell", "Pascal", "Patrice", "Paul", "Peppin", "Percival", "Percy", "Pernell", "Peverell", "Philipe", "Pierpont", "Pierre", "Pomeroy", "Prewitt", "Purvis", "Quennell", "Quentin", "Quincey", "Quincy", "Quintin", "Rémi", "Rafaelle", "Ranger", "Raoul", "Raphaël", "Rapier", "Rawlins", "Ray", "Raynard", "Remi", "René", "Renard", "Rene", "Reule", "Reynard", "Robin", "Romain", "Rondel", "Roy", "Royal", "Ruff", "Rush", "Russel", "Rustin", "Sabastien", "Sacha", "Salomon", "Samuel", "Satordi", "Saville", "Scoville", "Sebastien", "Sennett", "Severin", "Shant", "Shantae", "Sidney", "Siffre", "Simeon", "Simon", "Sinclair", "Sofiane", "Somer", "Stephane", "Sully", "Sydney", "Sylvain", "Talbot", "Talon", "Telford", "Tempest", "Teppo", "Théo", "Thayer", "Thibault", "Thibaut", "Thiery", "Tiennan", "Tiennot", "Titouan", "Toussaint", "Travaris", "Tyson", "Urson", "Vachel", "Valentin", "Valere", "Vallis", "Verdun", "Victoir", "Victor", "Waltier", "William", "Wyatt", "Yanis", "Yann", "Yves", "Yvon", "Zosime", "Abrial", "Abrielle", "Abril", "Adele", "Alair", "Alerion", "Amee", "Angelique", "Annette", "Antonella", "Arian", "Ariane", "Armandina", "Aubree", "Aubrielle", "Audra", "Avril", "Bella", "Berneta", "Bette", "Blaise", "Blanche", "Blasa", "Bonte", "Brie", "Brienne", "Brigit", "Cachay", "Calice", "Camille", "Camylle", "Caprice", "Caressa", "Caroline", "Catin", "Celesta", "Celeste", "Cera", "Cerise", "Chablis", "Chalice", "Chambray", "Champagne", "Chandell", "Chaney", "Chantal", "Chante", "Chanterelle", "Chantile", "Chantilly", "Chantrice", "Charla", "Charlotte", "Charmane", "Chaton", "Chemin", "Chenetta", "Cher", "Chere", "Cheri", "Cheryl", "Christine", "Cidney", "Cinderella", "Claire", "Claudette", "Colette", "Cordelle", "Cydnee", "Daeja", "Daija", "Daja", "Damzel", "Darelle", "Darlene", "Darselle", "Dejanelle", "Deleena", "Delice", "Demeri", "Deni", "Denise", "Desgracias", "Desire", "Desiree", "Destanee", "Destiny", "Dior", "Domanique", "Dominique", "Elaina", "Elaine", "Elayna", "Elise", "Eloisa", "Elyse", "Emeline", "Emmaline", "Emmeline", "Estella", "Estrella", "Etiennette", "Evette", "Fabienne", "Fabrienne", "Fanchon", "Fancy", "Fawna", "Fayana", "Fayette", "Fifi", "Fleur", "Fleurette", "Fontanna", "Fosette", "Francine", "Frederique", "Gabriel", "Gabriele", "Gabrielle", "Gaby", "Garcelle", "Gena", "Genie", "Georgette", "Germaine", "Gervaise", "Gitana", "Harriet", "Heloisa", "Holland", "Honnetta", "Isabelle", "Ivette", "Ivonne", "Jacqueena", "Jacquetta", "Jacquiline", "Jacyline", "Jaime", "Jakqueline", "Janeen", "Janelly", "Janina", "Janiqua", "Janique", "Jannnelle", "Jaquita", "Jardena", "Jeanetta", "Jermaine", "Jessamine", "Jewel", "Jewell", "Joli", "Jolie", "Josephine", "Jozephine", "Julieta", "Karessa", "Karmaine", "Klara", "Laine", "Lanelle", "Laramie", "Layne", "Layney", "Leala", "Leonette", "Lissette", "Lizette", "Lourdes", "Lucienne", "Ly", "Lyla", "Lysette", "Madelaine", "Malerie", "Manette", "Marais", "Marcelle", "Marché", "Mardi", "Margo", "Marguerite", "Marie", "Marie Claude", "Marie Frances", "Marie Joelle", "Marie Pascale", "Marie Sophie", "Marjolaine", "Marquise", "Marvella", "Mathieu", "Matisse", "Maurelle", "Maurissa", "Mavis", "Melisande", "Michelle", "Miette", "Mignon", "Mimi", "Mirya", "Monet", "Moniqua", "Monteen", "Musetta", "Myrlie", "Nadeen", "Nadia", "Nadiyah", "Naeva", "Nanon", "Natalle", "Naudia", "Nettie", "Nicholas", "Nicki", "Nicky", "Nicole", "Nicolette", "Nicolina", "Nicolle", "Nikolette", "Ninette", "Ninon", "Noelle", "Nycole", "Odelette", "Opaline", "Orane", "Orva", "Page", "Parisa", "Parnel", "Parris", "Patrice", "Peridot", "Pippi", "Prairie", "Rachele", "Rachelle", "Racquel", "Raphaelle", "Raquelle", "Remi", "Renée", "Renea", "Renelle", "Renita", "Risette", "Rochelle", "Romy", "Rosabel", "Rosiclara", "Ruba", "Russhell", "Saleena", "Salina", "Satin", "Sedona", "Serene", "Shandelle", "Shanta", "Shante", "Shariah", "Sharita", "Sharleen", "Sheree", "Shereen", "Sherell", "Sherice", "Sherry", "Sidnee", "Sidney", "Sidnie", "Sidonie", "Sinclaire", "Solange", "Solen", "Sorrel", "Suzette", "Sydnee", "Sydney", "Tallis", "Tempest", "Toinette", "Turquoise", "Veronique", "Vignette", "Villette", "Violeta", "Virginie", "Voleta", "Vonny");
    List<String> surnames = List.of("Arceneau", "Aucoin", "Babin", "Babineaux", "Benoit", "Bergeron", "Bernard", "Bertrand", "Bessette", "Blanc", "Blanchard", "Bonnet", "Boucher", "Bourg", "Bourque", "Boutin", "Bouvier", "Braud", "Broussard", "Brun", "Chevalier", "David", "Depaul", "Desmarais", "Disney", "Dubois", "Dupont", "Dupuis", "Durand", "Fortescue", "Fournier", "Garnier", "Gaudet", "Gillet", "Gillette", "Girard", "Gravois", "Grosvenor", "Lambert", "Landry", "Laroche", "Laurent", "Lefevre", "Leroy", "Leveque", "Lisle", "Martin", "Michel", "Molyneux", "Moreau", "Morel", "Neville", "Pelletier", "Petit", "Prideux", "Renard", "Richard", "Robert", "Rousseau", "Roux", "Rufus", "Simon", "Thomas");
    for (int i = 1; i <= limit; i++) {
        String givenName = givenNames.get(random.nextInt(0, givenNames.size()));
        String surname = surnames.get(random.nextInt(0, surnames.size()));
        UUID id = UUID.randomUUID();
        String description = Person.LOREM_IPSUM;
        writer.writeRow(givenName, surname, id, description);
    }
    writer.close();
}

public static void main(final String[] args) {
    // Launch the app.
    CsvSpeed app = new CsvSpeed();

    // Write.
    String when = Instant.now().truncatedTo(ChronoUnit.SECONDS).toString().replace(":", "•");
    Path pathOutput = Paths.get("/tmp/persons.csv");
    app.write(pathOutput);
    System.out.println("Writing file: " + pathOutput);

    // Read.
    long start = System.nanoTime();
    Path pathInput = Paths.get("/tmp/persons.csv");
    List<Person> list = app.read(pathInput);
    long stop = System.nanoTime();

    // Time.
    long elapsed = (stop - start);
    Duration d = Duration.ofNanos(elapsed);
    System.out.println("Reading elapsed: " + d);
    System.out.println("Reading took nanos per row: " + (elapsed / list.size()));
    System.out.println("nanos elapsed: " + elapsed + "  |  list.size: " + list.size());
}
}

运行 在我的机器上我得到以下时间:

Writing file: /tmp/persons.csv
Reading elapsed: PT0.230395859S
Reading took nanos per row: 5759
nanos elapsed: 230395859  |  list.size: 40000

它没有显示一旦您将 JIT 引入和优化代码考虑在内,解析器的速度有多快。我更改了代码以生成 400K 条记录(生成 200 MB 的文件)。现在代码打印:

Reading elapsed: PT0.993483883S
Reading took nanos per row: 2483
nanos elapsed: 993483883  |  list.size: 400000

并且有 400 万行(几乎 2GB 的数据):

Reading elapsed: PT7.961481755S
Reading took nanos per row: 1990
nanos elapsed: 7961481755  |  list.size: 4000000