如何优化嵌套循环或每次迭代执行的方法？

Question

我知道这个问题的标题并没有说明我正在努力解决的问题。我有一个文本文件，里面装满了在线书店的采购订单。此文本文件长约 900,000 行，每行包含两个以逗号分隔的字段 (customer_id,book_id)。我想做一些数据挖掘，并且认为找出一些关于书籍的统计数据会很有趣，所以我创建了两种方法。 GetOrderCount(字符串 x, 字符串 y) 和 AllPairs()。第一个计算有多少客户同时购买了两本特定书籍，第二个计算所有可能的配对（所有尺寸 2 的组合）。然而，这需要很长时间才能运行。查看代码是否有可能需要很长时间的特定内容？ AllPairs() 中的嵌套循环是否足够复杂以证明使用并行 For 是合理的？我还选择了一些结构，这样它会更有意义，但它们可能不适用于此类操作。任何关于为什么这段代码这么慢的指示都是完美的。

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace BookStats
{
class Order
{
    Dictionary<int, HashSet<String>> orders;
    List<string> books;

    public Order(String path)
    {
        orders = GetOrders(path, out books);
    }

    private Dictionary<int, HashSet<string>> GetOrders(string path, out List<string> distinctBooks)
    {
        Dictionary<int, HashSet<string>> items = new Dictionary<int, HashSet<string>>();
        distinctBooks = new List<string>();
        List<string> allBooks = new List<string>();
        using (StreamReader sr = File.OpenText(path))
        {
            string s = String.Empty;
            while ((s = sr.ReadLine()) != null)
            {
                string[] line = s.Split(',');
                try
                {
                    int id = int.Parse(line[0]);
                    allBooks.Add(line[1]);
                    if (items.ContainsKey(id))
                    {
                        items[id].Add(line[1]);
                    }
                    else
                    {
                        HashSet<string> customerBooks = new HashSet<string>();
                        customerBooks.Add(line[1]);
                        items.Add(id, customerBooks);
                    }
                }
                catch{ }
            }
        }
        distinctBooks.AddRange(allBooks.Distinct());
        return items;
    }

    public int GetOrderCount(string x, string y){
        int count = 0;
        foreach (KeyValuePair<int,HashSet<string>> order in orders)
        {
            var receipt = order.Value;
            if (receipt.Contains(x) && receipt.Contains(y))
            {
                count++;
            }
        }
        return count;
    }

    public void GetAllPairs()
    {
        Stopwatch watch = new Stopwatch();
        watch.Start();
        for (int i = 0; i < books.Count; i++)
        {
            for (int j = i+1; j < books.Count;j++)
            {
                int count = GetOrderCount(books[i], books[j]);
                Console.WriteLine(j);

            }
            Console.WriteLine(watch.Elapsed);
        }
    }

    public int GetBookCount() {
        return books.Count;
    }

    public void GetCustomerPurchase(int id)
    {
        foreach (string s in orders[id])
        {
            System.Console.WriteLine("Raamat " + s);
        }
    }



}

}

编辑：编辑代码以匹配@Chris 和@Anony-Mousse

给出的建议

Answer 1

您的循环实际上有四层深度（第三个循环在“GetOrdersCount”中，第四个在“Contains”中）。这可能就是让它变慢的原因。 使用分析器查看需要优化的地方。

对于初学者，替换

Dictionary<int, List<String>> orders;

和

Dictionary<int, Set<String>> orders;

并对代码进行必要的更改。

构建优化的数据结构，例如倒排排序列表，以加速昂贵的操作。例如，对于“包含”，集合也比列表更快。

如何优化嵌套循环或每次迭代执行的方法？

How to optimize nested looping or the method that is executed in every iteration?

c#

nested

data-mining