LinkedHashMap 与 LinkedHashSet 用于检索特定元素和按插入顺序检索

Question

我正在做一个显然需要使用 Set 的问题，但我需要按插入顺序从集合中检索元素 + 检索特定元素。

但是找到特定元素太慢了（我想 O(n) 因为我必须遍历整个集合才能找到 & return）。

所以我选择了 LinkedHashMap<someClass,someClass>，其中键值映射包含相同的对象。

虽然速度更快，但它使用了两倍的内存，如果我的 key/value（两者都一样）碰巧占用了很多 space。

我希望如果有人对我的问题有更好的解决方案或优化。

编辑：顺便说一句 SO answer 的评论可能会有所帮助

编辑：

 public Set<CompanyDummy> run(int numberOfCompanies) 
 {
        Set<CompanyDummy> companies=new LinkedHashSet<>();
        
        //Create companies
        CompanyDummy company;
        for(int idx=0;idx<=numberOfCompanies-1;idx++)
        {
            company=new CompanyDummy(idx);
            companies.add(company);
        }
        
        
        //Some code here to fill up each CompanyDummy element with engineers
        
        //At this point,there will be input 
        //specifying which companies to merge via their index(not reference)
        //Problem guarantees those companies exist. Hence, my reason why 
        //I didn't  do something like
        //if (set.contains(value)) return value;
        
        //Do note when we merge companies u & v, v is closed down
        
        for(int idx=0;idx<=transactions-1;idx++)
        {
            companyID= scanner.nextInt();
            anotherCompanyID= scanner.nextInt();
            
            //This part is where I search through companies to find if one exists
            //which is O(n)
            //Replacing sets with maps somehow makes the program faster
            //despite LinkedHashSet being backed by LinkedHashMap
            company=findCompany(companies, companyID);
            anotherCompany=findCompany(companies, anotherCompanyID);
            
            if(company!=null && anotherCompany!=null)
            {
                company.union(anotherCompany);
                companies.remove(anotherCompany);
            }
            

        }
 }
 
 private CompanyDummy findCompany(Set<CompanyDummy> companies,int id)
 {
        
        for(CompanyDummy company : companies)
        {
            if(company.getIndex()==id)
            {
                return company;
            }
        }
        
        return null;
  }

}

class CompanyDummy
{
private int index;
private Set<Integer> engineers;//The Integer here just denotes the engineer

public  CompanyDummy(int index) 
{
    this.index=index;
}

public int getindex()
{
    return index;
}

public void union(CompanyDummy someCompany)
{
    this.engineers.addAll(someCompany.engineers);
}

}

Answer 1

Although this was faster, it used up twice as much memory which is especially concerning if my key/value(both same anyways) happened to take up alot of space.

我看不出 LinkedHashMap 会更快。考虑到 LinkedHashSet 是使用后备 LinkedHashMap 实例实现的，可能会快一点，因此与直接使用 LinkedHashMap 相比，它会有一点开销。

至于内存，无论 key/value 实例的大小如何，两者都将占用完全相同的内存量。您会看到，当您将元素 X 添加到 LinkedHashSet 时，它实际上将条目 <X,PRESENT> 放入基础 LinkedHashMap（其中 PRESENT 是对某个虚拟 Object).使用 <X,X> 而不是 <X,PRESENT> 没有区别，因为 LinkedHashMap 只包含对键和值的引用，而不是 key/value 实例的副本。

But finding the specific element was too slow (I guess O(n) since I have to go through the entire set to find it)

A HashSet/LinkedHashSet 需要 O(1) 来确定特定元素是否已经在 Set 中。您不必遍历整个 Set。如果 Set 包含该元素，则您已经有了对它的引用。您不必寻找与您要搜索的元素相同的 Set 元素。

Answer 2

在java中，HashSet是使用HashMap实现的

HashSet 确实不适用于特定元素检索。映射中的键应该是标识映射到它的值对象的东西。此外，将复杂对象作为映射键通常不是一个好主意，除非你有一个非常好的 HashCode() 实现。

map 也不会维护插入顺序（LinkedHashMap 可能会保留它，因为它由 LinkedList 支持）。您可以将键设置为表示插入顺序的整数序列。例如：

Map<Integer, SomeClass> map = new HashMap<>();
map.put(0, object1);
map.put(1, object2);

等等。这将比 LinkedList 更好，因为获取时间是 O(1)。稍后，您可以在带有运行索引的 for 循环中获取它们，以按插入顺序获取它们。

如果您需要 HashSet 的属性来防止集合中的重复项，您将需要保留一组每个对象的一些唯一标识符。例如一个人的 ID#。在将每个对象插入地图之前，您还检查“唯一键”是否不在集合中，如果是，则将“唯一键”插入集合和地图。这完全取决于您的对象的性质。

如果您还有其他需要，请在评论中告诉我。祝你好运！

Answer 3

好的，现在我看到了代码，我可以很好地理解问题并给你一个解决方案。

您不仅仅是从集合中检索特定元素。如果元素是集合的成员，您实际上是在特定字段中检索具有特定值的元素。这就是为什么你不能使用集合（或映射）的快速查找操作¹.

解决方案：

如果您想以比 O(N) 更好的速度进行检索，则需要 Map<Integer, CompanyDummy>。必须填充映射，以便它从 id 映射到 CompanyDummy，并将 id 作为其 index。然后您可以将 findCompany 调用替换为 Map.get 调用。

请注意，将 Set<CompanyDummy> 替换为 Map<CompanyDummy, CompanyDummy> 将无助于解决此问题。两者都会给你 O(N) 性能。

你说的是关于用 LinkedHashMap<CompanyDummy, CompanyDummy> 替换 LinkedHashSet<CompanyDummy> 的实验：

Although this was faster, it used up twice as much memory which is especially concerning if my key/value(both same anyways) happened to take up a lot of space.

我怀疑这两件事是否真的存在。

Space 这两个数据结构的使用应该是相同的，因为 LinkedHashSet 实际上是 LinkedHashMap 在幕后。
（IMO）不太可能存在显着的实际性能差异。当然不是 2 倍的差异。最可能的解释是，您用于测量和比较性能的方法确实适当考虑了由（各种）JVM 启动开销引起的可能的时序失真；例如class 加载、调整堆大小和垃圾收集以及 JIT 编译。

^{1 - 为了完整起见，如果不要求您的场景/地图按插入顺序排序，您可以使用 TreeSet<CompanyDummy> 和自定义 Comparator 按 index 值对 CompanyDummy 个对象进行排序。然后，您可以使用 OrderedSet.ceiling 和虚拟 CompanyDummy 实例来探测 O(logN).}

中的集合

LinkedHashMap 与 LinkedHashSet 用于检索特定元素和按插入顺序检索

LinkedHashMap vs LinkedHashSet for retrieving specific elements & retrieving in insertion order

java

dictionary

set

time-complexity

disjoint-sets