为什么将 JPA 实体存储在 HashMap/ConcurrentHashMap 中用于缓存不是一个好主意? (并发异常、死锁、等待前的信号)

Why Is it not a good idea to store JPA entites in HashMap/ConcurrentHashMap for caching? (Concurrency Exception, Deadlocks, signal before wait)

我有一个带有 spring-data-jpa 的 SpringBootApplication,使用 eclipse-link 作为 JPA 服务提供者来执行 CRUD 应用程序。我还有一个缓存所有实体(在启动时和进行任何更新后)并将其存储在 ConcurrentHashMap 中的 CacheService。

我阅读了各种帖子/诸如此类的问题

  1. JPA multithreading org.eclipse.persistence.exceptions.ConcurrencyException
  2. 和大多数 google 搜索结果在此 link

这表明 entitymanager 不是线程安全的,不应跨线程共享实体。其他一些帖子提到 enitityManagerFactory 另一方面是线程安全的)

大部分问题都在下面代码的注释中。我也总结在最下面了。



// Sample code 
@SpringBootApplication
public class Application {

   public static void main(String[] args){
      SpringApplication.run(Application.class, args);
   }

}

/* Sample entity class */

@Entity
@Table
public class Student {

   @Id
   // other annotations for generation strategy.
   Integer studentId;

   //other fields below with different types of association like courses student is enrolled in

}


/* Sample Repository */
@Repository
public interface StudentRepository extends JpaRepository<Student, Integer>{} 


/** Sample Service */
@Service
public class StudentServiceImpl implements StudentService{

    @Autowired
    private StudentRepository studentRepository;

    @Autowired
    private CacheService cacheService;

    @Override
    public Student createStudent(Student student)}{
        student = studentRepository.saveAndFlush(student);
        cacheService.cacheEntity(student);
    }

    @Override
    @Transactional
    public Student updateStudent(Integer studentId, StudentDTO dto){
        // getting object from cache instead of repository. 
        // 1. does cachedEntity really enforces write lock? we never went to entity manager under this @Transactional .
        // since entity manager is not really thread safe, what can be scenarios where I can run into issues like deadlock, concurrency Exception?

        // when we cached the entities we might have cached using different thread. 
        // I know that entityManagerFactory is thread-safe, but not sure about it's internals. 

        // 2. What could be scenarios when ConcurrencyManager could issue a signal to release read/write locks
        // before a wait is triggered.
        Student student = cacheService.getStudent(studentId);
        student = updateStudentFromDTO(student, dto); // copies fields from dto to student
        Student student = studentRepository.saveAndFlush(student);
        cacheService.cacheEntity(student);
    }

}


// Cache Service
@Service
public class CacheService {

    private Map<Integer, Student> studentsMap = new ConcurrentHashMap<>();

    @Autowired
    private StudentRepository studentRepository;


    @PostConstruct
    @Scheduled(initialDelay = 3600000, fixedDelay = 3600000)
    public void cacheEntities(){
        cacheStudents();
        // cache other entities below

    }

    public Student getStudent(Integer studentId){
        // what whould happen once object  is shared 
        // to another class in a thread, multiple threads can operate on same since it is no longer thread safe 
        // can it result into issues like deadlock , concurrencyException , Why and How?
        return studentsMap.get(studentId);
    }

    public void cacheEntity(Object o){
        if(obj instanceof Student){
            Student s = (Student) o;
            synchronized(this){
                studentsMap.put(s.getStudentId(), s);
                // cache linked associations 
                for(Course c: s.getCourses()){
                    cacheCourses(s.getCourses()) // imagine this function does same as cacheEntity where it caches the courses in its own HashMap.
                }
            }
        }
        // other objects caching block
    }

    public void cacheStudents(){

        List<Student> students = studentRepository.findAll();
        Map<Integer, Student> map = students.stream.collect(Collectors.toMap(Student::getStudentId, Function.identity()));

        synchronized(this){
            this.studentsMap = map;
        }

    }

}


//  RefreshService
@Service 
public class RefreshService{

    @Autowired
    EntityManagerFactory emf;

    @Autowired
    CacheService cacheService;

    public void refreshCache(){
        this.emf.getCache().evictAll();

        // 1. what can happen in the interim. Say, one entity already taken from cache 
           and other thread evits the cache. 
        // 2. Now this entity is no longer managed by entityManager, what would happen if 
           lazy fetch is performed on such entities? or user tries to save this?
        
        cacheService.cacheEntities();
    }

}


问题汇总:

  1. 此类缓存可能会出现哪些问题?
  2. 如果我们从缓存中获取对象,@Transactional 注释是否会对实体强制执行任何类型的锁定?
  3. 自定义实体缓存的正确方法是什么。 (我知道我们可以编写一个映射器并将 dto 存储在 hashmap 中,但还有哪些其他选项)
  4. 如果我们有这个应用程序的多个实例并且在负载均衡器下,这意味着什么?鉴于并非所有实体都可能在所有实例的缓存中可用。它们也可能不同步。

PS。我不是在尝试实现 cacheService,而是试图从只有一个实例的现有应用程序中删除它(在意识到为什么缓存不是一个好主意之后)。

您的许多问题应该独立提出,因为它们通常与缓存有关,并且不会因涉及 JPA 而改变。正如链接和您的 post 所建议的那样,EntityManagers 以及从中读取的所有内容都不是线程安全的。简而言之,拥有一个共享的单一缓存并将用于单一工作单元架构的对象推入其中是很糟糕的。之所以会出现这些警告,是因为绝对不涉及线程锁定,并且容器无法帮助您。你可以通过确保所有需要的东西都被完全加载和分离,并且缓存的对象永远不会被修改来解决问题(我在过去的项目中有过)。动态延迟加载会让你产生不确定的行为——如果你幸运的话,这是一个错误。简而言之:

  1. What are the possible issues that could occur due to such caching?

这取决于应用程序,但这些对象及其绑定的任何 JPA 上下文都不是线程安全的。在两个线程上访问未获取(惰性)的东西是一个问题区域。这些缓存的对象还持有 JPA 上下文对象(从中读取它们的 EM),因此可以占用比您预想的更多的资源。

  1. Would @Transactional annotation enforce any kind of locking on entity if we get the object from cache?

我不知道怎么做。这只是将访问方法包装在事务中,但不保护它可能从其他线程加载的任何资源。

  1. What is the right way to have a custom cache of entities. (I know We could write a mapper and store dtos in hashmap but what are other options)

你自己覆盖了它。您自己的缓存需要缓存分离的对象。这对于 EclipseLink 来说有点困难,它需要将对象的副本 安全 用于缓存,可能通过使用 EclipseLink 的 CopyGroup 或您自己的复制机制。不过,EclipseLink 在 EMF 中使用共享缓存,因此如果对象之前已读入一次,则通过 EntityManagers 读取已经受益。

  1. The implication if we have multiple instance of this application and are under load-balancer? Given not all entities might be available in the cache in all instances. They might also be out of sync.

这是不言自明的 - 您必须将缓存保持在数据之上。 EclipseLink 的共享缓存有 coordination mechanisms 您可以配置,但是使用您自己的缓存,您需要自己的机制,除非您有一个缓存用于所有应用程序实例。