Thingsboard AWS 服务器冻结

Thingsboard AWS server freezes

我想要 运行 AWS 上的自我管理 Thingsboard (t2.micro)。

我已经在 t2.micro AWS 实例 运行ning Ubuntu 20.04 服务器上安装了 Thingsboard CE。
我遵循了 aws setup and Ubuntu install 指南(postgresql + 内置队列服务)。

我还使用 this 指南设置了 haproxy。

我能够成功登录到我的 Thingsboard。我只是更改了密码并检查了基本功能,但没有创建任何新的仪表板或进行任何修改。

在此之后,我让计算机保持打开状态,运行ning Thingsboard。第二天,我无法访问 Thingsboard,尽管 AWS 实例处于 运行ning 状态,但我无法再通过 ssh 进入它。停止并启动后(重启无效)实例一切正常(可以访问 ssh 和 Thingsboard)。

我可以通过将实例保持打开来重现此故障,似乎在几个小时(5-8 小时)后,Thingsboard(或其他不确定的东西)出现故障,导致整个计算机冻结。

我检查了两件事:

  1. 我检查了 CPU AWS 监控的使用情况。 几个小时后,CPU 负载似乎出现了大幅跃升,然后又回落到几乎为零。虽然 Thingsboard 是 运行ning,但它是恒定的。See printscreen from AWS monitoring

  2. 我检查了 Thingsboard 日志(在 /var/log/thingsboard 中): 有一些错误,但不幸的是,大多数事情还不够让我猜测全新安装可能有什么问题。以下是日志中的一些行:

2021-11-12 00:21:59,626 [http-nio-0.0.0.0-8080-exec-13] INFO  o.a.coyote.http11.Http11Processor - Error parsing HTTP request header
     Note: further occurrences of HTTP request parsing errors will be logged at DEBUG level.
    java.lang.IllegalArgumentException: Invalid character found in method name

[0x160x030x010x00{0x010x000x00w0x030x030x170xb80xb80xe50xef0x000xb50x0a&0x930x020x00:0xde0xd70xa00xab0xb 70x8bU0xc00x92r0x9330x10O0x8c<o0xf70xf90x000x000x1a0xc0/0xc0+0xc00x110xc00x070xc00x130xc00x090xc00x140xc00x0a0x000x050x00/0x0050xc00x120x000x0a0x010x000x0040x000x050x000x050x010x000 x000x000x000x000x0a0x000x080x000x060x000x170x000x180x000x190x000x0b0x000x020x010x000x000x0d0x000x100x000x0e0x040x010x040x030x020x010x020x030x040x010x050x010x060x010xff0x010x000x010x00...].

  HTTP method names must be tokens
            at org.apache.coyote.http11.Http11InputBuffer.parseRequestLine(Http11InputBuffer.java:417)
            at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:261)
            at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:65)
            at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:893)
            at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1707)
            at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
            at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
            at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
            at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
            at java.base/java.lang.Thread.run(Thread.java:829)
2021-11-12 00:22:01,486 [sql-queue-2-ts-4-thread-1] WARN  com.zaxxer.hikari.pool.PoolBase - HikariPool-1 - Failed to validate

connection org.postgresql.jdbc.PgConnection@4393afd0 (This connection has been closed.). Possibly consider using a shorter maxLifetime value. 2021-11-12 00:22:01,487 [sql-queue-2-ts latest-8-thread-1] WARN com.zaxxer.hikari.pool.PoolBase - HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection@75b9496b (This connection has been closed.). Possibly consider using a shorter maxLifetime value. 2021-11-12 00:22:01,487 [sql-queue-0-ts latest-6-thread-1] WARN com.zaxxer.hikari.pool.PoolBase - HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection@31849eec (This connection has been closed.). Possibly consider using a shorter maxLifetime value. 2021-11-12 00:22:01,487 [sql-queue-0-ts-2-thread-1] WARN com.zaxxer.hikari.pool.PoolBase - HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection@725fafe3 (This connection has been closed.). Possibly consider using a shorter maxLifetime value.

还有一些:

2021-11-12 00:23:46,205 [sql-log-1-thread-1] INFO  o.t.s.dao.sql.TbSqlBlockingQueue - Queue-2 [TS Latest] queueSize [9] totalAdded [0] totalSaved [0] totalFailed [0]
2021-11-12 00:23:47,741 [sql-queue-0-ts-2-thread-1] WARN  o.h.e.jdbc.spi.SqlExceptionHelper - SQL Error: 0, SQLState: 08003
2021-11-12 00:23:47,742 [sql-queue-2-ts-4-thread-1] WARN  o.h.e.jdbc.spi.SqlExceptionHelper - SQL Error: 0, SQLState: 08003
2021-11-12 00:23:47,742 [sql-queue-2-ts latest-8-thread-1] WARN  o.h.e.jdbc.spi.SqlExceptionHelper - SQL Error: 0, SQLState: 08003
2021-11-12 00:23:47,742 [sql-queue-0-ts latest-6-thread-1] WARN  o.h.e.jdbc.spi.SqlExceptionHelper - SQL Error: 0, SQLState: 08003
2021-11-12 00:23:48,022 [sql-queue-0-ts-2-thread-1] ERROR o.h.e.jdbc.spi.SqlExceptionHelper - HikariPool-1 - Connection is not available, request timed out after 634223ms.
2021-11-12 00:23:48,058 [sql-queue-0-ts-2-thread-1] ERROR o.h.e.jdbc.spi.SqlExceptionHelper - This connection has been closed.
2021-11-12 00:23:48,022 [sql-queue-0-ts latest-6-thread-1] ERROR o.h.e.jdbc.spi.SqlExceptionHelper - HikariPool-1 - Connection is not available, request timed out after 634223ms.
2021-11-12 00:23:48,059 [sql-queue-0-ts latest-6-thread-1] ERROR o.h.e.jdbc.spi.SqlExceptionHelper - This connection has been closed.
2021-11-12 00:23:48,022 [sql-queue-2-ts latest-8-thread-1] ERROR o.h.e.jdbc.spi.SqlExceptionHelper - HikariPool-1 - Connection is not available, request timed out after 624177ms.
2021-11-12 00:23:48,059 [sql-queue-2-ts latest-8-thread-1] ERROR o.h.e.jdbc.spi.SqlExceptionHelper - This connection has been closed.
2021-11-12 00:23:48,023 [sql-queue-2-ts-4-thread-1] ERROR o.h.e.jdbc.spi.SqlExceptionHelper - HikariPool-1 - Connection is not available, request timed out after 627819ms.
2021-11-12 00:23:48,059 [sql-queue-2-ts-4-thread-1] ERROR o.h.e.jdbc.spi.SqlExceptionHelper - This connection has been closed.

最后:

2021-11-12 00:33:10,919 [sql-queue-0-ts latest-6-thread-1] ERROR o.t.s.dao.sql.TbSqlBlockingQueue - [TS Latest] Failed to save 1 entities
org.springframework.transaction.CannotCreateTransactionException: Could not open JPA EntityManager for transaction; nested exception is org.hibernate.exception.JDBCConnectionException: Unable to acquire JDBC Connection
        at org.springframework.orm.jpa.JpaTransactionManager.doBegin(JpaTransactionManager.java:448)
        at org.springframework.transaction.support.AbstractPlatformTransactionManager.startTransaction(AbstractPlatformTransactionManager.java:400)
        at org.springframework.transaction.support.AbstractPlatformTransactionManager.getTransaction(AbstractPlatformTransactionManager.java:373)
        at org.springframework.transaction.interceptor.TransactionAspectSupport.createTransactionIfNecessary(TransactionAspectSupport.java:574)
        at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:361)
        at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:118)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
        at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:750)
        at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:692)
        at org.thingsboard.server.dao.sqlts.insert.latest.psql.PsqlLatestInsertTsRepository$$EnhancerBySpringCGLIB$1b448c.saveOrUpdate(<generated>)
        at org.thingsboard.server.dao.sqlts.SqlTimeseriesLatestDao.lambda$init(SqlTimeseriesLatestDao.java:133)
        at org.thingsboard.server.dao.sql.TbSqlBlockingQueue.lambda$init(TbSqlBlockingQueue.java:71)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.hibernate.exception.JDBCConnectionException: Unable to acquire JDBC Connection
        at org.hibernate.exception.internal.SQLExceptionTypeDelegate.convert(SQLExceptionTypeDelegate.java:48)
        at org.hibernate.exception.internal.StandardSQLExceptionConverter.convert(StandardSQLExceptionConverter.java:42)
        at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:113)
        at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:99)
        at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.acquireConnectionIfNeeded(LogicalConnectionManagedImpl.java:111)
        at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.getPhysicalConnection(LogicalConnectionManagedImpl.java:138)
        at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.getConnectionForTransactionManagement(LogicalConnectionManagedImpl.java:276)
        at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.begin(LogicalConnectionManagedImpl.java:284)
        at org.hibernate.resource.transaction.backend.jdbc.internal.JdbcResourceLocalTransactionCoordinatorImpl$TransactionDriverControlImpl.begin(JdbcResourceLocalTransactionCoordinatorImpl.java:246)
        at org.hibernate.engine.transaction.internal.TransactionImpl.begin(TransactionImpl.java:83)
        at org.springframework.orm.jpa.vendor.HibernateJpaDialect.beginTransaction(HibernateJpaDialect.java:184)
        at org.springframework.orm.jpa.JpaTransactionManager.doBegin(JpaTransactionManager.java:402)
        ... 16 common frames omitted
Caused by: java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 634223ms.
        at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:695)
        at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:197)
        at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:162)
        at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:128)
        at org.hibernate.engine.jdbc.connections.internal.DatasourceConnectionProviderImpl.getConnection(DatasourceConnectionProviderImpl.java:122)
        at org.hibernate.internal.NonContextualJdbcConnectionAccess.obtainConnection(NonContextualJdbcConnectionAccess.java:38)
        at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.acquireConnectionIfNeeded(LogicalConnectionManagedImpl.java:108)
        ... 23 common frames omitted
Caused by: org.postgresql.util.PSQLException: This connection has been closed.
        at org.postgresql.jdbc.PgConnection.checkClosed(PgConnection.java:877)
        at org.postgresql.jdbc.PgConnection.setNetworkTimeout(PgConnection.java:1610)
        at com.zaxxer.hikari.pool.PoolBase.setNetworkTimeout(PoolBase.java:560)
        at com.zaxxer.hikari.pool.PoolBase.isConnectionAlive(PoolBase.java:173)
        at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:186)
        ... 28 common frames omitted

有趣的是,CPU 负载达到最大值的时间间隔与日志中的错误消息并不完全相关。 对于长错误消息,我深表歉意,但现在我不知道根本原因是什么。

我还没试过重装整台电脑。

我的问题是,我应该如何进行?有没有人遇到过类似的问题?什么logs/services/etc。我应该检查一下才能找到根本原因吗?

我应该尝试使用资源更多的机器吗?我应该尝试其他数据库和队列服务吗?

在目前的形式下,这个 Thingsboard 实例即使在测试中也不稳定。

编辑:抱歉,我无法正确格式化错误代码的第一部分。 Edit2:第一个 link 是错误的。

在我将 RAM 增加到 4GB(从 1GB)之后,ThingsBoard 服务器正常运行。不再有零星的冻结。由于没有其他可证明的问题建议,现在我的系统可以正常工作,我认为问题已得到解答。

这里有几点:

  1. 看起来操作系统 运行 内存不足并且变得无响应。要解决此问题,请尝试 manage Java heap memory 对于 4Gb 实例,此 Java 堆限制 JAVA_OPTS="$JAVA_OPTS -Xms1024M -Xmx1024M" 可能有用,因为 Java 也使用一些非堆内存,PostgreSQL和其他人需要一些内存才能 运行.

  2. AWS 上的
  3. t2 个实例可能会因 CPU 节流而减慢整个过程。像 c6 或 m5 这样的实例是性能更好的选择。

  4. 内存中队列可能会导致内存不足问题和数据丢失,以防消息速率过高或由于第三方导致某些处理拥塞。考虑使用 Kafka 使您的安装更加稳定可靠。