Google 使用 Stackdriver 的 IOT 每个设备心跳警报
Google IOT per device heartbeat alert using Stackdriver
我想就大量 Google 物联网核心设备中的任何一个缺少心跳(或收到 0 字节)发出警报。我似乎无法在 Stackdriver 中执行此操作。相反,它似乎让我在整个设备注册表上发出警报,但没有提供我正在寻找的东西(我怎么知道特定设备已断开连接?)
那么如何做到这一点呢?
我不知道为什么这个问题被否决为 'too broad'。
事实是 Google IOT 没有针对每个设备的警报,而是仅针对整个设备注册表提供警报。如果这不是真的,请回复此 post。明确说明这一点的页面 is here:
Cloud IoT Core exports usage metrics that can be monitored
programmatically or accessed via Stackdriver Monitoring. These metrics
are aggregated at the device registry level. You can use Stackdriver
to create dashboards or set up alerts.
中假设的承诺中内置了每个设备警报的重要性
Operational information about the health and functioning of devices is
important to ensure that your data-gathering fabric is healthy and
performing well. Devices might be located in harsh environments or in
hard-to-access locations. Monitoring operational intelligence for your
IoT devices is key to preserving the business-relevant data stream.
因此,如果许多分散在全球的设备中的一个设备断开连接,今天要获得警报并不容易。人们需要构建它,并且根据人们试图做的事情,这将需要不同的解决方案。
在我的例子中,如果最后一次心跳时间或最后一次事件状态发布早于 5 分钟,我想发出警报。为此,我需要 运行 一个扫描设备注册表并定期执行此操作的循环函数。此 API 的用法在另一个 SO post 中概述:
作为参考,这是我刚刚编写的一个 Firebase 函数,用于检查设备的在线状态,可能需要一些调整和进一步测试,但可以帮助其他人开始:
// Example code to call this function
// const checkDeviceOnline = functions.httpsCallable('checkDeviceOnline');
// Include 'current' key for 'current' online status to force update on db with delta
// const isOnline = await checkDeviceOnline({ deviceID: 'XXXX', current: true })
export const checkDeviceOnline = functions.https.onCall(async (data, context) => {
if (!context.auth) {
throw new functions.https.HttpsError('failed-precondition', 'You must be logged in to call this function!');
}
// deviceID is passed in deviceID object key
const deviceID = data.deviceID
const dbUpdate = (isOnline) => {
if (('wasOnline' in data) && data.wasOnline !== isOnline) {
db.collection("devices").doc(deviceID).update({ online: isOnline })
}
return isOnline
}
const deviceLastSeen = () => {
// We only want to use these to determine "latest seen timestamp"
const stamps = ["lastHeartbeatTime", "lastEventTime", "lastStateTime", "lastConfigAckTime", "deviceAckTime"]
return stamps.map(key => moment(data[key], "YYYY-MM-DDTHH:mm:ssZ").unix()).filter(epoch => !isNaN(epoch) && epoch > 0).sort().reverse().shift()
}
await dm.setAuth()
const iotDevice: any = await dm.getDevice(deviceID)
if (!iotDevice) {
throw new functions.https.HttpsError('failed-get-device', 'Failed to get device!');
}
console.log('iotDevice', iotDevice)
// If there is no error status and there is last heartbeat time, assume device is online
if (!iotDevice.lastErrorStatus && iotDevice.lastHeartbeatTime) {
return dbUpdate(true)
}
// Add iotDevice.config.deviceAckTime to root of object
// For some reason in all my tests, I NEVER receive anything on lastConfigAckTime, so this is my workaround
if (iotDevice.config && iotDevice.config.deviceAckTime) iotDevice.deviceAckTime = iotDevice.config.deviceAckTime
// If there is a last error status, let's make sure it's not a stale (old) one
const lastSeenEpoch = deviceLastSeen()
const errorEpoch = iotDevice.lastErrorTime ? moment(iotDevice.lastErrorTime, "YYYY-MM-DDTHH:mm:ssZ").unix() : false
console.log('lastSeen:', lastSeenEpoch, 'errorEpoch:', errorEpoch)
// Device should be online, the error timestamp is older than latest timestamp for heartbeat, state, etc
if (lastSeenEpoch && errorEpoch && (lastSeenEpoch > errorEpoch)) {
return dbUpdate(true)
}
// error status code 4 matches
// lastErrorStatus.code = 4
// lastErrorStatus.message = mqtt: SERVER: The connection was closed because MQTT keep-alive check failed.
// will also be 4 for other mqtt errors like command not sent (qos 1 not acknowledged, etc)
if (iotDevice.lastErrorStatus && iotDevice.lastErrorStatus.code && iotDevice.lastErrorStatus.code === 4) {
return dbUpdate(false)
}
return dbUpdate(false)
})
我还创建了一个与命令一起使用的函数,用于向设备发送命令以检查它是否在线:
export const isDeviceOnline = functions.https.onCall(async (data, context) => {
if (!context.auth) {
throw new functions.https.HttpsError('failed-precondition', 'You must be logged in to call this function!');
}
// deviceID is passed in deviceID object key
const deviceID = data.deviceID
await dm.setAuth()
const dbUpdate = (isOnline) => {
if (('wasOnline' in data) && data.wasOnline !== isOnline) {
console.log( 'updating db', deviceID, isOnline )
db.collection("devices").doc(deviceID).update({ online: isOnline })
} else {
console.log('NOT updating db', deviceID, isOnline)
}
return isOnline
}
try {
await dm.sendCommand(deviceID, 'alive?', 'alive')
console.log('Assuming device is online after succesful alive? command')
return dbUpdate(true)
} catch (error) {
console.log("Unable to send alive? command", error)
return dbUpdate(false)
}
})
这也使用了我修改后的 DeviceManager
版本,您可以在这个要点上找到所有示例代码(以确保使用最新更新,并在此处保持 post 小):
https://gist.github.com/tripflex/3eff9c425f8b0c037c40f5744e46c319
所有这些代码,只是为了检查设备是否在线...可以通过 Google 发出某种事件或添加一种简单的方法来轻松处理。 加油GOOGLE一起加油!
我想就大量 Google 物联网核心设备中的任何一个缺少心跳(或收到 0 字节)发出警报。我似乎无法在 Stackdriver 中执行此操作。相反,它似乎让我在整个设备注册表上发出警报,但没有提供我正在寻找的东西(我怎么知道特定设备已断开连接?)
那么如何做到这一点呢?
我不知道为什么这个问题被否决为 'too broad'。
事实是 Google IOT 没有针对每个设备的警报,而是仅针对整个设备注册表提供警报。如果这不是真的,请回复此 post。明确说明这一点的页面 is here:
中假设的承诺中内置了每个设备警报的重要性Cloud IoT Core exports usage metrics that can be monitored programmatically or accessed via Stackdriver Monitoring. These metrics are aggregated at the device registry level. You can use Stackdriver to create dashboards or set up alerts.
Operational information about the health and functioning of devices is important to ensure that your data-gathering fabric is healthy and performing well. Devices might be located in harsh environments or in hard-to-access locations. Monitoring operational intelligence for your IoT devices is key to preserving the business-relevant data stream.
因此,如果许多分散在全球的设备中的一个设备断开连接,今天要获得警报并不容易。人们需要构建它,并且根据人们试图做的事情,这将需要不同的解决方案。
在我的例子中,如果最后一次心跳时间或最后一次事件状态发布早于 5 分钟,我想发出警报。为此,我需要 运行 一个扫描设备注册表并定期执行此操作的循环函数。此 API 的用法在另一个 SO post 中概述:
作为参考,这是我刚刚编写的一个 Firebase 函数,用于检查设备的在线状态,可能需要一些调整和进一步测试,但可以帮助其他人开始:
// Example code to call this function
// const checkDeviceOnline = functions.httpsCallable('checkDeviceOnline');
// Include 'current' key for 'current' online status to force update on db with delta
// const isOnline = await checkDeviceOnline({ deviceID: 'XXXX', current: true })
export const checkDeviceOnline = functions.https.onCall(async (data, context) => {
if (!context.auth) {
throw new functions.https.HttpsError('failed-precondition', 'You must be logged in to call this function!');
}
// deviceID is passed in deviceID object key
const deviceID = data.deviceID
const dbUpdate = (isOnline) => {
if (('wasOnline' in data) && data.wasOnline !== isOnline) {
db.collection("devices").doc(deviceID).update({ online: isOnline })
}
return isOnline
}
const deviceLastSeen = () => {
// We only want to use these to determine "latest seen timestamp"
const stamps = ["lastHeartbeatTime", "lastEventTime", "lastStateTime", "lastConfigAckTime", "deviceAckTime"]
return stamps.map(key => moment(data[key], "YYYY-MM-DDTHH:mm:ssZ").unix()).filter(epoch => !isNaN(epoch) && epoch > 0).sort().reverse().shift()
}
await dm.setAuth()
const iotDevice: any = await dm.getDevice(deviceID)
if (!iotDevice) {
throw new functions.https.HttpsError('failed-get-device', 'Failed to get device!');
}
console.log('iotDevice', iotDevice)
// If there is no error status and there is last heartbeat time, assume device is online
if (!iotDevice.lastErrorStatus && iotDevice.lastHeartbeatTime) {
return dbUpdate(true)
}
// Add iotDevice.config.deviceAckTime to root of object
// For some reason in all my tests, I NEVER receive anything on lastConfigAckTime, so this is my workaround
if (iotDevice.config && iotDevice.config.deviceAckTime) iotDevice.deviceAckTime = iotDevice.config.deviceAckTime
// If there is a last error status, let's make sure it's not a stale (old) one
const lastSeenEpoch = deviceLastSeen()
const errorEpoch = iotDevice.lastErrorTime ? moment(iotDevice.lastErrorTime, "YYYY-MM-DDTHH:mm:ssZ").unix() : false
console.log('lastSeen:', lastSeenEpoch, 'errorEpoch:', errorEpoch)
// Device should be online, the error timestamp is older than latest timestamp for heartbeat, state, etc
if (lastSeenEpoch && errorEpoch && (lastSeenEpoch > errorEpoch)) {
return dbUpdate(true)
}
// error status code 4 matches
// lastErrorStatus.code = 4
// lastErrorStatus.message = mqtt: SERVER: The connection was closed because MQTT keep-alive check failed.
// will also be 4 for other mqtt errors like command not sent (qos 1 not acknowledged, etc)
if (iotDevice.lastErrorStatus && iotDevice.lastErrorStatus.code && iotDevice.lastErrorStatus.code === 4) {
return dbUpdate(false)
}
return dbUpdate(false)
})
我还创建了一个与命令一起使用的函数,用于向设备发送命令以检查它是否在线:
export const isDeviceOnline = functions.https.onCall(async (data, context) => {
if (!context.auth) {
throw new functions.https.HttpsError('failed-precondition', 'You must be logged in to call this function!');
}
// deviceID is passed in deviceID object key
const deviceID = data.deviceID
await dm.setAuth()
const dbUpdate = (isOnline) => {
if (('wasOnline' in data) && data.wasOnline !== isOnline) {
console.log( 'updating db', deviceID, isOnline )
db.collection("devices").doc(deviceID).update({ online: isOnline })
} else {
console.log('NOT updating db', deviceID, isOnline)
}
return isOnline
}
try {
await dm.sendCommand(deviceID, 'alive?', 'alive')
console.log('Assuming device is online after succesful alive? command')
return dbUpdate(true)
} catch (error) {
console.log("Unable to send alive? command", error)
return dbUpdate(false)
}
})
这也使用了我修改后的 DeviceManager
版本,您可以在这个要点上找到所有示例代码(以确保使用最新更新,并在此处保持 post 小):
https://gist.github.com/tripflex/3eff9c425f8b0c037c40f5744e46c319
所有这些代码,只是为了检查设备是否在线...可以通过 Google 发出某种事件或添加一种简单的方法来轻松处理。 加油GOOGLE一起加油!