在 Azure Databricks Notebook 上检索集群不活动时间

Retrieve Cluster Inactivity Time on Azure Databricks Notebook

我是 Azure Databricks 的新手,我正在将它用于一个项目。

正如 Automatic termination 的文档中提到的 here 它说

您还可以为集群设置自动终止。 在集群创建期间,您可以指定 inactivity period 分钟后您希望集群终止。 如果集群上 current timelast command 运行 之间的差异超过指定的不活动时间, Azure Databricks 自动终止该集群。

是否有一种解决方法可以通过 Cluster API 或任何其他方法?

# Function to retrieve cluster inactivity time
from datetime import datetime
import time

def cluster_inactivity_time(log_file_path):
 
  # Open log4j-active.log and read last line
  with open(log_file_path, "r") as file:
    first_line = file.readline()
    for last_line in file:
        pass
      
  # Convert last lines' timestamp to milliseconds
  last_run_time = last_line[9:17]
  current_date = datetime.now().strftime('%Y-%m-%d')
  last_run_datetime = round(datetime.strptime(current_date + ' ' + last_run_time, "%Y-%m-%d %H:%M:%S").timestamp() * 1000)
  
  # Finding the difference between current time and last command run time
  current_time = round(time.time() * 1000)
  difference = current_time - last_run_datetime
  inactivity_time = datetime.fromtimestamp(difference / 1000.0)
  print(f'The Cluster has been Inactive for {inactivity_time.hour}:{inactivity_time.minute}:{inactivity_time.second}')


# Function Call
log_file_path = '/dbfs/cluster-logs/0809-101642-leap143/driver/log4j-active.log'
cluster_inactivity_time(log_file_path)

输出:

集群已停用 0:0:35