Azure 连续 Web 作业在某些情况下失败

Azure continuous webjob fails in some cases

我有一个连续的 webjob 运行 天蓝色,在 8 小时前进行更大规模的部署后,在某些情况下状态从未完成,而在其他情况下完成作业。 我启用了我能找到的所有日志记录,并且花了好几个小时试图找出问题所在。

我似乎能够找到的唯一日志错误信息来自 job_log,其中指出:

[11/15/2017 14:46:23 > e553e5: ERR ] Unhandled Exception: Microsoft.WindowsAzure.Storage.StorageException: The remote server returned an error: (404) Not Found. ---> System.Net.WebException: The remote server returned an error: (404) Not Found. [11/15/2017 14:46:23 > e553e5: ERR ] at Microsoft.WindowsAzure.Storage.Shared.Protocol.HttpResponseParsers.ProcessExpectedStatusCodeNoException[T](HttpStatusCode expectedStatusCode, HttpStatusCode actualStatusCode, T retVal, StorageCommandBase1 cmd, Exception ex) in c:\Program Files (x86)\Jenkins\workspace\release_dotnet_master\Lib\Common\Shared\Protocol\HttpResponseParsers.Common.cs:line 50 [11/15/2017 14:46:23 > e553e5: ERR ] at Microsoft.WindowsAzure.Storage.Blob.CloudBlob.<DeleteBlobImpl>b__33(RESTCommand1 cmd, HttpWebResponse resp, Exception ex, OperationContext ctx) in c:\Program Files (x86)\Jenkins\workspace\release_dotnet_master\Lib\ClassLibraryCommon\Blob\CloudBlob.cs:line 3349 [11/15/2017 14:46:23 > e553e5: ERR ] at Microsoft.WindowsAzure.Storage.Core.Executor.Executor.EndGetResponse[T](IAsyncResult getResponseResult) in c:\Program Files (x86)\Jenkins\workspace\release_dotnet_master\Lib\ClassLibraryCommon\Core\Executor\Executor.cs:line 299 [11/15/2017 14:46:23 > e553e5: ERR ] --- End of inner exception stack trace --- [11/15/2017 14:46:23 > e553e5: ERR ] at Microsoft.WindowsAzure.Storage.Core.Executor.Executor.EndExecuteAsync[T](IAsyncResult result) in c:\Program Files (x86)\Jenkins\workspace\release_dotnet_master\Lib\ClassLibraryCommon\Core\Executor\Executor.cs:line 50 [11/15/2017 14:46:23 > e553e5: ERR ] at Microsoft.WindowsAzure.Storage.Blob.CloudBlob.EndDelete(IAsyncResult asyncResult) in c:\Program Files (x86)\Jenkins\workspace\release_dotnet_master\Lib\ClassLibraryCommon\Blob\CloudBlob.cs:line 1729 [11/15/2017 14:46:23 > e553e5: ERR ] at Microsoft.WindowsAzure.Storage.Core.Util.AsyncExtensions.<>c__DisplayClass4.b__3(IAsyncResult ar) in c:\Program Files (x86)\Jenkins\workspace\release_dotnet_master\Lib\ClassLibraryCommon\Core\Util\AsyncExtensions.cs:line 114 [11/15/2017 14:46:23 > e553e5: ERR ] --- End of stack trace from previous location where exception was thrown --- [11/15/2017 14:46:23 > e553e5: ERR ] at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) [11/15/2017 14:46:23 > e553e5: ERR ] at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) [11/15/2017 14:46:23 > e553e5: ERR ] at Microsoft.Azure.WebJobs.Host.Protocols.PersistentQueueWriter1.<DeleteAsync>d__6.MoveNext() [11/15/2017 14:46:23 > e553e5: ERR ] --- End of stack trace from previous location where exception was thrown --- [11/15/2017 14:46:23 > e553e5: ERR ] at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) [11/15/2017 14:46:23 > e553e5: ERR ] at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) [11/15/2017 14:46:23 > e553e5: ERR ] at Microsoft.Azure.WebJobs.Host.Loggers.CompositeFunctionInstanceLogger.<DeleteLogFunctionStartedAsync>d__e.MoveNext() [11/15/2017 14:46:23 > e553e5: ERR ] --- End of stack trace from previous location where exception was thrown --- [11/15/2017 14:46:23 > e553e5: ERR ] at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) [11/15/2017 14:46:23 > e553e5: ERR ] at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) [11/15/2017 14:46:23 > e553e5: ERR ] at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.<TryExecuteAsync>d__1.MoveNext() [11/15/2017 14:46:23 > e553e5: ERR ] --- End of stack trace from previous location where exception was thrown --- [11/15/2017 14:46:23 > e553e5: ERR ] at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) [11/15/2017 14:46:23 > e553e5: ERR ] at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) [11/15/2017 14:46:23 > e553e5: ERR ] at Microsoft.Azure.WebJobs.Host.Executors.TriggeredFunctionExecutor1.d__0.MoveNext() [11/15/2017 14:46:23 > e553e5: ERR ] --- End of stack trace from previous location where exception was thrown --- [11/15/2017 14:46:23 > e553e5: ERR ] at Microsoft.Azure.WebJobs.Host.Timers.BackgroundExceptionDispatcher.<>c__DisplayClass1.b__0() [11/15/2017 14:46:23 > e553e5: ERR ] at System.Threading.ThreadHelper.ThreadStart_Context(Object state) [11/15/2017 14:46:23 > e553e5: ERR ] at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx) [11/15/2017 14:46:23 > e553e5: ERR ] at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx) [11/15/2017 14:46:23 > e553e5: ERR ] at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state) [11/15/2017 14:46:23 > e553e5: ERR ] at System.Threading.ThreadHelper.ThreadStart()

任何人都可以给我一些关于如何调试它的想法,因为我没有想法。

我的 webjobs 主要是这样的:

 static void Main()
    {
         var host = new JobHost();

        var config = new JobHostConfiguration();
        config.Queues.MaxPollingInterval = new TimeSpan(0,0,0,30);
        config.Queues.MaxDequeueCount = 3;
        // The following code ensures that the WebJob will be running continuously
        host.RunAndBlock();
    }

进程队列消息如下所示:

 public static void ProcessQueueMessage([QueueTrigger("importqueue")] string msg)
    {
        try
        {
            WorkerWebJobCore wwjc = new WorkerWebJobCore();
            wwjc.RunCore(msg, TableStorageAccessResources.ImportQueue,
                TableStorageAccessResources.TableStorageDataOneId,
                TableStorageAccessResources.TableStorageDataOnePassword);
        }
        catch (Exception e)
        {
            CommunicatorLog.Log.LogError("WebJobWorker","WebJobWorker","Error in processing queue message","ERRWJWF01");
        }
    }

所以我对所有事情都有把握,所以我不明白它怎么会失败?

提前致谢。

显然 运行 低于 Microsoft.Azure.Webjob 2.0.0 的版本使得无法获得有用的答案。 当我终于开始尝试安装该版本时,它向我指出了有用的错误消息的问题。

问题与关于 webjob 核心工作的 dll 版本错误有关

我的猜测是您的队列或存储本身中的文件出现问题。

它似乎试图删除不再存在的文件。或者可能 "larger" 正在被删除。

深入研究后,您的 WebJob 部署方式也可能存在问题。部署时有时可能会有所不同?看看这些:

https://github.com/Azure/azure-webjobs-sdk/issues/922

Azure WebJob QueueTrigger message is not deleted from queue

https://github.com/Azure/azure-webjobs-sdk/issues/645