TPL 数据流向所有消费者重复消息
TPL Dataflow duplicate message to all consumers
我目前正在使用 WPF 和 TPL 数据流编写应用程序,它应该执行以下操作:
- 加载目录中的所有文件
- 一旦它开始处理,记录一些东西到 ui 并处理每个文件
- 完成后将一些内容记录到 ui
问题是 UI 的日志记录需要在 UI 线程中发生,并且只在它开始处理之前记录。
我现在能够做到这一点的唯一方法是从 TPL 转换块内部手动调用调度程序并更新 UI:
Application.Current.Dispatcher.Invoke(new Action(() =>
{
ProcessedFiles.Add(optimizedFileResult);
}));
我想通过 DataFlow 块来完成此操作,尽管它在 UI 线程上 运行ning 使用:
ExecutionDataflowBlockOptions.TaskScheduler = TaskScheduler.FromCurrentSynchronizationContext();
但是,如果我在进行优化的块上设置它,优化也将 运行 单线程。
另一方面,如果我在处理块之前创建一个新块并在那里调用它。它会在实际开始之前就开始说 "processing"。
示例代码
我创建了一些示例代码来重现此问题:
public class TplLoggingToUiIssue
{
public TplLoggingToUiIssue()
{
}
public IEnumerable<string> RecurseFiles()
{
for (int i = 0; i < 20; i++)
{
yield return i.ToString();
}
}
public async Task Go()
{
var block1 = new TransformBlock<string, string>(input =>
{
Console.WriteLine($"1: {input}");
return input;
}, new ExecutionDataflowBlockOptions()
{
MaxDegreeOfParallelism = 4,
BoundedCapacity = 10,
EnsureOrdered = false
});
var block2 = new TransformBlock<string, string>(input =>
{
Console.WriteLine($"2: {input}\t\t\tStarting {input} now (ui logging)");
return input;
}, new ExecutionDataflowBlockOptions()
{
//TaskScheduler = TaskScheduler.FromCurrentSynchronizationContext(), (Doesn't work in Console app, but you get the idea)
MaxDegreeOfParallelism = 1,
BoundedCapacity = 1,
EnsureOrdered = false
});
var block3 = new TransformBlock<string, string>(async input =>
{
Console.WriteLine($"3 start: {input}");
await Task.Delay(5000);
Console.WriteLine($"3 end: {input}");
return input;
}, new ExecutionDataflowBlockOptions()
{
MaxDegreeOfParallelism = 2,
BoundedCapacity = 10,
EnsureOrdered = false
});
var block4 = new ActionBlock<string>(input =>
{
Console.WriteLine($"4: {input}");
}, new ExecutionDataflowBlockOptions()
{
MaxDegreeOfParallelism = 1,
BoundedCapacity = 1,
EnsureOrdered = false
});
block1.LinkTo(block2, new DataflowLinkOptions() { PropagateCompletion = true });
block2.LinkTo(block3, new DataflowLinkOptions() { PropagateCompletion = true });
block3.LinkTo(block4, new DataflowLinkOptions() { PropagateCompletion = true });
var files = RecurseFiles();
await Task.Run(async () =>
{
foreach (var file in files)
{
Console.WriteLine($"Posting: {file}");
var result = await block1.SendAsync(file);
if (!result)
{
Console.WriteLine("Result is false!!!");
}
}
});
Console.WriteLine("Completing");
block1.Complete();
await block4.Completion;
Console.WriteLine("Done");
}
}
如果你 运行 这个样本(只有 6 'files'),你将得到以下输出:
Posting: 0
Posting: 1
Posting: 2
Posting: 3
Posting: 4
Posting: 5
1: 2
1: 1
1: 3
1: 0
1: 4
1: 5
2: 2 Starting 2 now (ui logging)
Completing
3 start: 2
2: 0 Starting 0 now (ui logging)
3 start: 0
2: 3 Starting 3 now (ui logging)
2: 1 Starting 1 now (ui logging)
2: 4 Starting 4 now (ui logging)
2: 5 Starting 5 now (ui logging)
3 end: 2
3 end: 0
3 start: 3
3 start: 1
4: 2
4: 0
3 end: 3
3 end: 1
4: 3
3 start: 4
3 start: 5
4: 1
3 end: 5
3 end: 4
4: 5
4: 4
Done
从这个输出中可以看出,它的记录开始得太早了。我也尝试过使用 Broadcast 块,但这会覆盖值,因此它们会丢失。
理想的情况是以某种方式让日志记录块等到处理块有容量,然后推送一项。
这是一个有点做作的方法,它通过异步 lambda 作为参数传递给 ActionBlock
.
的开始-完成事件得到增强
public static Func<TInput, Task> Enhance<TInput>(
Func<TInput, Task> action,
Action<TInput> onActionStarted = null,
Action<TInput> onActionFinished = null,
ISynchronizeInvoke synchronizingObject = null)
{
return async (item) =>
{
RaiseEvent(onActionStarted, item, synchronizingObject);
await action(item).ConfigureAwait(false);
RaiseEvent(onActionFinished, item, synchronizingObject);
};
}
private static void RaiseEvent<T>(Action<T> onEvent, T arg1,
ISynchronizeInvoke synchronizingObject)
{
if (onEvent == null) return;
if (synchronizingObject != null && synchronizingObject.InvokeRequired)
{
synchronizingObject.Invoke(onEvent, new object[] { arg1 });
}
else
{
onEvent(arg1);
}
}
用法示例:
private void Form_Load(object sender, EventArgs e)
{
var block = new ActionBlock<string>(Enhance<string>(async item =>
{
await Task.Delay(5000); // Simulate some lengthy asynchronous job
}, onActionStarted: item =>
{
this.Text = $"{item} started";
}, onActionFinished: item =>
{
ListBoxCompleted.Items.Add(item);
}, synchronizingObject: this), new ExecutionDataflowBlockOptions()
{
MaxDegreeOfParallelism = 2,
BoundedCapacity = 10,
EnsureOrdered = false
});
}
onActionStarted
和 onActionFinished
回调将在 UI 线程中为每个已处理的项目调用一次。
如其他答案所示,有几种方法可以解决这个问题。我想指出一个替代方案:为此使用 Progress<T>
。虽然它被设计为最适合与 Tasks 一起使用,但它也适用于 Dataflow,如下所示:
private void Form1_Load(object sender, EventArgs e)
{
var progressReporter = new Progress<string>();
progressReporter.ProgressChanged += (reporter, message) => label1.Text = message;
var b1 = new ActionBlock<string>((input) =>
{
((IProgress<string>)progressReporter).Report(input);
}, new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 10
});
b1.Post("a");
b1.Post("b");
b1.Post("c");
b1.Post("d");
}
总的来说,这看起来是一个干净的替代方案,无需为各个块添加一些管道。
可以在这个优秀的 blogpost
中找到更多信息
我目前正在使用 WPF 和 TPL 数据流编写应用程序,它应该执行以下操作:
- 加载目录中的所有文件
- 一旦它开始处理,记录一些东西到 ui 并处理每个文件
- 完成后将一些内容记录到 ui
问题是 UI 的日志记录需要在 UI 线程中发生,并且只在它开始处理之前记录。
我现在能够做到这一点的唯一方法是从 TPL 转换块内部手动调用调度程序并更新 UI:
Application.Current.Dispatcher.Invoke(new Action(() =>
{
ProcessedFiles.Add(optimizedFileResult);
}));
我想通过 DataFlow 块来完成此操作,尽管它在 UI 线程上 运行ning 使用:
ExecutionDataflowBlockOptions.TaskScheduler = TaskScheduler.FromCurrentSynchronizationContext();
但是,如果我在进行优化的块上设置它,优化也将 运行 单线程。
另一方面,如果我在处理块之前创建一个新块并在那里调用它。它会在实际开始之前就开始说 "processing"。
示例代码
我创建了一些示例代码来重现此问题:
public class TplLoggingToUiIssue
{
public TplLoggingToUiIssue()
{
}
public IEnumerable<string> RecurseFiles()
{
for (int i = 0; i < 20; i++)
{
yield return i.ToString();
}
}
public async Task Go()
{
var block1 = new TransformBlock<string, string>(input =>
{
Console.WriteLine($"1: {input}");
return input;
}, new ExecutionDataflowBlockOptions()
{
MaxDegreeOfParallelism = 4,
BoundedCapacity = 10,
EnsureOrdered = false
});
var block2 = new TransformBlock<string, string>(input =>
{
Console.WriteLine($"2: {input}\t\t\tStarting {input} now (ui logging)");
return input;
}, new ExecutionDataflowBlockOptions()
{
//TaskScheduler = TaskScheduler.FromCurrentSynchronizationContext(), (Doesn't work in Console app, but you get the idea)
MaxDegreeOfParallelism = 1,
BoundedCapacity = 1,
EnsureOrdered = false
});
var block3 = new TransformBlock<string, string>(async input =>
{
Console.WriteLine($"3 start: {input}");
await Task.Delay(5000);
Console.WriteLine($"3 end: {input}");
return input;
}, new ExecutionDataflowBlockOptions()
{
MaxDegreeOfParallelism = 2,
BoundedCapacity = 10,
EnsureOrdered = false
});
var block4 = new ActionBlock<string>(input =>
{
Console.WriteLine($"4: {input}");
}, new ExecutionDataflowBlockOptions()
{
MaxDegreeOfParallelism = 1,
BoundedCapacity = 1,
EnsureOrdered = false
});
block1.LinkTo(block2, new DataflowLinkOptions() { PropagateCompletion = true });
block2.LinkTo(block3, new DataflowLinkOptions() { PropagateCompletion = true });
block3.LinkTo(block4, new DataflowLinkOptions() { PropagateCompletion = true });
var files = RecurseFiles();
await Task.Run(async () =>
{
foreach (var file in files)
{
Console.WriteLine($"Posting: {file}");
var result = await block1.SendAsync(file);
if (!result)
{
Console.WriteLine("Result is false!!!");
}
}
});
Console.WriteLine("Completing");
block1.Complete();
await block4.Completion;
Console.WriteLine("Done");
}
}
如果你 运行 这个样本(只有 6 'files'),你将得到以下输出:
Posting: 0
Posting: 1
Posting: 2
Posting: 3
Posting: 4
Posting: 5
1: 2
1: 1
1: 3
1: 0
1: 4
1: 5
2: 2 Starting 2 now (ui logging)
Completing
3 start: 2
2: 0 Starting 0 now (ui logging)
3 start: 0
2: 3 Starting 3 now (ui logging)
2: 1 Starting 1 now (ui logging)
2: 4 Starting 4 now (ui logging)
2: 5 Starting 5 now (ui logging)
3 end: 2
3 end: 0
3 start: 3
3 start: 1
4: 2
4: 0
3 end: 3
3 end: 1
4: 3
3 start: 4
3 start: 5
4: 1
3 end: 5
3 end: 4
4: 5
4: 4
Done
从这个输出中可以看出,它的记录开始得太早了。我也尝试过使用 Broadcast 块,但这会覆盖值,因此它们会丢失。
理想的情况是以某种方式让日志记录块等到处理块有容量,然后推送一项。
这是一个有点做作的方法,它通过异步 lambda 作为参数传递给 ActionBlock
.
public static Func<TInput, Task> Enhance<TInput>(
Func<TInput, Task> action,
Action<TInput> onActionStarted = null,
Action<TInput> onActionFinished = null,
ISynchronizeInvoke synchronizingObject = null)
{
return async (item) =>
{
RaiseEvent(onActionStarted, item, synchronizingObject);
await action(item).ConfigureAwait(false);
RaiseEvent(onActionFinished, item, synchronizingObject);
};
}
private static void RaiseEvent<T>(Action<T> onEvent, T arg1,
ISynchronizeInvoke synchronizingObject)
{
if (onEvent == null) return;
if (synchronizingObject != null && synchronizingObject.InvokeRequired)
{
synchronizingObject.Invoke(onEvent, new object[] { arg1 });
}
else
{
onEvent(arg1);
}
}
用法示例:
private void Form_Load(object sender, EventArgs e)
{
var block = new ActionBlock<string>(Enhance<string>(async item =>
{
await Task.Delay(5000); // Simulate some lengthy asynchronous job
}, onActionStarted: item =>
{
this.Text = $"{item} started";
}, onActionFinished: item =>
{
ListBoxCompleted.Items.Add(item);
}, synchronizingObject: this), new ExecutionDataflowBlockOptions()
{
MaxDegreeOfParallelism = 2,
BoundedCapacity = 10,
EnsureOrdered = false
});
}
onActionStarted
和 onActionFinished
回调将在 UI 线程中为每个已处理的项目调用一次。
如其他答案所示,有几种方法可以解决这个问题。我想指出一个替代方案:为此使用 Progress<T>
。虽然它被设计为最适合与 Tasks 一起使用,但它也适用于 Dataflow,如下所示:
private void Form1_Load(object sender, EventArgs e)
{
var progressReporter = new Progress<string>();
progressReporter.ProgressChanged += (reporter, message) => label1.Text = message;
var b1 = new ActionBlock<string>((input) =>
{
((IProgress<string>)progressReporter).Report(input);
}, new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 10
});
b1.Post("a");
b1.Post("b");
b1.Post("c");
b1.Post("d");
}
总的来说,这看起来是一个干净的替代方案,无需为各个块添加一些管道。
可以在这个优秀的 blogpost
中找到更多信息