TPL数据流处理N条最新消息
TPL dataflow process N latest messages
我正在尝试创建某种队列来处理收到的 N 条最新消息。现在我有这个:
private static void SetupMessaging()
{
_messagingBroadcastBlock = new BroadcastBlock<string>(msg => msg, new ExecutionDataflowBlockOptions
{
//BoundedCapacity = 1,
EnsureOrdered = true,
MaxDegreeOfParallelism = 1,
MaxMessagesPerTask = 1
});
_messagingActionBlock = new ActionBlock<string>(msg =>
{
Console.WriteLine(msg);
Thread.Sleep(5000);
}, new ExecutionDataflowBlockOptions
{
BoundedCapacity = 2,
EnsureOrdered = true,
MaxDegreeOfParallelism = 1,
MaxMessagesPerTask = 1
});
_messagingBroadcastBlock.LinkTo(_messagingActionBlock, new DataflowLinkOptions { PropagateCompletion = true });
_messagingBroadcastBlock.LinkTo(DataflowBlock.NullTarget<string>());
}
问题是如果我 post 1,2,3,4,5 我会得到 1,2,5 但我希望它是 1,4,5。欢迎提出任何建议。
UPD 1
我能够使以下解决方案起作用
class FixedCapacityActionBlock<T>
{
private readonly ActionBlock<CancellableMessage<T>> _actionBlock;
private readonly ConcurrentQueue<CancellableMessage<T>> _inputCollection = new ConcurrentQueue<CancellableMessage<T>>();
private readonly int _maxQueueSize;
private readonly object _syncRoot = new object();
public FixedCapacityActionBlock(Action<T> act, ExecutionDataflowBlockOptions opt)
{
var options = new ExecutionDataflowBlockOptions
{
EnsureOrdered = opt.EnsureOrdered,
CancellationToken = opt.CancellationToken,
MaxDegreeOfParallelism = opt.MaxDegreeOfParallelism,
MaxMessagesPerTask = opt.MaxMessagesPerTask,
NameFormat = opt.NameFormat,
SingleProducerConstrained = opt.SingleProducerConstrained,
TaskScheduler = opt.TaskScheduler,
//we intentionally ignore this value
//BoundedCapacity = opt.BoundedCapacity
};
_actionBlock = new ActionBlock<CancellableMessage<T>>(cmsg =>
{
if (cmsg.CancellationTokenSource.IsCancellationRequested)
{
return;
}
act(cmsg.Message);
}, options);
_maxQueueSize = opt.BoundedCapacity;
}
public bool Post(T msg)
{
var fullMsg = new CancellableMessage<T>(msg);
//what if next task starts here?
lock (_syncRoot)
{
_inputCollection.Enqueue(fullMsg);
var itemsToDrop = _inputCollection.Skip(1).Except(_inputCollection.Skip(_inputCollection.Count - _maxQueueSize + 1));
foreach (var item in itemsToDrop)
{
item.CancellationTokenSource.Cancel();
CancellableMessage<T> temp;
_inputCollection.TryDequeue(out temp);
}
return _actionBlock.Post(fullMsg);
}
}
}
和
class CancellableMessage<T> : IDisposable
{
public CancellationTokenSource CancellationTokenSource { get; set; }
public T Message { get; set; }
public CancellableMessage(T msg)
{
CancellationTokenSource = new CancellationTokenSource();
Message = msg;
}
public void Dispose()
{
CancellationTokenSource?.Dispose();
}
}
虽然这有效并且确实完成了这项工作,但这个实现看起来很脏,也可能不是线程安全的。
TPL Dataflow
不太适合 Last N messages
,因为它意味着队列或管道 (FIFO),而不是堆栈 (LIFO)。您真的需要使用数据流库来执行此操作吗?
使用 ConcurrentStack<T>
, you just introduce one producer task, which posts to the stack, and one consumer task, which gets messages from stack while number of handled ones are lesser than N
(More about Producer-Consumer 更容易。
如果您需要 TPL Dataflow
,您可以在消费者任务中使用它来开始处理最后的消息,但不能在生产者中使用它,因为它确实不是它应该使用的方式。此外,还有其他一些具有基于事件架构的库,它们可能更适合您的问题。
这是一个 TransformBlock
和 ActionBlock
实现,只要收到较新的消息并且已达到 BoundedCapacity
限制,它就会丢弃其队列中最旧的消息。它的行为与 Channel
configured with BoundedChannelFullMode.DropOldest
.
非常相似
public static IPropagatorBlock<TInput, TOutput>
CreateTransformBlockDropOldest<TInput, TOutput>(
Func<TInput, Task<TOutput>> transform,
ExecutionDataflowBlockOptions dataflowBlockOptions = null,
IProgress<TInput> droppedMessages = null)
{
if (transform == null) throw new ArgumentNullException(nameof(transform));
dataflowBlockOptions = dataflowBlockOptions ?? new ExecutionDataflowBlockOptions();
var boundedCapacity = dataflowBlockOptions.BoundedCapacity;
var cancellationToken = dataflowBlockOptions.CancellationToken;
var queue = new Queue<TInput>(Math.Max(0, boundedCapacity));
var outputBlock = new BufferBlock<TOutput>(new DataflowBlockOptions()
{
BoundedCapacity = boundedCapacity,
CancellationToken = cancellationToken
});
if (boundedCapacity != DataflowBlockOptions.Unbounded)
dataflowBlockOptions.BoundedCapacity = checked(boundedCapacity * 2);
// After testing, at least boundedCapacity + 1 is required.
// Make it double to be sure that all non-dropped messages will be processed.
var transformBlock = new ActionBlock<object>(async _ =>
{
TInput item;
lock (queue)
{
if (queue.Count == 0) return;
item = queue.Dequeue();
}
var result = await transform(item).ConfigureAwait(false);
await outputBlock.SendAsync(result, cancellationToken).ConfigureAwait(false);
}, dataflowBlockOptions);
dataflowBlockOptions.BoundedCapacity = boundedCapacity; // Restore initial value
var inputBlock = new ActionBlock<TInput>(item =>
{
var droppedEntry = (Exists: false, Item: (TInput)default);
lock (queue)
{
transformBlock.Post(null);
if (queue.Count == boundedCapacity) droppedEntry = (true, queue.Dequeue());
queue.Enqueue(item);
}
if (droppedEntry.Exists) droppedMessages?.Report(droppedEntry.Item);
}, new ExecutionDataflowBlockOptions()
{
CancellationToken = cancellationToken
});
PropagateCompletion(inputBlock, transformBlock);
PropagateFailure(transformBlock, inputBlock);
PropagateCompletion(transformBlock, outputBlock);
_ = transformBlock.Completion.ContinueWith(_ => { lock (queue) queue.Clear(); },
TaskScheduler.Default);
return DataflowBlock.Encapsulate(inputBlock, outputBlock);
async void PropagateCompletion(IDataflowBlock source, IDataflowBlock target)
{
try { await source.Completion.ConfigureAwait(false); } catch { }
var exception = source.Completion.IsFaulted ? source.Completion.Exception : null;
if (exception != null) target.Fault(exception); else target.Complete();
}
async void PropagateFailure(IDataflowBlock source, IDataflowBlock target)
{
try { await source.Completion.ConfigureAwait(false); } catch { }
if (source.Completion.IsFaulted) target.Fault(source.Completion.Exception);
}
}
// Overload with synchronous lambda
public static IPropagatorBlock<TInput, TOutput>
CreateTransformBlockDropOldest<TInput, TOutput>(
Func<TInput, TOutput> transform,
ExecutionDataflowBlockOptions dataflowBlockOptions = null,
IProgress<TInput> droppedMessages = null)
{
return CreateTransformBlockDropOldest(item => Task.FromResult(transform(item)),
dataflowBlockOptions, droppedMessages);
}
// ActionBlock equivalent
public static ITargetBlock<TInput>
CreateActionBlockDropOldest<TInput>(
Func<TInput, Task> action,
ExecutionDataflowBlockOptions dataflowBlockOptions = null,
IProgress<TInput> droppedMessages = null)
{
if (action == null) throw new ArgumentNullException(nameof(action));
var block = CreateTransformBlockDropOldest<TInput, object>(
async item => { await action(item).ConfigureAwait(false); return null; },
dataflowBlockOptions, droppedMessages);
block.LinkTo(DataflowBlock.NullTarget<object>());
return block;
}
// ActionBlock equivalent with synchronous lambda
public static ITargetBlock<TInput>
CreateActionBlockDropOldest<TInput>(
Action<TInput> action,
ExecutionDataflowBlockOptions dataflowBlockOptions = null,
IProgress<TInput> droppedMessages = null)
{
return CreateActionBlockDropOldest(
item => { action(item); return Task.CompletedTask; },
dataflowBlockOptions, droppedMessages);
}
想法是将排队的项目存储在辅助 Queue
中,并将虚拟(空)值传递给内部 ActionBlock<object>
。该块忽略作为参数传递的项目,如果有的话,取而代之的是从队列中取一个项目。 αlock
用于确保队列中所有未丢弃的项目最终都会被处理(当然除非发生异常)。
还有一个额外的功能。可选的 IProgress<TInput>
droppedMessages
参数允许在每次丢弃消息时接收通知。
用法示例:
_messagingActionBlock = CreateActionBlockDropOldest<string>(msg =>
{
Console.WriteLine($"Processing: {msg}");
Thread.Sleep(5000);
}, new ExecutionDataflowBlockOptions
{
BoundedCapacity = 2,
}, new Progress<string>(msg =>
{
Console.WriteLine($"Message dropped: {msg}");
}));
我正在尝试创建某种队列来处理收到的 N 条最新消息。现在我有这个:
private static void SetupMessaging()
{
_messagingBroadcastBlock = new BroadcastBlock<string>(msg => msg, new ExecutionDataflowBlockOptions
{
//BoundedCapacity = 1,
EnsureOrdered = true,
MaxDegreeOfParallelism = 1,
MaxMessagesPerTask = 1
});
_messagingActionBlock = new ActionBlock<string>(msg =>
{
Console.WriteLine(msg);
Thread.Sleep(5000);
}, new ExecutionDataflowBlockOptions
{
BoundedCapacity = 2,
EnsureOrdered = true,
MaxDegreeOfParallelism = 1,
MaxMessagesPerTask = 1
});
_messagingBroadcastBlock.LinkTo(_messagingActionBlock, new DataflowLinkOptions { PropagateCompletion = true });
_messagingBroadcastBlock.LinkTo(DataflowBlock.NullTarget<string>());
}
问题是如果我 post 1,2,3,4,5 我会得到 1,2,5 但我希望它是 1,4,5。欢迎提出任何建议。
UPD 1
我能够使以下解决方案起作用
class FixedCapacityActionBlock<T>
{
private readonly ActionBlock<CancellableMessage<T>> _actionBlock;
private readonly ConcurrentQueue<CancellableMessage<T>> _inputCollection = new ConcurrentQueue<CancellableMessage<T>>();
private readonly int _maxQueueSize;
private readonly object _syncRoot = new object();
public FixedCapacityActionBlock(Action<T> act, ExecutionDataflowBlockOptions opt)
{
var options = new ExecutionDataflowBlockOptions
{
EnsureOrdered = opt.EnsureOrdered,
CancellationToken = opt.CancellationToken,
MaxDegreeOfParallelism = opt.MaxDegreeOfParallelism,
MaxMessagesPerTask = opt.MaxMessagesPerTask,
NameFormat = opt.NameFormat,
SingleProducerConstrained = opt.SingleProducerConstrained,
TaskScheduler = opt.TaskScheduler,
//we intentionally ignore this value
//BoundedCapacity = opt.BoundedCapacity
};
_actionBlock = new ActionBlock<CancellableMessage<T>>(cmsg =>
{
if (cmsg.CancellationTokenSource.IsCancellationRequested)
{
return;
}
act(cmsg.Message);
}, options);
_maxQueueSize = opt.BoundedCapacity;
}
public bool Post(T msg)
{
var fullMsg = new CancellableMessage<T>(msg);
//what if next task starts here?
lock (_syncRoot)
{
_inputCollection.Enqueue(fullMsg);
var itemsToDrop = _inputCollection.Skip(1).Except(_inputCollection.Skip(_inputCollection.Count - _maxQueueSize + 1));
foreach (var item in itemsToDrop)
{
item.CancellationTokenSource.Cancel();
CancellableMessage<T> temp;
_inputCollection.TryDequeue(out temp);
}
return _actionBlock.Post(fullMsg);
}
}
}
和
class CancellableMessage<T> : IDisposable
{
public CancellationTokenSource CancellationTokenSource { get; set; }
public T Message { get; set; }
public CancellableMessage(T msg)
{
CancellationTokenSource = new CancellationTokenSource();
Message = msg;
}
public void Dispose()
{
CancellationTokenSource?.Dispose();
}
}
虽然这有效并且确实完成了这项工作,但这个实现看起来很脏,也可能不是线程安全的。
TPL Dataflow
不太适合 Last N messages
,因为它意味着队列或管道 (FIFO),而不是堆栈 (LIFO)。您真的需要使用数据流库来执行此操作吗?
使用 ConcurrentStack<T>
, you just introduce one producer task, which posts to the stack, and one consumer task, which gets messages from stack while number of handled ones are lesser than N
(More about Producer-Consumer 更容易。
如果您需要 TPL Dataflow
,您可以在消费者任务中使用它来开始处理最后的消息,但不能在生产者中使用它,因为它确实不是它应该使用的方式。此外,还有其他一些具有基于事件架构的库,它们可能更适合您的问题。
这是一个 TransformBlock
和 ActionBlock
实现,只要收到较新的消息并且已达到 BoundedCapacity
限制,它就会丢弃其队列中最旧的消息。它的行为与 Channel
configured with BoundedChannelFullMode.DropOldest
.
public static IPropagatorBlock<TInput, TOutput>
CreateTransformBlockDropOldest<TInput, TOutput>(
Func<TInput, Task<TOutput>> transform,
ExecutionDataflowBlockOptions dataflowBlockOptions = null,
IProgress<TInput> droppedMessages = null)
{
if (transform == null) throw new ArgumentNullException(nameof(transform));
dataflowBlockOptions = dataflowBlockOptions ?? new ExecutionDataflowBlockOptions();
var boundedCapacity = dataflowBlockOptions.BoundedCapacity;
var cancellationToken = dataflowBlockOptions.CancellationToken;
var queue = new Queue<TInput>(Math.Max(0, boundedCapacity));
var outputBlock = new BufferBlock<TOutput>(new DataflowBlockOptions()
{
BoundedCapacity = boundedCapacity,
CancellationToken = cancellationToken
});
if (boundedCapacity != DataflowBlockOptions.Unbounded)
dataflowBlockOptions.BoundedCapacity = checked(boundedCapacity * 2);
// After testing, at least boundedCapacity + 1 is required.
// Make it double to be sure that all non-dropped messages will be processed.
var transformBlock = new ActionBlock<object>(async _ =>
{
TInput item;
lock (queue)
{
if (queue.Count == 0) return;
item = queue.Dequeue();
}
var result = await transform(item).ConfigureAwait(false);
await outputBlock.SendAsync(result, cancellationToken).ConfigureAwait(false);
}, dataflowBlockOptions);
dataflowBlockOptions.BoundedCapacity = boundedCapacity; // Restore initial value
var inputBlock = new ActionBlock<TInput>(item =>
{
var droppedEntry = (Exists: false, Item: (TInput)default);
lock (queue)
{
transformBlock.Post(null);
if (queue.Count == boundedCapacity) droppedEntry = (true, queue.Dequeue());
queue.Enqueue(item);
}
if (droppedEntry.Exists) droppedMessages?.Report(droppedEntry.Item);
}, new ExecutionDataflowBlockOptions()
{
CancellationToken = cancellationToken
});
PropagateCompletion(inputBlock, transformBlock);
PropagateFailure(transformBlock, inputBlock);
PropagateCompletion(transformBlock, outputBlock);
_ = transformBlock.Completion.ContinueWith(_ => { lock (queue) queue.Clear(); },
TaskScheduler.Default);
return DataflowBlock.Encapsulate(inputBlock, outputBlock);
async void PropagateCompletion(IDataflowBlock source, IDataflowBlock target)
{
try { await source.Completion.ConfigureAwait(false); } catch { }
var exception = source.Completion.IsFaulted ? source.Completion.Exception : null;
if (exception != null) target.Fault(exception); else target.Complete();
}
async void PropagateFailure(IDataflowBlock source, IDataflowBlock target)
{
try { await source.Completion.ConfigureAwait(false); } catch { }
if (source.Completion.IsFaulted) target.Fault(source.Completion.Exception);
}
}
// Overload with synchronous lambda
public static IPropagatorBlock<TInput, TOutput>
CreateTransformBlockDropOldest<TInput, TOutput>(
Func<TInput, TOutput> transform,
ExecutionDataflowBlockOptions dataflowBlockOptions = null,
IProgress<TInput> droppedMessages = null)
{
return CreateTransformBlockDropOldest(item => Task.FromResult(transform(item)),
dataflowBlockOptions, droppedMessages);
}
// ActionBlock equivalent
public static ITargetBlock<TInput>
CreateActionBlockDropOldest<TInput>(
Func<TInput, Task> action,
ExecutionDataflowBlockOptions dataflowBlockOptions = null,
IProgress<TInput> droppedMessages = null)
{
if (action == null) throw new ArgumentNullException(nameof(action));
var block = CreateTransformBlockDropOldest<TInput, object>(
async item => { await action(item).ConfigureAwait(false); return null; },
dataflowBlockOptions, droppedMessages);
block.LinkTo(DataflowBlock.NullTarget<object>());
return block;
}
// ActionBlock equivalent with synchronous lambda
public static ITargetBlock<TInput>
CreateActionBlockDropOldest<TInput>(
Action<TInput> action,
ExecutionDataflowBlockOptions dataflowBlockOptions = null,
IProgress<TInput> droppedMessages = null)
{
return CreateActionBlockDropOldest(
item => { action(item); return Task.CompletedTask; },
dataflowBlockOptions, droppedMessages);
}
想法是将排队的项目存储在辅助 Queue
中,并将虚拟(空)值传递给内部 ActionBlock<object>
。该块忽略作为参数传递的项目,如果有的话,取而代之的是从队列中取一个项目。 αlock
用于确保队列中所有未丢弃的项目最终都会被处理(当然除非发生异常)。
还有一个额外的功能。可选的 IProgress<TInput>
droppedMessages
参数允许在每次丢弃消息时接收通知。
用法示例:
_messagingActionBlock = CreateActionBlockDropOldest<string>(msg =>
{
Console.WriteLine($"Processing: {msg}");
Thread.Sleep(5000);
}, new ExecutionDataflowBlockOptions
{
BoundedCapacity = 2,
}, new Progress<string>(msg =>
{
Console.WriteLine($"Message dropped: {msg}");
}));