取出除最后一个满足条件的所有项目?
Take all items except the last ones that satisfy condition?
我的具体要求是我有一个 IEnumerable<IEnumerable<string>>
,我想 "take" 外部枚举中的所有项目,但任何 "empty" 尾随项目除外,其中 "empty"表示所有字符串都是 null/empty 或内部枚举为空。请注意,我想保留在最后一个非空项之前出现的所有空项。例如:
Item 1: a, b, c
Item 2: (nothing)
Item 3: a, f, g, h
Item 4: (nothing)
Item 5: (nothing)
我想保留项目 1–3,但 trim 项目 4 和 5。
在更一般的意义上,我有一个项目枚举,我想 trim 任何满足条件的尾随项目,这些项目出现在最后一个不满足条件的项目之后。
为了选择合适的解决方案,我可能会补充一点,外部枚举通常包含几百到几十万项,而内部枚举每个仅包含几项。我可能只需要一些空项目 trim。
我目前的解决方案是将所有外部项目放在一个列表中(在用 .Select(...)
转换它们之后),然后在循环中不断删除最后一个项目(如果它是空的),直到找到一个非空项目。
这个怎么样?
var trimmedItems = items.Reverse().SkipWhile(e => !e.Any()).Reverse();
如果您有非常大的数据集,这将需要比您可以想出的其他一些解决方案更多的内存,但它很容易阅读和遵循。
juharr 的建议只是稍微复杂一点,如果你有大量的项目,效果会更好:
var trimmedItems = items.Take(items.Reverse().TakeWhile(e => !e.Any()).Count());
这是我正在使用的基准测试代码。它意味着 LINQPad 中的 运行,但您可以更改 result.Dump();
调用以将结果输出到控制台或其他内容(如果您愿意)。另外,为了简单起见,我使用 IEnumerable<string>
而不是 IEnumerable<IEnumerable<string>>
,但这不应该影响算法的性能:
/* This is a benchmarking template I use in LINQPad when I want to do a
* quick performance test. Just give it a couple of actions to test and
* it will give you a pretty good idea of how long they take compared
* to one another. It's not perfect: You can expect a 3% error margin
* under ideal circumstances. But if you're not going to improve
* performance by more than 3%, you probably don't care anyway.*/
void Main()
{
// Enter setup code here
var items = new[] { "a, b, c",
"",
"a, f, g, h",
"",
""}.AsEnumerable();
var manyitems = Enumerable.Range(1, 10000).SelectMany(i => items);
var actions = new[]
{
new TimedAction("Control", () =>
{
// ToList() is the one thing that all of these have to do.
manyitems.ToList();
}),
new TimedAction("Reverse().SkipWhile().Reverse()", () =>
{
manyitems.Reverse().SkipWhile(e => !e.Any()).Reverse().ToList();
}),
new TimedAction("Take(Reverse().TakeWhile().Count())", () =>
{
manyitems.Take(manyitems.Reverse().TakeWhile(e => !e.Any()).Count()).ToList();
}),
new TimedAction("SkipLastWhile", () =>
{
manyitems.SkipLastWhile(e => !e.Any()).ToList();
}),
// Add tests as desired
};
const int TimesToRun = 100; // Tweak this as necessary
TimeActions(TimesToRun, actions);
}
public static class EnumerableExtensions
{
public static IEnumerable<T> SkipLastWhile<T>(this IEnumerable<T> source, Func<T, bool> predicate)
{
var skipBuffer = new List<T>();
foreach (var item in source)
{
if (predicate(item))
skipBuffer.Add(item);
else
{
foreach (var skipped in skipBuffer)
yield return skipped;
skipBuffer.Clear();
yield return item;
}
}
}
}
#region timer helper methods
// Define other methods and classes here
public void TimeActions(int iterations, params TimedAction[] actions)
{
Stopwatch s = new Stopwatch();
int length = actions.Length;
var results = new ActionResult[actions.Length];
// Perform the actions in their initial order.
for (int i = 0; i < length; i++)
{
var action = actions[i];
var result = results[i] = new ActionResult { Message = action.Message };
// Do a dry run to get things ramped up/cached
result.DryRun1 = s.Time(action.Action, 10);
result.FullRun1 = s.Time(action.Action, iterations);
}
// Perform the actions in reverse order.
for (int i = length - 1; i >= 0; i--)
{
var action = actions[i];
var result = results[i];
// Do a dry run to get things ramped up/cached
result.DryRun2 = s.Time(action.Action, 10);
result.FullRun2 = s.Time(action.Action, iterations);
}
results.Dump();
}
public class ActionResult
{
public string Message { get; set; }
public double DryRun1 { get; set; }
public double DryRun2 { get; set; }
public double FullRun1 { get; set; }
public double FullRun2 { get; set; }
}
public class TimedAction
{
public TimedAction(string message, Action action)
{
Message = message;
Action = action;
}
public string Message { get; private set; }
public Action Action { get; private set; }
}
public static class StopwatchExtensions
{
public static double Time(this Stopwatch sw, Action action, int iterations)
{
sw.Restart();
for (int i = 0; i < iterations; i++)
{
action();
}
sw.Stop();
return sw.Elapsed.TotalMilliseconds;
}
}
#endregion
结果:
如果您的 IEnumerable
由列表支持,基准测试结果会更深刻,因为 LINQ 可以对 Reverse() 进行一些额外的优化:
var manyitems = Enumerable.Range(1, 10000).SelectMany(i => items).ToList().AsEnumerable();
没有标准高效的 LINQ 解决方案。我会使用这样的自定义扩展 "LINQ like" 方法:
public static class EnumerableExtensions
{
public static IEnumerable<T> SkipLastWhile<T>(this IEnumerable<T> source, Func<T, bool> predicate)
{
var skipBuffer = new List<T>();
foreach (var item in source)
{
if (predicate(item))
skipBuffer.Add(item);
else
{
if (skipBuffer.Count > 0)
{
foreach (var skipped in skipBuffer)
yield return skipped;
skipBuffer.Clear();
}
yield return item;
}
}
}
}
它需要额外的 space 来缓冲满足跳过谓词的最长项目序列,而 LINQ Reverse
方法必须缓冲整个输入序列。
用法为:
var result = input.SkipLastWhile(e => !e.Any());
我的具体要求是我有一个 IEnumerable<IEnumerable<string>>
,我想 "take" 外部枚举中的所有项目,但任何 "empty" 尾随项目除外,其中 "empty"表示所有字符串都是 null/empty 或内部枚举为空。请注意,我想保留在最后一个非空项之前出现的所有空项。例如:
Item 1: a, b, c
Item 2: (nothing)
Item 3: a, f, g, h
Item 4: (nothing)
Item 5: (nothing)
我想保留项目 1–3,但 trim 项目 4 和 5。
在更一般的意义上,我有一个项目枚举,我想 trim 任何满足条件的尾随项目,这些项目出现在最后一个不满足条件的项目之后。
为了选择合适的解决方案,我可能会补充一点,外部枚举通常包含几百到几十万项,而内部枚举每个仅包含几项。我可能只需要一些空项目 trim。
我目前的解决方案是将所有外部项目放在一个列表中(在用 .Select(...)
转换它们之后),然后在循环中不断删除最后一个项目(如果它是空的),直到找到一个非空项目。
这个怎么样?
var trimmedItems = items.Reverse().SkipWhile(e => !e.Any()).Reverse();
如果您有非常大的数据集,这将需要比您可以想出的其他一些解决方案更多的内存,但它很容易阅读和遵循。
juharr 的建议只是稍微复杂一点,如果你有大量的项目,效果会更好:
var trimmedItems = items.Take(items.Reverse().TakeWhile(e => !e.Any()).Count());
这是我正在使用的基准测试代码。它意味着 LINQPad 中的 运行,但您可以更改 result.Dump();
调用以将结果输出到控制台或其他内容(如果您愿意)。另外,为了简单起见,我使用 IEnumerable<string>
而不是 IEnumerable<IEnumerable<string>>
,但这不应该影响算法的性能:
/* This is a benchmarking template I use in LINQPad when I want to do a
* quick performance test. Just give it a couple of actions to test and
* it will give you a pretty good idea of how long they take compared
* to one another. It's not perfect: You can expect a 3% error margin
* under ideal circumstances. But if you're not going to improve
* performance by more than 3%, you probably don't care anyway.*/
void Main()
{
// Enter setup code here
var items = new[] { "a, b, c",
"",
"a, f, g, h",
"",
""}.AsEnumerable();
var manyitems = Enumerable.Range(1, 10000).SelectMany(i => items);
var actions = new[]
{
new TimedAction("Control", () =>
{
// ToList() is the one thing that all of these have to do.
manyitems.ToList();
}),
new TimedAction("Reverse().SkipWhile().Reverse()", () =>
{
manyitems.Reverse().SkipWhile(e => !e.Any()).Reverse().ToList();
}),
new TimedAction("Take(Reverse().TakeWhile().Count())", () =>
{
manyitems.Take(manyitems.Reverse().TakeWhile(e => !e.Any()).Count()).ToList();
}),
new TimedAction("SkipLastWhile", () =>
{
manyitems.SkipLastWhile(e => !e.Any()).ToList();
}),
// Add tests as desired
};
const int TimesToRun = 100; // Tweak this as necessary
TimeActions(TimesToRun, actions);
}
public static class EnumerableExtensions
{
public static IEnumerable<T> SkipLastWhile<T>(this IEnumerable<T> source, Func<T, bool> predicate)
{
var skipBuffer = new List<T>();
foreach (var item in source)
{
if (predicate(item))
skipBuffer.Add(item);
else
{
foreach (var skipped in skipBuffer)
yield return skipped;
skipBuffer.Clear();
yield return item;
}
}
}
}
#region timer helper methods
// Define other methods and classes here
public void TimeActions(int iterations, params TimedAction[] actions)
{
Stopwatch s = new Stopwatch();
int length = actions.Length;
var results = new ActionResult[actions.Length];
// Perform the actions in their initial order.
for (int i = 0; i < length; i++)
{
var action = actions[i];
var result = results[i] = new ActionResult { Message = action.Message };
// Do a dry run to get things ramped up/cached
result.DryRun1 = s.Time(action.Action, 10);
result.FullRun1 = s.Time(action.Action, iterations);
}
// Perform the actions in reverse order.
for (int i = length - 1; i >= 0; i--)
{
var action = actions[i];
var result = results[i];
// Do a dry run to get things ramped up/cached
result.DryRun2 = s.Time(action.Action, 10);
result.FullRun2 = s.Time(action.Action, iterations);
}
results.Dump();
}
public class ActionResult
{
public string Message { get; set; }
public double DryRun1 { get; set; }
public double DryRun2 { get; set; }
public double FullRun1 { get; set; }
public double FullRun2 { get; set; }
}
public class TimedAction
{
public TimedAction(string message, Action action)
{
Message = message;
Action = action;
}
public string Message { get; private set; }
public Action Action { get; private set; }
}
public static class StopwatchExtensions
{
public static double Time(this Stopwatch sw, Action action, int iterations)
{
sw.Restart();
for (int i = 0; i < iterations; i++)
{
action();
}
sw.Stop();
return sw.Elapsed.TotalMilliseconds;
}
}
#endregion
结果:
如果您的 IEnumerable
由列表支持,基准测试结果会更深刻,因为 LINQ 可以对 Reverse() 进行一些额外的优化:
var manyitems = Enumerable.Range(1, 10000).SelectMany(i => items).ToList().AsEnumerable();
没有标准高效的 LINQ 解决方案。我会使用这样的自定义扩展 "LINQ like" 方法:
public static class EnumerableExtensions
{
public static IEnumerable<T> SkipLastWhile<T>(this IEnumerable<T> source, Func<T, bool> predicate)
{
var skipBuffer = new List<T>();
foreach (var item in source)
{
if (predicate(item))
skipBuffer.Add(item);
else
{
if (skipBuffer.Count > 0)
{
foreach (var skipped in skipBuffer)
yield return skipped;
skipBuffer.Clear();
}
yield return item;
}
}
}
}
它需要额外的 space 来缓冲满足跳过谓词的最长项目序列,而 LINQ Reverse
方法必须缓冲整个输入序列。
用法为:
var result = input.SkipLastWhile(e => !e.Any());