TPL 數據流管道設計基礎知識

March 11, 2014

我嘗試創建設計良好的 TPL 數據流管道，以優化系統資源的使用。我的項目是一個 HTML 解析器，它將解析後的值添加到 SQL Server DB 中。我已經有了未來管道的所有方法，現在我的問題是將它們放置在 Dataflow 塊中的最佳方法是什麼，我應該使用多少塊？一些方法受 CPU 限制，其中一些方法受 I/O 限制（從 Internet 載入，SQL Server DB 查詢）。現在我認為將每個 I/O 操作放在單獨的塊中是正確的方法，就像這個方案一樣：
在這種情況下設計管道的基本規則是什麼？

選擇如何劃分塊的一種方法是確定要獨立於其他部分縮放的部分。一個好的起點是將 CPU 綁定部分與 I/O 綁定部分分開。我會考慮合併最後兩個塊，因為它們都是 I/O 綁定的（可能是同一個數據庫）。

我發布了來自Concurrent Programming on Windows的範例通用管道。好的管道是平衡的管道，這意味著每個階段都不會在管道內出現瓶頸。根據範常式式碼，您可以創建盡可能多的執行緒來執行每個階段。

原始碼：

public class Pipeline&lt;TSource, TDest&gt; : IPipeline
{
 private readonly IPipelineStage[] _stages;

 public Pipeline(Func&lt;TSource, TDest&gt; transform, int degree) : 
    this (new IPipelineStage[0], transform, degree) {}

 internal Pipeline(IPipelineStage[] toCopy, Func&lt;TSource, TDest&gt; transform, int degree) 
 {
    _stages = new IPipelineStage[toCopy.Length] + 1;
    Array.Copy(toCopy, _stages, _stages.Length);
    _stages[_stages.Length - 1] = new PipelineStage(transform, degree);
 }

 public Pipeline&lt;TSource, TNew&gt; AddStage&lt;TNew&gt;(Func&lt;TDest, TNew&gt; transform, degree) 
 {
    return new Pipeline&lt;TSource, TNew&gt;(_stages, transform, degree);
 }

 public IEnumerator&lt;TDest&gt; GetEnumerator(IEnumerable&lt;TSrouce&gt; arg)
 {
    IEnumerable er = arg;
    CountdownEvent ev = null;

    for (int i = 0; i &lt; _stages.Length; i++)
      er = _stages[i].Start(er, ref ev);

    foreach (TDest elem in ef)
      yield return elem;
 }
}

class PipelineStage&lt;TInput, TOutput&gt; : IPipelineStage
{
  private readonly Func&lt;TInput, TOutput&gt; _transform;
  private readonly int _degree;

  internal PipelineStage(Func&lt;TInput, TOutput&gt; transform, int degree)
  {
     _transform = transform;
     _degree = degree;
  }

  internal IEnumerable Start(IEnumerable src)
  {
      //...
  }
}

interface IPipelineStage 
{
  IEnumerable Start(IEnumerable Src);
}

引用自：https://stackoverflow.com/questions/22297364

TPL 數據流管道設計基礎知識

相關問答

TPL 數據流和 Akka.net 有什麼區別？

在哪裡可以找到 4.0 的 TPL 數據流版本？

VS 2012 RC 中引用 TPL 數據流和 TPL 的問題

為什麼 Task.Delay() 允許無限延遲？

SynchronizationContext.IsWaitNotificationRequired 有什麼用？

將 CancellationToken 作為參數傳遞給 Task.Run 有什麼好處？