將項目添加到字典的 LINQ 方法

July 3, 2017

我試圖通過在 C# 中實現 Peter Norvig 的拼寫校正器來了解更多關於 LINQ 的知識。

第一部分涉及獲取一個大的單詞文件（大約 100 萬個）並將其放入字典中，其中key是單詞，並且value是出現次數。

我通常會這樣做：

foreach (var word in allWords)                                                    
{           
   if (wordCount.ContainsKey(word))
       wordCount[word]++;
   else
       wordCount.Add(word, 1);
}

allWords在哪裡IEnumerable<string>

在 LINQ 中，我目前正在這樣做：

var wordCountLINQ = (from word in allWordsLINQ
                        group word by word
                        into groups
                        select groups).ToDictionary(g =&gt; g.Key, g =&gt; g.Count());

我通過查看所有字典來比較這兩個字典，<key, value>它們是相同的，所以它們產生了相同的結果。

foreach循環耗時3.82 秒，LINQ 查詢耗時4.49 秒

我正在使用 Stopwatch 類對其進行計時，並且我正在 RELEASE 模式下執行。我不認為表現不好我只是想知道是否有差異的原因。

我是在以低效的方式執行 LINQ 查詢還是遺漏了什麼？

**更新：**這是完整的基準程式碼範例：

public static void TestCode()
{
   //File can be downloaded from http://norvig.com/big.txt and consists of about a million words.
   const string fileName = @"path_to_file";
   var allWords = from Match m in Regex.Matches(File.ReadAllText(fileName).ToLower(), "[a-z]+", RegexOptions.Compiled)
                  select m.Value;

   var wordCount = new Dictionary&lt;string, int&gt;();
   var timer = new Stopwatch();            
   timer.Start();
   foreach (var word in allWords)                                                    
   {           
       if (wordCount.ContainsKey(word))
           wordCount[word]++;
       else
           wordCount.Add(word, 1);
   }
   timer.Stop();

   Console.WriteLine("foreach loop took {0:0.00} ms ({1:0.00} secs)\n",
           timer.ElapsedMilliseconds, timer.ElapsedMilliseconds / 1000.0);

   //Make LINQ use a different Enumerable (with the exactly the same values), 
   //if you don't it suddenly becomes way faster, which I assmume is a caching thing??
   var allWordsLINQ = from Match m in Regex.Matches(File.ReadAllText(fileName).ToLower(), "[a-z]+", RegexOptions.Compiled)
                  select m.Value;

   timer.Reset();
   timer.Start();
   var wordCountLINQ = (from word in allWordsLINQ
                           group word by word
                           into groups
                           select groups).ToDictionary(g =&gt; g.Key, g =&gt; g.Count());  
   timer.Stop();

   Console.WriteLine("LINQ took {0:0.00} ms ({1:0.00} secs)\n",
           timer.ElapsedMilliseconds, timer.ElapsedMilliseconds / 1000.0);                     
}

LINQ 版本較慢的原因之一是因為創建了兩個字典而不是一個字典：
（內部）來自 group by 運算符；group by 還儲存每個單詞。您可以通過查看 ToArray() 而不是 Count() 來驗證這一點。在您的情況下，這是您實際上不需要的很多成本。
ToDictionary 方法基本上是對實際 LINQ 查詢的 foreach，其中查詢的結果被添加到新字典中。根據唯一詞的數量，這也可能需要一些時間。
LINQ 查詢稍慢的另一個原因是因為 LINQ 依賴於 lambda 表達式（Dathan 的答案中的委託），並且與內聯程式碼相比，呼叫委託會增加少量成本。
**編輯：**請注意，對於某些 LINQ 方案（例如 LINQ to SQL，但不是記憶體中的 LINQ，例如此處），重寫查詢會產生更優化的計劃：
from word in allWordsLINQ 
group word by word into groups 
select new { Word = groups.Key, Count = groups.Count() }
但是請注意，這並沒有給你一個字典，而是一個單詞序列和它們的計數。您可以將其轉換為字典
(from word in allWordsLINQ 
group word by word into groups 
select new { Word = groups.Key, Count = groups.Count() })
.ToDictionary(g =&gt; g.Word, g =&gt; g.Count);

引用自：https://stackoverflow.com/questions/2118671

將項目添加到字典的 LINQ 方法

相關問答

在 VB.NET 中使用帶有匿名方法的 LINQ 的 ForEach

為什麼 .NET 中的匿名類型實現為引用類型？

為什麼 LINQ JOIN 比使用 WHERE 連結快得多？

.NET LINQ 查詢語法與方法鏈

LINQ 查詢是否有很多成本？

在 LINQ 查詢中呼叫 ToList() 或 ToArray() 更好嗎？