Dot-Net
為什麼編譯的 RegEx 性能比 Intrepreted RegEx 慢?
我遇到了這篇文章:
性能:編譯與解釋正則表達式,我修改了範常式式碼以編譯 1000 個正則表達式,然後每個執行 500 次以利用預編譯,但即使在這種情況下,解釋正則表達式的執行速度也快 4 倍!
這意味著
RegexOptions.Compiled選項完全沒用,實際上更糟糕的是,它更慢!很大的不同是由於 JIT,在以下程式碼中解決了 JIT 編譯的正則表達式後仍然執行有點慢,對我來說沒有意義,但答案中的@Jim 提供了一個更乾淨的版本,可以按預期工作。誰能解釋為什麼會這樣?
從部落格文章中獲取和修改的程式碼:
using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Text.RegularExpressions; namespace RegExTester { class Program { static void Main(string[] args) { DateTime startTime = DateTime.Now; for (int i = 0; i < 1000; i++) { CheckForMatches("some random text with email address, address@domain200.com" + i.ToString()); } double msTaken = DateTime.Now.Subtract(startTime).TotalMilliseconds; Console.WriteLine("Full Run: " + msTaken); startTime = DateTime.Now; for (int i = 0; i < 1000; i++) { CheckForMatches("some random text with email address, address@domain200.com" + i.ToString()); } msTaken = DateTime.Now.Subtract(startTime).TotalMilliseconds; Console.WriteLine("Full Run: " + msTaken); Console.ReadLine(); } private static List<Regex> _expressions; private static object _SyncRoot = new object(); private static List<Regex> GetExpressions() { if (_expressions != null) return _expressions; lock (_SyncRoot) { if (_expressions == null) { DateTime startTime = DateTime.Now; List<Regex> tempExpressions = new List<Regex>(); string regExPattern = @"^[a-zA-Z0-9]+[a-zA-Z0-9._%-]*@{0}$"; for (int i = 0; i < 2000; i++) { tempExpressions.Add(new Regex( string.Format(regExPattern, Regex.Escape("domain" + i.ToString() + "." + (i % 3 == 0 ? ".com" : ".net"))), RegexOptions.IgnoreCase));// | RegexOptions.Compiled } _expressions = new List<Regex>(tempExpressions); DateTime endTime = DateTime.Now; double msTaken = endTime.Subtract(startTime).TotalMilliseconds; Console.WriteLine("Init:" + msTaken); } } return _expressions; } static List<Regex> expressions = GetExpressions(); private static void CheckForMatches(string text) { DateTime startTime = DateTime.Now; foreach (Regex e in expressions) { bool isMatch = e.IsMatch(text); } DateTime endTime = DateTime.Now; //double msTaken = endTime.Subtract(startTime).TotalMilliseconds; //Console.WriteLine("Run: " + msTaken); } } }
當按預期使用時,編譯的正則表達式匹配得更快。正如其他人指出的那樣,我們的想法是編譯一次並多次使用它們。構造和初始化時間在這些多次執行中攤銷。
我創建了一個更簡單的測試,它將向您展示編譯的正則表達式無疑比未編譯的要快。
const int NumIterations = 1000; const string TestString = "some random text with email address, address@domain200.com"; const string Pattern = "^[a-zA-Z0-9]+[a-zA-Z0-9._%-]*@domain0\\.\\.com$"; private static Regex NormalRegex = new Regex(Pattern, RegexOptions.IgnoreCase); private static Regex CompiledRegex = new Regex(Pattern, RegexOptions.IgnoreCase | RegexOptions.Compiled); private static Regex DummyRegex = new Regex("^.$"); static void Main(string[] args) { var DoTest = new Action<string, Regex, int>((s, r, count) => { Console.Write("Testing {0} ... ", s); Stopwatch sw = Stopwatch.StartNew(); for (int i = 0; i < count; ++i) { bool isMatch = r.IsMatch(TestString + i.ToString()); } sw.Stop(); Console.WriteLine("{0:N0} ms", sw.ElapsedMilliseconds); }); // Make sure that DoTest is JITed DoTest("Dummy", DummyRegex, 1); DoTest("Normal first time", NormalRegex, 1); DoTest("Normal Regex", NormalRegex, NumIterations); DoTest("Compiled first time", CompiledRegex, 1); DoTest("Compiled", CompiledRegex, NumIterations); Console.WriteLine(); Console.Write("Done. Press Enter:"); Console.ReadLine(); }設置
NumIterations為 500 給了我這個:Testing Dummy ... 0 ms Testing Normal first time ... 0 ms Testing Normal Regex ... 1 ms Testing Compiled first time ... 13 ms Testing Compiled ... 1 ms通過 500 萬次迭代,我得到:
Testing Dummy ... 0 ms Testing Normal first time ... 0 ms Testing Normal Regex ... 17,232 ms Testing Compiled first time ... 17 ms Testing Compiled ... 15,299 ms在這裡您可以看到編譯後的正則表達式比未編譯的版本至少快 10%。
有趣的是,如果
RegexOptions.IgnoreCase從正則表達式中刪除 500 萬次迭代的結果會更加驚人:Testing Dummy ... 0 ms Testing Normal first time ... 0 ms Testing Normal Regex ... 12,869 ms Testing Compiled first time ... 14 ms Testing Compiled ... 8,332 ms在這裡,編譯的正則表達式比未編譯的正則表達式快 35%。
在我看來,您引用的部落格文章只是一個有缺陷的測試。