CPython 与 PyPy 都能用的XML解析器大决杀(XML parser performance in CPython 3.3 and PyPy 1.7)

Web应用开发 William 354浏览 0评论

In a recent article, I compared the performance of MiniDOM and the three ElementTree implementations ElementTree, cElementTree and lxml.etree for parsing XML in CPython 3.3. Given the utterly poor performance of the pure Python library MiniDOM in this competition, I decided to give it another chance and tried the same in PyPy 1.7. Because lxml.etree and cElementTree are not available on this platform, I only ran the tests with plain ElementTree and MiniDOM. I also report the original benchmark results for CPython below for comparison.

Parser performance of XML libraries in CPython 3.3 and PyPy 1.7


最近的文章中,我比较了在CPython 3.3中使用 MiniDOM以及ElementTree,cElementTree,lxml.etree在XML解析的性能差别。比较结果则是MiniDOM在纯Python环境下以绝对劣势落败,然而我还想在
PyPy1.7这个环境下也比较一下。因为 lxml.etree与cElementTree在该平台下是不可用的。所以这一次仅仅比较了ElementTree与MiniDOM。CPtyon环境的比较结果如下.


While I also provide numbers regarding the memory usage of each library in this comparison, they are not directly comparable between PyPy and CPython because of the different memory management of both platforms and because the overall memory that PyPy uses right from the start is much larger than for CPython. So the relative increase in memory may or may not be an accurate way to tell what each runtime does with the memory. However, it appears that PyPy manages to kill at least the severe memory problems of MiniDOM, as the total amount of memory used for the larger files is several times smaller than that used by CPython.

Memory usage of XML trees in CPython 3.3 and PyPy 1.7

值得注意的是我在这次比较中也比较了它们的内存使用量,这次没有把PyPy与CPython直接比较,是因为这两个平台在内存管理机制上是不一样的,而且PyPy在启动时所需要的内存也是远远多余CPython。所以直接拿它们比较肯定是不准确的。

So, what do I take from this benchmark? If you have legacy MiniDOM code lying around, you want PyPy to run it. It exhibits several times better performance in terms of memory and runtime. It also performs substantially better for ElementTree than the plain Python ElementTree in CPython.

However, for fast XML processing in general, the better performance of PyPy even for plain Python ElementTree is not really all that interesting, because it is still several times slower than cElementTree or lxml.etree in CPython. That means that you will often be able to process multiple files in CPython in the time that you need for just one in PyPy, even if your actual application code that does the processing manages to get a substantial JIT speed-up in PyPy. Even worse, the GIL in PyPy will keep your code from getting a parallel speedup that you usually get with multi-threaded processing in lxml and CPython, e.g. in a web server setting.

So, as always, the decision depends on what your actual application does and which library it uses. Do your own benchmarks.

那么我们可以从这比较结果图中看出什么呢?如果你已经习惯了 MiniDOM 编码的话,那么建议你在PyPy环境去运行MiniDOM,因为不管是在性能还是内存使用上,MiniDOM在PyPy环境下比其他几个XML解析器要好很多。

然而,如果只是以XML解析速度的角度看问题的话,意味着你能在CPython中同时解析好几个XML文件,相反在PyPy中只能一次性解析一个。

综上所述,你采用哪个XML解析器还是取决于程序运行的平台以及内存提供的多少,Benchmark表能告诉我们一切。

转载请注明:AspxHtml学习分享网 » CPython 与 PyPy 都能用的XML解析器大决杀(XML parser performance in CPython 3.3 and PyPy 1.7)

发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址