Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Special pages
ANRC Universe
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Module talk:Namespace detect
(section)
Add topic
Module
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
Add topic
View history
General
What links here
Related changes
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Possible performance issues with Module:Namespace_detect == :<small>Moved here from my user talk page. β '''''[[User:Mr. Stradivarius|<span style="color: #194D00; font-family: Palatino, Times, serif">Mr. Stradivarius</span>]]''''' <sup>[[User talk:Mr. Stradivarius|βͺ talk βͺ]]</sup> 09:37, 20 March 2014 (UTC)</small> Hi. Sorry for the bother. I just wanted to bring this matter to your attention. [[Module:Namespace_detect]] appears to be at the root of some performance issues. For example, I believe it is because of [[Module:Namespace_detect]] that it takes about 30 seconds to preview an edit on [[List_of_Unicode_characters]] [[List_of_Unicode_characters]] is a large page. However, I believe most of the time is spent by several of the Unicode charts calling [[Module:Namespace_detect]] indirectly through [[Template:Lang]]. For example: * [[Template:Unicode_chart_Kannada]] has 86 [[Template:Lang]] calls. EX: <nowiki>{{lang|kn|ಂ}}</nowiki> * Each [[Template:Lang]] calls [[Template:Category handler]] * [[Template:Category_handler]] is just a wrapper around [[Module:Category_handler]] * [[Module:Category_handler]] has a _main function that calls nsDetect.getParamMappings() * getParamMappings in [[Module:Namespace_detect]] has this block of code <syntaxhighlight lang=lua> for nsid, ns in pairs(mw.site.subjectNamespaces) do if nsid ~= 0 then -- Exclude main namespace. local nsname = mw.ustring.lower(ns.name) local canonicalName = mw.ustring.lower(ns.canonicalName) mappings[nsname] = {nsname} if canonicalName ~= nsname then table.insert(mappings[nsname], canonicalName) end for _, alias in ipairs(ns.aliases) do table.insert(mappings[nsname], mw.ustring.lower(alias)) end end end </syntaxhighlight> * Note that this code loops 15 times (as there are 15 subject namespaces) * Each loop will call mw.ustring.lower 2 times (for simplicity's sake, we can ignore the mw.ustring.lower(alias)) * So, for [[Template:Unicode_chart_Kannada]], there will be 2,580 calls to mw.ustring.lower (86 * 15 * 2) * There are several more Unicode charts on the page that call lang: for example [[Template:Unicode chart Myanmar]] * All told, there are approximately 100,000 calls to mw.ustring.lower from one edit to this page * Although mw.ustring.lower and Language:lc are relatively simple procs, there are overhead costs with going back and forth between Lua / PHP. Even at 3,300 calls per second, it will take the aforementioned 30 seconds to preview an edit I also have reason to believe that variations of this situation are repeated elsewhere on other pages. For context, I am the developer of [http://xowa.sourceforge.net XOWA] (an offline wiki app), and I do monthly parses of English Wikipedia's 4.4 million mainspace articles. Module:Namespace_detect is flagged as one of the most time-consuming #invokes. I never understood why, until tonight. As a recommendation, you could move getParamMappings to a new [[Module:Namespace_detect/Data]] page and use "return mw.loadData('Module:Namespace_detect/Data')". This change would be straightforward and improve performance, though you would have to change the Module to preserve the portable cfg table. Let me know if you need more info. Thanks. [[User_talk:Gnosygnu|gnosygnu]] 08:28, 20 March 2014 (UTC) :This should be at some template/module talk page, with just a link here. Perhaps Mr.S would like to move all this? I have not looked at the issue, but the requirement to detect the namespace multiple times should be removed (obviously!). After ten seconds research, I'm guessing that {{tl|Lang}} detects the namespace each time it is invoked in order to decide whether to add a category. That's way over-the-top on a page like [[List of Unicode characters]] that is apparently using {{tl|Lang}} many times. A clever template coder should add a parameter which can be used to omit all that overhead. [[User:Johnuniq|Johnuniq]] ([[User talk:Johnuniq|talk]]) 09:27, 20 March 2014 (UTC) :{{ec}} Hi Gnosygnu, and thank you for the detailed analysis! I wrote [[Module:Namespace detect]] back when I wasn't that experienced with Lua, and I was actually thinking that I should go back and have a look at it now that I have a few more modules under my belt. While it (maybe, hopefully) should be faster than the old wikitext version, we can obviously make things a lot better. I'll have a look at it, implement your suggestions, and see if there are any other changes that might need making. Best β '''''[[User:Mr. Stradivarius|<span style="color: #194D00; font-family: Palatino, Times, serif">Mr. Stradivarius</span>]]''''' <sup>[[User talk:Mr. Stradivarius|βͺ talk βͺ]]</sup> 09:34, 20 March 2014 (UTC) :: Sorry about the wrong placement, and thanks for moving it.[[User_talk:Gnosygnu|gnosygnu]] 03:08, 21 March 2014 (UTC) {{edit protected|ans=y}} Please replace the contents of the page with [https://en.wikipedia.org/w/index.php?title=Module:Namespace_detect/sandbox&oldid=600467500 this]. This will cause getParamMappings to only run once per page, rather than once per #invoke, per the above discussion. [[User:Jackmcbarn|Jackmcbarn]] ([[User talk:Jackmcbarn|talk]]) 16:20, 20 March 2014 (UTC) :Hold on a second - I have some other changes I'd like to make to the code before we put this up live. At the moment we are expanding all the arguments we are passed, for example. It would be better to only expand the ones necessary for the namespace the page is called from. And I'd like to implement Helder's suggestion above as well. β '''''[[User:Mr. Stradivarius|<span style="color: #194D00; font-family: Palatino, Times, serif">Mr. Stradivarius</span>]]''''' <sup>[[User talk:Mr. Stradivarius|βͺ talk βͺ]]</sup> 02:32, 21 March 2014 (UTC) : Thanks for making the proposed changes. I made two changes now: :* I re-added p.getParamMappings since it is public and other Modules call it. For example, [[Module:Category_handler]] has this: <code>local mappings = nsDetect.getParamMappings()</code> :* <s>I wasn't sure if the <code>function p.table(frame)</code> was supposed to change. I reverted it back to the current version. Feel free to revert back your version if the change was deliberate.</s> It looks like this was deliberate. My apologies. :Otherwise, it tested fine in a limited test on my machine. :FWIW: my initial estimate of 100,000 calls looks like it is closer to 85,000 calls. This means there were roughly 2,833 calls to Module:Namespace_detect. (85,000 / (15 * 2)) :The page still does roughly 27,000 calls b/c of the matchesBlacklist function. This correlates with the above figure: 27,000 calls / 9 blacklisted terms => 3,000 calls to Module:Category_handler. I don't know if this can be fixed easily, as matchesBlacklist can't be made a static variable and no_cat = false is the default :At any rate, I'm hoping this should drop [[List_of_Unicode_characters]] to somewhere between 6 and 8 seconds to render (vs approximately 30 now)[[User_talk:Gnosygnu|gnosygnu]] 03:08, 21 March 2014 (UTC) :: You could probably use [[mw:Extension:TemplateSandbox]] to test it. (Also, I'm sure [[mw:Help:Extension:TemplateSandbox|TemplateSandbox's documentation]] could use some love if you're willing to help.) [[User:Anomie|Anomie]][[User talk:Anomie|β]] 12:26, 22 March 2014 (UTC) ::: Thanks {{ping|Anomie}} for the link. I didn't know about this Extension before. ::: I tried it now<s>, and got mixed results</s>. I'm still getting the same profile render time (25 seconds), but the Lua Profile table is clearly different. ::: [https://en.wikipedia.org/w/index.php?title=List_of_Unicode_characters&action=edit List_of_Unicode_characters] <pre> Lua Profile: Scribunto_LuaSandboxCallback::getAllExpandedArguments 4540 ms 49.3% Scribunto_LuaSandboxCallback::lc 1660 ms 18.0% Scribunto_LuaSandboxCallback::match 1140 ms 12.4% recursiveClone <mw.lua:109> 620 ms 6.7% Scribunto_LuaSandboxCallback::gsub 360 ms 3.9% type 100 ms 1.1% (for generator) 100 ms 1.1% <mw.language.lua:87> 80 ms 0.9% getParamMappings <Module:Namespace_detect:69> 80 ms 0.9% Scribunto_LuaSandboxCallback::loadPackage 60 ms 0.7% [others] 460 ms 5.0% </pre> ::: [https://en.wikipedia.org/wiki/Special:TemplateSandbox?prefix=User%3AGnosygnu%2Fsandbox&page=List_of_Unicode_characters Sandbox] <pre> Lua Profile: Scribunto_LuaSandboxCallback::getAllExpandedArguments 3420 ms 57.0% Scribunto_LuaSandboxCallback::match 760 ms 12.7% recursiveClone <mw.lua:109> 540 ms 9.0% Scribunto_LuaSandboxCallback::gsub 320 ms 5.3% dataWrapper <mw.lua:698> 140 ms 2.3% (for generator) 140 ms 2.3% Scribunto_LuaSandboxCallback::getExpandedArgument 140 ms 2.3% type 120 ms 2.0% Scribunto_LuaSandboxCallback::lc 60 ms 1.0% <mw.title.lua:50> 60 ms 1.0% [others] 300 ms 5.0% </pre> ::: Clearly the sandbox is picking up the new changes (lc falls from 18% to 1%). I'm pretty sure I'm using the correct prefix: User:Gnosygnu/sandbox. If I use an incorrect prefix, such as User:Gnosygnu/sandbox_invalid, I get the same Lua Profile as the current page. ::: <s>I'll play with this some more later, but I just wanted to let you know.</s> After further investigation, I'm beginning to think that the multiple lc Scribunto calls have a more dramatic effect in LuaStandalone than LuaSandbox -- presumably because LuaStandalone serializes all messages back and forth. As such, Special:TemplateSandbox may very well be correct. The new change will reduce the number of lc calls, but won't have any real meaningful effect (maybe 1 second faster). Any performance issues with the current page might be due to regular Template expansion. [[User_talk:Gnosygnu|gnosygnu]] 16:47, 22 March 2014 (UTC) {{ping|Gnosygnu|Jackmcbarn}} I've finished making my changes to [[Module:Namespace detect/sandbox]]: * I've moved the configuration to a separate page, [[Module:Namespace detect/config]]. This is to try make the distinction between code and configuration clearer. * I have also implemented Helder's suggestion - now the parameter config values can be specified either as an array of strings or just a string. * All config values have now been made optional. * The p.main function now uses [[Module:Arguments]], which means that arguments are now only fetched when they are needed. (Before they were all expanded before being passed to p._main.) * I've also simplified the p._main code to avoid an unnecessary for loop. As part of this, I have replaced the tail call from p.main to p._main with a retval. This is to make more explicit the fact that on finding no matches at all, the module should return nil for other Lua modules and the blank string for #invoke. If there are any objections to this, we can always revert it back to an implicit return value. * Finally, I have revamped the p.table code so that the namespaces are now displayed in order, and so that {{para|talk|yes}} actually works. Let me know what you think of the changes, and if everything looks good we can update the main module. β '''''[[User:Mr. Stradivarius|<span style="color: #194D00; font-family: Palatino, Times, serif">Mr. Stradivarius</span>]]''''' <sup>[[User talk:Mr. Stradivarius|βͺ talk βͺ]]</sup> 13:31, 22 March 2014 (UTC) :{{ping|Mr. Stradivarius}} Thanks for the changes. I think they're fine. I've done a modified test on my machine, as well as the TemplateSandbox (See my comment above). "lc" is no longer a significant portion of execution time. I'm still seeing the same number of calls to "match", but I think the calling templates need to be changed. FWIW, my own tests (using XOWA) show no measurable performance difference with the latest changes. This may be because I'm using LuaStandalone vs LuaSandbox. (LuaSandbox is not possible in a Windows / Java environment). There may be other apples to oranges issues as well, though I still believe that "lc" is a significant performance cost [[User_talk:Gnosygnu|gnosygnu]] 15:44, 22 March 2014 (UTC) ::Yes, I expected that my latest changes wouldn't increase performance for most pages. For pages using {{tl|namespace detect}} directly, the switch to [[Module:Arguments]] may provide a significant boost, depending on what wikitext would have otherwise been expanded. But most transclusions of [[Module:Namespace detect]] come through [[Module:Category handler]], and those uses aren't affected by that change. Your original suggestion is definitely the big performance-saver. For further performance savings we might want to consider removing the page and demospace parameters, as that would enable caching of the namespace data with mw.loadData. If we did the same thing with [[Module:Category handler]], we would also be able to cache the blacklist checks. The downside to that would be that the module test cases would no longer work, but I think that the performance benefits might outweigh that disadvantage. Another performance saving could be made by changing [[Module:Category handler]] to only expand arguments when necessary. That could be done by using [[Module:Arguments]], and by using metatables to pass arguments to [[Module:Namespace detect]] by proxy. β '''''[[User:Mr. Stradivarius|<span style="color: #194D00; font-family: Palatino, Times, serif">Mr. Stradivarius</span>]]''''' <sup>[[User talk:Mr. Stradivarius|βͺ talk βͺ]]</sup> 16:33, 22 March 2014 (UTC) :::Okay. Thanks for the explanation. I'm beginning to think that even the "lc" calls won't make a noticeable difference because of LuaSandbox / LuaStandalone differences. (see my comment above). If so, then the real performance problems may be non-Module related, though I'm at a loss to suggest what. (Templates?) [[User_talk:Gnosygnu|gnosygnu]] 16:47, 22 March 2014 (UTC) ::::{{ping|Mr. Stradivarius}} I don't like local variables to avoid table lookups. I think that makes the code more confusing, and I don't think it really helps much (if at all) with performance. Other than that, looks good. [[User:Jackmcbarn|Jackmcbarn]] ([[User talk:Jackmcbarn|talk]]) 00:51, 23 March 2014 (UTC) :::::If this was a less performance-critical module I would probably agree with you, and I admit that I'm guilty of overusing this technique. However, if we're talking about tens of thousands of calls to a loop every time a page is parsed, I think it would make enough of a difference to be worth doing. I'm basing this on [http://www.lua.org/gems/sample.pdf Lua Performance Tips by Roberto Ierusalimschy], which says that using local variables is 30% faster than using table lookups. Of course, 30% faster than hardly anything is still hardly anything, so it might make sense to change some of those local variables back to table lookups to make the code easier to read. In particular I would guess that using them in the /data page is not important, since that is cached with mw.loadData now. But it would make sense to use local variables for the functions that are getting called multiple times per #invoke. β '''''[[User:Mr. Stradivarius|<span style="color: #194D00; font-family: Palatino, Times, serif">Mr. Stradivarius</span>]]''''' <sup>[[User talk:Mr. Stradivarius|βͺ talk βͺ]]</sup> 11:06, 23 March 2014 (UTC) ::::{{ping|Gnosygnu}} The real performance problems are definitely not module-related. When I previewed [[List of Unicode characters]] just now with the old version of [[Module:Namespace detect]], it took 40 seconds to parse, but the Lua time usage was only reported at about 4.3 seconds. So the other 36 seconds must be from something other than Lua. Templates are a very likely candidate - the current post-expand include size is 1944444/2048000 bytes, and I count 207 different templates and modules used - but I have also seen performance issues like this on pages containing lots of images. This would be a good question to ask at [[WP:VPT]], I think. β '''''[[User:Mr. Stradivarius|<span style="color: #194D00; font-family: Palatino, Times, serif">Mr. Stradivarius</span>]]''''' <sup>[[User talk:Mr. Stradivarius|βͺ talk βͺ]]</sup> 11:06, 23 March 2014 (UTC) :::::{{ping|Mr. Stradivarius}} Yeah, sorry about that. I should've checked the Profile at the start. On my machine, the lc calls to Scribunto has a much larger impact, but I'm beginning to think that this is because :::::* I only partially reconstructed [[List of Unicode characters]] and most of the templates were not brought over :::::* I'm using LuaStandalone because I'm on Windows. LuaStandalone serializes all messages back and forth from Lua to PHP. (In contrast, Wikipedia is using LuaSandbox which hooks PHP directly to Lua) :::::* There really are 85,000+ calls to lc, and I saw a significant difference by skipping this section. :::::In the future, I'll check the Parser Output more closely. I'll also try to set up a full enwiki environment on a machine here so I can get a better comparison. :::::Thanks for making the changes . I'm hoping they should still help in some way. [[User_talk:Gnosygnu|gnosygnu]] 21:07, 23 March 2014 (UTC) ::::::Nothing to apologise about - your concerns were perfectly valid, and I want this module to run well on all wikis, not just the English Wikipedia. I've updated the main module now with the sandbox version, so we will now see if our efforts have paid off. :) Let me know if you spot anything strange happening. Best β '''''[[User:Mr. Stradivarius|<span style="color: #194D00; font-family: Palatino, Times, serif">Mr. Stradivarius</span>]]''''' <sup>[[User talk:Mr. Stradivarius|βͺ talk βͺ]]</sup> 11:58, 24 March 2014 (UTC) :::::::Yeah, it looks like barely a second of difference, if even. Oh well. (*sigh*) :::::::Thanks again for the changes. [[User_talk:Gnosygnu|gnosygnu]] 02:43, 25 March 2014 (UTC)
Summary:
Please note that all contributions to ANRC Universe may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
ANRC Universe:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Search
Search
Editing
Module talk:Namespace detect
(section)
Add topic