Page MenuHomePhabricator

Cannot declare class CacheTime, because the name is already in use in CacheTime.php
Open, Unbreak Now!PublicPRODUCTION ERROR

Description

Error
normalized_message
Cannot declare class CacheTime, because the name is already in use in CacheTime.php
FrameLocationCall
#0/srv/mediawiki/php-1.43.0-wmf.27/includes/parser/CacheTime.php(38)unknown()
#1/srv/mediawiki/php-1.43.0-wmf.27/includes/AutoLoader.php(226)require()
#2[internal function]AutoLoader::autoload()
#3[internal function]spl_autoload_call()
#4/srv/mediawiki/php-1.43.0-wmf.27/includes/json/JsonCodec.php(79)class_exists()
#5/srv/mediawiki/php-1.43.0-wmf.27/includes/parser/ParserCache.php(695)MediaWiki\Json\JsonCodec->deserialize()
#6/srv/mediawiki/php-1.43.0-wmf.27/includes/parser/ParserCache.php(295)ParserCache->restoreFromJson()
#7/srv/mediawiki/php-1.43.0-wmf.27/includes/parser/ParserCache.php(380)ParserCache->getMetadata()
#8/srv/mediawiki/php-1.43.0-wmf.27/includes/page/ParserOutputAccess.php(233)ParserCache->get()
#9/srv/mediawiki/php-1.43.0-wmf.27/includes/page/Article.php(726)MediaWiki\Page\ParserOutputAccess->getCachedParserOutput()
#10/srv/mediawiki/php-1.43.0-wmf.27/includes/page/Article.php(545)Article->generateContentOutput()
#11/srv/mediawiki/php-1.43.0-wmf.27/includes/page/ImagePage.php(155)Article->view()
#12/srv/mediawiki/php-1.43.0-wmf.27/includes/actions/ViewAction.php(78)ImagePage->view()
#13/srv/mediawiki/php-1.43.0-wmf.27/includes/actions/ActionEntryPoint.php(733)ViewAction->show()
#14/srv/mediawiki/php-1.43.0-wmf.27/includes/actions/ActionEntryPoint.php(510)MediaWiki\Actions\ActionEntryPoint->performAction()
#15/srv/mediawiki/php-1.43.0-wmf.27/includes/actions/ActionEntryPoint.php(146)MediaWiki\Actions\ActionEntryPoint->performRequest()
#16/srv/mediawiki/php-1.43.0-wmf.27/includes/MediaWikiEntryPoint.php(200)MediaWiki\Actions\ActionEntryPoint->execute()
#17/srv/mediawiki/php-1.43.0-wmf.27/index.php(58)MediaWiki\MediaWikiEntryPoint->run()
#18/srv/mediawiki/w/index.php(3)require()
Impact/Notes

These errors started accumulating when I rolled the 1.43.0-wmf.28 train to group1 today. Note that the errors come from .27.

I rolled back to group0 in the meantime. Right now we're looking at ~8000 errors in the last 15 minutes, and some page accesses are resulting in HTTP 500 errors.

Event Timeline

dancy triaged this task as Unbreak Now! priority.Wed, Oct 23, 6:10 PM
dancy created this task.

Change #1082540 had a related patch set uploaded (by Reedy; author: Reedy):

[mediawiki/core@master] Autoloader: Use require_once rather than require

https://gerrit.wikimedia.org/r/1082540

Screenshot Capture - 2024-10-23 - 22-12-53.png (510×1 px, 54 KB)

The distribution of errors suggests that this is triggered by loading from cache somehow (and after rollback we are now left with cached objects with a now-unknown namespace) but a simple manual test works fine:

tgr@mwmaint2002:~$ mwscript shell.php enwiki
> $ct = new CacheTime
= CacheTime {#6125}
> var_export(serialize($ct))
'O:9:"CacheTime":4:{s:20:"' . "\0" . '*' . "\0" . 'mParseUsedOptions";a:0:{}s:13:"' . "\0" . '*' . "\0" . 'mCacheTime";s:0:"";s:15:"' . "\0" . '*' . "\0" . 'mCacheExpiry";N;s:19:"' . "\0" . '*' . "\0" . 'mCacheRevisionId";N;}'⏎

tgr@mwmaint2002:~$ mwscript shell.php testwiki
> unserialize('O:9:"CacheTime":4:{s:20:"' . "\0" . '*' . "\0" . 'mParseUsedOptions";a:0:{}s:13:"' . "\0" . '*' . "\0" . 'mCacheTime";s:0:"";s:15:"' . "\0" . '*' . "\0" . 'mCacheExpiry";N;s:19:"' . "\0" . '*' . "\0" . 'mCacheRevisionId";N;}')
= MediaWiki\Parser\CacheTime {#6345}

(Or does the parser cache use JSON serialization?)

The parser cache should use JSON serialization exclusively, and (as far as I know) none of the cache entries should actually mention the CacheTime class, since it is a parent of ParserOutput, not directly the name of the serialized class.

Further, JsonCodec *ought* to be tolerant of class aliases: T353883.

But obvious something isn't working quite as designed. I wish I knew what $class name is triggering the error, but we're getting the "cannot declare CacheTime" exception thrown before JsonCodec could throw it's "Invalid target class {$class}" message. This can't be ParserOutput triggering the issue, however, because otherwise it would be taking the first if branch, not the elseif. Let me look more closely at what types are named in the json serialized form of the parser output. Sometimes this is triggered by something added by getExtensionData by an extension running on a particular wiki -- are all the instances of this error on commons?

Oh, I see: ParserCache is calling JsonCodec::deserialize with CacheTime::class as the 'expected class'. It's not *actually* the expected class, the result will be a ParserOutput which is an instanceof CacheTime. Anyway, that should be relatively harmless.

I got confused a bit because the stack trace here is of the *rollback* stacktrace, ie the patch *before* 3bc172d0e4e8048a415b6992af3b6db84929cc02, which is also the patch which fixed T353883: JsonCodec should be robust against class aliases. So the rollback version is known to be less tolerant of class aliases.

But neither of those is the actual root cause. We're on the rollback we're calling class_exists('CacheTime') in wmf.27 and there's no reason why that should fail, unless the opcache is already 'corrupted' with the existence of MediaWiki\Parser\CacheTime and its class_alias and tries to reload an already-loaded class.

And I think the same is true of the roll-forward to wmf.28, which isn't going to have the same failure stack trace but will probably transiently try to class_exists('MediaWiki\Parser\CacheTime') when CacheTime has already been loaded. In that case it might (transiently) fail the assertion at the beginning of deserialize, when it tests whether is_subclass_of('MediaWiki\Parser\CacheTime', JsonDeserializable::class) because (transiently) the namespaced class won't yet exist so it will try to reload it, then fail on the class_alias to CacheTime because that class *has* been loaded.

I'm still not 100% sure why we're still seeing failures after roolback to wmf.27 unless those opcaches haven't been cleared and still have MediaWiki\Parser\CacheTime loaded in them. Isn't there a maintenance script or some such to force-clear the opcache? If so, that should make the continued errors go away -- and it should also clear the errors after roll-forward to wmf.28.

Change #1082565 had a related patch set uploaded (by Reedy; author: Reedy):

[mediawiki/core@wmf/1.43.0-wmf.27] CacheTme: Add forward namespaced alias

https://gerrit.wikimedia.org/r/1082565

Summary based on a slack discussion between @Reedy, @cscott , and me.

Here is the sequence:

  1. Attempt to roll out wmf28 to group1.
  2. While wmf28 is on canaries, parser cache got updated to wmf28 serialization.
  3. ParserCache is shared - so wmf27 servers now started failing trying to load the wmf28 entries
  4. wmf28 rollout was aborted.
  5. But, existing wmf28 cache entries continued to cause failures
  6. Because wmf27 continued to fatal when accessing wmf28 cache entries, those entries would never refresh and the errors continued as long as those cache entries were hit.

https://logstash.wikimedia.org/goto/d2fa0f41334d98c4acc543bb4c9bd026 shows that this also happened with group0 rollout but the incident volume in #2 was much smaller that group0 rollout went through and the errors dissipated.

As for why wmf27 code couldn't access the wmf28 entries, that is because wmf28 included namespacing of "CacheTime" to "MediaWiki\Parser\CacheTime" and the ParserCache JSON serialization includes the name of the class in a type property to make deserialization self-descriptive. However, wmf27 does not know about the namespaced class name. The autoloader doesn't find an explicit entry for "MediaWiki\Parser\CacheTime" so it must fall back to the directory rule, which says that MediaWiki\Parser is found in includes/parser -- and sure enough, includes/parser/CacheTime.php exists there, so it gets (re)loaded. But, that file has already been loaded as CacheTime in wmf27 and the code fatals.

Possible solutions: (a) require_once patch by Reedy above (b) Adding a forward-namespaced alias patch by Reedy above. (c) A CacheTime patch similar to this one for ParserOutput.

Some additional details:

  • There is a try/catch in ParserCache::restoreFromJson() which was added after a similar issue arose with ParserOutput namespacing T353835: PHP Fatal Error from line 51 of /srv/mediawiki/php-1.42.0-wmf.9/includes/parser/ParserOutput.php: Cannot declare class ParserOutput, because the name is already in use, which /should/ convert the error into a cache miss, allowing the older version to rewrite the cache entry and thus "fix" it. However, this patch didn't work in practice because the "class redefinition" was a PHP E_FATAL which is not catchable. If the attempt to lookup MediaWiki\Parser\CacheTime had "just" failed by returning false from class_exists the json codec would have thrown a JsonException and the try/catch would have worked as designed, making this a transient failure instead of a persistent one which recurred every time the bad entry was touched by wmf.27.
  • Unlike the previous time (T353835) the failure occurred in ParserCache->getMetadata(), which is a serialized CacheTime object not a ParserOutput object. We do include serialization tests of CacheTime objects (see tests/phpunit/data/ParserCache/1.43_wmf.11-CacheTime-usedOptions.json for example) but we didn't follow the forward-compatibility process described in https://www.mediawiki.org/wiki/Manual:Parser_cache/Serialization_compatibility because I forgot that CacheTime objects were stored directly; at the time I was reviewing the CacheTime namespacing patch (https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1004161) I flagged CacheTime as "risky" but thought that the only class explicitly named in serialized ParserCache objects was ParserOutput. I had forgotten about the metadata cache which is colocated with the ParserOutput cache.

Action items:

Change #1082565 merged by jenkins-bot:

[mediawiki/core@wmf/1.43.0-wmf.27] CacheTme: Add forward namespaced alias

https://gerrit.wikimedia.org/r/1082565

Mentioned in SAL (#wikimedia-operations) [2024-10-23T23:39:30Z] <reedy@deploy2002> Started scap sync-world: T378006

Mentioned in SAL (#wikimedia-operations) [2024-10-23T23:46:39Z] <reedy@deploy2002> Finished scap sync-world: T378006 (duration: 07m 09s)

Change #1082540 merged by jenkins-bot:

[mediawiki/core@master] AutoLoader: Use require_once rather than require

https://gerrit.wikimedia.org/r/1082540

Change #1082585 had a related patch set uploaded (by Reedy; author: Reedy):

[mediawiki/core@wmf/1.43.0-wmf.28] AutoLoader: Use require_once rather than require

https://gerrit.wikimedia.org/r/1082585

Change #1082586 had a related patch set uploaded (by Reedy; author: Reedy):

[mediawiki/core@wmf/1.43.0-wmf.27] AutoLoader: Use require_once rather than require

https://gerrit.wikimedia.org/r/1082586

Change #1082587 had a related patch set uploaded (by Reedy; author: Reedy):

[mediawiki/core@REL1_43] AutoLoader: Use require_once rather than require

https://gerrit.wikimedia.org/r/1082587

Change #1082588 had a related patch set uploaded (by Reedy; author: Reedy):

[mediawiki/core@REL1_42] AutoLoader: Use require_once rather than require

https://gerrit.wikimedia.org/r/1082588

Change #1082589 had a related patch set uploaded (by Reedy; author: Reedy):

[mediawiki/core@REL1_41] AutoLoader: Use require_once rather than require

https://gerrit.wikimedia.org/r/1082589

Change #1082590 had a related patch set uploaded (by Reedy; author: Reedy):

[mediawiki/core@REL1_39] AutoLoader: Use require_once rather than require

https://gerrit.wikimedia.org/r/1082590

I'll probably +2 and deploy the .27 and .28 patches post having had some sleep, before the train is due to run...

Change #1082590 merged by jenkins-bot:

[mediawiki/core@REL1_39] AutoLoader: Use require_once rather than require

https://gerrit.wikimedia.org/r/1082590

Change #1082589 merged by jenkins-bot:

[mediawiki/core@REL1_41] AutoLoader: Use require_once rather than require

https://gerrit.wikimedia.org/r/1082589

Change #1082588 merged by jenkins-bot:

[mediawiki/core@REL1_42] AutoLoader: Use require_once rather than require

https://gerrit.wikimedia.org/r/1082588

Change #1082587 merged by jenkins-bot:

[mediawiki/core@REL1_43] AutoLoader: Use require_once rather than require

https://gerrit.wikimedia.org/r/1082587

Nikerabbit added subscribers: abi_, Nikerabbit.

We had this on translatewiki.net too but unfortunately didn't report it. CC @abi_ .

From the code review by @tstarling at https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1082540/comments/38fed3bd_9e70f6b6 :

There used to be a stat call in require_once to resolve symlinks, which made it a lot slower than require. It was probably fixed a long time ago, but it would be prudent to do a benchmark.

For parser cache hits on my local test wiki, I was able to measure the instruction count with

perf stat -e instructions -p "$(pidof -d, php-fpm7.4)" &  sleep 0.4; ab -H'Cookie: PHP_ENGINE=7.4' -n100 https://mw.internal/index.php/Main_Page ; kill -INT %1 ; sleep 0.1 ; echo

After I reconfigured PHP-FPM to use a single static worker, the results became very stable indeed and show a change of 0.01% which is less than the error bars.

Individual runs:

old

       19962029384      instructions                                                          
       19960579927      instructions                                                          
       19964258854      instructions                                                          

new

       19962931316      instructions                                                          
       19967287719      instructions                                                          
       19963595292      instructions

strace showed that the number of stat calls was not significantly different.

I love how a concern about a potential performance regression get raised and then proved to not be concern.

I am happy to deploy it ahead of the train, my guess is to do wmf.28 first (it is only on group0) and then follow with wmf.27 (rest of wikis)?

Possible solutions: (a) require_once patch by Reedy above

Wouldn't that just replace the duplicate declaration error with a class not found error? 1.27 still would not know what to do with MediaWiki\Parser\CacheTime (and, if your analysis is correct, 1.28 never had a problem to begin with).

OTOH the patch will cause HookContainer::mayBeCallable() to return an incorrect result when called with the same class (with an unloadable interface) multiple times; the patch had to change the tests to avoid that scenario. The documentation of that method claims that is a legitimate use case that might happen in production.

Possible solutions: (a) require_once patch by Reedy above

Wouldn't that just replace the duplicate declaration error with a class not found error? 1.27 still would not know what to do with MediaWiki\Parser\CacheTime (and, if your analysis is correct, 1.28 never had a problem to begin with).

Yes, but a class not found error would get handled properly by the existing try/catch in ParserCache::restoreFromJson(): the error is being triggered by a class_exists call that would simply return false and then throw a JsonExcpetion, which would then be caught by the try/catch and handled as a miss. (As opposed to the current situation, where the class redefinition results in an uncatchable E_FATAL). The result would be a log ( "unexpected failure during cache load") and a cache miss. Crucially, however, because it is a cache miss, 1.27 will update the cache entry, and the new entry will have CacheTime not MediaWiki\ParserCacheTime and so future hits of this page will not error nor log.

The class_alias patch is 'better' but require_once will also prevent the crashes, at the cost of a temporarily lower cache hit rate. Either one should be sufficient to allow the transition to 1.28 and future rollbacks to 1.27 if other issues arise.

I don't really have any insight into the require_once vs require issue. As a temporary patch either should be sufficient to get us through the train deploy. If we were to make require_once permanent then I'd defer to y'all on whether or not that's a good idea.

@daniel wrote the HookContainer logic that seems to not work with require so maybe he has thoughts. I don't really understand why the tests failed, it just didn't seem specific to them being tests - the require -> require_once seemed to affect how the HookContainer methods behave when called with the same non-existent classname for the second time.

@hashar posted asking about the backports of the require_once patches to wmf.27/wmf.28:

Good morning, what an exciting night. I am happy to deploy the backported patches during the day and ahead of this evening MediaWiki train. But I would rather have a +1 from someone knowing a bit more about the issue, I lack self confidence to just do it ™.

There was a potential CPU usage concern raised by Tim but he benchmarked the diff and there is nothing significant showing up.

I responded:

Those are the require_once patches. Neither should be /required/ now that https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1082565 has landed on wmf.27. As far as we can tell, wmf.28 doesn't have any issue either itself or with back-compat with wmf.27, the issue was caused by forward-compatibility from wmf.27 to cache entries created by wmf.28 when it was on the test servers that were persistent after rollback.
Since the require_once isn't needed in wmf.27 (or wmf.28) I don't think we should backport them (IMO). The require_once patch is merged on master, so we can let it live on beta for a while and ride the train next week and handle any issues in a less time-constrained manner.

And @ssastry agreed:

Ya @Tgr has some questions about potential edge case breakage with the require_once patch .. so, we should get that answered and revert it on master if necessary before next week's train.

@hashar asked me to cross-post that here, and I'll make a comment on the two backport patches as well suggesting that they be abandoned (but I'll leave that decision to their author, @Reedy). We'll wait for some input from @daniel maybe w/r/t whether require_once should be reverted on master.

In T378006#10256711, @ssastry wrote (emphasis added by me):

As for why wmf27 code couldn't access the wmf28 entries, that is because wmf28 included namespacing of "CacheTime" to "MediaWiki\Parser\CacheTime" and the ParserCache JSON serialization includes the name of the class in a type property to make deserialization self-descriptive. However, wmf27 does not know about the namespaced class name. The autoloader doesn't find an explicit entry for "MediaWiki\Parser\CacheTime" so it must fall back to the directory rule, which says that MediaWiki\Parser is found in includes/parser -- and sure enough, includes/parser/CacheTime.php exists there, so it gets (re)loaded. But, that file has already been loaded as CacheTime in wmf27 and the code fatals.

I think this error would not have happened if either of the following had been true:

  • JsonCodec had checked a list of allowed classes (similar to the option that unserialize() provides) before trying to load the class. The new class name would not have been in the list, so the file would have not been loaded a second time for the new class name.
  • The list of PSR-4 namespaces had actually been removed from AutoLoader. c04cb6f6071073a9 added Core's namespaced class names to the class map in autoload.php, though did not remove the namespace list. Had that list not remained, AutoLoader would not have attempted to load the file a second time, as the class would not have been listed in the class map under its new name.
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy