Page MenuHomePhabricator

WMF-JobQueueComponent
ActivePublic

Details

Description

The infrastructure used by Wikimedia Foundation for storage and execution of the MediaWiki job queue.

As of July 2018, the MediaWiki JobQueue infrastructure (at WMF) in a nutshell:

  • Jobs are submitted from MediaWiki web servers to Kafka using EventBus.
  • Jobs are scheduled using ChangeProp.
  • Jobs are executed using rpc/RunSingleJob endpoint in wmf-config, on a dedicated "jobrunner" pool of MediaWiki app servers.

Workboard columns:

See also:

Recent Activity

Today

TheDJ closed T368364: Transcodes of audio-only samples are not running for new uploads as Resolved.

As far as I'm aware, this was fixed.

Tue, Jan 7, 12:43 PM · WMF-JobQueue, Regression, TimedMediaHandler-Transcode

Thu, Dec 19

lmata added a comment to T359472: Migrate MediaWiki.jobqueue to statslib.

Note: This work is expected to be done by April. Should be minor change; see other linked patches for examples.

Thu, Dec 19, 4:42 PM · Essential-Work, MW-Interfaces-Team, WMF-JobQueue, MediaWiki-Engineering, Observability-Metrics

Wed, Dec 18

HCoplin-WMF moved T359472: Migrate MediaWiki.jobqueue to statslib from Backlog (Triaged and Ready) to Next Up on the MW-Interfaces-Team board.
Wed, Dec 18, 6:37 PM · Essential-Work, MW-Interfaces-Team, WMF-JobQueue, MediaWiki-Engineering, Observability-Metrics
HCoplin-WMF added a project to T359472: Migrate MediaWiki.jobqueue to statslib: Essential-Work.
Wed, Dec 18, 2:53 PM · Essential-Work, MW-Interfaces-Team, WMF-JobQueue, MediaWiki-Engineering, Observability-Metrics
HCoplin-WMF added a comment to T359472: Migrate MediaWiki.jobqueue to statslib.

Note: This work is expected to be done by April.

Wed, Dec 18, 2:52 PM · Essential-Work, MW-Interfaces-Team, WMF-JobQueue, MediaWiki-Engineering, Observability-Metrics

Wed, Dec 11

Aklapper moved T380543: WMF-JobQueue Phabricator project description is out of date from Incoming to Projects to change / set on the Project-Admins board.
Wed, Dec 11, 8:19 PM · Project-Admins, MW-Interfaces-Team, Documentation, WMF-JobQueue
Aklapper renamed T380543: WMF-JobQueue Phabricator project description is out of date from WMF-JobQueue project description is out of date to WMF-JobQueue Phabricator project description is out of date.
Wed, Dec 11, 8:19 PM · Project-Admins, MW-Interfaces-Team, Documentation, WMF-JobQueue
Aklapper edited projects for T380543: WMF-JobQueue Phabricator project description is out of date, added: Project-Admins; removed Phabricator.
Wed, Dec 11, 8:19 PM · Project-Admins, MW-Interfaces-Team, Documentation, WMF-JobQueue

Nov 29 2024

akosiaris lowered the priority of T380544: Temporarily run more refreshLinks jobs on Commons from High to Low.

I 'll switch to low, we can keep monitoring the next few weeks and see how this pans out.

Nov 29 2024, 2:31 PM · Commons, serviceops, WMF-JobQueue
akosiaris added a comment to T380544: Temporarily run more refreshLinks jobs on Commons.

@Nikki thanks for the explanation. As you note yourself, speed wise, there are many ups and downs.

Nov 29 2024, 2:28 PM · Commons, serviceops, WMF-JobQueue

Nov 24 2024

mdaniels5757 added a comment to T380544: Temporarily run more refreshLinks jobs on Commons.

Just processed those edit requests.

Nov 24 2024, 7:19 PM · Commons, serviceops, WMF-JobQueue

Nov 23 2024

Pppery edited projects for T175146: JobQueue: Unify JobRunner entry points, added: Patch-Needs-Improvement; removed Patch-For-Review.
Nov 23 2024, 9:33 PM · Patch-Needs-Improvement, Security, MW-Interfaces-Team, Platform Team Workboards (Initiatives), WMF-JobQueue, TechCom-RFC (TechCom-RFC-Closed), MediaWiki-Core-JobQueue, MediaWiki-Configuration
LucasWerkmeister added a comment to T380544: Temporarily run more refreshLinks jobs on Commons.

For the record, I just found another set of templates we want to update on most of the affected files (Cc-by(-sa)-layout should bypass the SDC_statement_exist template) and filed edit requests for them.

Nov 23 2024, 2:38 PM · Commons, serviceops, WMF-JobQueue

Nov 22 2024

Nikki added a comment to T380544: Temporarily run more refreshLinks jobs on Commons.

@LucasWerkmeister how was the 10 year estimation calculated?

Nov 22 2024, 1:24 PM · Commons, serviceops, WMF-JobQueue
akosiaris added a comment to T380544: Temporarily run more refreshLinks jobs on Commons.

This type of Kafka Consumer lag for that job isn't unheard of. In fact, just recently we had way higher consumer lags for commons specifically.

Nov 22 2024, 9:51 AM · Commons, serviceops, WMF-JobQueue

Nov 21 2024

Scott_French added a comment to T380544: Temporarily run more refreshLinks jobs on Commons.

Indeed, it looks like the refreshLinks_partitioner rule is easily keeping up with the "upstream" rate of new jobs [0] but the "real" refreshLinks rule on partition 3 (commons) has a rather deep backlog.

Nov 21 2024, 11:35 PM · Commons, serviceops, WMF-JobQueue
LucasWerkmeister added a comment to T380544: Temporarily run more refreshLinks jobs on Commons.

That’s great, thank you!

Nov 21 2024, 11:12 PM · Commons, serviceops, WMF-JobQueue
LucasWerkmeister updated the task description for T380544: Temporarily run more refreshLinks jobs on Commons.
Nov 21 2024, 11:12 PM · Commons, serviceops, WMF-JobQueue
Platonides added a comment to T380544: Temporarily run more refreshLinks jobs on Commons.

I have processed those editrequests, so at least when a page contains multiple license templates, it will only need to be reparsed once.

Nov 21 2024, 11:11 PM · Commons, serviceops, WMF-JobQueue
Krinkle edited Description on WMF-JobQueue.
Nov 21 2024, 10:34 PM
Krinkle updated the task description for T380543: WMF-JobQueue Phabricator project description is out of date.
Nov 21 2024, 10:34 PM · Project-Admins, MW-Interfaces-Team, Documentation, WMF-JobQueue
Krinkle edited projects for T380543: WMF-JobQueue Phabricator project description is out of date, added: MW-Interfaces-Team; removed MediaWiki-Platform-Team.
Nov 21 2024, 10:32 PM · Project-Admins, MW-Interfaces-Team, Documentation, WMF-JobQueue
LucasWerkmeister added a comment to T380544: Temporarily run more refreshLinks jobs on Commons.

Currently, the number of links to Template:SDC statement has value (as counted by search – this lags somewhat behind the “real” number, as it depends on a further job, but it should be a decent approximation)

Nov 21 2024, 10:30 PM · Commons, serviceops, WMF-JobQueue
LucasWerkmeister added a parent task for T380544: Temporarily run more refreshLinks jobs on Commons: T343131: Commons database is growing way too fast.
Nov 21 2024, 9:51 PM · Commons, serviceops, WMF-JobQueue
AntiCompositeNumber added a project to T380544: Temporarily run more refreshLinks jobs on Commons: Commons.
Nov 21 2024, 9:45 PM · Commons, serviceops, WMF-JobQueue
LucasWerkmeister triaged T380544: Temporarily run more refreshLinks jobs on Commons as High priority.

Per IRC discussion, marking as High priority. @AntiCompositeNumber reports that this results in category changes being slow to propagate (Category:Johann Baptist Hops not showing up in Category:Hops (surname) yet).

Nov 21 2024, 9:43 PM · Commons, serviceops, WMF-JobQueue
LucasWerkmeister created T380544: Temporarily run more refreshLinks jobs on Commons.
Nov 21 2024, 9:40 PM · Commons, serviceops, WMF-JobQueue
Reedy renamed T380543: WMF-JobQueue Phabricator project description is out of date from WMF-JobQueue description is out of date to WMF-JobQueue project description is out of date.
Nov 21 2024, 9:35 PM · Project-Admins, MW-Interfaces-Team, Documentation, WMF-JobQueue
Reedy created T380543: WMF-JobQueue Phabricator project description is out of date.
Nov 21 2024, 9:35 PM · Project-Admins, MW-Interfaces-Team, Documentation, WMF-JobQueue
Scott_French closed T378385: Spike in JobQueue job backlog time (500ms -> 4-8 minutes) as Resolved.

Monitoring for sustained latency impact on low-traffic jobs is now live.

Nov 21 2024, 6:45 PM · FlaggedRevs, serviceops, WMF-JobQueue

Nov 13 2024

Scott_French claimed T378385: Spike in JobQueue job backlog time (500ms -> 4-8 minutes).

Since the three job types critical to uploads have now been moved to dedicated consumers (T379035), the primary follow-up here is monitoring (T378609) to detect when these kinds of isolation failures occur so that we can reactively isolate the "antagonist" job.

Nov 13 2024, 8:02 PM · FlaggedRevs, serviceops, WMF-JobQueue

Nov 12 2024

Samwalton9-WMF moved T379476: Page deletion queued via Nuke is sometimes very slow to complete from Backlog to Bugs on the MediaWiki-extensions-Nuke board.
Nov 12 2024, 1:35 PM · WMF-JobQueue, Moderator-Tools-Team, MediaWiki-extensions-Nuke
Samwalton9-WMF moved T379476: Page deletion queued via Nuke is sometimes very slow to complete from Inbox to Triaged on the Moderator-Tools-Team board.
Nov 12 2024, 1:14 PM · WMF-JobQueue, Moderator-Tools-Team, MediaWiki-extensions-Nuke

Nov 11 2024

jijiki placed T377512: runJobs.log isn't being written to up for grabs.
Nov 11 2024, 4:07 PM · WMF-JobQueue, MW-on-K8s
jijiki moved T378385: Spike in JobQueue job backlog time (500ms -> 4-8 minutes) from Incoming 🐫 to Production Errors 🚜 on the serviceops board.
Nov 11 2024, 1:11 PM · FlaggedRevs, serviceops, WMF-JobQueue
jijiki triaged T378385: Spike in JobQueue job backlog time (500ms -> 4-8 minutes) as High priority.
Nov 11 2024, 1:07 PM · FlaggedRevs, serviceops, WMF-JobQueue
Scott_French added a comment to T378385: Spike in JobQueue job backlog time (500ms -> 4-8 minutes).

Thanks for flagging, all. Yes, this looks like another isolation failure on the low-traffic consumer, and appears to have largely self-resolved as of ~ 14:50 UTC on the 10th. I'll follow up on T379462 for this particular instance, and aim to prioritize T379035 when I'm back this week.

Nov 11 2024, 1:51 AM · FlaggedRevs, serviceops, WMF-JobQueue

Nov 10 2024

Wargo added a comment to T379476: Page deletion queued via Nuke is sometimes very slow to complete.

Happened to me some months ago.

Nov 10 2024, 6:24 PM · WMF-JobQueue, Moderator-Tools-Team, MediaWiki-extensions-Nuke
Samwalton9 added a comment to T378385: Spike in JobQueue job backlog time (500ms -> 4-8 minutes).

Possibly the cause of T379476?

Nov 10 2024, 10:30 AM · FlaggedRevs, serviceops, WMF-JobQueue
matej_suchanek added a project to T379476: Page deletion queued via Nuke is sometimes very slow to complete: WMF-JobQueue.
Nov 10 2024, 9:07 AM · WMF-JobQueue, Moderator-Tools-Team, MediaWiki-extensions-Nuke

Nov 9 2024

Myrealnamm added a comment to T378385: Spike in JobQueue job backlog time (500ms -> 4-8 minutes).

Subscribing myself. I'm seeing this for a while, and yes, today some mw tags are taking forever to update.

Nov 9 2024, 8:51 PM · FlaggedRevs, serviceops, WMF-JobQueue
Bawolff added a comment to T378385: Spike in JobQueue job backlog time (500ms -> 4-8 minutes).

That said, it does seem like the p99 for AssembleChunkUpload jobs has spiked to ~15 min for the last 2 hours (was fine before that point), so maybe that is just it. Maybe driven by a spike in ChangeDeletionNotification jobs. Sounds like a dedicated queue as Scott suggests would really help.

Nov 9 2024, 10:18 AM · FlaggedRevs, serviceops, WMF-JobQueue
Bawolff added a comment to T378385: Spike in JobQueue job backlog time (500ms -> 4-8 minutes).

@MBH lets open a separate new task to investigate, as the cause could be something different than the job queue thing this task is about. If you want you could email the HAR file to me ( bawolff@gmail.com ).

Nov 9 2024, 10:01 AM · FlaggedRevs, serviceops, WMF-JobQueue
MBH added a comment to T378385: Spike in JobQueue job backlog time (500ms -> 4-8 minutes).

Now files are waiting ~10 minutes before publishing and doesn't published due to errors "Unknown server error" and "Incorrect CSRF token". Uploading (first UploadWizard step) was very slow too with the same behavior than in previous case: 3 files in queue and all other files waiting, after several minutes this 3 files uploaded and next 3 files in queue.

{A2518AE1-A179-4C24-B15D-0D827809598E}.png (1×1 px, 43 KB)

Nov 9 2024, 9:53 AM · FlaggedRevs, serviceops, WMF-JobQueue
MBH reopened T378385: Spike in JobQueue job backlog time (500ms -> 4-8 minutes) as "Open".

@Bawolff The problem described in T378276 is rised again. I have recorded a HAR file, please, give me an e-mail where should I send it. I will not clear any cookies because I don't know how to do it, I'll just seng you a raw file.

Nov 9 2024, 9:49 AM · FlaggedRevs, serviceops, WMF-JobQueue

Nov 5 2024

Bawolff merged T378276: Mass uploads to Commons doesn't work for me into T378385: Spike in JobQueue job backlog time (500ms -> 4-8 minutes).
Nov 5 2024, 2:22 AM · FlaggedRevs, serviceops, WMF-JobQueue
Scott_French added a comment to T378385: Spike in JobQueue job backlog time (500ms -> 4-8 minutes).

Thanks, @Bawolff - Yes, indeed, those both fan into the low-traffic consumer. While we don't really have a prioritization mechanism in this context that I'm aware of, it would probably be fairly straightforward to at least move them out of low-traffic to a dedicated consumer, as @Ladsgroup points out. I've opened T379035 to look into that.

Nov 5 2024, 1:07 AM · FlaggedRevs, serviceops, WMF-JobQueue

Nov 4 2024

Ladsgroup added a comment to T378385: Spike in JobQueue job backlog time (500ms -> 4-8 minutes).

It shouldn't be too hard to give it a dedicated lane with small concurrency

Nov 4 2024, 10:12 AM · FlaggedRevs, serviceops, WMF-JobQueue

Oct 31 2024

Bawolff added a comment to T378385: Spike in JobQueue job backlog time (500ms -> 4-8 minutes).

Just as an aside, I believe PublishStashedFile AssembleUploadChunks are considered low traffic job. Unlike normal jobs these are very latency sensitive, as they don't happen in the background, but the UI actually makes users wait well these jobs complete (See also T378276). It would be really great if somehow these jobs can be prioritized in a job queue backlog situation.

Oct 31 2024, 9:23 PM · FlaggedRevs, serviceops, WMF-JobQueue

Oct 30 2024

lmata moved T359472: Migrate MediaWiki.jobqueue to statslib from Inbox to Prioritized on the Observability-Metrics board.
Oct 30 2024, 6:58 PM · Essential-Work, MW-Interfaces-Team, WMF-JobQueue, MediaWiki-Engineering, Observability-Metrics
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy