After the first cleanup is done and as it happens with checkuser, we should cron purgeOldIPLogData.php to run daily. Patch comming.
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Jalexander | T160357 Allow those with CheckUser right to access AbuseLog private information on WMF projects | |||
Resolved | Reedy | T179131 AbuseFilter should actively prune old IP data | |||
Invalid | MarcoAurelio | T187053 Setup puppet cron to delete old data daily |
Event Timeline
I see that https://phabricator.wikimedia.org/source/operations-puppet/browse/production/modules/mediawiki/manifests/maintenance/purge_abusefilter.pp is already there so it just needs to be set to run after we're done with the initial purge.
Weird: https://phabricator.wikimedia.org/source/operations-puppet/browse/production/modules/profile/manifests/mediawiki/maintenance.pp;9bd6328c0f4499daf1cfe8e028fc6165d5a8601a$31 is also there.
Do we really have old data in production? Can someone with access check? @jcrespo ?
Something isn't right for sure...
Running select * from abuse_filter_log where afl_ip <> "" ORDER BY afl_id limit 1; on enwiki gives a row of with afl_timestamp of 20160916011613
Needs some investigation
Looks like @Huji should've fixed it in https://github.com/wikimedia/mediawiki-extensions-AbuseFilter/commit/8ca391c8e0912438495cb6eb390b22ed123a9434 for T186928
reedy@tin:~$ mwscript extensions/AbuseFilter/maintenance/purgeOldLogIPData.php testwiki The following extensions are required to be installed for this script to run: AbuseFilter. Please enable them and then try again.
Should be fixed as part of the train, but I've just cherry picked them to .20. Will deploy and check it again tomorrow or so
@Reedy But the script has been running in prod, right? It was not so long ago when we added the $this->requireExtension( 'AbuseFilter' ); to this, and that was already fixed. So, this is not a new script deployment, right? I mean, has the script been working and cleaning those rows? If not, enwiki alone will have like ~20 millions rows to clean.
https://gerrit.wikimedia.org/r/#/c/326723/
December 2016.. It's been there a long time
20160916011613 would suggest it has been broken for 3 months before that
reedy@tin:~$ mwscript extensions/AbuseFilter/maintenance/purgeOldLogIPData.php testwiki Purging old IP Address data from abuse_filter_log... 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 4200 4400 4600 4800 5000 5200 5400 5600 5800 6000 6200 6400 6600 6800 7000 7200 7400 7600 7800 8000 8200 8400 8414 8414 rows. Done.
So it definitely seems to work now
Sorry for not checking the dates rightly. We've been doing those $this->requireExtension thing recently too.
Given that the cron was already there and was the script the one that was broken, I'm closing this.