Page MenuHomePhabricator

Update to Phorge upstream 2024.35 release
Open, MediumPublic

Event Timeline

Aklapper added a parent task: Restricted Task.Aug 3 2024, 11:45 PM
Aklapper renamed this task from [Placeholder] Update to Phorge upstream 2024.XX release to Update to Phorge upstream 2024.35 release.Sep 2 2024, 11:24 AM
Aklapper updated the task description. (Show Details)
Aklapper updated the task description. (Show Details)
Aklapper changed the task status from Stalled to Open.Sep 2 2024, 11:27 AM
Aklapper raised the priority of this task from Low to Medium.

DB upgrade may take a while:

MariaDB [phabricator_file]> SELECT COUNT(*) FROM file;
+----------+
| COUNT(*) |
+----------+
|   547777 |
+----------+

MariaDB [phabricator_file]> SELECT COUNT(*) FROM file_attachment;
+----------+
| COUNT(*) |
+----------+
| 29897115 |
+----------+

DB upgrade may take a while:

MariaDB [phabricator_file]> SELECT COUNT(*) FROM file;
+----------+
| COUNT(*) |
+----------+
|   547777 |
+----------+

MariaDB [phabricator_file]> SELECT COUNT(*) FROM file_attachment;
+----------+
| COUNT(*) |
+----------+
| 29897115 |
+----------+

Oh good.

Also: why do ours seem backward from upstream?

Just as an indication: the storage upgrade, in a Phorge with file count 1.3M rows and file_attachment consisting in 9K rows, it may delete 170K rows in less than 1 second on average hardware.

– (change that introduced db upgrade)

And

NOTE: If you have 1M+ phabricator_file.file and 10K file_attachment - it may delete 200K rows in 2s

– (2024.35 changelog)

Meanwhile: we have ½M phabricator_file.file and 29M phabricator_file.file_attachment.

Why is our usage backwards from upstream?


For timing "may delete 200K rows in 2s"

If that's the pace of cleanup, then the timing doesn't seem too bad, really. If it deleted every one of our rows that would take ((29,500,000/200,000)*2) = 295 seconds (5 minutes).

Why is our usage backwards from upstream?

Mine is a speculation but it makes sense for Phabricator platforms with a long history to have more references, maybe even more than files. Especially for public installations attracting spiders and generating lot of extra temporary files and extra orphan references.

Just for extra handy reference, this is the (only) involved patch:

USE phabricator_file;

DELETE FROM file_attachment
 WHERE NOT EXISTS
  (SELECT *
   FROM file
   WHERE phid=file_attachment.filePHID)

Eh, I'm kinda reluctant to run USE phabricator_file; SELECT COUNT(*) FROM file_attachment WHERE NOT EXISTS (SELECT * FROM file WHERE phid=file_attachment.filePHID); in our production instance because I'm worried it's gonna take a looong time. :-/

This comment was removed by brennen.

Would there be interest in getting a DB snapshot (replica on a real-size host) to test this on? We took this approach when upgrading VRTS: T355541: Install a temporary DB host in m2 to support VRTS migration.

Would there be interest in getting a DB snapshot (replica on a real-size host) to test this on? We took this approach when upgrading VRTS: T355541: Install a temporary DB host in m2 to support VRTS migration.

Would this be the case in the end? So I can prepare for this.

Are there any significant schema changes on this upgrade?

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy