-
Notifications
You must be signed in to change notification settings - Fork 1.1k
fix NoCacheLoadAndPost data corruption #2532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
The size_orgmeta was not updated correctly in cases of insufficient cache, which resulted in the previously uploaded data not being loaded correctly when re-entering the NoCacheLoadAndPost process Signed-off-by: liubingrun <liubr1@chinatelecom.cn>
Is there any kind of test we can add that exercises these code paths? |
sure, will try and come back later. |
@VVoidV Thanks for this PR. |
I come back, with bad news... Initially, I intended to limit the remaining disk space to verify my changes. However, I found that after writing the file, the ls command hangs. So, I rolled back the code to the master. But the hang still occurred. It seems that we have introduced a new deadlock in the master. I set fake_diskfree to 200 and copied a 180MB file to the test directory, and s3fs_release is hanging on fdent_lock.
|
I am confident that version 1.94, combined with the following patch(picked from1a50b9a04a82678b05e36927c323f42d94ca4a07) , should not cause such a deadlock. 0001-bugfix-Fixed-a-deadlock-in-the-FdManager-ChangeEntit.patch |
After testing with git bisect, I confirmed that the first bad commit is 1a50b9a. |
oh, It seems to be related to UpdateEntityToTempPath();, as FdEntity is being destructed twice. I tried to fix this issue. Please check the case when cachedir is not specified (using the /tmp directory). At this point, temporary files are already being used. Is there still a need to switch to another tmpfile? @ggtakec |
ecec420
to
366d7d6
Compare
When the cachedir is not specified, FdEntity already uses a randomly prefixed key. Therefore, when handling UpdateEntityToTempPath, it is necessary to perform a search through the map. Otherwise, the fdent map will insert duplicate entries, leading to repeated destructors being called, which triggers the deadlock. Signed-off-by: liubingrun <liubr1@chinatelecom.cn>
366d7d6
to
1ddd713
Compare
if(GetStatsHasLock(st)){ | ||
size_orgmeta = st.st_size; | ||
} else { | ||
S3FS_PRN_ERR("fstat is failed by errno(%d), but continue...", errno); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this return errno instead?
if(GetStatsHasLock(st)){ | ||
size_orgmeta = st.st_size; | ||
} else { | ||
S3FS_PRN_ERR("fstat is failed by errno(%d), but continue...", errno); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
It took me a while to reproduce this problem. Could you test it by mixing the following into your code?
This problem is probably not a problem with The above fix code has been filed as a PR (#2538), so I think it will be reviewed and merged. Thanks in advance for your kndness. |
I observed that when the cache dir is not specified, the keys stored in the fent map are already random paths. However, the keys placed in except_fent by ChangeEntityToTempPath still use the actual paths. This discrepancy causes issues during the update, as it cannot find the corresponding entries in the fent based on the path, leading it to enter the else branch. Here is the point that the FdEntity pointer is inserted a second time Lines 772 to 792 in b283ab2
|
This relies on the std::enable_shared_from_this helper to create a shared_ptr from this. Fixes s3fs-fuse#2532.
This relies on the std::enable_shared_from_this helper to create a shared_ptr from this. Fixes s3fs-fuse#2532.
FdEntity may have multiple references due to ChangeEntityToTempPath. This relies on the std::enable_shared_from_this helper to create a std::shared_ptr from this. Fixes s3fs-fuse#2532.
FdEntity may have multiple references due to ChangeEntityToTempPath. This relies on the std::enable_shared_from_this helper to create a std::shared_ptr from this. Fixes s3fs-fuse#2532.
FdEntity may have multiple references due to ChangeEntityToTempPath. This relies on the std::enable_shared_from_this helper to create a std::shared_ptr from this. Fixes s3fs-fuse#2532.
FdEntity may have multiple references due to ChangeEntityToTempPath. This relies on the std::enable_shared_from_this helper to create a std::shared_ptr from this. Fixes s3fs-fuse#2532.
FdEntity may have multiple references due to ChangeEntityToTempPath. This relies on the std::enable_shared_from_this helper to create a std::shared_ptr from this. Fixes s3fs-fuse#2532.
After merging #2541, this PR was automatically closed for some reason, so I will reopen it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you test the latest master and rebase or close this PR as needed?
Sorry for the late reply. I tested again with the 1.95 code. The cachedir is not specified, so the key in fdent is already a temp path. When ChangeEntityToTempPath() is called, the key in except_fent becomes the real path. This causes UpdateEntityToTempPath to not find the entity, then put it back into fdent. This results in two identical FdEntity entries in the fdent map. |
@VVoidV I think I understand the problem. If the cache directory is not specified (there may be cases already I have posted a PR(as Draft) for this fix in #2582. |
The size_orgmeta was not updated correctly in cases of insufficient cache, which resulted in the previously uploaded data not being loaded correctly when re-entering the NoCacheLoadAndPost process
Details
s3fs-fuse/src/fdcache_entity.cpp
Lines 1149 to 1159 in df5364d