Introducing lazytime
The problem with atime is that it is supposed to be updated every time the associated file is accessed. Updating atime requires writing the file's inode back to disk, so atime tracking essentially turns every read operation into a write. For many workloads, the effect on performance can be severe. On top of that, there are few programs out there that make use of atime or depend on it being updated. So, ten years ago, it was common to mount filesystems with the "noatime" option, which disabled the tracking of access times entirely.
The problem, of course, is that "few programs" is not the same as "no programs"; it turns out that there are indeed utilities that break in the absence of atime tracking. A classic example is mail clients that use atime to determine whether a mailbox has been read since mail was last delivered to it. After some discussion, the kernel community added the "relatime" mount option in the 2.6.20 development cycle. Relatime will cause most atime updates to be suppressed, but it will allow an atime update if the current recorded atime is prior to the current ctime or mtime. Later on, relatime was tweaked to update atime once every 24 hours regardless (but only if the file is accessed, of course).
Relatime works well enough for most systems, but there are still those who would like better atime tracking without paying the performance penalty for it. Some users also dislike the fact that relatime, for all its value, causes the system to not be fully compliant with the POSIX specification. For the most part, people have put up with the minimal deficiencies in relatime (or put up with the cost of atime updates), but there is now an alternative on the horizon.
That alternative takes the form of the lazytime mount option, posted as an ext4-specific patch by Ted Ts'o. With lazytime enabled, a filesystem will keep atime current in a file's in-memory inode. But that inode will not be written to disk until there is some other reason to do so, or until the inode itself is being pushed out of memory. The effect will to have an atime that is always correct from the point of view of any program running on the system. The version of atime stored on disk may well lag significantly behind reality, though, and the current atime could be lost if the system were to crash.
Dave Chinner was quick to point out that, while the option looked like it could be useful, the ext4 filesystem might not be the best place to implement it. If lazytime were to be implemented in the virtual filesystem (VFS) layer, then it would be available for all filesystems, not just ext4 and, perhaps most importantly, it would work the same way on all of them. Ted agreed that a VFS implementation might make sense; the next version of this patch seems almost certain to be implemented at that level.
Dave also suggested that delaying the writing of atime updates indefinitely might not be advisable:
Once again, Ted was amenable to this idea, so the next version will probably write out updated atime values a minimum of once every 24 hours. Without that change, atime updates could be held in memory for months at a time on a system like a database server (which keeps its files open indefinitely).
Finally, there is the question of whether lazytime should become the default mount option. It satisfies POSIX (or, at least, will after another fix or two) without incurring the cost of normal atime updates, so it does seem like a better option than relatime, which is the current default. Ted, seemingly, would like to change the default in the near future, while Dave is a bit more concerned about regressions and would like to wait a couple of years to see how things work out. That led to a question of whether the feature will see enough testing in the meantime, but, as Dave noted, there will probably be enough interest in the feature to ensure that people will try it out.
Whether that is true remains to be seen; relatime works well enough for
most users, so there isn't necessarily a crowd of people looking to try a
new filesystem mount option. But eventually some of the more adventurous
distributions are likely to pick it up; at that point, any latent problems
should probably come out before too long. So, when lazytime becomes the
default in 2016 or so, it should indeed be well tested and shown to work
without problems.
Index entries for this article | |
---|---|
Kernel | Filesystems/Access-time tracking |
Introducing lazytime
Posted Nov 20, 2014 3:09 UTC (Thu)
by kokada (subscriber, #92849)
[Link] (1 responses)
Posted Nov 20, 2014 3:09 UTC (Thu) by kokada (subscriber, #92849) [Link] (1 responses)
Introducing lazytime
Posted Dec 4, 2014 10:02 UTC (Thu)
by chojrak11 (guest, #52056)
[Link]
Posted Dec 4, 2014 10:02 UTC (Thu) by chojrak11 (guest, #52056) [Link]
Glad to hear that Linux is finally starting to use these critical performance-increasing tricks.
Mtime, too?
Posted Nov 20, 2014 5:09 UTC (Thu)
by ncm (guest, #165)
[Link] (8 responses)
Posted Nov 20, 2014 5:09 UTC (Thu) by ncm (guest, #165) [Link] (8 responses)
It seems as if a similar treatment of mtime could be a big improvement.
Mtime, too?
Posted Nov 20, 2014 6:09 UTC (Thu)
by neilbrown (subscriber, #359)
[Link] (6 responses)
Posted Nov 20, 2014 6:09 UTC (Thu) by neilbrown (subscriber, #359) [Link] (6 responses)
Isn't it?
My reading of the code suggests that mtime is updated by calls to file_update_time(), and that ultimately calls mark_inode_dirty_sync() which should cause the on-disk version to be updated fairly promptly.
file_update_time() is called when writing to a file (__generic_file_write_iter) or when making a mmapping writeable (filemap_page_mkwrite).
So while I haven't tested, it looks to me like the mtime is updated at the correct time and is written to storage promptly (not instantly - maybe 30second delay).
What is your evidence?
Mtime, too?
Posted Nov 20, 2014 10:07 UTC (Thu)
by epa (subscriber, #39769)
[Link]
Posted Nov 20, 2014 10:07 UTC (Thu) by epa (subscriber, #39769) [Link]
One answer might be to add an 'mtime needs updating' flag to the inode. When a file is opened for write access (or left mmapped, even if the file descriptor is closed) then this 'needs updating' flag is set and written to disk immediately. If there is a crash, then on recovery the mtime on all such inodes is set to the current time.
Mtime, too?
Posted Nov 20, 2014 16:37 UTC (Thu)
by SEJeff (guest, #51588)
[Link] (4 responses)
Posted Nov 20, 2014 16:37 UTC (Thu) by SEJeff (guest, #51588) [Link] (4 responses)
$ uname -r
3.8.5-201.fc18.x86_64
$ touch foobar
$ stat -c %Y foobar
1416501313
$ python
Python 2.7.3 (default, Aug 9 2012, 17:23:57)
[GCC 4.7.1 20120720 (Red Hat 4.7.1-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> FH = open('foobar', 'w')
>>>
[1]+ Stopped python
$ stat -c %Y foobar
1416501321
$ fg
python
>>> FH.write('test\n')
>>>
[1]+ Stopped python
$ stat -c %Y foobar
1416501321
$ fg
python
>>> FH.write('test\n')
>>>
[1]+ Stopped python
$ stat -c %Y foobar
1416501321
$ fg
python
>>> FH.close()
>>>
$ stat -c %Y foobar
1416501347
Looks like the evidence points that indeed this is the case. Oh and if you're curious, this is on my Fedora 18 workstation (I'm too lazy to update) with a 3.8.5 kernel.
Mtime, too?
Posted Nov 20, 2014 16:58 UTC (Thu)
by mthambi (guest, #51395)
[Link] (2 responses)
Posted Nov 20, 2014 16:58 UTC (Thu) by mthambi (guest, #51395) [Link] (2 responses)
After that mtime gets updated immediately. I am not sure whether it is actually written to disk.
Mtime, too?
Posted Nov 20, 2014 20:59 UTC (Thu)
by jefftaylor (guest, #95911)
[Link] (1 responses)
Posted Nov 20, 2014 20:59 UTC (Thu) by jefftaylor (guest, #95911) [Link] (1 responses)
Using the `os` module gives the expected behaviour (linux 3.17.2, F20)
$ uname -r
3.17.2-200.fc20.x86_64
$ touch test
$ stat -c %Y test
1416516686
$ python3
Python 3.3.2 (default, Nov 3 2014, 15:32:43)
[GCC 4.8.3 20140911 (Red Hat 4.8.3-7)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> fd = os.open("test", os.O_WRONLY)
[2]+ Stopped python3
$ stat -c %Y test
1416516686 # Unchanged by os.open
$ fg
python3
>>> os.write(3, b"test\n")
5
[2]+ Stopped python3
$ stat -c %Y test
1416516760 # Changed by os.write
$ fg
python3
>>> os.write(3, b"test2\n")
6
[2]+ Stopped python3
$ stat -c %Y test
1416516780 # Changed by os.write
$ fg
python3
>>> os.close(3)
[2]+ Stopped python3
$ stat -c %Y test
1416516780 # Unchanged by os.close
Mtime, too?
Posted Nov 21, 2014 4:06 UTC (Fri)
by zev (subscriber, #88455)
[Link]
Posted Nov 21, 2014 4:06 UTC (Fri) by zev (subscriber, #88455) [Link]
$ exec 3>foo
$ stat -c%Y foo
1416542579
$ echo bar >&3
$ stat -c%Y foo
1416542590
# note that subsequently closing does *not* update mtime
$ sleep 5
$ exec 3>&-
$ stat -c%Y foo
1416542590
Mtime, too?
Posted Dec 3, 2014 10:06 UTC (Wed)
by k8to (guest, #15413)
[Link]
Posted Dec 3, 2014 10:06 UTC (Wed) by k8to (guest, #15413) [Link]
I don't know when the inode is updated on disk.
For more odd ducks like NFS and CIFS, the mtime information viewed by different parties can vary, unfortunately. On modern Windows, the strongly equivalent modification time is only updated on file close, which is almost irrelevant except when writing portable code or when mounting those filesystems on unix.
Mtime, too?
Posted Nov 20, 2014 22:50 UTC (Thu)
by reubenhwk (guest, #75803)
[Link]
Posted Nov 20, 2014 22:50 UTC (Thu) by reubenhwk (guest, #75803) [Link]
/* I also had to test this to see for myself. My C code * shows mtime *is* updated as expected... */ #include <stdio.h> #include <sys/types.h> #include <sys/stat.h> #include <unistd.h> int main() { struct stat buf; FILE * fp = fopen("hello.txt", "w"); fprintf(fp, "Hello World!\n"); fflush(fp); stat("hello.txt", &buf); printf("mtime: %d\n", (int)buf.st_mtime); sleep(2); fp = fopen("hello.txt", "a"); fprintf(fp, "Hello World!\n"); fflush(fp); stat("hello.txt", &buf); printf("mtime: %d\n", (int)buf.st_mtime); return 0; }
Introducing lazytime
Posted Nov 21, 2014 4:52 UTC (Fri)
by steveriley (guest, #83540)
[Link] (1 responses)
Posted Nov 21, 2014 4:52 UTC (Fri) by steveriley (guest, #83540) [Link] (1 responses)
"In current systems, there is a mount option (called "relatime") that mitigates the worst problems caused by atime, but it has a few issues of its own."
What are the "few issues" that relatime has?
Introducing lazytime
Posted Nov 21, 2014 11:15 UTC (Fri)
by tialaramex (subscriber, #21167)
[Link]
Posted Nov 21, 2014 11:15 UTC (Fri) by tialaramex (subscriber, #21167) [Link]
For example, perhaps you intend to preserve files in a cache directory so long as they've been accessed at least once per hour. You have a flush process which checks once per hour, and any files that are more than an hour old and have an atime more than an hour old are deleted. With relatime this feature won't work out of the box. You have to configure relatime to ensure atime is no worse than an hour (the default is a day) wrong.
Introducing lazytime
Posted Nov 25, 2014 12:42 UTC (Tue)
by nix (subscriber, #2304)
[Link] (6 responses)
Posted Nov 25, 2014 12:42 UTC (Tue) by nix (subscriber, #2304) [Link] (6 responses)
This is a fairly significant change in semantics, I'd say.
Introducing lazytime
Posted Nov 25, 2014 14:51 UTC (Tue)
by mathstuf (subscriber, #69389)
[Link] (5 responses)
Posted Nov 25, 2014 14:51 UTC (Tue) by mathstuf (subscriber, #69389) [Link] (5 responses)
> But that inode will not be written to disk until there is some other reason to do so, or until the inode itself is being pushed out of memory.
Introducing lazytime
Posted Nov 25, 2014 18:53 UTC (Tue)
by dlang (guest, #313)
[Link]
Posted Nov 25, 2014 18:53 UTC (Tue) by dlang (guest, #313) [Link]
Unless the system crashes, your software isn't going to see any difference, any filesystem actions you take will see the modified inode.
This change is just affecting when the changes hit disk.
Introducing lazytime
Posted Nov 25, 2014 22:43 UTC (Tue)
by nix (subscriber, #2304)
[Link] (3 responses)
Posted Nov 25, 2014 22:43 UTC (Tue) by nix (subscriber, #2304) [Link] (3 responses)
Introducing lazytime
Posted Nov 26, 2014 4:31 UTC (Wed)
by mathstuf (subscriber, #69389)
[Link] (1 responses)
Posted Nov 26, 2014 4:31 UTC (Wed) by mathstuf (subscriber, #69389) [Link] (1 responses)
Introducing lazytime
Posted Nov 27, 2014 17:34 UTC (Thu)
by nix (subscriber, #2304)
[Link]
Posted Nov 27, 2014 17:34 UTC (Thu) by nix (subscriber, #2304) [Link]
Introducing lazytime
Posted Dec 4, 2014 14:40 UTC (Thu)
by nye (subscriber, #51576)
[Link]
Posted Dec 4, 2014 14:40 UTC (Thu) by nye (subscriber, #51576) [Link]
>Add a new mount option which enables a new "lazytime" mode. This mode
>causes atime, mtime, and ctime updates to only be made to the
>in-memory version of the inode. The on-disk times will only get
>updated when (a) when the inode table block for the inode needs to be
>updated for some non-time related change involving any inode in the
>block, (b) if userspace calls fsync(), or (c) the refcount on an
>undeleted inode goes to zero (in most cases, when the last file
>descriptor assoicated with the inode is closed).
So if you want that information to be committed to disk, fsync() will do the job.
Since behaviour when the file is never fsync()ed is not well defined, this isn't *in theory* much of a change.
In practice, the difference roughly means going from
"Data written but not synced before OS crash or power loss will hopefully be written to disk, mostly in the right order if you're lucky, but there are no guarantees" to appending the proviso "except for isolated time updates, which probably won't".
In both cases, you could never be sure that the mtime was updated on disk after writing to the middle to your file, nor even that the write was either completed or not, and the file would need to be checked for consistency. This might, in certain circumstances, reduce the probability that you happen to get lucky, if you're using mtime to determine whether a file needs to be checked after crash/power loss (this scenario seems unlikely on the grounds that it probably won't work a great deal of the time).
Do correct me if I have misunderstood.
Introducing lazytime
Posted Apr 25, 2015 15:45 UTC (Sat)
by mcortese (guest, #52099)
[Link]
Posted Apr 25, 2015 15:45 UTC (Sat) by mcortese (guest, #52099) [Link]
I don't care about performance, but all my disks are SSD and avoiding some writes looks like a good idea. I switched to noatime years ago and have never since regretted it.
Apart from email clients developed centuries ago, what are the "few" applications that really can't work without atime?
Introducing lazytime
Posted Aug 10, 2015 22:48 UTC (Mon)
by lsatenstein (guest, #34741)
[Link]
Posted Aug 10, 2015 22:48 UTC (Mon) by lsatenstein (guest, #34741) [Link]
But with the open call, I would have added a symbol _FORCE_ATIME, which would reverse the noatime for that particular file or Directory or partition.
Of course, we would have to go through the exercise of review by all standard associations, but in the end, atime or noatime would be managed by the user. (If the open() is sacrosanct, then extend the touch call to control /force atime.
If I am wrong about this solution for managing atime, please post an explanation to me as to why it is not a good solution.
Thank you
Leslie