|
|
Subscribe / Log in / New account

Introducing lazytime

By Jonathan Corbet
November 19, 2014
POSIX-compliant filesystems maintain three timestamps for every file, corresponding to the times of the last change in the file's metadata or contents (known as its "ctime"), modification of the file's contents ("mtime"), and access of the file ("atime"). The first two timestamps are generally considered to be useful, but "atime" has long been seen as being too expensive for the benefits it provides. In current systems, there is a mount option (called "relatime") that mitigates the worst problems caused by atime, but it has a few issues of its own. Now a new "lazytime" option might replace relatime and provide the best of all worlds.

The problem with atime is that it is supposed to be updated every time the associated file is accessed. Updating atime requires writing the file's inode back to disk, so atime tracking essentially turns every read operation into a write. For many workloads, the effect on performance can be severe. On top of that, there are few programs out there that make use of atime or depend on it being updated. So, ten years ago, it was common to mount filesystems with the "noatime" option, which disabled the tracking of access times entirely.

The problem, of course, is that "few programs" is not the same as "no programs"; it turns out that there are indeed utilities that break in the absence of atime tracking. A classic example is mail clients that use atime to determine whether a mailbox has been read since mail was last delivered to it. After some discussion, the kernel community added the "relatime" mount option in the 2.6.20 development cycle. Relatime will cause most atime updates to be suppressed, but it will allow an atime update if the current recorded atime is prior to the current ctime or mtime. Later on, relatime was tweaked to update atime once every 24 hours regardless (but only if the file is accessed, of course).

Relatime works well enough for most systems, but there are still those who would like better atime tracking without paying the performance penalty for it. Some users also dislike the fact that relatime, for all its value, causes the system to not be fully compliant with the POSIX specification. For the most part, people have put up with the minimal deficiencies in relatime (or put up with the cost of atime updates), but there is now an alternative on the horizon.

That alternative takes the form of the lazytime mount option, posted as an ext4-specific patch by Ted Ts'o. With lazytime enabled, a filesystem will keep atime current in a file's in-memory inode. But that inode will not be written to disk until there is some other reason to do so, or until the inode itself is being pushed out of memory. The effect will to have an atime that is always correct from the point of view of any program running on the system. The version of atime stored on disk may well lag significantly behind reality, though, and the current atime could be lost if the system were to crash.

Dave Chinner was quick to point out that, while the option looked like it could be useful, the ext4 filesystem might not be the best place to implement it. If lazytime were to be implemented in the virtual filesystem (VFS) layer, then it would be available for all filesystems, not just ext4 and, perhaps most importantly, it would work the same way on all of them. Ted agreed that a VFS implementation might make sense; the next version of this patch seems almost certain to be implemented at that level.

Dave also suggested that delaying the writing of atime updates indefinitely might not be advisable:

However, we'd be fools to ignore the development of relatime, which in it's original form never updated the atime until m/ctime updated. 3 years after it was introduced, relatime was changed to limit the update delay to 24 hours (before it was made the default) so that applications that required regular updates to timestamps didn't silently stop working.

Once again, Ted was amenable to this idea, so the next version will probably write out updated atime values a minimum of once every 24 hours. Without that change, atime updates could be held in memory for months at a time on a system like a database server (which keeps its files open indefinitely).

Finally, there is the question of whether lazytime should become the default mount option. It satisfies POSIX (or, at least, will after another fix or two) without incurring the cost of normal atime updates, so it does seem like a better option than relatime, which is the current default. Ted, seemingly, would like to change the default in the near future, while Dave is a bit more concerned about regressions and would like to wait a couple of years to see how things work out. That led to a question of whether the feature will see enough testing in the meantime, but, as Dave noted, there will probably be enough interest in the feature to ensure that people will try it out.

Whether that is true remains to be seen; relatime works well enough for most users, so there isn't necessarily a crowd of people looking to try a new filesystem mount option. But eventually some of the more adventurous distributions are likely to pick it up; at that point, any latent problems should probably come out before too long. So, when lazytime becomes the default in 2016 or so, it should indeed be well tested and shown to work without problems.

Index entries for this article
KernelFilesystems/Access-time tracking


to post comments

Introducing lazytime

Posted Nov 20, 2014 3:09 UTC (Thu) by kokada (subscriber, #92849) [Link] (1 responses)

For coincidence, yesterday I was reading the article about the relatime option here in LWN (https://lwn.net/Articles/244829/). Someone suggested something similar in the comments (https://lwn.net/Articles/244879/) and another person commented that Solaris already did something similar (https://lwn.net/Articles/244915/).

Introducing lazytime

Posted Dec 4, 2014 10:02 UTC (Thu) by chojrak11 (guest, #52056) [Link]

NTFS has that for years. And more; for example during the course of file writing, file metadata is not updated on the disk, only on close or when another process specifically requests it using an equivalent of stat/fstat(2).

Glad to hear that Linux is finally starting to use these critical performance-increasing tricks.

Mtime, too?

Posted Nov 20, 2014 5:09 UTC (Thu) by ncm (guest, #165) [Link] (8 responses)

Most people are shocked to find that mtime is not updated when you write to a file, but only after you close the file descriptor. If you mmap the file and close the file descriptor, subsequent modifications also do not affect mtime. If you crash before you close the file, mtime will not have been updated, even if the write finished a week before the crash.

It seems as if a similar treatment of mtime could be a big improvement.

Mtime, too?

Posted Nov 20, 2014 6:09 UTC (Thu) by neilbrown (subscriber, #359) [Link] (6 responses)

> Most people are shocked to find that mtime is not updated when you write to a file

Isn't it?
My reading of the code suggests that mtime is updated by calls to file_update_time(), and that ultimately calls mark_inode_dirty_sync() which should cause the on-disk version to be updated fairly promptly.

file_update_time() is called when writing to a file (__generic_file_write_iter) or when making a mmapping writeable (filemap_page_mkwrite).

So while I haven't tested, it looks to me like the mtime is updated at the correct time and is written to storage promptly (not instantly - maybe 30second delay).

What is your evidence?

Mtime, too?

Posted Nov 20, 2014 10:07 UTC (Thu) by epa (subscriber, #39769) [Link]

While a 30 second (or 24 hour) delay may be acceptable for the infrequently-used atime field, failure to update mtime promptly could result in data loss (if there is a crash in the meantime). 'make' wouldn't build the right result, and so on.

One answer might be to add an 'mtime needs updating' flag to the inode. When a file is opened for write access (or left mmapped, even if the file descriptor is closed) then this 'needs updating' flag is set and written to disk immediately. If there is a crash, then on recovery the mtime on all such inodes is set to the current time.

Mtime, too?

Posted Nov 20, 2014 16:37 UTC (Thu) by SEJeff (guest, #51588) [Link] (4 responses)

Looks like the parent is correct. I didn't believe this either, but it only takes ~30 seconds to test. I touched a file, used stat -c %Y to see the mtime, opened it in write mode via the python shell, wrote to it ^Z, ran stat, etc, etc.

$ uname -r
3.8.5-201.fc18.x86_64
$ touch foobar
$ stat -c %Y foobar
1416501313
$ python
Python 2.7.3 (default, Aug 9 2012, 17:23:57)
[GCC 4.7.1 20120720 (Red Hat 4.7.1-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

>>> FH = open('foobar', 'w')
>>>
[1]+ Stopped python
$ stat -c %Y foobar
1416501321
$ fg
python

>>> FH.write('test\n')
>>>
[1]+ Stopped python
$ stat -c %Y foobar
1416501321
$ fg
python

>>> FH.write('test\n')
>>>
[1]+ Stopped python
$ stat -c %Y foobar
1416501321
$ fg
python

>>> FH.close()
>>>
$ stat -c %Y foobar
1416501347

Looks like the evidence points that indeed this is the case. Oh and if you're curious, this is on my Fedora 18 workstation (I'm too lazy to update) with a 3.8.5 kernel.

Mtime, too?

Posted Nov 20, 2014 16:58 UTC (Thu) by mthambi (guest, #51395) [Link] (2 responses)

FH.flush() is needed to make python do the system call.

After that mtime gets updated immediately. I am not sure whether it is actually written to disk.

Mtime, too?

Posted Nov 20, 2014 20:59 UTC (Thu) by jefftaylor (guest, #95911) [Link] (1 responses)

Alternatively, using the `os` (os.{open|write|close}) module gives direct access to the low level file access primitives. Particularly important when mucking around with HW drivers; the builtin `write` buffers into 4096B chunks which might wreak havoc if your goal is to send data over a wire.

Using the `os` module gives the expected behaviour (linux 3.17.2, F20)

$ uname -r
3.17.2-200.fc20.x86_64
$ touch test
$ stat -c %Y test
1416516686
$ python3
Python 3.3.2 (default, Nov 3 2014, 15:32:43)
[GCC 4.8.3 20140911 (Red Hat 4.8.3-7)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> fd = os.open("test", os.O_WRONLY)
[2]+ Stopped python3
$ stat -c %Y test
1416516686 # Unchanged by os.open
$ fg
python3
>>> os.write(3, b"test\n")
5
[2]+ Stopped python3
$ stat -c %Y test
1416516760 # Changed by os.write
$ fg
python3
>>> os.write(3, b"test2\n")
6
[2]+ Stopped python3
$ stat -c %Y test
1416516780 # Changed by os.write
$ fg
python3
>>> os.close(3)
[2]+ Stopped python3
$ stat -c %Y test
1416516780 # Unchanged by os.close

Mtime, too?

Posted Nov 21, 2014 4:06 UTC (Fri) by zev (subscriber, #88455) [Link]

Or just use the shell...

$ exec 3>foo
$ stat -c%Y foo
1416542579
$ echo bar >&3
$ stat -c%Y foo
1416542590

# note that subsequently closing does *not* update mtime
$ sleep 5
$ exec 3>&-
$ stat -c%Y foo
1416542590

Mtime, too?

Posted Dec 3, 2014 10:06 UTC (Wed) by k8to (guest, #15413) [Link]

On unix, with native filesystems, mtime is updated for every write.

I don't know when the inode is updated on disk.

For more odd ducks like NFS and CIFS, the mtime information viewed by different parties can vary, unfortunately. On modern Windows, the strongly equivalent modification time is only updated on file close, which is almost irrelevant except when writing portable code or when mounting those filesystems on unix.

Mtime, too?

Posted Nov 20, 2014 22:50 UTC (Thu) by reubenhwk (guest, #75803) [Link]

/* I also had to test this to see for myself.  My C code
 * shows mtime *is* updated as expected...
 */


#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>

int main()
{
	struct stat buf;

	FILE * fp = fopen("hello.txt", "w");
	fprintf(fp, "Hello World!\n");
	fflush(fp);
	stat("hello.txt", &buf);
	printf("mtime: %d\n", (int)buf.st_mtime);

	sleep(2);

	fp = fopen("hello.txt", "a");
	fprintf(fp, "Hello World!\n");
	fflush(fp);

	stat("hello.txt", &buf);
	printf("mtime: %d\n", (int)buf.st_mtime);
	return 0;		
}

Introducing lazytime

Posted Nov 21, 2014 4:52 UTC (Fri) by steveriley (guest, #83540) [Link] (1 responses)

The article states:

"In current systems, there is a mount option (called "relatime") that mitigates the worst problems caused by atime, but it has a few issues of its own."

What are the "few issues" that relatime has?

Introducing lazytime

Posted Nov 21, 2014 11:15 UTC (Fri) by tialaramex (subscriber, #21167) [Link]

If you rely on atime's official semantics you will be disappointed by relatime which of course does not provide them.

For example, perhaps you intend to preserve files in a cache directory so long as they've been accessed at least once per hour. You have a flush process which checks once per hour, and any files that are more than an hour old and have an atime more than an hour old are deleted. With relatime this feature won't work out of the box. You have to configure relatime to ensure atime is no worse than an hour (the default is a day) wrong.

Introducing lazytime

Posted Nov 25, 2014 12:42 UTC (Tue) by nix (subscriber, #2304) [Link] (6 responses)

It's not correct to say that this is just a atime-related change. It affects mtime too: if a file is written to without modifying its inode (say, it's preallocated or you're modifying the middle of it for some reason) *mtime* updates will be put off just like the atime ones.

This is a fairly significant change in semantics, I'd say.

Introducing lazytime

Posted Nov 25, 2014 14:51 UTC (Tue) by mathstuf (subscriber, #69389) [Link] (5 responses)

Any reason mtime wouldn't trigger under the "some other reason" condition here?

> But that inode will not be written to disk until there is some other reason to do so, or until the inode itself is being pushed out of memory.

Introducing lazytime

Posted Nov 25, 2014 18:53 UTC (Tue) by dlang (guest, #313) [Link]

It's also worth pointing out the difference between the inode being modified in RAM and the inode being pushed out to disk.

Unless the system crashes, your software isn't going to see any difference, any filesystem actions you take will see the modified inode.

This change is just affecting when the changes hit disk.

Introducing lazytime

Posted Nov 25, 2014 22:43 UTC (Tue) by nix (subscriber, #2304) [Link] (3 responses)

Because writing to the middle of a non-sparse file won't change any part of the inode other than the mtime -- so the inode wouldn't get written out under those conditions.

Introducing lazytime

Posted Nov 26, 2014 4:31 UTC (Wed) by mathstuf (subscriber, #69389) [Link] (1 responses)

To restate: why would it be invalid for the mtime to not be a trigger to write the inode out to disk? (Maybe I'm missing some key data point here, but in don't think your question shed any new light for me.)

Introducing lazytime

Posted Nov 27, 2014 17:34 UTC (Thu) by nix (subscriber, #2304) [Link]

My understanding is that after this patch, changes to the inode times (atime, mtime and ctime) are not such triggers, by definition: only changes to other things in the inode are. Clearly it *would* be valid -- heck, I'd prefer it, since unlike atime updates mtime updates don't really constitute a large volume of disk output in normal usage as far as I'm aware -- but that's not what this patch does.

Introducing lazytime

Posted Dec 4, 2014 14:40 UTC (Thu) by nye (subscriber, #51576) [Link]

Ted's actual description is somewhat clearer:

>Add a new mount option which enables a new "lazytime" mode. This mode
>causes atime, mtime, and ctime updates to only be made to the
>in-memory version of the inode. The on-disk times will only get
>updated when (a) when the inode table block for the inode needs to be
>updated for some non-time related change involving any inode in the
>block, (b) if userspace calls fsync(), or (c) the refcount on an
>undeleted inode goes to zero (in most cases, when the last file
>descriptor assoicated with the inode is closed).

So if you want that information to be committed to disk, fsync() will do the job.
Since behaviour when the file is never fsync()ed is not well defined, this isn't *in theory* much of a change.

In practice, the difference roughly means going from
"Data written but not synced before OS crash or power loss will hopefully be written to disk, mostly in the right order if you're lucky, but there are no guarantees" to appending the proviso "except for isolated time updates, which probably won't".

In both cases, you could never be sure that the mtime was updated on disk after writing to the middle to your file, nor even that the write was either completed or not, and the file would need to be checked for consistency. This might, in certain circumstances, reduce the probability that you happen to get lucky, if you're using mtime to determine whether a file needs to be checked after crash/power loss (this scenario seems unlikely on the grounds that it probably won't work a great deal of the time).

Do correct me if I have misunderstood.

Introducing lazytime

Posted Apr 25, 2015 15:45 UTC (Sat) by mcortese (guest, #52099) [Link]

I don't care about performance, but all my disks are SSD and avoiding some writes looks like a good idea. I switched to noatime years ago and have never since regretted it.

Apart from email clients developed centuries ago, what are the "few" applications that really can't work without atime?

Introducing lazytime

Posted Aug 10, 2015 22:48 UTC (Mon) by lsatenstein (guest, #34741) [Link]

I am just an end-user with SSD disks. I would have preferred to have noatime as the absolute default for any fileopen or mount.
But with the open call, I would have added a symbol _FORCE_ATIME, which would reverse the noatime for that particular file or Directory or partition.

Of course, we would have to go through the exercise of review by all standard associations, but in the end, atime or noatime would be managed by the user. (If the open() is sacrosanct, then extend the touch call to control /force atime.

If I am wrong about this solution for managing atime, please post an explanation to me as to why it is not a good solution.

Thank you

Leslie


Copyright © 2014, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy