0% found this document useful (0 votes)
289 views7 pages

Log File Sync Waits (Doc ID 1376916.1)

This document provides troubleshooting steps for addressing 'log file sync' waits in Oracle databases, detailing the causes and diagnostic methods for high wait times. Key factors include issues with LGWR's I/O performance, excessive application commits, and the size of redo logs. Recommendations are provided for optimizing performance and reducing wait times through various strategies, including examining I/O performance and adjusting commit frequency.

Uploaded by

Daniel Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
289 views7 pages

Log File Sync Waits (Doc ID 1376916.1)

This document provides troubleshooting steps for addressing 'log file sync' waits in Oracle databases, detailing the causes and diagnostic methods for high wait times. Key factors include issues with LGWR's I/O performance, excessive application commits, and the size of redo logs. Recommendations are provided for optimizing performance and reducing wait times through various strategies, including examining I/O performance and adjusting commit frequency.

Uploaded by

Daniel Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Copyright (c) 2024, Oracle. All rights reserved. Oracle Confidential.

Troubleshooting: 'Log file sync' Waits (Doc ID 1376916.1)

In this Document

Purpose
Troubleshooting Steps
What is a 'log file sync' wait?
What should be collected for initial diagnosis of 'log file sync' waits ?
What causes high waits for 'log file sync'?
Issues affecting LGWR's IO Performance
Compare the average wait time for 'log file sync' to the average wait time for 'log file parallel write'.
Recommendations
Effect of Peaks of IO (or Bursts of IO) on 'log file sync' waits
Check LGWR Traces
Recommendations
Check to see if redo logs are large enough
Recommendations
Excessive Application Commits
Compare the average user commits and user rollbacks to user calls
Recommendations
Other Wait Events that may be relevant
Adaptive Log File Sync
Redo Synch Time Overhead
Known Issues
Other diagnostics to help analyze 'log file sync' waits?
Data Guard and 'Log File Sync'
Troubleshooting Other Performance Issues
Community Discussions
References

APPLIES TO:

Gen 1 Exadata Cloud at Customer (Oracle Exadata Database Cloud Machine) - Version N/A and later
Oracle Cloud Infrastructure - Database Service - Version N/A and later
Oracle Database Backup Service - Version N/A and later
Oracle Database Cloud Exadata Service - Version N/A and later
Oracle Database Cloud Service - Version N/A and later
Information in this document applies to any platform.

PURPOSE

The purpose of this note is to help troubleshoot issues where there are significant waits for the event 'log file sync'.

TROUBLESHOOTING STEPS

What is a 'log file sync' wait?


When a user session commits, all redo records generated by that session's transaction need to be flushed from memory to the
redo logfile to insure changes to the database made by that transaction become permanent.

At the time of commit, the user session will post LGWR to write the log buffer (containing the current unwritten redo, including
this session's redo records) to the redo log file. Once LGWR knows that its write requests have completed, it will post the user
session to notify it that this has completed. The user session waits on 'log file sync' while waiting for LGWR to post it back to
confirm all redo it generated have made it safely onto disk.

The time between the user session posting the LGWR and the LGWR posting the user after the write has completed is the
wait time for 'log file sync' that the user session will show.

Note that in 11.2 and higher LGWR may dynamically switch from the default post/wait mode to a polling mode where it will
maintain it's writing progress in an in-memory structure and sessions waiting on 'log file sync' can periodically check that
structure (i.e. poll) to see if LGWR has progressed far enough such that the redo records representing their transactions have
made it to disk. In that case the wait time will span from posting LGWR until the session sees sufficient progress has been
made.

NOTE: if a sync is ongoing, other sessions that want to commit (and thus flush log information) will also wait for the
LGWR to sync and will also wait on 'log file sync'?

What should be collected for initial diagnosis of 'log file sync' waits ?

To initially analyze 'log file sync' waits the following information is helpful:

AWR report from a similar time frame and period where time waited for 'log file sync' is "acceptable" in order to use as
a baseline for reasonable performance for comparison purposes
AWR report when "excessive" 'log file sync' waits are occurring
Note: The 2 reports should be for between 10-30 minutes each.
LGWR trace file (including LGnn traces in 12.2 and higher)
The lgwr trace file will show warning messages for periods when 'redo writing' times may be high

What causes high waits for 'log file sync'?

Waits for the 'log file sync' event can occur at any stage between a user process posting the LGWR to write redo information
and the LGWR posting back the user process after the redo has been written from the log buffer to disk (local redo logs and
optionally propagated remote standby databases in SYNC mode) and the user process waking up to receive the post or poll
that LGWR has written the info as requested.
For more information see:

Document:34592.1 WAITEVENT: "log file sync"

In terms of the most common causes, these are:

Issues affecting LGWR's I/O Performance


Excessive Application Commits

Details of these causes and how to troubleshoot them are outlined below:

Issues affecting LGWR's IO Performance

The primary question we are looking to answer here is "Is LGWR slow in writing to disk?". The following steps can assist in
determining whether this is the case or not.

Compare the average wait time for 'log file sync' to the average wait time for 'log file parallel write'.
Wait event 'log file parallel' write is waited for by LGWR while the actual write operation to the redo is occurring. The duration
of the event shows the time waited for the IO portion of the operation to occur. For more information on "log file parallel
write" see: :

Document:34583.1 WAITEVENT: "log file parallel write" Reference Note

Looking at this event in conjunction with "log file sync" shows how much of the sync operation is spent on IO and also, by
inference, how much processing time is spent on the CPU.

The example above shows high wait times for both 'log file sync' and 'log file parallel write'

If the proportion of the 'log file sync' time spent on 'log file parallel write' times is high, then most of the wait time is due to IO
(waiting for the redo to be written). The performance of LGWR in terms of IO should be examined. As a rule of thumb, an
average time for 'log file parallel write' over 5-10 milliseconds, maybe even lower, suggests a problem with IO subsystem (the
typical time may be much smaller for more modern storage systems with lots of disk caching and/or non-moving parts e.g.
SSD, NVRAM, etc.).

Recommendations

Work with the system administrator to examine the file systems / logical volumes where the redologs are located with a
view to improving the performance of IO.
Avoid placing redo logfiles on older generations or less sophisticated RAID technologies that require the calculation of
parity, such as RAID-5 or RAID-6 and writing to multiple disks with very little front end caching or buffering plus
dedicated CPU resources to mask that overhead.
Avoid placing redo logs on older generations of Solid State Disk (SSD) technologies.
Although generally, Solid State Disks write performance is good on average, they may endure write peaks which will
highly increase some waits on 'log file sync' which may result in choppy performance or even transient database hangs.
(This should be tested, as there are cases where performance is still acceptable on SSD despite the uneven IO response
times)
Oracle Engineered Systems (Exadata, SuperCluster and Oracle Database Appliance) have been optimized to leverage
SSDs and newer related technologies more effectively.

Look for other processes that may be writing to the same disk location or general IO paths and ensure that the storage
systems have sufficient bandwidth to cope with the required IO traffic activity. If they do not then consider adding /
modernizing the storage to increase the load it can handle or rebalance the existing IO activity as much as possible
across what is currently available.

Effect of Peaks of IO (or Bursts of IO) on 'log file sync' waits

Log writer (LGWR) tends to write in small bursts of activity as opposed to a large chunks of I/O (which is more normal). Most
disks are not setup to work well with small bursts of activity and this might cause IO waits. But since, on average, the disks
work fine (they are just not working well with the burst of activity) I/O vendors may report that there is no issue with the
disks. This, along with other I/O on the system, may mean that rather than the I/O performance being consistent over time,
there may be a burst of higher I/O followed by a period of lower I/O (the bursts may result in what are called IO outliers).
When averaged out, the peak information is 'lost' in the mass of lower I/O numbers, but these peaks can cause a backup of
log data that needs to be written and this manifests itself as high waits for 'log file sync' but where the average waits for 'log
file parallel write' are well within normal I/O tolerences.

If you see a situation where you have high waits for 'log file sync' and the average waits for 'log file parallel write' are well
within normal I/O tolerances, be aware that this could be because there are peaks in the I/O that the average is not showing
you. One place to see this is in the WAIT HISTOGRAM sections of AWR. You could also use something like OSWatcher (see:
Document 301137.1) to identify I/O peaks and then try to correlate these with peaks in 'log file sync' (which would occur
slightly after). If you identify peaks of I/O then work with your storage vendor to deal with the bursts of activity.

Check LGWR Traces

Even if the average wait for 'log file parallel write' may be in the normal range, there may be peaks where the write time is
much longer and will therefore influence waits on 'log file sync'. From 10.2.0.4, messages are written in the LGWR trace when
a redo writing takes more than 500 ms. This is quite a high threshold so a lack of messages does not necessarily mean there is
no problem. The messages look similar to the following:

*** 2011-10-26 10:14:41.718


Warning: log write elapsed time 21130ms, size 1KB
(set event 10468 level 4 to disable this warning)

*** 2011-10-26 10:14:42.929


Warning: log write elapsed time 4916ms, size 1KB
(set event 10468 level 4 to disable this warning)

NOTE: Peaks like the following may not have a high influence on the 'log file parallel wait' if they are far between. However , if
100s of sessions are waiting for the 'log file parallel wait' to complete, the total wait for 'log file sync' can be high as the wait
time will be multiplied for the 100s of sessions. Therefore it is worth investigating the reason for the high peaks in IO for the
log writer.

See:

Document:601316.1 LGWR Is Generating Trace file with "Warning: Log Write Time 540ms, Size 5444kb" In 10.2.0.4
Database

Recommendations
Work with the system administrator to examine what else is happening at this time that may be causing the peaks in
LGWR writing to disk
Truss of the LGWR process when the slow down is occurring may help identify where time is going

NOTE: These warnings can be particularly useful for preempting potential issues. Even if a general problem in terms of
the average wait time is not been seen, by highlighting extreme peaks of IO performance, a dba can have a useful
indicator that LGWR is encountering intermittent issues. These can then be resolved before they cause outages or similar.

Check to see if redo logs are large enough

A 'log file sync' operation is performed every time the redo logs switch to the next log to ensure that everything is written
before the next log is started. Standard recommendations are that a log switch should occur at most once every 15 to 20
minutes. If switches occur more frequently than this, then more 'log file sync' operations will occur meaning more waiting for
individual sessions.

Check the time between log file switches in alert.log

Thu Jun 02 14:57:01 2011


Thread 1 advanced to log sequence 2501 (LGWR switch)
Current log# 5 seq# 2501 mem# 0: /opt/oracle/oradata/orcl/redo05a.log
Current log# 5 seq# 2501 mem# 1: /opt/oracle/logs/orcl/redo05b.log
Thu Nov 03 14:59:12 2011
Thread 1 advanced to log sequence 2502 (LGWR switch)
Current log# 6 seq# 2502 mem# 0: /opt/oracle/oradata/orcl/redo06a.log
Current log# 6 seq# 2502 mem# 1: /opt/oracle/logs/orcl/redo06b.log
Thu Nov 03 15:03:01 2011
Thread 1 advanced to log sequence 2503 (LGWR switch)
Current log# 4 seq# 2503 mem# 0: /opt/oracle/oradata/orcl/redo04a.log
Current log# 4 seq# 2503 mem# 1: /opt/oracle/logs/orcl/redo04b.log

In the above example we see log switches every 2 to 4 minutes which is at best 5 times more frequent than the
recommendations.
You can also check the average time for log switch in the AWR report

The example above shows that based on the information in AWR, there are 29.98 redo logs switches per hour: ~1
switch every 2 minutes. This is higher than the accepted value of 1 switch every 15-20 minutes and will have an affect
on the time foreground process will need to wait for 'log file sync' waits to complete because the overhead of initiating
the sync operation more than necessary.

Recommendations

Increase the size of the redo logs


Document:602066.1 How To Maintain and/or Add Redo Logs
Document:779306.1 How To Add/Increase The Size Of Redo Log Files In Rac Environment?

Excessive Application Commits

In this case the question to answer is "Is the Application Committing too Frequently?".
If it is , then the excessive commit activity can cause performance issues since commits flush redo from the log buffer to the
redo logs which can cause waits for 'log file sync'.

To identify a potential high commit rate, if the average wait time for 'log file sync' is much higher than the average wait time
for 'log file parallel write', then this means that most of the time waiting is not due to waiting for the redo to be written and
thus slow IO is not the cause of the problem. The surplus time is CPU activity and is most commonly contention caused by
over committing.

Additionally, if the average time waited on 'log file sync' is low, but the number of waits is high, then the application might be
committing too frequently.

Compare the average user commits and user rollbacks to user calls

In the AWR or Statspack report, if the average user calls per commit/rollback calculated as "user calls/(user commits+user
rollbacks)" is less than 30, then commits are happening too frequently:

In the above example we see an average of 5.76 user calls per commit which is considered high - about 5x higher that
recommended.
Rule of thumb, we should expect at least 25 user calls / commit. This of course depends on the application design.

Recommendations

If there are lots of short duration transactions, see if it is possible to group transactions together so there are fewer
distinct COMMIT operations. Since it is mandatory for each commit to receive confirmation that the relevant REDO is on
disk, additional commits can add significantly to the overhead. Although commits can be "piggybacked" by Oracle,
reducing the overall number of commits by batching transactions can have a very beneficial effect.
See if any of the processing can use the COMMIT NOWAIT option (be sure to understand the semantics of this before
writing application code to use it).
See if any activity can safely be done with NOLOGGING / UNRECOVERABLE options.

Other Wait Events that may be relevant

Check the AWR report to see if there are other events related to the LGWR that are also shown to be taking a significant
amount of time as this may give a clue as to what is causing the problem.
Both foreground and background events should be checked.
For example the following AWR shows high waits for other foreground and background wait events that may actually indicate
a problem with the transfer of redo logs to a remote location and result in the foreground processes waiting on "log file sync".

Adaptive Log File Sync

Adaptive Log File sync was introduced in 11.2. The parameter controlling this feature, _use_adaptive_log_file_sync, is set to
false by default in 11.2.0.1 and 11.2.0.2.
In 11.2.0.3 the default is now true. When enabled, Oracle can switch between the 2 methods:

Post/wait, traditional method for posting completion of writes to redo log


Polling, a new method where the foreground process checks if the LGWR has completed the write.

For more information on this feature see:

Document 1541136.1 Waits for "log file sync" with Adaptive Polling vs Post/Wait Choice Enabled

Redo Synch Time Overhead

The statistic, 'redo synch time overhead', included in 11.2.0.4 and 12c, records the difference between ideal and actual, 'log
file sync' times. If the difference is small, then large 'log file sync' waits may be attributed to outlier 'log file parallel write'
times.

Community Discussions

Still have questions? Consider posting a discussion in the Database Tuning Community.

Didn't find what you are looking for?

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy