Skip to content

Commit c780a7a

Browse files
committed
Add CheckBuffer() to check on-disk pages without shared buffer loading
CheckBuffer() is designed to be a concurrent-safe function able to run sanity checks on a relation page without loading it into the shared buffers. The operation is done using a lock on the partition involved in the shared buffer mapping hashtable and an I/O lock for the buffer itself, preventing the risk of false positives due to any concurrent activity. The primary use of this function is the detection of on-disk corruptions for relation pages. If a page is found in shared buffers, the on-disk page is checked if not dirty (a follow-up checkpoint would flush a valid version of the page if dirty anyway), as it could be possible that a page was present for a long time in shared buffers with its on-disk version corrupted. Such a scenario could lead to a corrupted cluster if a host is plugged off for example. If the page is not found in shared buffers, its on-disk state is checked. PageIsVerifiedExtended() is used to apply the same sanity checks as when a page gets loaded into shared buffers. This function will be used by an upcoming patch able to check the state of on-disk relation pages using a SQL function. Author: Julien Rouhaud, Michael Paquier Reviewed-by: Masahiko Sawada Discussion: https://postgr.es/m/CAOBaU_aVvMjQn=ge5qPiJOPMmOj5=ii3st5Q0Y+WuLML5sR17w@mail.gmail.com
1 parent 9e0f87a commit c780a7a

File tree

2 files changed

+95
-0
lines changed

2 files changed

+95
-0
lines changed

src/backend/storage/buffer/bufmgr.c

Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4585,3 +4585,95 @@ TestForOldSnapshot_impl(Snapshot snapshot, Relation relation)
45854585
(errcode(ERRCODE_SNAPSHOT_TOO_OLD),
45864586
errmsg("snapshot too old")));
45874587
}
4588+
4589+
4590+
/*
4591+
* CheckBuffer
4592+
*
4593+
* Check the state of a buffer without loading it into the shared buffers. To
4594+
* avoid torn pages and possible false positives when reading data, a shared
4595+
* LWLock is taken on the target buffer pool partition mapping, and we check
4596+
* if the page is in shared buffers or not. An I/O lock is taken on the block
4597+
* to prevent any concurrent activity from happening.
4598+
*
4599+
* If the page is found as dirty in the shared buffers, it is ignored as
4600+
* it will be flushed to disk either before the end of the next checkpoint
4601+
* or during recovery in the event of an unsafe shutdown.
4602+
*
4603+
* If the page is found in the shared buffers but is not dirty, we still
4604+
* check the state of its data on disk, as it could be possible that the
4605+
* page stayed in shared buffers for a rather long time while the on-disk
4606+
* data got corrupted.
4607+
*
4608+
* If the page is not found in shared buffers, the block is read from disk
4609+
* while holding the buffer pool partition mapping LWLock.
4610+
*
4611+
* The page data is stored in a private memory area local to this function
4612+
* while running the checks.
4613+
*/
4614+
bool
4615+
CheckBuffer(SMgrRelation smgr, ForkNumber forknum, BlockNumber blkno)
4616+
{
4617+
char buffer[BLCKSZ];
4618+
BufferTag buf_tag; /* identity of requested block */
4619+
uint32 buf_hash; /* hash value for buf_tag */
4620+
LWLock *partLock; /* buffer partition lock for the buffer */
4621+
BufferDesc *bufdesc;
4622+
int buf_id;
4623+
4624+
Assert(smgrexists(smgr, forknum));
4625+
4626+
/* create a tag so we can look after the buffer */
4627+
INIT_BUFFERTAG(buf_tag, smgr->smgr_rnode.node, forknum, blkno);
4628+
4629+
/* determine its hash code and partition lock ID */
4630+
buf_hash = BufTableHashCode(&buf_tag);
4631+
partLock = BufMappingPartitionLock(buf_hash);
4632+
4633+
/* see if the block is in the buffer pool or not */
4634+
LWLockAcquire(partLock, LW_SHARED);
4635+
buf_id = BufTableLookup(&buf_tag, buf_hash);
4636+
if (buf_id >= 0)
4637+
{
4638+
uint32 buf_state;
4639+
4640+
/*
4641+
* Found it. Now, retrieve its state to know what to do with it, and
4642+
* release the pin immediately. We do so to limit overhead as much as
4643+
* possible. We keep the shared LWLock on the target buffer mapping
4644+
* partition for now, so this buffer cannot be evicted, and we acquire
4645+
* an I/O Lock on the buffer as we may need to read its contents from
4646+
* disk.
4647+
*/
4648+
bufdesc = GetBufferDescriptor(buf_id);
4649+
4650+
LWLockAcquire(BufferDescriptorGetIOLock(bufdesc), LW_SHARED);
4651+
buf_state = LockBufHdr(bufdesc);
4652+
UnlockBufHdr(bufdesc, buf_state);
4653+
4654+
/* If the page is dirty or invalid, skip it */
4655+
if ((buf_state & BM_DIRTY) != 0 || (buf_state & BM_TAG_VALID) == 0)
4656+
{
4657+
LWLockRelease(BufferDescriptorGetIOLock(bufdesc));
4658+
LWLockRelease(partLock);
4659+
return true;
4660+
}
4661+
4662+
/* Read the buffer from disk, with the I/O lock still held */
4663+
smgrread(smgr, forknum, blkno, buffer);
4664+
LWLockRelease(BufferDescriptorGetIOLock(bufdesc));
4665+
}
4666+
else
4667+
{
4668+
/*
4669+
* Simply read the buffer. There's no risk of modification on it as
4670+
* we are holding the buffer pool partition mapping lock.
4671+
*/
4672+
smgrread(smgr, forknum, blkno, buffer);
4673+
}
4674+
4675+
/* buffer lookup done, so now do its check */
4676+
LWLockRelease(partLock);
4677+
4678+
return PageIsVerifiedExtended(buffer, blkno, PIV_REPORT_STAT);
4679+
}

src/include/storage/bufmgr.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -240,6 +240,9 @@ extern void AtProcExit_LocalBuffers(void);
240240

241241
extern void TestForOldSnapshot_impl(Snapshot snapshot, Relation relation);
242242

243+
extern bool CheckBuffer(struct SMgrRelationData *smgr, ForkNumber forknum,
244+
BlockNumber blkno);
245+
243246
/* in freelist.c */
244247
extern BufferAccessStrategy GetAccessStrategy(BufferAccessStrategyType btype);
245248
extern void FreeAccessStrategy(BufferAccessStrategy strategy);

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy