From:   Dave Chinner <david@fromorbit.com>
To:     linux-xfs@vger.kernel.org
Cc:     xfs@oss.sgi.com
Subject: [PATCH] xfs: quiesce the filesystem after recovery on readonly mount
Date:   Fri, 23 Sep 2016 10:11:40 +1000

From: Dave Chinner <dchinner@redhat.com>

Recently we've had a number of reports where log recovery on a v5
filesystem has reported corruptions that looked to be caused by
recovery being re-run over the top of an already-recovered
metadata. This has uncovered a bug in recovery (fixed elsewhere)
but the vector that caused this was largely unknown.

A kdump test started tripping over this problem - the system
would be crashed, the kdump kernel and environment would boot and
dump the kernel core image, and then the system would reboot. After
reboot, the root filesystem was triggering log recovery and
corruptions were being detected. The metadumps indicated the above
log recovery issue.

What is happening is that the kdump kernel and environment is
mounting the root device read-only to find the binaries needed to do
it's work. The result of this is that it is running log recovery.
However, because there were unlinked files and EFIs to be processed
by recovery, the completion of phase 1 of log recovery could not
mark the log clean. And because it's a read-only mount, the unmount
process does not write records to the log to mark it clean, either.
Hence on the next mount of the filesystem, log recovery was run
again across all the metadata that had already been recovered and
this is what triggered corruption warnings.

To avoid this problem, we need to ensure that a read-only mount
always updates the log when it completes the second phase of
recovery. We already handle this sort of issue with rw->ro remount
transitions, so the solution is as simple as quiescing the
filesystem at the appropriate time during the mount process. This
results in the log being marked clean so the mount behaviour
recorded in the logs on repeated RO mounts will change (i.e. log
recovery will no longer be run on every mount until a RW mount is
done). This is a user visible change in behaviour, but it is
harmless.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_mount.c | 14 ++++++++++++++
 fs/xfs/xfs_super.c |  2 +-
 fs/xfs/xfs_super.h |  1 +
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index faeead6..56e85a6 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -934,6 +934,20 @@ xfs_mountfs(
 	}
 
 	/*
+	 * Now the log is fully replayed, we can transition to full read-only
+	 * mode for read-only mounts. This will sync all the metadata and clean
+	 * the log so that the recovery we just performed does not have to be
+	 * replayed again on the next mount.
+	 *
+	 * We use the same quiesce mechanism as the rw->ro remount, as they are
+	 * semantically identical operations.
+	 */
+	if ((mp->m_flags & (XFS_MOUNT_RDONLY|XFS_MOUNT_NORECOVERY)) ==
+							XFS_MOUNT_RDONLY) {
+		xfs_quiesce_attr(mp);
+	}
+
+	/*
 	 * Complete the quota initialisation, post-log-replay component.
 	 */
 	if (quotamount) {
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 3409753..2d092f9 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1137,7 +1137,7 @@ xfs_restore_resvblks(struct xfs_mount *mp)
  * Note: xfs_log_quiesce() stops background log work - the callers must ensure
  * it is started again when appropriate.
  */
-static void
+void
 xfs_quiesce_attr(
 	struct xfs_mount	*mp)
 {
diff --git a/fs/xfs/xfs_super.h b/fs/xfs/xfs_super.h
index 529bce9..b6418ab 100644
--- a/fs/xfs/xfs_super.h
+++ b/fs/xfs/xfs_super.h
@@ -61,6 +61,7 @@ struct xfs_mount;
 struct xfs_buftarg;
 struct block_device;
 
+extern void xfs_quiesce_attr(struct xfs_mount *mp);
 extern void xfs_flush_inodes(struct xfs_mount *mp);
 extern void xfs_blkdev_issue_flush(struct xfs_buftarg *);
 extern xfs_agnumber_t xfs_set_inode_alloc(struct xfs_mount *,
From:   Eryu Guan <eguan@redhat.com>
To:     linux-xfs@vger.kernel.org
Cc:     xfs@oss.sgi.com, Eryu Guan <eguan@redhat.com>
Subject: [PATCH v2] xfs: undo block reservation correctly in xfs_trans_reserve()
Date:   Tue,  6 Sep 2016 20:14:40 +0800

"blocks" should be added back to fdblocks at undo time, not taken
away, i.e. the minus sign should not be used.

This is a regression introduced by commit 0d485ada404b ("xfs: use
generic percpu counters for free block counter"). And it's found by
code inspection, I didn't it in real world, so there's no
reproducer.

Signed-off-by: Eryu Guan <eguan@redhat.com>
---
v2:
- Remove "Fixes:" tag and describe relevant commit in commit log
- Update commit log to mention that it's found by code inspection
- Remove outer () from the "int64_t" cast

 fs/xfs/xfs_trans.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 5f3d33d..836eb80 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -217,7 +217,7 @@ undo_log:
 
 undo_blocks:
 	if (blocks > 0) {
-		xfs_mod_fdblocks(tp->t_mountp, -((int64_t)blocks), rsvd);
+		xfs_mod_fdblocks(tp->t_mountp, (int64_t)blocks, rsvd);
 		tp->t_blk_res = 0;
 	}