Skip to content

Commit ab93b4b

Browse files
robnbehlendorf
authored andcommitted
linux/super: add tunable to request immediate reclaim of unused inodes
Traditionally, unused inodes would be held on the superblock inode cache until the associated on-disk file is removed or the kernel requests reclaim. On filesystems with millions of rarely-used files, this can be a lot of unusable memory. Here we implement the superblock drop_inode method, and add a zfs_delete_inode tunable to control its behaviour. By default it continues the traditional behaviour, but when the tunable is enabled, we signal that the inode should be deleted immediately when the last reference is dropped, rather than cached. This releases the associated data to the dbuf cache and ARC, allowing them to be reclaimed normally. Sponsored-by: Klara, Inc. Sponsored-by: Fastmail Pty Ltd Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes #17746
1 parent ffe93ae commit ab93b4b

File tree

2 files changed

+62
-5
lines changed

2 files changed

+62
-5
lines changed

man/man4/zfs.4

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818
.\" own identifying information:
1919
.\" Portions Copyright [yyyy] [name of copyright owner]
2020
.\"
21-
.Dd August 14, 2025
21+
.Dd September 15, 2025
2222
.Dt ZFS 4
2323
.Os
2424
.
@@ -2583,6 +2583,27 @@ the xattr so as to not accumulate duplicates.
25832583
.It Sy zio_requeue_io_start_cut_in_line Ns = Ns Sy 0 Ns | Ns 1 Pq int
25842584
Prioritize requeued I/O.
25852585
.
2586+
.It Sy zfs_delete_inode Ns = Ns Sy 0 Ns | Ns 1 Pq int
2587+
Sets whether the kernel should free an inode structure when the last reference
2588+
is released, or cache it in memory.
2589+
Intended for testing/debugging.
2590+
.Pp
2591+
A live inode structure "pins" versious internal OpenZFS structures in memory,
2592+
which can result in large amounts of "unusable" memory on systems with lots of
2593+
infrequently-accessed files, until the kernel's memory pressure mechanism
2594+
asks OpenZFS to release them.
2595+
.Pp
2596+
The default value of
2597+
.Sy 0
2598+
always caches inodes that appear to still exist on disk.
2599+
Setting it to
2600+
.Sy 1
2601+
will immediately release unused inodes and their associated memory back to the
2602+
dbuf cache or the ARC for reuse, but may reduce performance if inodes are
2603+
frequently evicted and reloaded.
2604+
.Pp
2605+
This parameter is only available on Linux.
2606+
.
25862607
.It Sy zio_taskq_batch_pct Ns = Ns Sy 80 Ns % Pq uint
25872608
Percentage of online CPUs which will run a worker thread for I/O.
25882609
These workers are responsible for I/O work such as compression, encryption,

module/os/linux/zfs/zpl_super.c

Lines changed: 40 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@
2222
/*
2323
* Copyright (c) 2011, Lawrence Livermore National Security, LLC.
2424
* Copyright (c) 2023, Datto Inc. All rights reserved.
25+
* Copyright (c) 2025, Klara, Inc.
2526
*/
2627

2728

@@ -33,6 +34,12 @@
3334
#include <linux/iversion.h>
3435
#include <linux/version.h>
3536

37+
/*
38+
* What to do when the last reference to an inode is released. If 0, the kernel
39+
* will cache it on the superblock. If 1, the inode will be freed immediately.
40+
* See zpl_drop_inode().
41+
*/
42+
int zfs_delete_inode = 0;
3643

3744
static struct inode *
3845
zpl_inode_alloc(struct super_block *sb)
@@ -77,11 +84,36 @@ zpl_dirty_inode(struct inode *ip, int flags)
7784
}
7885

7986
/*
80-
* When ->drop_inode() is called its return value indicates if the
81-
* inode should be evicted from the inode cache. If the inode is
82-
* unhashed and has no links the default policy is to evict it
83-
* immediately.
87+
* ->drop_inode() is called when the last reference to an inode is released.
88+
* Its return value indicates if the inode should be destroyed immediately, or
89+
* cached on the superblock structure.
90+
*
91+
* By default (zfs_delete_inode=0), we call generic_drop_inode(), which returns
92+
* "destroy immediately" if the inode is unhashed and has no links (roughly: no
93+
* longer exists on disk). On datasets with millions of rarely-accessed files,
94+
* this can cause a large amount of memory to be "pinned" by cached inodes,
95+
* which in turn pin their associated dnodes and dbufs, until the kernel starts
96+
* reporting memory pressure and requests OpenZFS release some memory (see
97+
* zfs_prune()).
8498
*
99+
* When set to 1, we call generic_delete_node(), which always returns "destroy
100+
* immediately", resulting in inodes being destroyed immediately, releasing
101+
* their associated dnodes and dbufs to the dbuf cached and the ARC to be
102+
* evicted as normal.
103+
*
104+
* Note that the "last reference" doesn't always mean the last _userspace_
105+
* reference; the dentry cache also holds a reference, so "busy" inodes will
106+
* still be kept alive that way (subject to dcache tuning).
107+
*/
108+
static int
109+
zpl_drop_inode(struct inode *ip)
110+
{
111+
if (zfs_delete_inode)
112+
return (generic_delete_inode(ip));
113+
return (generic_drop_inode(ip));
114+
}
115+
116+
/*
85117
* The ->evict_inode() callback must minimally truncate the inode pages,
86118
* and call clear_inode(). For 2.6.35 and later kernels this will
87119
* simply update the inode state, with the sync occurring before the
@@ -470,6 +502,7 @@ const struct super_operations zpl_super_operations = {
470502
.destroy_inode = zpl_inode_destroy,
471503
.dirty_inode = zpl_dirty_inode,
472504
.write_inode = NULL,
505+
.drop_inode = zpl_drop_inode,
473506
.evict_inode = zpl_evict_inode,
474507
.put_super = zpl_put_super,
475508
.sync_fs = zpl_sync_fs,
@@ -491,3 +524,6 @@ struct file_system_type zpl_fs_type = {
491524
.mount = zpl_mount,
492525
.kill_sb = zpl_kill_sb,
493526
};
527+
528+
ZFS_MODULE_PARAM(zfs, zfs_, delete_inode, INT, ZMOD_RW,
529+
"Delete inodes as soon as the last reference is released.");

0 commit comments

Comments
 (0)