summaryrefslogtreecommitdiff
path: root/fs/ceph
Commit message (Collapse)AuthorAge
* ceph: always update atime/mtime/ctime for new inodeYan, Zheng2018-04-16
| | | | | | | | | | For new inode, atime/mtime/ctime are uninitialized. Don't compare against them. Cc: stable@kernel.org Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* Merge tag 'ceph-for-4.17-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds2018-04-10
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull ceph updates from Ilya Dryomov: "The big ticket items are: - support for rbd "fancy" striping (myself). The striping feature bit is now fully implemented, allowing mapping v2 images with non-default striping patterns. This completes support for --image-format 2. - CephFS quota support (Luis Henriques and Zheng Yan). This set is based on the new SnapRealm code in the upcoming v13.y.z ("Mimic") release. Quota handling will be rejected on older filesystems. - memory usage improvements in CephFS (Chengguang Xu). Directory specific bits have been split out of ceph_file_info and some effort went into improving cap reservation code to avoid OOM crashes. Also included a bunch of assorted fixes all over the place from Chengguang and others" * tag 'ceph-for-4.17-rc1' of git://github.com/ceph/ceph-client: (67 commits) ceph: quota: report root dir quota usage in statfs ceph: quota: add counter for snaprealms with quota ceph: quota: cache inode pointer in ceph_snap_realm ceph: fix root quota realm check ceph: don't check quota for snap inode ceph: quota: update MDS when max_bytes is approaching ceph: quota: support for ceph.quota.max_bytes ceph: quota: don't allow cross-quota renames ceph: quota: support for ceph.quota.max_files ceph: quota: add initial infrastructure to support cephfs quotas rbd: remove VLA usage rbd: fix spelling mistake: "reregisteration" -> "reregistration" ceph: rename function drop_leases() to a more descriptive name ceph: fix invalid point dereference for error case in mdsc destroy ceph: return proper bool type to caller instead of pointer ceph: optimize memory usage ceph: optimize mds session register libceph, ceph: add __init attribution to init funcitons ceph: filter out used flags when printing unused open flags ceph: don't wait on writeback when there is no more dirty pages ...
| * ceph: quota: report root dir quota usage in statfsLuis Henriques2018-04-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit changes statfs default behaviour when reporting usage statistics. Instead of using the overall filesystem usage, statfs now reports the quota for the filesystem root, if ceph.quota.max_bytes has been set for this inode. If quota hasn't been set, it falls back to the old statfs behaviour. A new mount option is also added ('noquotadf') to disable this behaviour. Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * ceph: quota: add counter for snaprealms with quotaLuis Henriques2018-04-02
| | | | | | | | | | | | | | | | | | | | | | By keeping a counter with the number of snaprealms that have quota set allows to optimize the functions that need to walk throught the realms hierarchy looking for quotas. Thus, if this counter is zero it's safe to assume that there are no realms with quota. Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * ceph: quota: cache inode pointer in ceph_snap_realmLuis Henriques2018-04-02
| | | | | | | | | | | | | | | | | | Keep a pointer to the inode in struct ceph_snap_realm. This allows to optimize functions that walk the realms hierarchy (e.g. in quotas). Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * ceph: fix root quota realm checkYan, Zheng2018-04-02
| | | | | | | | | | Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * ceph: don't check quota for snap inodeYan, Zheng2018-04-02
| | | | | | | | | | | | | | snap inode's i_snap_realm is not pointing to ceph_snap_realm. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * ceph: quota: update MDS when max_bytes is approachingLuis Henriques2018-04-02
| | | | | | | | | | | | | | | | | | | | | | | | | | When we're reaching the ceph.quota.max_bytes limit, i.e., when writing more than 1/16th of the space left in a quota realm, update the MDS with the new file size. This mirrors the fuse-client approach with commit 122c50315ed1 ("client: Inform mds file size when approaching quota limit"), in the ceph git tree. Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * ceph: quota: support for ceph.quota.max_bytesLuis Henriques2018-04-02
| | | | | | | | | | | | Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * ceph: quota: don't allow cross-quota renamesLuis Henriques2018-04-02
| | | | | | | | | | | | | | | | | | | | This patch changes ceph_rename so that -EXDEV is returned if an attempt is made to mv a file between two different dir trees with different quotas setup. Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * ceph: quota: support for ceph.quota.max_filesLuis Henriques2018-04-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for the max_files quota. It hooks into all the ceph functions that add new filesystem objects that need to be checked against the quota limits. When these limits are hit, -EDQUOT is returned. Note that we're not checking quotas on ceph_link(). ceph_link doesn't really create a new inode, and since the MDS doesn't update the directory statistics when a new (hard) link is created (only with symlinks), they are not accounted as a new file. Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * ceph: quota: add initial infrastructure to support cephfs quotasLuis Henriques2018-04-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the infrastructure required to support cephfs quotas as it is currently implemented in the ceph fuse client. Cephfs quotas can be set on any directory, and can restrict the number of bytes or the number of files stored beneath that point in the directory hierarchy. Quotas are set using the extended attributes 'ceph.quota.max_files' and 'ceph.quota.max_bytes', and can be removed by setting these attributes to '0'. Link: http://tracker.ceph.com/issues/22372 Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * ceph: rename function drop_leases() to a more descriptive nameYan, Zheng2018-04-02
| | | | | | | | | | Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * ceph: fix invalid point dereference for error case in mdsc destroyChengguang Xu2018-04-02
| | | | | | | | | | | | | | | | | | | | 1. set fsc->mdsc after successfully allocate all necessary memory in mdsc init. 2. if fsc->mdsc is NULL, just skip destroy operation in mdsc destroy. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * ceph: return proper bool type to caller instead of pointerChengguang Xu2018-04-02
| | | | | | | | | | | | | | | | Change to return true/false only for bool type return code. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * ceph: optimize memory usageChengguang Xu2018-04-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In current code, regular file and directory use same struct ceph_file_info to store fs specific data so the struct has to include some fields which are only used for directory (e.g., readdir related info), when having plenty of regular files, it will lead to memory waste. This patch introduces dedicated ceph_dir_file_info cache for readdir related thins. So that regular file does not include those unused fields anymore. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * ceph: optimize mds session registerChengguang Xu2018-04-02
| | | | | | | | | | | | | | | | | | Do memory allocation first, so that avoid unnecessary initialization of newly allocated session in error case. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * libceph, ceph: add __init attribution to init funcitonsChengguang Xu2018-04-02
| | | | | | | | | | | | | | | | | | | | Add __init attribution to the functions which are called only once during initiating/registering operations and deleting unnecessary symbol exports. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * ceph: filter out used flags when printing unused open flagsChengguang Xu2018-04-02
| | | | | | | | | | | | | | | | Filter out used access mode flags when printing unused open flags. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * ceph: don't wait on writeback when there is no more dirty pagesYan, Zheng2018-04-02
| | | | | | | | | | | | | | | | | | | | | | | | | | In sync mode, writepages() needs to write all dirty pages. But it can only write dirty pages associated with the oldest snapc. To write dirty pages associated with next snapc, it needs to wait until current writes complete. If there is no more dirty pages, writepages() should not wait on writeback. Otherwise, dirty page writeback becomes very slow. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * ceph: invalidate pages that beyond EOF in ceph_writepages_start()Yan, Zheng2018-04-02
| | | | | | | | | | | | | | | | | | | | Dirty pages can be associated with different capsnap. Different capsnap may have different EOF value. So invalidating dirty pages according to the largest EOF value is wrong. Dirty pages beyond EOF, but associated with other capsnap, do not get invalidated. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * ceph: mark the cap cache as unreclaimableChengguang Xu2018-04-02
| | | | | | | | | | | | | | | | | | | | | | Releasing cap is affected by many factors (e.g., avail_count/reserve_count/min_count) and min_count could be specified high volume in client mount option. Hence it's better to mark cap cache as unreclaimable in case of non-trivial discrepancies between memory shown as reclaimable and what is actually reclaimed. Signed-off-by: Chengguang Xu <cgxu519@icloud.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * ceph: change variable name to follow common ruleChengguang Xu2018-04-02
| | | | | | | | | | | | | | | | | | | | | | | | | | Variable name ci is mostly used for ceph_inode_info. Variable name fi is mostly used for ceph_file_info. Variable name cf is mostly used for ceph_cap_flush. Change variable name to follow above common rules in case of confusing. Signed-off-by: Chengguang Xu <cgxu519@icloud.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * ceph: optimizing cap reservationChengguang Xu2018-04-02
| | | | | | | | | | | | | | | | | | | | | | | | | | When caps_avail_count is in a low level, most newly trimmed caps will probably go into ->caps_list and caps_avail_count will be increased. Hence after trimming, should recheck caps_avail_count to effectly reuse newly trimmed caps. Also, when releasing unnecessary caps follow the same rule of ceph_put_cap. Signed-off-by: Chengguang Xu <cgxu519@icloud.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * ceph: release unreserved caps if having enough available capsChengguang Xu2018-04-02
| | | | | | | | | | | | | | | | | | When unreserving caps check if there is too mamy available caps in the ->caps_list, if so release unreserved caps. Signed-off-by: Chengguang Xu <cgxu519@icloud.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * ceph: optimizing cap allocationChengguang Xu2018-04-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When setting high volume of caps_min_count or having many unreserved caps, unused caps may always keep in the ->caps_list even can't get new cap from kmem_cache_alloc because lack of maximum limitation of caps_avail_count. Hence reuse caps in ->caps_list if available, it's maybe better than setting max limitation of caps_avail_count and releasing unused caps when reaching the limit. Signed-off-by: Chengguang Xu <cgxu519@icloud.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * ceph: adding protection for showing cap reservation infoChengguang Xu2018-04-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Adding spinlock protection during getting cap reservation ralated fields so that the numbers match below BUG_ON condition in the code. BUG_ON(mdsc->caps_total_count != mdsc->caps_use_count + mdsc->caps_reserve_count + mdsc->caps_avail_count); Signed-off-by: Chengguang Xu <cgxu519@icloud.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * ceph: use seq_show_option for string type optionsChengguang Xu2018-04-02
| | | | | | | | | | | | | | | | Using seq_show_option to replace seq_printf for string type options. Signed-off-by: Chengguang Xu <cgxu519@icloud.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * libceph, ceph: change permission for readonly debugfs entriesChengguang Xu2018-04-02
| | | | | | | | | | | | | | | | | | Remove write permission for debugfs entries which only have readonly function. Signed-off-by: Chengguang Xu <cgxu519@icloud.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * ceph: keep consistent semantic in fscache related option combinationChengguang Xu2018-04-02
| | | | | | | | | | | | | | | | | | | | When specifying multiple fscache related options, the result isn't always the same as option order, this fix will keep strict consistent meaning by order. Signed-off-by: Chengguang Xu <cgxu519@icloud.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * ceph: add newline to end of debug message formatChengguang Xu2018-04-02
| | | | | | | | | | | | | | | | | | | | Some of dout format do not include newline in the end, fix for the files which are in fs/ceph and net/ceph directories, and changing printk to dout for printing debug info in super.c Signed-off-by: Chengguang Xu <cgxu519@icloud.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * libceph, ceph: move ceph_calc_file_object_mapping() to striper.cIlya Dryomov2018-04-02
| | | | | | | | | | | | ceph_calc_file_object_mapping() has nothing to do with osdmaps. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * libceph, ceph: change ceph_calc_file_object_mapping() signatureIlya Dryomov2018-04-02
| | | | | | | | | | | | | | | | | | - make it void - xlen (object extent length) out parameter should be u32 because only a single stripe unit is mapped at a time Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>
* | Merge tag 'fscache-next-20180406' of ↵Linus Torvalds2018-04-07
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull fscache updates from David Howells: "Three patches that fix some of AFS's usage of fscache: (1) Need to invalidate the cache if a foreign data change is detected on the server. (2) Move the vnode ID uniquifier (equivalent to i_generation) from the auxiliary data to the index key to prevent a race between file delete and a subsequent file create seeing the same index key. (3) Need to retire cookies that correspond to files that we think got deleted on the server. Four patches to fix some things in fscache and cachefiles: (4) Fix a couple of checker warnings. (5) Correctly indicate to the end-of-operation callback whether an operation completed or was cancelled. (6) Add a check for multiple cookie relinquishment. (7) Fix a path through the asynchronous write that doesn't wake up a waiter for a page if the cache decides not to write that page, but discards it instead. A couple of patches to add tracepoints to fscache and cachefiles: (8) Add tracepoints for cookie operators, object state machine execution, cachefiles object management and cachefiles VFS operations. (9) Add tracepoints for fscache operation management and page wrangling. And then three development patches: (10) Attach the index key and auxiliary data to the cookie, pass this information through various fscache-netfs API functions and get rid of the callbacks to the netfs to get it. This means that the cache can get at this information, even if the netfs goes away. It also means that the cache can be lazy in updating the coherency data. (11) Pass the object data size through various fscache-netfs API rather than calling back to the netfs for it, and store the value in the object. This makes it easier to correctly resize the object, as the size is updated on writes to the cache, rather than calling back out to the netfs. (12) Maintain a catalogue of allocated cookies. This makes it possible to catch cookie collision up front rather than down in the bowels of the cache being run from a service thread from the object state machine. This will also make it possible in the future to reconnect to a cookie that's not gone dead yet because it's waiting for finalisation of the storage and also make it possible to bring cookies online if the cache is added after the cookie has been obtained" * tag 'fscache-next-20180406' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: fscache: Maintain a catalogue of allocated cookies fscache: Pass object size in rather than calling back for it fscache: Attach the index key and aux data to the cookie fscache: Add more tracepoints fscache: Add tracepoints fscache: Fix hanging wait on page discarded by writeback fscache: Detect multiple relinquishment of a cookie fscache: Pass the correct cancelled indications to fscache_op_complete() fscache, cachefiles: Fix checker warnings afs: Be more aggressive in retiring cached vnodes afs: Use the vnode ID uniquifier in the cache key not the aux data afs: Invalidate cache on server data change
| * | fscache: Pass object size in rather than calling back for itDavid Howells2018-04-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pass the object size in to fscache_acquire_cookie() and fscache_write_page() rather than the netfs providing a callback by which it can be received. This makes it easier to update the size of the object when a new page is written that extends the object. The current object size is also passed by fscache to the check_aux function, obviating the need to store it in the aux data. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Anna Schumaker <anna.schumaker@netapp.com> Tested-by: Steve Dickson <steved@redhat.com>
| * | fscache: Attach the index key and aux data to the cookieDavid Howells2018-04-04
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Attach copies of the index key and auxiliary data to the fscache cookie so that: (1) The callbacks to the netfs for this stuff can be eliminated. This can simplify things in the cache as the information is still available, even after the cache has relinquished the cookie. (2) Simplifies the locking requirements of accessing the information as we don't have to worry about the netfs object going away on us. (3) The cache can do lazy updating of the coherency information on disk. As long as the cache is flushed before reboot/poweroff, there's no need to update the coherency info on disk every time it changes. (4) Cookies can be hashed or put in a tree as the index key is easily available. This allows: (a) Checks for duplicate cookies can be made at the top fscache layer rather than down in the bowels of the cache backend. (b) Caching can be added to a netfs object that has a cookie if the cache is brought online after the netfs object is allocated. A certain amount of space is made in the cookie for inline copies of the data, but if it won't fit there, extra memory will be allocated for it. The downside of this is that live cache operation requires more memory. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Anna Schumaker <anna.schumaker@netapp.com> Tested-by: Steve Dickson <steved@redhat.com>
* | Merge branch 'work.misc' of ↵Linus Torvalds2018-04-06
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "Assorted stuff, including Christoph's I_DIRTY patches" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: move I_DIRTY_INODE to fs.h ubifs: fix bogus __mark_inode_dirty(I_DIRTY_SYNC | I_DIRTY_DATASYNC) call ntfs: fix bogus __mark_inode_dirty(I_DIRTY_SYNC | I_DIRTY_DATASYNC) call gfs2: fix bogus __mark_inode_dirty(I_DIRTY_SYNC | I_DIRTY_DATASYNC) calls fs: fold open_check_o_direct into do_dentry_open vfs: Replace stray non-ASCII homoglyph characters with their ASCII equivalents vfs: make sure struct filename->iname is word-aligned get rid of pointless includes of fs_struct.h [poll] annotate SAA6588_CMD_POLL users
| * get rid of pointless includes of fs_struct.hAl Viro2018-02-22
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | ceph: only dirty ITER_IOVEC pages for direct readYan, Zheng2018-03-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a page is already locked, attempting to dirty it leads to a deadlock in lock_page(). This is what currently happens to ITER_BVEC pages when a dio-enabled loop device is backed by ceph: $ losetup --direct-io /dev/loop0 /mnt/cephfs/img $ xfs_io -c 'pread 0 4k' /dev/loop0 Follow other file systems and only dirty ITER_IOVEC pages. Cc: stable@kernel.org Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* | ceph: fix potential memory leak in init_caches()Chengguang Xu2018-03-01
| | | | | | | | | | | | | | | | | | There is lack of cache destroy operation for ceph_file_cachep when failing from fscache register. Signed-off-by: Chengguang Xu <cgxu519@icloud.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* | ceph: fix dentry leak when failing to init debugfsChengguang Xu2018-02-26
| | | | | | | | | | | | | | | | | | | | | | When failing from ceph_fs_debugfs_init() in ceph_real_mount(), there is lack of dput of root_dentry and it causes slab errors, so change the calling order of ceph_fs_debugfs_init() and open_root_dentry() and do some cleanups to avoid this issue. Signed-off-by: Chengguang Xu <cgxu519@icloud.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* | libceph, ceph: avoid memory leak when specifying same option several timesChengguang Xu2018-02-26
| | | | | | | | | | | | | | | | | | When parsing string option, in order to avoid memory leak we need to carefully free it first in case of specifying same option several times. Signed-off-by: Chengguang Xu <cgxu519@icloud.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* | ceph: flush dirty caps of unlinked inode ASAPZhi Zhang2018-02-26
|/ | | | | | | | | | Client should release unlinked inode from its cache ASAP. But client can't release inode with dirty caps. Link: http://tracker.ceph.com/issues/22886 Signed-off-by: Zhi Zhang <zhang.david2011@gmail.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* Merge tag 'ceph-for-4.16-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds2018-02-08
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull ceph updates from Ilya Dryomov: "Things have been very quiet on the rbd side, as work continues on the big ticket items slated for the next merge window. On the CephFS side we have a large number of cap handling improvements, a fix for our long-standing abuse of ->journal_info in ceph_readpages() and yet another dentry pointer management patch" * tag 'ceph-for-4.16-rc1' of git://github.com/ceph/ceph-client: ceph: improving efficiency of syncfs libceph: check kstrndup() return value ceph: try to allocate enough memory for reserved caps ceph: fix race of queuing delayed caps ceph: delete unreachable code in ceph_check_caps() ceph: limit rate of cap import/export error messages ceph: fix incorrect snaprealm when adding caps ceph: fix un-balanced fsc->writeback_count update ceph: track read contexts in ceph_file_info ceph: avoid dereferencing invalid pointer during cached readdir ceph: use atomic_t for ceph_inode_info::i_shared_gen ceph: cleanup traceless reply handling for rename ceph: voluntarily drop Fx cap for readdir request ceph: properly drop caps for setattr request ceph: voluntarily drop Lx cap for link/rename requests ceph: voluntarily drop Ax cap for requests that create new inode rbd: whitelist RBD_FEATURE_OPERATIONS feature bit rbd: don't NULL out ->obj_request in rbd_img_obj_parent_read_full() rbd: use kmem_cache_zalloc() in rbd_img_request_create() rbd: obj_request->completion is unused
| * ceph: improving efficiency of syncfsChengguang Xu2018-01-30
| | | | | | | | | | | | | | | | | | | | write_inode() could be called variety of reasons, in the case of syncfs(2) there is no need to wait for flush getting completed in write_inode(), ->sync_fs is for guaranteeing flush completion for all inodes at that point. Signed-off-by: Chengguang Xu <cgxu519@icloud.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * ceph: try to allocate enough memory for reserved capsZhi Zhang2018-01-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ceph_reserve_caps() may not reserve enough caps under high memory pressure, but it saved the needed caps number that expected to be reserved. When getting caps, crash would happen due to number mismatch. Now we will try to trim more caps when failing to allocate memory for caps need to be reserved, then try again. If still failing to allocate memory, return -ENOMEM. Signed-off-by: Zhi Zhang <zhang.david2011@gmail.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * ceph: fix race of queuing delayed capsYan, Zheng2018-01-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When called with CHECK_CAPS_AUTHONLY flag, ceph_check_caps() only processes auth caps. In that case, it's unsafe to remove inode from mdsc->cap_delay_list, because there can be delayed non-auth caps. Besides, ceph_check_caps() may lock/unlock i_ceph_lock several times, when multiple threads call ceph_check_caps() at the same time. It's possible that one thread calls __cap_delay_requeue(), another thread calls __cap_delay_cancel(). __cap_delay_cancel() should be called at very beginning of ceph_check_caps(), so that it does not race with __cap_delay_requeue(). Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * ceph: delete unreachable code in ceph_check_caps()Yan, Zheng2018-01-29
| | | | | | | | | | | | | | | | "revoking & (CEPH_CAP_FILE_CACHE|CEPH_CAP_FILE_LAZYIO)" has already been tested before calling try_nonblocking_invalidate() Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * ceph: limit rate of cap import/export error messagesYan, Zheng2018-01-29
| | | | | | | | | | Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * ceph: fix incorrect snaprealm when adding capsYan, Zheng2018-01-29
| | | | | | | | | | Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>