-
Notifications
You must be signed in to change notification settings - Fork 522
STOR-2682: option to recreate LSO symlinks #1889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
@dobsonj: This pull request references STOR-2682 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@dobsonj: GitHub didn't allow me to request PR reviews from the following users: openshift/storage. Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@dobsonj: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
| LSO will throw an alert if `current-link-target` does not match the recommended symlink in `dev-disk-by-id-list`. `dev-disk-by-uuid` is not available for raw block volumes, but may still be used for filesystem volumes to ensure detected symlinks match the device with the correct on-disk identifier. | ||
|
|
||
| In response to the alert, the administrator can review the annotations on the PV, and then add a `storage.openshift.com/recreate-symlink` annotation to the PV to tell diskmaker to recreate the symlink. This can be done to proactively switch to the recommended symlink, or it can be done reactively when a symlink is no longer valid (assuming there is another known valid by-id symlink). Diskmaker will recreate the symlink pointing to the new by-id symlink and remove the `recreate-symlink` annotation when it is complete. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we talked about this, But I guess in future if we wanted to give control over symlink that needs to be picked to customers, we could define another annotation which specifies user specified policy? I know, one of the issues we discussed if we tried to do is, user specified preferred policy will be lost when PV is re-created but may be we could solve this by persisting {symlinkBaseName: <preferred_policy>} mapping to a json file.
We are hoping that what we are doing currently will be enough to steer us away from most of the issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest to use a special value to pick LSO-preferred symlink, something like: storage.openshift.com/recreate-symlink: lso-preferred
We can add other policies later.
| 3. If a UUID was previously discovered and recorded, the new by-id target must point to the same device. | ||
| 4. Return the first valid link from the by-id list that meets this criteria. Throw an error if no valid symlink can be found. | ||
|
|
||
| Diskmaker already watches for PV changes, but it will also need to watch for udev changes (?) to trigger reconcile to update the new annotations. It will need a reconcile trigger for detecting new UUID symlinks as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if user annotated the PV, but there is another symlink name that points to the same by-id symlink?
| LSO's diskmaker will detect all valid by-id symlinks for each PV and add them as an annotation on the PV. It will also annotate the by-id symlink currently in use. For filesystem volumes, it will annotate the UUID of the filesystem to help find the corresponding disk. This means each PV will have up to three new annotations added by diskmaker: | ||
|
|
||
| * `storage.openshift.com/current-link-target`: current by-id symlink in use for the device | ||
| * `storage.openshift.com/dev-disk-by-id-list`: list of valid by-id symlinks for the device |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * `storage.openshift.com/dev-disk-by-id-list`: list of valid by-id symlinks for the device | |
| * `storage.openshift.com/dev-disk-by-id-list`: list of valid by-id symlinks for the device (as JSON array of strings) |
|
|
||
| ### Workflow Description | ||
|
|
||
| LSO's diskmaker will detect all valid by-id symlinks for each PV and add them as an annotation on the PV. It will also annotate the by-id symlink currently in use. For filesystem volumes, it will annotate the UUID of the filesystem to help find the corresponding disk. This means each PV will have up to three new annotations added by diskmaker: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a note when will it detect the symlinks and update the annotations. On start? Every minute?
|
|
||
| We want to annotate the PV with the UUID if an on-disk identifier can be found, for informational purposes at the very least. There are some open questions on how LSO can use the UUID, but these may be addressed in future revisions: | ||
|
|
||
| 1. In addition to filesystem volumes, it would help to get the UUID of Ceph OSD volumes and annotate them. Ceph OSD volumes don't have by-uuid symlinks, can this be changed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting UUID of a ceph BlueStore volume is easy:
$ ceph-volume raw list /dev/nvme2n1 --format=json
{
"cc79d1c0-0965-446f-a301-e3e7780729b4": {
"ceph_fsid": "f3d09b80-d608-4cd5-b06a-cbf81ee20b42",
"device": "/dev/nvme2n1",
"osd_id": 0,
"osd_uuid": "cc79d1c0-0965-446f-a301-e3e7780729b4",
"type": "bluestore"
}
}
It "only" requires ceph-volume.rpm, which brings lot of quite heavy dependencies and thus probably can't be in coreos. I think we can afford installing it in diskmaker image, but it won't get us a nice /dev/disk/by-uuid symlink.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I filed Ceph RFE: https://bugzilla.redhat.com/show_bug.cgi?id=2414811
I am not very optimistic, but I might be surprised :-).
|
|
||
| 1. In addition to filesystem volumes, it would help to get the UUID of Ceph OSD volumes and annotate them. Ceph OSD volumes don't have by-uuid symlinks, can this be changed? | ||
|
|
||
| 2. If diskmaker uses UUID to recreate symlinks, we need to figure out how to solve snapshot restore for LocalVolume object -- is it possible to have UUID conflicts that resolve to the wrong disk? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the symlink fix is always initiated by the user and they know what /dev/disk/by-id symlink will be used, then I don't think it's an issue.
|
|
||
| Diskmaker already watches for PV changes, but it will also need to watch for udev changes (?) to trigger reconcile to update the new annotations. It will need a reconcile trigger for detecting new UUID symlinks as well. | ||
|
|
||
| Diskmaker will keep the link name and only change the link target. For example, if a PV has an existing symlink `/mnt/local-storage/localblock/scsi-0NVME_MODEL_abcde` pointing to `/dev/disk/by-id/scsi-0NVME_MODEL_abcde`, but there is a by-id link `/dev/disk/by-id/scsi-2ace42e0035eabcde`, setting `storage.openshift.com/recreate-symlink` will cause diskmaker to replace `/mnt/local-storage/localblock/scsi-0NVME_MODEL_abcde` with a new symlink pointing to `/dev/disk/by-id/scsi-2ace42e0035eabcde`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will users see if the operation succeeded or failed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also describe retry mechanism.
https://issues.redhat.com/browse/STOR-2682
/cc @openshift/storage