I've been playing around with Ceph and Ceph RADOS Gateway recently. Mostly using the excellent ceph-ansible project.
I was trying to swap the ip address that was being used for the RadosGW service by setting the radosgw_address_block variable, and ran into a bug, for which I've put in a Pull Request. The bug meant that while the ceph.conf had been updated properly, the service hadn't been restarted, and so I began restarting the service manually.
This lead to a massive yak shaving exercise because I had the wrong service name, it will give you a confusing error message that makes it seem something has gone wrong that you can fix.
root@rgw1:/# service ceph-radosgw@rgw1 status
● ceph-radosgw@rgw1.service - Ceph rados gateway
Loaded: loaded (/lib/systemd/system/ceph-radosgw@.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/ceph-radosgw@.service.d
└─ceph-radosgw-systemd-overrides.conf
Active: inactive (dead)
Dec 20 04:03:11 rgw1 radosgw[8513]: 2017-12-20 04:03:11.233269 7f94c825ae80 -1 auth: unable to find a keyring on /var/lib/ceph/radosgw/ceph-rgw1/keyring: (2) No such file or directory
Dec 20 04:03:11 rgw1 radosgw[8513]: 2017-12-20 04:03:11.235727 7f94c825ae80 -1 monclient: authenticate NOTE: no keyring found; disabled cephx authentication
Dec 20 04:03:11 rgw1 radosgw[8513]: 2017-12-20 04:03:11.236865 7f94c825ae80 -1 Couldn't init storage provider (RADOS)
Dec 20 04:03:11 rgw1 systemd[1]: ceph-radosgw@rgw1.service: Main process exited, code=exited, status=5/NOTINSTALLED
Dec 20 04:03:11 rgw1 systemd[1]: ceph-radosgw@rgw1.service: Unit entered failed state.
Dec 20 04:03:11 rgw1 systemd[1]: ceph-radosgw@rgw1.service: Failed with result 'exit-code'.
Dec 20 04:03:11 rgw1 systemd[1]: ceph-radosgw@rgw1.service: Service hold-off time over, scheduling restart.
Dec 20 04:03:11 rgw1 systemd[1]: Stopped Ceph rados gateway.
Dec 20 04:03:11 rgw1 systemd[1]: ceph-radosgw@rgw1.service: Start request repeated too quickly.
Dec 20 04:03:11 rgw1 systemd[1]: Failed to start Ceph rados gateway.
At this point I found the keyring and linked it over to the /var/lib/ceph/radosgw/ceph-rgw1/keyring location. This results in another error.
root@rgw1:/# service ceph-radosgw@rgw1 status
● ceph-radosgw@rgw1.service - Ceph rados gateway
Loaded: loaded (/lib/systemd/system/ceph-radosgw@.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/ceph-radosgw@.service.d
└─ceph-radosgw-systemd-overrides.conf
Active: inactive (dead)
Dec 20 04:01:59 rgw1 systemd[1]: Stopped Ceph rados gateway.
Dec 20 04:01:59 rgw1 systemd[1]: Started Ceph rados gateway.
Dec 20 04:01:59 rgw1 radosgw[8404]: 2017-12-20 04:01:59.241222 7faf4c8aae80 -1 Couldn't init storage provider (RADOS)
Dec 20 04:01:59 rgw1 systemd[1]: ceph-radosgw@rgw1.service: Main process exited, code=exited, status=5/NOTINSTALLED
Dec 20 04:01:59 rgw1 systemd[1]: ceph-radosgw@rgw1.service: Unit entered failed state.
Dec 20 04:01:59 rgw1 systemd[1]: ceph-radosgw@rgw1.service: Failed with result 'exit-code'.
Dec 20 04:01:59 rgw1 systemd[1]: ceph-radosgw@rgw1.service: Service hold-off time over, scheduling restart.
Dec 20 04:01:59 rgw1 systemd[1]: Stopped Ceph rados gateway.
Dec 20 04:01:59 rgw1 systemd[1]: ceph-radosgw@rgw1.service: Start request repeated too quickly.
Dec 20 04:01:59 rgw1 systemd[1]: Failed to start Ceph rados gateway.
This error Couldn't init storage provider (RADOS) is often due to permissions issues, so at that point I re-considered what I was doing and figured out the real issue.
TL;DR: When restarting ceph-radosgw use ceph-radosgw@rgw.hostname and NOT ceph-radosgw@hostname.
Comments