OpenStack + Proxmox + Ceph backup failing with “ceph init error” — any ideas?

Wesley

Hi everyone,
I’m running backups for an OpenStack environment, and the job keeps failing.

Platform: Proxmox + Ceph
Backup software: external backup system

The job fails every time with this error message:

[ERROR: 307#BdCephController error: ceph init error], open virtual disk error.

It looks like Ceph initialization is failing, but I’m not sure why.

Has anyone run into this before? What should I check?
Thanks!

Everett Kensington

Wesley
Hi Wesley,
Issue Analysis: Root Causes for Restoration Failure
The failure is primarily caused by Ceph client configuration and permission issues, specifically two points:

Configuration Parsing Failure:

The Ceph functions themselves could not correctly parse the complex format of the Monitor nodes (mon_host) in the ceph.conf file.
This prevented the backup system from establishing a proper connection to the Ceph cluster.

Insufficient Keyring Permissions:

The Ceph keyring used by the backup plugin had insufficient permissions to perform the required read/write operations for backup.
The backup operation requires the higher-privileged admin.keyring instead of the keyring that was likely being used.

Solution & Step-by-Step Instructions
Please follow these steps in sequence to restore the backup functionality.

Step 1: Copy and Correct Configuration Files
The goal of this step is to copy the production environment's configuration files to the backup system and correct the mon_host format.

Create Directories on the Backup System:
Execute the following command with root privileges on the backup system to create three necessary directories:
mkdir -p /etc/ceph /etc/cinder /etc/nova
Copy Files from the OpenStack Production Environment:
Copy the entire contents of the following directories from the OpenStack production environment (source server) to the corresponding directories on the backup system:
Source: /etc/ceph/ → Target: /etc/ceph/
Source: /etc/cinder/ → Target: /etc/cinder/
Source: /etc/nova/ → Target: /etc/nova/
Correct the ceph.conf File:
Use a text editor (like vi or nano) to open the /etc/ceph/ceph.conf file on the backup system.
Locate the mon_host configuration line.
Modify its value to a pure list of IP addresses, separated by commas.
Before modification:
mon_host = [v1:10.10.34.64:6789], [v1:10.10.34.62:6789], [v1:10.10.34.63:6789], [v1:10.10.32.72:6789]

After modification:
mon_host = 10.10.34.64, 10.10.34.62, 10.10.34.63, 10.10.32.72

Note: Ensure that any typos from the original example (like 10.10,34.62) are corrected to the proper format (10.10.34.62).

Step 2: Address Keyring File Conflict
If the backup still fails after completing Step 1, check for a specific keyring file that might be causing a conflict and preventing the system from reading the correct admin.keyring.

Check for the Conflicting File:
Check if the following file exists on the backup system:
ls -l /etc/pve/ceph/ceph.client.crash.keyring

Move the Conflicting File:
If this file exists, move it to a different directory (e.g., a temporary backup directory) to disable it. Do not delete it immediately in case it needs to be restored.

# Create a backup directory (if it doesn't exist)
mkdir -p /root/ceph-keyring-backup
# Move the conflicting keyring file
mv /etc/pve/ceph/ceph.client.crash.keyring /root/ceph-keyring-backup/

This action will force the backup system to fall back to using the default, sufficiently privileged /etc/ceph/ceph.client.admin.keyring.

Step 3: Cleanup Backup System Cache (Optional)
If the problem persists after the steps above, try cleaning up old configuration file caches that might exist on the backup system.

Clean the specific configuration directory on the backup system:
rm -rf /etc/backup_system/lan_free_ceph/*

Note: Before executing this, please confirm that the files in this directory can be removed. The system will regenerate them during the next task.

Cedric Winthrop

Everett Kensington Wow, that's a lot! You're on another level!

Everett Kensington

Hope this helps.