İçerik Tablosu
Introduction
When a disk in your ZFS pool fails, the pool enters a DEGRADED state. ZFS offers a robust recovery mechanism called resilvering, which allows you to replace the failed disk with a new one and rebuild the data. This article explains how to replace a failed disk and complete the resilvering process in your Proxmox ZFS pool.
Identifying the Problem
When a disk fails in the ZFS pool, the pool status will show as DEGRADED. Use the following command to check the pool status:
zpool status
Example output:
pool: rpool
state: DEGRADED
status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state.
action: Replace the device using 'zpool replace'.
Steps to Replace the Failed Disk
1. Identify the Failed Disk
From the zpool status
output, note the ID of the failed disk. For example:
5446107257933431427 UNAVAIL 0 0 0
2. Install the New Disk
Physically replace the failed disk with a new one. Verify that the new disk is recognized by the system:
ls -l /dev/disk/by-id/
Find the new disk’s ID (e.g., ata-WDC_WD30EFRX-68AX9N0_WD-WCC1T0809124
).
3. Clear Any Existing Filesystem on the New Disk
If the new disk contains an existing filesystem, ZFS may reject it. Clear the filesystem using:
wipefs -a /dev/disk/by-id/<new_disk_id>
Replace <new_disk_id>
with the appropriate disk ID.
4. Replace the Failed Disk
Use the zpool replace
command to replace the failed disk with the new disk:
zpool replace rpool <failed_disk_id> /dev/disk/by-id/<new_disk_id>
Example:
zpool replace rpool 5446107257933431427 /dev/disk/by-id/ata-WDC_WD30EFRX-68AX9N0_WD-WCC1T0809124
Monitoring the Resilvering Process
After replacing the disk, ZFS will automatically start rebuilding the data (resilvering). Monitor the progress using:
zpool status
Example output:
scan: resilver in progress since Sat Dec 21 21:32:03 2024
10.9G / 10.9G scanned, 56.6M / 5.37G issued at 14.2M/s
21.2M resilvered, 1.03% done, 00:06:24 to go
The process duration depends on the amount of data and system performance.
Post-Resilvering Verification
Once the resilvering process is complete, verify that the pool has returned to a healthy state:
zpool status
All disks should be listed as ONLINE. Example output:
pool: rpool
state: ONLINE
scan: resilvered 5.37G in 0h7m with 0 errors on Sat Dec 21 21:39:01 2024
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-WDC_WD30EFRX-68AX9N0_WD-WCC1T1080266-part3 ONLINE 0 0 0
ata-WDC_WD30EFRX-68AX9N0_WD-WCC1T1101796-part3 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
ata-WDC_WD30EFRX-68AX9N0_WD-WCC1T1401427-part3 ONLINE 0 0 0
ata-WDC_WD30EFRX-68AX9N0_WD-WCC1T0809124 ONLINE 0 0 0
errors: No known data errors
Troubleshooting
If you encounter issues:
- Ensure the new disk is properly connected and recognized by the system.
- If
zpool replace
fails, check for existing partitions or data on the new disk and clear them usingwipefs
. - Review system logs for any hardware errors:
dmesg | grep ZFS
The resilvering feature in ZFS allows you to recover from disk failures without data loss. By following these steps, you can replace a failed disk and restore your ZFS pool to a healthy state. Regular monitoring and proper maintenance of your storage system ensure long-term reliability.
No Comment! Be the first one.