SEGFAULT or ACCESS_VIOLATION
Symptom
The CDP Server crashes and becomes unavailable.
Backup or restore tasks fail with strange errors like:
Cause
Bad Hardware or Bad Memory.
Resolution
First Check The Monitor Log File To Confirm.
The first thing to do when you get the strange errors is check the log files. There are several log files kept on the CDP Server.
First check the monitor.log file. This log file is generated by the CDP Server "monitor." The CDP Server monitor is a watch dog daemon that constantly monitors the health of the CDP Server.
The default locations for the monitor.log file are as follows:
Windows CDP Server: C:\Program Files\Righteous Backup\Backup Server\log\monitor.log
Linux CDP Server: /usr/r1soft/buserver/log/monitor.log
The log is a text file and can be opened with any standard text editor.
Look through the log and look for errors like the following:
Windows Hardware Error Example
Linux Hardware Error Example
Make sure you identify SIGSEGV in the error on Linux or the ACCESS_VIOLATION on Windows. Typically, these errors are a clear indicator that you have bad memory or another serious hardware problem.
Bad memory or hard disks or other hardware issues can cause the CDP Server to fail in unexpected ways. Typically, but not always, this results in a SEGAFULT error (Linux) or ACCESS_VIOLATION error (Windows).
If you ever have a CDP Server crash, please report it to R1Soft Tech Support as soon as possible. We would like to know about it, even if you resolved the issue by replacing hardware.
When you contact Tech Support, please include your CDP Server logs. The CDP Server will automatically generate a zip file for you to download with most or all relevant log files.
To download this zip file, open the CDP Server Web User Interface in a Web browser. Click on the "Options" tab from the Main Menu. Then select "Server Logs" from the "Options" sub-menu. Click on the "Download Server Log Files" button.
Similar errors can be caused by a failing hard disk or RAID array. These can corrupt page files or swap partitions, resulting in what appears to be memory errors.
"I used this same Server Before for X Years and I swear it Ran Great before I installed CDP Server"
CDP Server is very memory intensive. This feature is good because it uses as much RAM as possible to increase performance. However, this feature can also be bad as it has a tendency to reveal hardware problems. Another factor is that Continuous Data Protection is system intensive and especially I/O intensive. When several Hosts are being synchronized at once pushing the hardware to its limits, it can be very intensive on the CDP Server.
These factors can be a recipe for revealing hardware problems you did not know you had.
How To Resolve
1) First shut down your CDP Server.
There is no point in running the CDP Server on a system with bad memory. Shut it down so you do not cause other problems.
Windows:
Open The NT Service manager. Select the CDP Server service and stop it.
Linux:
/etc/init.d/buserver stop
2) Replace The Failed Hardware
The only resolution for this issue is to replace the failed piece of hardware, in most cases bad memory. Troubleshooting memory problems can be very challenging. Many factors can effect memory errors including environmental factors like heat and load on the system. Tools such as memtest86 and the Microsoft memory test tool can sometimes identify bad memory.
3) After Replacing Failed Hardware or installing on New Hardware Start the CDP Server and Watch for Errors
Periodically check the log file to make sure you have resolved the hardware problem.
We have seen customers replace failed memory with a new set of bad RAM. Sometimes bad memory comes in batches or is purchased from the same vendor from the same batch. In other cases, it is not bad memory but instead an issue of compatibility. Mis-matched RAM sticks from different manufacturers with different specifications can sometimes cause these problems.
4) Resolve Any Issues With Corrupted Disk Safes Caused By The Crash(es)
When the CDP Server crashes because of SEGFAULT, it crashes in a very unexpected way. Any Disk Safes you are writing backups to (synchronizing) at the time of the crash are probably corrupt.
Watch your Task History closely for failed Backup/synchronization tasks.
If your Backup or restore tasks fail with strange errors like:
page N not found...
file XYZ corrupt needs repair...
file XYZ has bad magic number
Then you have identified a corrupt Disk Safe.
To resolve the corrupt Disk Safe:
- Create a New Disk Safe for the effected Host.
- Delete the corrupted Disk Safe.