Skip to end of metadata
Go to start of metadata

SEGFAULT or ACCESS_VIOLATION

Symptom


The CDP Server crashes and becomes unavailable.

Backup or restore tasks fail with strange errors like:

page N not found... file XYZ corrupt needs repair... file XYZ has bad magic number

Cause


Bad Hardware or Bad Memory.

Resolution


First Check The Monitor Log File To Confirm.
The first thing to do when you get the strange errors is check the log files. There are several log files kept on the CDP Server.

First check the monitor.log file. This log file is generated by the CDP Server "monitor." The CDP Server monitor is a watch dog daemon that constantly monitors the health of the CDP Server.

The default locations for the monitor.log file are as follows:
Windows CDP Server: C:\Program Files\Righteous Backup\Backup Server\log\monitor.log

Linux CDP Server: /usr/r1soft/buserver/log/monitor.log

The log is a text file and can be opened with any standard text editor.

Look through the log and look for errors like the following:

Windows Hardware Error Example

# EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x000007ff7fc52b42, pid=3520, tid=3768 # # Java VM: Java HotSpot(TM) 64-Bit Server VM (1.6.0-b105 mixed mode) # Problematic frame: # C [msvcrt.dll+0x52b42]

Linux Hardware Error Example

INFO | buserver | 2006/12/21 13:38:12 | # INFO | buserver | 2006/12/21 13:38:12 | # An unexpected error has been detected by HotSpot Virtual Machine: INFO | buserver | 2006/12/21 13:38:12 | # INFO | buserver | 2006/12/21 13:38:12 | # SIGSEGV (0xb) at pc=0x401cbf3e, pid=653, tid=1520520112 INFO | buserver | 2006/12/21 13:38:12 | # INFO | buserver | 2006/12/21 13:38:12 | # Java VM: Java HotSpot(TM) Server VM (1.5.0_08-b03 mixed mode) INFO | buserver | 2006/12/21 13:38:12 | # Problematic frame: INFO | buserver | 2006/12/21 13:38:12 | # V [libjvm.so+0x1cbf3e] INFO | buserver | 2006/12/21 13:38:12 | # INFO | buserver | 2006/12/21 13:38:12 | # An error report file with more information is saved as hs_err_pid653.log INFO | buserver | 2006/12/21 13:38:12 | # INFO | buserver | 2006/12/21 13:38:12 | # If you would like to submit a bug report, please visit: INFO | buserver | 2006/12/21 13:38:12 | # ERROR | monitor | 2006/12/21 13:38:12 | Service exited unexpectedly.

Make sure you identify SIGSEGV in the error on Linux or the ACCESS_VIOLATION on Windows. Typically, these errors are a clear indicator that you have bad memory or another serious hardware problem.

Bad memory or hard disks or other hardware issues can cause the CDP Server to fail in unexpected ways. Typically, but not always, this results in a SEGAFULT error (Linux) or ACCESS_VIOLATION error (Windows).

If you ever have a CDP Server crash, please report it to R1Soft Tech Support as soon as possible. We would like to know about it, even if you resolved the issue by replacing hardware.

When you contact Tech Support, please include your CDP Server logs. The CDP Server will automatically generate a zip file for you to download with most or all relevant log files.

To download this zip file, open the CDP Server Web User Interface in a Web browser. Click on the "Options" tab from the Main Menu. Then select "Server Logs" from the "Options" sub-menu. Click on the "Download Server Log Files" button.

Similar errors can be caused by a failing hard disk or RAID array. These can corrupt page files or swap partitions, resulting in what appears to be memory errors.

"I used this same Server Before for X Years and I swear it Ran Great before I installed CDP Server"
CDP Server is very memory intensive. This feature is good because it uses as much RAM as possible to increase performance. However, this feature can also be bad as it has a tendency to reveal hardware problems. Another factor is that Continuous Data Protection is system intensive and especially I/O intensive. When several Hosts are being synchronized at once pushing the hardware to its limits, it can be very intensive on the CDP Server.

These factors can be a recipe for revealing hardware problems you did not know you had.

How To Resolve

1) First shut down your CDP Server.
There is no point in running the CDP Server on a system with bad memory. Shut it down so you do not cause other problems.

Windows:
Open The NT Service manager. Select the CDP Server service and stop it.

Linux:
/etc/init.d/buserver stop

2) Replace The Failed Hardware
The only resolution for this issue is to replace the failed piece of hardware, in most cases bad memory. Troubleshooting memory problems can be very challenging. Many factors can effect memory errors including environmental factors like heat and load on the system. Tools such as memtest86 and the Microsoft memory test tool can sometimes identify bad memory.

3) After Replacing Failed Hardware or installing on New Hardware Start the CDP Server and Watch for Errors
Periodically check the log file to make sure you have resolved the hardware problem.

We have seen customers replace failed memory with a new set of bad RAM. Sometimes bad memory comes in batches or is purchased from the same vendor from the same batch. In other cases, it is not bad memory but instead an issue of compatibility. Mis-matched RAM sticks from different manufacturers with different specifications can sometimes cause these problems.

4) Resolve Any Issues With Corrupted Disk Safes Caused By The Crash(es)
When the CDP Server crashes because of SEGFAULT, it crashes in a very unexpected way. Any Disk Safes you are writing backups to (synchronizing) at the time of the crash are probably corrupt.

Watch your Task History closely for failed Backup/synchronization tasks.

If your Backup or restore tasks fail with strange errors like:
page N not found...
file XYZ corrupt needs repair...
file XYZ has bad magic number

Then you have identified a corrupt Disk Safe.
To resolve the corrupt Disk Safe:

  1. Create a New Disk Safe for the effected Host.
  2. Delete the corrupted Disk Safe.

Related Articles


Page: Upgrading Hardware Configuration (Archived Knowledge Base 2.0) Labels: hardware, upgrade
Page: Moving Your Linux Control Server Onto New Hardware (Archived Knowledge Base 2.0) Labels: hardware, upgrade, migration, performance
Page: SEGFAULT or ACCESS_VIOLATION (Archived Knowledge Base 2.0) Labels: hardware, disk_safe
Page: Linux CDP Server — Migration (Archived Knowledge Base 2.0) Labels: migration, hardware
Page: Linux CDP Server — Installing on 64-bit Linux (Archived Knowledge Base 2.0) Labels: 64-bit, hardware
Page: CDP Data Center Base Deployment (Archived Knowledge Base 2.0) Labels: install, hardware
Page: Reinstalling CDP Server on New Hardware (Archived Knowledge Base 2.0) Labels: install, hardware
Page: Bare-Metal Restore to Another Hard Drive (Archived Knowledge Base 2.0) Labels: hardware, bare-metal_restore
Page: Failed To Detect Hardware Configuration (0xECEBE1E2) (Archived Knowledge Base 2.0) Labels: hardware
Page: Output Says That There Is Corrupt Data And The Agent Shuts Down And Stops The Back Up (Archived Knowledge Base 2.0) Labels: hardware, backup_error
Page: Using Verification to Detect Hardware Problems (Archived Knowledge Base 2.0) Labels: verify, hardware
Page: Recommended Hardware Configuration For R1Soft Solutions (Archived Knowledge Base 2.0) Labels: hardware
Page: Not Enough Data In Sum File To Read BlockSum (Archived Knowledge Base 2.0) Labels: disk_safe, volume, backup_error
Page: Verifying Disk Safe (Archived Knowledge Base 2.0) Labels: disk_safe, verify
Page: SEGFAULT or ACCESS_VIOLATION (Archived Knowledge Base 2.0) Labels: hardware, disk_safe
Page: The Disk Safe Browser (Archived Knowledge Base 2.0) Labels: disk_safe, device, backup_image, file_restore
Page: Removing Files from the Backups (Archived Knowledge Base 2.0) Labels: disk_safe
Page: Moving Backups to Another Place (Archived Knowledge Base 2.0) Labels: disk_safe
Page: What Does the Verify Recovery Point Task Do? (Archived Knowledge Base 2.0) Labels: cdp_server, disk_safe, verify
Page: License for Disabled Agent (Archived Knowledge Base 2.0) Labels: license, disk_safe
Page: Disk Safe Browser - Error Reading File System (Archived Knowledge Base 2.0) Labels: disk_safe, encryption, file_system
Page: Why Recreate a Disk Safe (Archived Knowledge Base 2.0) Labels: disk_safe, corrupt, volume, encryption, partition
Labels:
hardware hardware Delete
disk_safe disk_safe Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.