Bacula: Director's connection to SD for this Job was lost

1.5k views Asked by At

SD-7.4.4 (ubuntu 16) Director-7.4.4(ubuntu 16) FD-5.2.10 (windows)

I'm having trouble backing up windows clients with Bacula. I can run a backup just fine when the backup size is around 1MB or 2 but when running a backup of 500MB, I get the same error every time

"Director's connection to SD for this Job was lost."

Some things to mention. When I issue status client:

Terminated Jobs:  JobId  Level    Files      Bytes   Status   Finished
======================================================================
    81  Full      5,796    514.8 M  OK       06-Nov-17 12:50 BackupComputerA

When I issue status dir

06-Nov 17:58 acme-director JobId 81: Error: Director's connection to SD for this Job was lost.
06-Nov 17:58 acme-director JobId 81: Error: Bacula acme-director 7.4.4 (202Sep16):
  Build OS:               arm-unknown-linux-gnueabihf debian 9.0
  JobId:                  81
  Job:                    BackupComputerA.2017-11-06_17.41.01_03
  Backup Level:           Full (upgraded from Incremental)
  Client:                 "Computer-A-fd" 5.2.10 (28Jun12) Microsoft  (build 9200), 32-bit,Cross-compile,Win32
  FileSet:                "Full Set" 2017-11-03 22:12:58
  Pool:                   "RemoteFile" (From Job resource)
  Catalog:                "MyCatalog" (From Client resource)
  Storage:                "File1" (From Job resource)
  Scheduled time:         06-Nov-2017 17:40:59
  Start time:             06-Nov-2017 17:41:04
  End time:               06-Nov-2017 17:58:00
  Elapsed time:           16 mins 56 secs
  Priority:               10
  FD Files Written:       5,796
  SD Files Written:       0
  FD Bytes Written:       514,883,164 (514.8 MB)
  SD Bytes Written:       0 (0 B)
  Rate:                   506.8 KB/s
  Software Compression:   100.0% 1.0:1
  Snapshot/VSS:           yes
  Encryption:             yes
  Accurate:               no
  Volume name(s):         
  Volume Session Id:      1
  Volume Session Time:    1509989906
  Last Volume Bytes:      8,045,880,119 (8.045 GB)
  Non-fatal FD errors:    1
  SD Errors:              0
  FD termination status:  OK
  SD termination status:  Error
  Termination:            *** Backup Error ***

About 5 minutes into the backup, I get a message:

Running Jobs:
Console connected at 06-Nov-17 18:08
 JobId  Type Level     Files     Bytes  Name              Status
======================================================================
    83  Back Full          0         0  BackupComputerE   has terminated
====

The job completes and terminates but loses connection afterwards and I never get a

"OK"

for the status update.

I have added the "Heartbeat Interval = 1 Minute" to all the Daemons and still no luck. Using mysql as the database on the Director

Future thanks for any help

1

There are 1 answers

0
pm1391 On

For anyone having the same issues, I was able to fix this problem between the SD and director by adding the heartbeat interval to the clients and adjusting the keep alive time with

sysctl -w net.ipv4.tcp_keepalive_time=60

on both the Storage daemon and the director. Connecting remotely to the director with the bconsole also interrupted jobs so I ran bconsole on the same machine as the director and connected via ssh.