Skip to content

Commit 1144591

Browse files
committed
tq: retry batch failures
If the batch operation fails due to an error instead of a bad HTTP status code, we'll abort the batch operation and retry. This appears to be a regression from 1412d6e ("Don't fail if we lack objects the server has", 2019-04-30), which caused us to handle errors differently. Since there are two error returns from enqueueAndCollectRetriesFor, let's wrap the batch error case as a retriable error and not abort if we find a retriable error later on. This lets us continue to abort if we get a missing object, which should be fatal, but retry in the more common network failure case.
1 parent 1805222 commit 1144591

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

tq/transfer_queue.go

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -442,7 +442,7 @@ func (q *TransferQueue) collectBatches() {
442442
// don't process further batches. Abort the wait queue so that
443443
// we don't deadlock waiting for objects to complete when they
444444
// never will.
445-
if err != nil {
445+
if err != nil && !errors.IsRetriableError(err) {
446446
q.wait.Abort()
447447
break
448448
}
@@ -538,7 +538,7 @@ func (q *TransferQueue) enqueueAndCollectRetriesFor(batch batch) (batch, error)
538538
}
539539
}
540540

541-
return next, err
541+
return next, errors.NewRetriableError(err)
542542
}
543543
}
544544

0 commit comments

Comments
 (0)