-
Notifications
You must be signed in to change notification settings - Fork 584
Description
Corpus pruning was silently failing in a private fork of clusterfuzz.
I was seeing log messages for Deleting URLs
and Done deleting URLs
, but no audit logs were present indicating URLs were actually being deleted successfully. Corpus sizes remained large, indicating URLs were not being deleted.
def delete_signed_urls(urls):
if not urls:
return
logs.info('Deleting URLs.')
with concurrency.make_pool(_POOL_SIZE) as pool:
pool.map(_error_tolerant_delete_signed_url, urls)
logs.info('Done deleting URLs.')
I updated the code in a private fork of clusterfuzz and now URLs are being deleted successfully:
def delete_signed_urls(urls):
if not urls:
return
logs.info('Deleting URLs.')
with concurrency.make_pool(_POOL_SIZE) as pool:
list(pool.map(_error_tolerant_delete_signed_url, urls))
logs.info('Done deleting URLs.')
However, log messages are never emitted when _error_tolerant_delete_signed_url
is called. For example, I updated the code to log every time an attempt is made to delete an URL, but this was never present in Logs Explorer. This means that the Failed to delete
warning will never be emitted:
def _error_tolerant_delete_signed_url(url: str):
logs.info('Deleting URL %s' % url)
try:
delete_signed_url(url)
except Exception:
logs.warning(f'Failed to delete: {url}')
I'm not exactly sure why list(pool.map(_error_tolerant_delete_signed_url, urls))
fixes the issue. I've attempted to reproduce the issue in standalone Python programs with no success. For example, code like the following always prints 0
through 4
even though pool.map(bad_function, array)
is called instead of list(pool.map(bad_function, array))
.
from multiprocessing import Pool
import multiprocessing
import contextlib
from concurrent import futures
POOL_SIZE = multiprocessing.cpu_count()
@contextlib.contextmanager
def make_pool(pool_size=POOL_SIZE, max_pool_size=None):
"""Returns a pool that can (usually) execute tasks concurrently."""
if max_pool_size is not None:
pool_size = min(pool_size, max_pool_size)
yield futures.ProcessPoolExecutor(pool_size)
def test_function(x):
print(x)
if __name__ == '__main__':
array = [x for x in range(0, 5)]
with make_pool() as pool:
pool.map(bad_function, array)
Consider the following changes for fixing this issue:
- Update
_error_tolerant_delete_signed_url
to return errors when URLs are not deleted successfully - Update
delete_signed_urls
to iterate through the results and emit a warning for each URL not deleted successfully - Update
delete_signed_urls
to return a count of successfully deleted URLs - Update
rsync_from_disk
to use the return value ofdelete_signed_urls
to log the number of successfully deleted URLs
I may submit a pull request myself with these suggested changes. However, this requires going through the open source approval process for my company and may take a while to get through.