remote_storage

class Provider(value)[source]

Bases: str, enum.Enum

An enumeration.

GOOGLE_STORAGE = 'google_storage'
S3 = 's3'
class RemoteObjectProtocol(*args, **kwargs)[source]

Bases: Protocol

Protocol of classes that describe remote objects. Describes information about the remote object and functionality to download the object.

name: str
size: int
hash: int
provider: str
download(download_path, overwrite_existing=False) Optional[accsr.remote_storage.RemoteObjectProtocol]
class SyncObject(local_path: Optional[str] = None, remote_obj: Optional[accsr.remote_storage.RemoteObjectProtocol] = None, remote_path: Optional[str] = None)[source]

Bases: accsr.remote_storage._JsonReprMixin

Class representing the sync-status between a local path and a remote object. Is mainly used for creating summaries and syncing within RemoteStorage and for introspection before and after push/pull transactions.

It is not recommended creating or manipulate instances of this class outside RemoteStorage, in particular in user code. This class forms part of the public interface because instances of it are given to users for introspection.

property name
property exists_on_target: bool

True iff the file exists on both locations

set_local_path(path: Optional[str])

Changes the local path of the SyncObject :param path: :return: None

property exists_on_remote
property equal_md5_hash_sum
to_dict(make_serializable=True)
class TransactionSummary(matched_source_files: typing.List[accsr.remote_storage.SyncObject] = <factory>, not_on_target: typing.List[accsr.remote_storage.SyncObject] = <factory>, on_target_eq_md5: typing.List[accsr.remote_storage.SyncObject] = <factory>, on_target_neq_md5: typing.List[accsr.remote_storage.SyncObject] = <factory>, unresolvable_collisions: typing.Dict[str, typing.Union[typing.List[accsr.remote_storage.RemoteObjectProtocol], str]] = <factory>, skipped_source_files: typing.List[accsr.remote_storage.SyncObject] = <factory>, synced_files: typing.List[accsr.remote_storage.SyncObject] = <factory>, sync_direction: typing.Optional[str] = None)[source]

Bases: accsr.remote_storage._JsonReprMixin

Class representing the summary of a push or pull operation. Is mainly used for introspection before and after push/pull transactions.

It is not recommended creating or manipulate instances of this class outside RemoteStorage, in particular in user code. This class forms part of the public interface because instances of it are given to users for introspection.

matched_source_files: List[accsr.remote_storage.SyncObject]
not_on_target: List[accsr.remote_storage.SyncObject]
on_target_eq_md5: List[accsr.remote_storage.SyncObject]
on_target_neq_md5: List[accsr.remote_storage.SyncObject]
unresolvable_collisions: Dict[str, Union[List[accsr.remote_storage.RemoteObjectProtocol], str]]
skipped_source_files: List[accsr.remote_storage.SyncObject]
synced_files: List[accsr.remote_storage.SyncObject]
sync_direction: Optional[str] = None
property files_to_sync: List[accsr.remote_storage.SyncObject]

Returns of files that need synchronization.

Returns

list of all files that are not on the target or have different md5sums on target and remote

size_files_to_sync() int

Computes the total size of all objects that need synchronization. Raises a RuntimeError if the sync_direction property is not set to ‘push’ or ‘pull’.

Returns

the total size of all local objects that need synchronization if self.sync_direction=’push’ and the size of all remote files that need synchronization if self.sync_direction=’pull’

property requires_force: bool

Getter of the requires_force property. :return: True iff a failure of the transaction can only be prevented by setting force=True.

property has_unresolvable_collisions: bool

Getter of the requires_force property. :return: True iff there exists a collision that cannot be resolved.

property all_files_analyzed: List[accsr.remote_storage.SyncObject]

Getter of the all_files_analyzed property. :return: list of all analyzed source files

add_entry(synced_object: Union[accsr.remote_storage.SyncObject, str], collides_with: Optional[Union[List[accsr.remote_storage.RemoteObjectProtocol], str]] = None, skip: bool = False)

Adds a SyncObject to the summary. :param synced_object: either a SyncObject or a path to a local file. :param collides_with: specification of unresolvable collisions for the given sync object :param skip: if True, the object is marked to be skipped :return: None

get_short_summary_dict()

Returns a short summary of the transaction as a dictionary.

print_short_summary()

Prints a short summary of the transaction (shorter than the full repr, which contains information about local and remote objects).

class RemoteStorageConfig(provider: str, key: str, bucket: str, secret: str, region: Optional[str] = None, host: Optional[str] = None, port: Optional[int] = None, base_path: str = '', secure: bool = True)[source]

Bases: object

Contains all necessary information to establish a connection to a bucket within the remote storage, and the base path on the remote.

provider: str
key: str
bucket: str
secret: str
region: str = None
host: str = None
port: int = None
base_path: str = ''
secure: bool = True
class RemoteStorage(conf: accsr.remote_storage.RemoteStorageConfig)[source]

Bases: object

Wrapper around lib-cloud for accessing remote storage services. :param conf:

create_bucket(exist_ok: bool = True)
property conf: accsr.remote_storage.RemoteStorageConfig
property provider: str
property remote_base_path: str
set_remote_base_path(path: Optional[str])

Changes the base path in the remote storage (overriding the base path extracted from RemoteStorageConfig during instantiation). Pull and push operations will only affect files within the remote base path.

Parameters

path – a path with linux-like separators

property bucket: libcloud.storage.base.Container
property driver: libcloud.storage.base.StorageDriver
pull(remote_path: str, local_base_dir='', force=False, include_regex: Optional[Union[str, Pattern]] = None, exclude_regex: Optional[Union[str, Pattern]] = None, convert_to_linux_path=True, dryrun=False, path_regex: Optional[Union[str, Pattern]] = None) accsr.remote_storage.TransactionSummary

Pull either a file or a directory under the given path relative to local_base_dir.

Parameters
  • remote_path – remote path on storage bucket relative to the configured remote base path. e.g. ‘data/ground_truth/some_file.json’

  • local_base_dir – Local base directory for constructing local path e.g. passing ‘local_base_dir’ will download to the path ‘local_base_dir/data/ground_truth/some_file.json’ in the above example

  • force – If False, pull will raise an error if an already existing file deviates from the remote in its md5sum. If True, these files are overwritten.

  • include_regex – If not None only files with paths matching the regex will be pulled. This is useful for filtering files within a remote directory before pulling them.

  • exclude_regex – If not None, files with paths matching the regex will be excluded from the pull. Takes precedence over include_regex, i.e. if a file matches both, it will be excluded.

  • convert_to_linux_path – if True, will convert windows path to linux path (as needed by remote storage) and thus passing a remote path like ‘datamypath’ will be converted to ‘data/my/path’ before pulling. This should only be set to False if you want to pull a remote object with ‘' in its file name (which is discouraged).

  • dryrun – If True, simulates the pull operation and returns the remote objects that would have been pulled.

  • path_regex – DEPRECATED! Use include_regex instead.

Returns

An object describing the summary of the operation.

get_push_remote_path(local_path: str) str

Get the full path within a remote storage bucket for pushing.

Parameters

local_path – the local path to the file

Returns

the remote path that corresponds to the local path

push(path: str, local_path_prefix: Optional[str] = None, force: bool = False, include_regex: Optional[Union[str, Pattern]] = None, exclude_regex: Optional[Union[str, Pattern]] = None, dryrun: bool = False, path_regex: Optional[Union[str, Pattern]] = None) accsr.remote_storage.TransactionSummary

Upload files into the remote storage. Does not upload files for which the md5sum matches existing remote files. The remote path for uploading will be constructed from the remote_base_path and the provided path. The local_path_prefix serves for finding the directory on the local system or for stripping off parts of absolute paths if path is absolute, see examples below.

Examples

  1. path=foo/bar, local_path_prefix=None –>

    ./foo/bar uploaded to remote_base_path/foo/bar

  2. path=/home/foo/bar, local_path_prefix=None –>

    /home/foo/bar uploaded to remote_base_path/home/foo/bar

  3. path=bar, local_path_prefix=/home/foo –>

    /home/foo/bar uploaded to remote_base_path/bar

  4. path=/home/foo/bar, local_path_prefix=/home/foo –>

    /home/foo/bar uploaded to remote_base_path/bar (Same as 3)

  5. path=/home/baz/bar, local_path_prefix=/home/foo –>

    ValueError: Specified path=/home/baz/bar is not a child of local_path_prefix=/home/foo

Parameters
  • path – Path to the local object (file or directory) to be uploaded, may be absolute or relative. globs are supported as well, thus path may be a pattern like *.txt.

  • local_path_prefix – Prefix to be concatenated with path

  • force – If False, push will raise an error if an already existing remote file deviates from the local in its md5sum. If True, these files are overwritten.

  • include_regex – If not None, only files with paths matching the regex will be pushed. Note that paths matched against the regex will be relative to local_path_prefix.

  • exclude_regex – If not None, only files with paths not matching the regex will be pushed. Takes precedence over include_regex, i.e. if a file matches both regexes, it will be excluded. Note that paths matched against the regex will be relative to local_path_prefix.

  • dryrun – If True, simulates the push operation and returns the summary (with synced_files being an empty list).

  • path_regex – DEPRECATED! Same as include_regex.

Returns

An object describing the summary of the operation.

delete(remote_path: str, include_regex: Optional[Union[str, Pattern]] = None, exclude_regex: Optional[Union[str, Pattern]] = None, path_regex: Optional[Union[str, Pattern]] = None) List[accsr.remote_storage.RemoteObjectProtocol]

Deletes a file or a directory under the given path relative to local_base_dir. Use with caution!

Parameters
  • remote_path – remote path on storage bucket relative to the configured remote base path.

  • include_regex – If not None only files with paths matching the regex will be deleted.

  • exclude_regex – If not None only files with paths not matching the regex will be deleted. Takes precedence over include_regex, i.e. if a file matches both regexes, it will be excluded.

  • path_regex – DEPRECATED! Same as include_regex.

Returns

list of remote objects referring to all deleted files

list_objects(remote_path: str) List[accsr.remote_storage.RemoteObjectProtocol]
Parameters

remote_path – remote path on storage bucket relative to the configured remote base path.

Returns

list of remote objects under the remote path (multiple entries if the remote path is a directory)