remote_storage
- class Provider(value)[source]
Bases:
str
,enum.Enum
An enumeration.
- GOOGLE_STORAGE = 'google_storage'
- S3 = 's3'
- class RemoteObjectProtocol(*args, **kwargs)[source]
Bases:
Protocol
Protocol of classes that describe remote objects. Describes information about the remote object and functionality to download the object.
- name: str
- size: int
- hash: int
- provider: str
- download(download_path, overwrite_existing=False) Optional[accsr.remote_storage.RemoteObjectProtocol]
- class SyncObject(local_path: Optional[str] = None, remote_obj: Optional[accsr.remote_storage.RemoteObjectProtocol] = None, remote_path: Optional[str] = None)[source]
Bases:
accsr.remote_storage._JsonReprMixin
Class representing the sync-status between a local path and a remote object. Is mainly used for creating summaries and syncing within RemoteStorage and for introspection before and after push/pull transactions.
It is not recommended creating or manipulate instances of this class outside RemoteStorage, in particular in user code. This class forms part of the public interface because instances of it are given to users for introspection.
- property name
- property exists_on_target: bool
True iff the file exists on both locations
- set_local_path(path: Optional[str])
Changes the local path of the SyncObject :param path: :return: None
- property exists_on_remote
- property equal_md5_hash_sum
- to_dict(make_serializable=True)
- class TransactionSummary(matched_source_files: typing.List[accsr.remote_storage.SyncObject] = <factory>, not_on_target: typing.List[accsr.remote_storage.SyncObject] = <factory>, on_target_eq_md5: typing.List[accsr.remote_storage.SyncObject] = <factory>, on_target_neq_md5: typing.List[accsr.remote_storage.SyncObject] = <factory>, unresolvable_collisions: typing.Dict[str, typing.Union[typing.List[accsr.remote_storage.RemoteObjectProtocol], str]] = <factory>, skipped_source_files: typing.List[accsr.remote_storage.SyncObject] = <factory>, synced_files: typing.List[accsr.remote_storage.SyncObject] = <factory>, sync_direction: typing.Optional[str] = None)[source]
Bases:
accsr.remote_storage._JsonReprMixin
Class representing the summary of a push or pull operation. Is mainly used for introspection before and after push/pull transactions.
It is not recommended creating or manipulate instances of this class outside RemoteStorage, in particular in user code. This class forms part of the public interface because instances of it are given to users for introspection.
- matched_source_files: List[accsr.remote_storage.SyncObject]
- not_on_target: List[accsr.remote_storage.SyncObject]
- on_target_eq_md5: List[accsr.remote_storage.SyncObject]
- on_target_neq_md5: List[accsr.remote_storage.SyncObject]
- unresolvable_collisions: Dict[str, Union[List[accsr.remote_storage.RemoteObjectProtocol], str]]
- skipped_source_files: List[accsr.remote_storage.SyncObject]
- synced_files: List[accsr.remote_storage.SyncObject]
- sync_direction: Optional[str] = None
- property files_to_sync: List[accsr.remote_storage.SyncObject]
Returns of files that need synchronization.
- Returns
list of all files that are not on the target or have different md5sums on target and remote
- size_files_to_sync() int
Computes the total size of all objects that need synchronization. Raises a RuntimeError if the sync_direction property is not set to ‘push’ or ‘pull’.
- Returns
the total size of all local objects that need synchronization if self.sync_direction=’push’ and the size of all remote files that need synchronization if self.sync_direction=’pull’
- property requires_force: bool
Getter of the requires_force property. :return: True iff a failure of the transaction can only be prevented by setting force=True.
- property has_unresolvable_collisions: bool
Getter of the requires_force property. :return: True iff there exists a collision that cannot be resolved.
- property all_files_analyzed: List[accsr.remote_storage.SyncObject]
Getter of the all_files_analyzed property. :return: list of all analyzed source files
- add_entry(synced_object: Union[accsr.remote_storage.SyncObject, str], collides_with: Optional[Union[List[accsr.remote_storage.RemoteObjectProtocol], str]] = None, skip: bool = False)
Adds a SyncObject to the summary. :param synced_object: either a SyncObject or a path to a local file. :param collides_with: specification of unresolvable collisions for the given sync object :param skip: if True, the object is marked to be skipped :return: None
- get_short_summary_dict()
Returns a short summary of the transaction as a dictionary.
- print_short_summary()
Prints a short summary of the transaction (shorter than the full repr, which contains information about local and remote objects).
- class RemoteStorageConfig(provider: str, key: str, bucket: str, secret: str, region: Optional[str] = None, host: Optional[str] = None, port: Optional[int] = None, base_path: str = '', secure: bool = True)[source]
Bases:
object
Contains all necessary information to establish a connection to a bucket within the remote storage, and the base path on the remote.
- provider: str
- key: str
- bucket: str
- secret: str
- region: str = None
- host: str = None
- port: int = None
- base_path: str = ''
- secure: bool = True
- class RemoteStorage(conf: accsr.remote_storage.RemoteStorageConfig)[source]
Bases:
object
Wrapper around lib-cloud for accessing remote storage services. :param conf:
- create_bucket(exist_ok: bool = True)
- property conf: accsr.remote_storage.RemoteStorageConfig
- property provider: str
- property remote_base_path: str
- set_remote_base_path(path: Optional[str])
Changes the base path in the remote storage (overriding the base path extracted from RemoteStorageConfig during instantiation). Pull and push operations will only affect files within the remote base path.
- Parameters
path – a path with linux-like separators
- property bucket: libcloud.storage.base.Container
- property driver: libcloud.storage.base.StorageDriver
- pull(remote_path: str, local_base_dir='', force=False, include_regex: Optional[Union[str, Pattern]] = None, exclude_regex: Optional[Union[str, Pattern]] = None, convert_to_linux_path=True, dryrun=False, path_regex: Optional[Union[str, Pattern]] = None) accsr.remote_storage.TransactionSummary
Pull either a file or a directory under the given path relative to local_base_dir.
- Parameters
remote_path – remote path on storage bucket relative to the configured remote base path. e.g. ‘data/ground_truth/some_file.json’
local_base_dir – Local base directory for constructing local path e.g. passing ‘local_base_dir’ will download to the path ‘local_base_dir/data/ground_truth/some_file.json’ in the above example
force – If False, pull will raise an error if an already existing file deviates from the remote in its md5sum. If True, these files are overwritten.
include_regex – If not None only files with paths matching the regex will be pulled. This is useful for filtering files within a remote directory before pulling them.
exclude_regex – If not None, files with paths matching the regex will be excluded from the pull. Takes precedence over
include_regex
, i.e. if a file matches both, it will be excluded.convert_to_linux_path – if True, will convert windows path to linux path (as needed by remote storage) and thus passing a remote path like ‘datamypath’ will be converted to ‘data/my/path’ before pulling. This should only be set to False if you want to pull a remote object with ‘' in its file name (which is discouraged).
dryrun – If True, simulates the pull operation and returns the remote objects that would have been pulled.
path_regex – DEPRECATED! Use
include_regex
instead.
- Returns
An object describing the summary of the operation.
- get_push_remote_path(local_path: str) str
Get the full path within a remote storage bucket for pushing.
- Parameters
local_path – the local path to the file
- Returns
the remote path that corresponds to the local path
- push(path: str, local_path_prefix: Optional[str] = None, force: bool = False, include_regex: Optional[Union[str, Pattern]] = None, exclude_regex: Optional[Union[str, Pattern]] = None, dryrun: bool = False, path_regex: Optional[Union[str, Pattern]] = None) accsr.remote_storage.TransactionSummary
Upload files into the remote storage. Does not upload files for which the md5sum matches existing remote files. The remote path for uploading will be constructed from the remote_base_path and the provided path. The local_path_prefix serves for finding the directory on the local system or for stripping off parts of absolute paths if path is absolute, see examples below.
Examples
- path=foo/bar, local_path_prefix=None –>
./foo/bar uploaded to remote_base_path/foo/bar
- path=/home/foo/bar, local_path_prefix=None –>
/home/foo/bar uploaded to remote_base_path/home/foo/bar
- path=bar, local_path_prefix=/home/foo –>
/home/foo/bar uploaded to remote_base_path/bar
- path=/home/foo/bar, local_path_prefix=/home/foo –>
/home/foo/bar uploaded to remote_base_path/bar (Same as 3)
- path=/home/baz/bar, local_path_prefix=/home/foo –>
ValueError: Specified path=/home/baz/bar is not a child of local_path_prefix=/home/foo
- Parameters
path – Path to the local object (file or directory) to be uploaded, may be absolute or relative. globs are supported as well, thus
path
may be a pattern like*.txt
.local_path_prefix – Prefix to be concatenated with
path
force – If False, push will raise an error if an already existing remote file deviates from the local in its md5sum. If True, these files are overwritten.
include_regex – If not None, only files with paths matching the regex will be pushed. Note that paths matched against the regex will be relative to
local_path_prefix
.exclude_regex – If not None, only files with paths not matching the regex will be pushed. Takes precedence over
include_regex
, i.e. if a file matches both regexes, it will be excluded. Note that paths matched against the regex will be relative tolocal_path_prefix
.dryrun – If True, simulates the push operation and returns the summary (with synced_files being an empty list).
path_regex – DEPRECATED! Same as
include_regex
.
- Returns
An object describing the summary of the operation.
- delete(remote_path: str, include_regex: Optional[Union[str, Pattern]] = None, exclude_regex: Optional[Union[str, Pattern]] = None, path_regex: Optional[Union[str, Pattern]] = None) List[accsr.remote_storage.RemoteObjectProtocol]
Deletes a file or a directory under the given path relative to local_base_dir. Use with caution!
- Parameters
remote_path – remote path on storage bucket relative to the configured remote base path.
include_regex – If not None only files with paths matching the regex will be deleted.
exclude_regex – If not None only files with paths not matching the regex will be deleted. Takes precedence over
include_regex
, i.e. if a file matches both regexes, it will be excluded.path_regex – DEPRECATED! Same as
include_regex
.
- Returns
list of remote objects referring to all deleted files
- list_objects(remote_path: str) List[accsr.remote_storage.RemoteObjectProtocol]
- Parameters
remote_path – remote path on storage bucket relative to the configured remote base path.
- Returns
list of remote objects under the remote path (multiple entries if the remote path is a directory)