pox package documentation
pox: utilities for filesystem exploration and automated builds
About Pox
pox
provides a collection of utilities for navigating and manipulating
filesystems. This module is designed to facilitate some of the low level
operating system interactions that are useful when exploring a filesystem
on a remote host, where queries such as “what is the root of the filesystem?”,
“what is the user’s name?”, and “what login shell is preferred?” become
essential in allowing a remote user to function as if they were logged in
locally. While pox
is in the same vein of both the os
and shutil
builtin modules, the majority of its functionality is unique and compliments
these two modules.
pox
provides Python equivalents of several unix shell commands such as
which
and find
. These commands allow automated discovery of what has
been installed on an operating system, and where the essential tools are
located. This capability is useful not only for exploring remote hosts,
but also locally as a helper utility for automated build and installation.
Several high-level operations on files and filesystems are also provided. Examples of which are: finding the location of an installed Python package, determining if and where the source code resides on the filesystem, and determining what version the installed package is.
pox
also provides utilities to enable the abstraction of commands sent
to a remote filesystem. In conjunction with a registry of environment
variables and installed utilites, pox
enables the user to interact with
a remote filesystem as if they were logged in locally.
pox
is part of pathos
, a Python framework for heterogeneous computing.
pox
is in active development, so any user feedback, bug reports, comments,
or suggestions are highly appreciated. A list of issues is located at https://github.com/uqfoundation/pox/issues, with a legacy list maintained at https://uqfoundation.github.io/project/pathos/query.
Major Features
pox
provides utilities for discovering the user’s environment:
return the user’s name, current shell, and path to user’s home directory
strip duplicate entries from the user’s
$PATH
lookup and expand environment variables from
${VAR}
tovalue
pox
also provides utilities for filesystem exploration and manipulation:
discover the path to a file, exectuable, directory, or symbolic link
discover the path to an installed package
parse operating system commands for remote shell invocation
convert text files to platform-specific formatting
Current Release
The latest released version of pox
is available from:
pox
is distributed under a 3-clause BSD license.
Development Version
You can get the latest development version with all the shiny new features at:
If you have a new contribution, please submit a pull request.
Basic Usage
pox
includes some basic utilities to connect to and automate exploration
on local and remote filesystems. There are some basic functions to discover
important locations:
>>> import pox
>>> pox.homedir()
'/Users/mmckerns'
>>> pox.rootdir()
'/'
or, you can interact with local and global environment variables:
>>> local = {'DEV':'${HOME}/dev', 'FOO_VERSION':'0.1', 'BAR_VERSION':'1.0'}
>>> pox.getvars('${DEV}/lib/foo-${FOO_VERSION}', local)
{'DEV': '${HOME}/dev', 'FOO_VERSION': '0.1'}
>>> pox.expandvars('${DEV}/lib/foo-${FOO_VERSION}', local)
'${HOME}/dev/lib/foo-0.1'
>>> pox.expandvars('${HOME}/dev/lib/foo-0.1')
'/Users/mmckerns/dev/lib/foo-0.1'
>>> pox.env('HOME')
{'HOME': '/Users/mmckerns'}
and perform some basic search functions:
>>> pox.find('python3.9', recurse=5, root='/opt')
['/opt/local/bin/python3.9']
>>> pox.which('python3.9')
'/opt/local/bin/python3.9'
pox
also has a specialized which command just for Python:
>>> pox.which_python()
'/opt/local/bin/python3.9'
>>> pox.which_python(lazy=True, version=True)
'`which python3.9`'
Any of the pox
functions can be launched from the command line,
which facilitates executing commands across parallel and distributed pipes
(such as pathos.connection.Pipe and pathos.secure.connection.Pipe):
>>> import pathos
>>> p = pathos.connection.Pipe()
>>> p(command="python -m pox 'which_python()'")
>>> p.launch()
>>> print(p.response())
'/usr/bin/python'
>>> p.kill()
The functions in pox
that help make interactions with filesystems and
environment varialbles programmatic and abstract become especially relevant
when trying to execute complex commands remotely.
More Information
Probably the best way to get started is to look at the documentation at
http://pox.rtfd.io. Also see pox.tests
for a set of scripts that demonstrate
how pox
can be used to interact with the operating system. You can run the
test suite with python -m pox.tests
. All pox
utilities
can be launched directly from an operating system terminal, using the pox
script (or with python -m pox
). The source code is also generally well
documented, so further questions may be resolved by inspecting the code
itself. Please feel free to submit a ticket on github, or ask a
question on stackoverflow (@Mike McKerns).
If you would like to share how you use pox
in your work, please send an
email (to mmckerns at uqfoundation dot org).
Citation
If you use pox
to do research that leads to publication, we ask that you
acknowledge use of pox
by citing the following in your publication:
M.M. McKerns, L. Strand, T. Sullivan, A. Fang, M.A.G. Aivazis,
"Building a framework for predictive science", Proceedings of
the 10th Python in Science Conference, 2011;
http://arxiv.org/pdf/1202.1056
Michael McKerns and Michael Aivazis,
"pathos: a framework for heterogeneous computing", 2010- ;
https://uqfoundation.github.io/project/pathos
Please see https://uqfoundation.github.io/project/pathos or http://arxiv.org/pdf/1202.1056 for further information.
- citation()
print the citation
- convert(files, platform=None, pathsep=None, verbose=True)
convert text files to given platform type
Ensure given files use the appropriate
os.linesep
and other formatting.
- disk_used(path)
get the disk usage for the given directory
- Parameters
path (str) – path string.
- Returns
int corresponding to disk usage in blocks.
- env(variable, all=True, minimal=False)
get dict of environment variables of the form
{variable:value}
- Parameters
- Returns
dict of strings of environment variables.
Warning
selecting all=False can lead to unexpected matches of variable.
Examples
>>> env('*PATH') {'PYTHONPATH': '.', 'PATH': '.:/usr/bin:/bin:/usr/sbin:/sbin'}
- expandvars(string, ref=None, secondref={})
expand shell variables in string
Expand shell variables of form
$var
and${var}
. Unknown variables are left unchanged. If a reference dictionary (ref) is provided, restrict lookups to ref. A second reference dictionary (secondref) can also be provided for failover searches. If ref is not provided, lookup variables are defined by the user’s environment variables.- Parameters
- Returns
string with the selected shell variables substituted.
Examples
>>> expandvars('found:: $PYTHONPATH') 'found:: .:/Users/foo/lib/python3.4/site-packages' >>> >>> expandvars('found:: $PYTHONPATH', ref={}) 'found:: $PYTHONPATH'
- find(patterns, root=None, recurse=True, type=None, verbose=False)
get the path to a file or directory
- Parameters
patterns (str) – name or partial name of items to search for.
root (str, default=None) – path of top-level directory to search.
recurse (bool, default=True) – if True, recurse downward from root.
type (str, default=None) – a search filter.
verbose (bool, default=False) – if True, be verbose about the search.
- Returns
a list of string paths.
Notes
on some OS, recursion can be specified by recursion depth (int), and patterns can be specified with basic pattern matching. Also, multiple patterns can be specified by splitting patterns with a
;
. The type can be one of{file, dir, link, socket, block, char}
.Examples
>>> find('pox*', root='..') ['/Users/foo/pox/pox', '/Users/foo/pox/scripts/pox_launcher.py'] >>> >>> find('*shutils*;*init*') ['/Users/foo/pox/pox/shutils.py', '/Users/foo/pox/pox/__init__.py']
- findpackage(package, root=None, all=False, verbose=True, recurse=True)
retrieve the path(s) for a package
- Parameters
package (str) – name of the package to search for.
root (str, default=None) – path string of top-level directory to search.
all (bool, defualt=False) – if True, return everywhere package is found.
verbose (bool, default=True) – if True, print messages about the search.
recurse (bool, default=True) – if True, recurse down the root directory.
- Returns
string path (or list of paths) where package is found.
Notes
On some OS, recursion can be specified by recursion depth (an integer).
findpackage
will do standard pattern matching for package names, attempting to match the head directory of the distribution.
- getvars(path, ref=None, sep=None)
get a dictionary of all variables defined in path
Extract shell variables of form
$var
and${var}
. Unknown variables will raise an exception. If a reference dictionary (ref) is provided, first try the lookup in ref. Failover from ref will lookup variables defined in the user’s environment variables. Use sep to override the path separator (os.sep
).- Parameters
- Returns
dict of shell variables found in the given path string.
Examples
>>> getvars('$HOME/stuff') {'HOME': '/Users/foo'}
- homedir()
get the full path of the user’s home directory
- Parameters
None –
- Returns
string path of the directory, or None if home can not be determined.
- index_join(sequence, start, stop, step=1, sequential=True, inclusive=True)
slice a list of strings, then join the remaining strings
If start is not found in the sequence, slice from the beginning. If stop is not found in the sequence, slice to the end.
- Parameters
sequence (list) – an ordered sequence of elements.
start (int) – index for start of the slice.
stop (int) – index for stop position in the sequence.
step (int, default=1) – indices until next member of the slice.
sequential (bool, default=True) – if True, start must preceed stop.
inclusive (bool, default=True) – if True, include stop in the slice.
- Returns
string produced by slicing the given sequence and joining the elements.
- index_slice(sequence, start, stop, step=1, sequential=False, inclusive=False)
get the slice for a given sequence
Slice indicies are determined by the positions of start and stop. If start is not found in the sequence, slice from the beginning. If stop is not found in the sequence, slice to the end.
- Parameters
sequence (list) – an ordered sequence of elements.
start (int) – index for start of the slice.
stop (int) – index for stop position in the sequence.
step (int, default=1) – indices until next member of the slice.
sequential (bool, default=False) – if True, start must preceed stop.
inclusive (bool, default=False) – if True, include stop in the slice.
- Returns
slice corresponding to given start, stop, and step.
- kbytes(text)
convert memory text to the corresponding value in kilobytes
- Parameters
text (str) – string corresponding to an abbreviation of size.
- Returns
int representation of text.
Examples
>>> kbytes('10K') 10 >>> >>> kbytes('10G') 10485760
- license()
print the license
- minpath(path, pathsep=None)
remove duplicate paths from given set of paths
- Parameters
- Returns
string composed of one or more paths, with duplicates removed.
Examples
>>> minpath('.:/Users/foo/bin:.:/Users/foo/bar/bin:/Users/foo/bin') '.:/Users/foo/bin:/Users/foo/bar/bin'
- mkdir(path, root=None, mode=None)
create a new directory in the root directory
create a directory at path and any necessary parents (i.e.
mkdir -p
). Default mode is read/write/execute for ‘user’ and ‘group’, and then read/execute otherwise.
- parse_remote(path, loopback=False, login_flag=False)
parse remote connection string of the form
[[user@]host:]path
- Parameters
- Returns
a tuple of the form
(user, host, path)
.
Notes
if loopback=True and host=None, then host will be set to localhost.
- pattern(list=[], separator=';')
generate a filter pattern from list of strings
- remote(path, host=None, user=None, loopback=False)
build string for a remote connection of the form
[[user@]host:]path
- Parameters
- Returns
a remote connection string.
Notes
if loopback=True and host=None, then host will be set to localhost.
- replace(file, sub={}, outfile=None)
make text substitutions given by sub in the given file
- Parameters
- Returns
None
Notes
replace
uses regular expressions, thus a pattern may be used as old text.replace
can fail if order of substitution is important.
- rmtree(path, self=True, ignore_errors=False, onerror=None)
remove directories in the given path
- Parameters
- Returns
None
Notes
If self=False, the directory indicated by path is left in place, and its subdirectories are erased. If self=True, path is also removed.
If ignore_errors=True, errors are ignored. Otherwise, onerror is called to handle the error with arguments
(func, path, exc_info)
, where func isos.listdir
,os.remove
, oros.rmdir
; path is the argument to the function that caused it to fail; and exc_info is a tuple returned bysys.exc_info()
. If ignore_errors=False and onerror=None, an exception is raised.
- rootdir()
get the path corresponding to the root of the current drive
- Parameters
None –
- Returns
string path of the directory.
- select(iterable, counter='', minimum=False, reverse=False, all=True)
find items in iterable with the max (or min) count of the given counter.
Find the items in an iterable that have the maximum number of counter (e.g. counter=’3’ counts occurances of ‘3’). Use
minimum=True
to search for the minimum number of occurances of the counter.- Parameters
iterable (list) – an iterable of iterables (e.g. lists, strings, etc).
counter (str, default='') – the item to count.
minimum (bool, default=False) – if True, find min count (else, max).
reverse (bool, default=False) – if True, reverse order of the results.
all (bool, default=True) – if False, only return the first result.
- Returns
list of items in the iterable with the min (or max) count.
Examples
>>> z = ['zero','one','two','three','4','five','six','seven','8','9/81'] >>> select(z, counter='e') ['three', 'seven'] >>> select(z, counter='e', minimum=True) ['two', '4', 'six', '8', '9/81'] >>> >>> y = [[1,2,3],[4,5,6],[1,3,5]] >>> select(y, counter=3) [[1, 2, 3], [1, 3, 5]] >>> select(y, counter=3, minumim=True, all=False) [4, 5, 6]
- selectdict(dict, counter='', minimum=False, all=True)
return a dict of items with the max (or min) count of the given counter.
Get the items from a dict that have the maximum number of the counter (e.g. counter=’3’ counts occurances of ‘3’) in the values. Use
minimum=True
to search for minimum number of occurances of counter.- Parameters
- Returns
dict of items composed of the entries with the min (or max) count.
Examples
>>> z = ['zero','one','two','three','4','five','six','seven','8','9/81'] >>> z = dict(enumerate(z)) >>> selectdict(z, counter='e') {3: 'three', 7: 'seven'} >>> selectdict(z, counter='e', minimum=True) {8: '8', 9: '9/81', 2: 'two', 4: '4', 6: 'six'} >>> >>> y = {1: [1,2,3], 2: [4,5,6], 3: [1,3,5]} >>> selectdict(y, counter=3) {1: [1, 2, 3], 3: [1, 3, 5]} >>> selectdict(y, counter=3, minumim=True) {2: [4, 5, 6]}
- sep(type='')
get the separator string for the given type of separator
- Parameters
type (str, default='') – one of
{sep,line,path,ext,alt}
.- Returns
separator string.
- shellsub(command)
parse the given command to be formatted for remote shell invocation
secure shell (
ssh
) can be used to send and execute commands to remote machines (usingssh <hostname> <command>
). Additional escape characters are needed to enable the command to be correctly formed and executed remotely. shellsub attemps to parse the given command string correctly so that it can be executed remotely with ssh.- Parameters
command (str) – the command to be executed remotely.
- Returns
the parsed command string.
- shelltype()
get the name (e.g.
bash
) of the current command shell- Parameters
None –
- Returns
string name of the shell, or None if name can not be determined.
- username()
get the login name of the current user
- Parameters
None –
- Returns
string name of the user.
- wait_for(path, sleep=1, tries=150, ignore_errors=False)
block execution by waiting for a file to appear at the given path
- Parameters
- Returns
None
Notes
if the file is not found after the given number of tries, an error will be thrown unless
ignore_error=True
.using
subproc = Popen(...)
andsubproc.wait()
is usually a better approach. However, when a handle to the subprocess is unavailable, waiting for a file to appear at a given path is a decent last resort.
- walk(root, patterns='*', recurse=True, folders=False, files=True, links=True)
walk directory tree and return a list matching the requested pattern
- Parameters
root (str) – path of top-level directory to search.
patterns (str, default='*') – (partial) name of items to search for.
recurse (bool, default=True) – if True, recurse downward from root.
folders (bool, default=False) – if True, include folders in the results.
files (bool, default=True) – if True, include files in results.
links (bool, default=True) – if True, include links in results.
- Returns
a list of string paths.
Notes
patterns can be specified with basic pattern matching. Additionally, multiple patterns can be specified by splitting patterns with a
;
.Examples
>>> walk('..', patterns='pox*') ['/Users/foo/pox/pox', '/Users/foo/pox/scripts/pox_launcher.py'] >>> >>> walk('.', patterns='*shutils*;*init*') ['/Users/foo/pox/pox/shutils.py', '/Users/foo/pox/pox/__init__.py']
- where(name, path, pathsep=None)
get the full path for the given name string on the given search path.
- Parameters
- Returns
the full path string.
Notes
if pathsep is not provided, the OS default will be used.
- whereis(prog, all=False)
get path to the given program
search the standard binary install locations for the given executable.
- which(prog, allow_links=True, ignore_errors=True, all=False)
get the path of the given program
search the user’s paths for the given executable.
- Parameters
- Returns
if all=True, get a list of string paths, else return a string path.
- which_python(version=False, lazy=False, fullpath=True, ignore_errors=True)
get the command to launch the selected version of python
which_python
composes a command string that can be used to launch the desired python executable. The user’s path is searched for the executable, unlesslazy=True
and thus only a lazy-evaluating command (e.g.which python
) is produced.- Parameters
- Returns
string of the implicit or explicit location of the python executable.
Notes
if version is given as an int or float, include the version number in the command string.
if the executable is not found, an error will be thrown unless
ignore_error=True
.