basiliskStart {basilisk}R Documentation

Start and stop basilisk-related processes

Description

Creates a basilisk process in which Python operations (via reticulate) can be safely performed with the correct versions of Python packages.

Usage

basiliskStart(env, fork = getBasiliskFork(), shared = getBasiliskShared())

basiliskStop(proc)

basiliskRun(
  proc = NULL,
  fun,
  ...,
  env,
  fork = getBasiliskFork(),
  shared = getBasiliskShared()
)

Arguments

env

A BasiliskEnvironment object specifying the basilisk environment to use.

Alternatively, a string specifying the path to an environment, though this should only be used for testing purposes.

Alternatively, NULL to indicate that the base environment of the basilisk Anaconda instance should be used.

fork

Logical scalar indicating whether forking should be performed on non-Windows systems, see getBasiliskFork. If FALSE, a new worker process is created using communication over sockets.

shared

Logical scalar indicating whether basiliskStart is allowed to load a shared Python instance into the current R process, see getBasiliskShared.

proc

A process object generated by basiliskStart.

fun

A function to be executed in the basilisk process.

...

Further arguments to be passed to fun.

Details

These functions ensure that any Python operations in fun will use the environment specified by envname. This avoids version conflicts in the presence of other Python instances or environments loaded by other packages or by the user. Thus, basilisk clients are not affected by (and if shared=FALSE, do not affect) the activity of other R packages.

If necessary, objects created in fun can persist across calls to basiliskRun, e.g., for file handles. This requires the use of assign with envir set to findPersistentEnv to persist a variable, and a corresponding get to retrieve that object in later calls. See Examples for more details.

It is good practice to call basiliskStop once computation is finished. This will close the basilisk processes and restore certain environment variables to their original state (e.g., "PYTHONPATH") so that other non-basilisk operations can operate properly.

Any Python-related operations between basiliskStart and basiliskStop should only occur via basiliskRun. Calling reticulate functions directly will have unpredictable consequences, Similarly, it would be unwise to interact with proc via any function other than the ones listed here.

If proc=NULL in basiliskRun, a process will be created and closed automatically. This may be convenient in functions where persistence is not required. Note that doing so requires specification of pkgname and envname.

Value

basiliskStart returns a process object, the exact nature of which depends on fork and shared. This object should only be used in basiliskRun and basiliskStop.

basiliskRun returns the output of fun(...) when executed inside the separate process.

basiliskStop stops the process in proc.

Choice of backend

Developers can control these choices directly by explicitly specifying shared and fork, while users can control them indirectly with setBasiliskFork and related functions.

If the Anaconda installation provided with basilisk satisfies the requirements of the client package, it is strongly recommended to set env=NULL rather than constructing a separate environment. This is obviously easier but it is also more efficient as it increases the chance of multiple basilisk clients being able to share a common Python instance within the same R session.

Constraints on user-defined functions

In basiliskRun, there is no guarantee that fun has access to the environment in which basiliskRun is called. This has a number of consequences for the type of code that can be written inside fun:

Use of lazy installation

If the specified basilisk environment is not present and env is a BasiliskEnvironment object, the environment will be created upon first use of basiliskStart. If the Anaconda instance is not present, it will also be installed upon first use of basiliskStart. The motivation for this is to avoid portability problems with hard-coded paths when basilisk is provided as a binary.

Both Anaconda and the environments will be placed in an external user-writable directory, the location of which can be changed by setting the BASILISK_EXTERNAL_DIR variable. This may occasionally be necessary if the file path to the default location is too long for Windows, or if the default path has spaces that break the Anaconda installer.

Advanced users may consider turning on BASILISK_USE_SYSTEM_DIR for installations from source, which will place both Anaconda and the environments in the R system directory. This simplifies permission management and avoids duplication in enterprise settings.

Author(s)

Aaron Lun

See Also

setupBasiliskEnv, to set up the conda environments.

getBasiliskFork and getBasiliskShared, to control various global options.

Examples

# Loading the base environment:
cl <- basiliskStart(NULL)
basiliskRun(proc=cl, function() { 
    X <- reticulate::import("pandas"); X$`__version__` 
})
basiliskStop(cl)

# Co-exists with our other environment:
tmploc <- file.path(tempdir(), "my_package_C")
setupBasiliskEnv(tmploc, c('pandas=0.24.1',
    "python-dateutil=2.7.1", "pytz=2018.7"))

cl <- basiliskStart(tmploc)
basiliskRun(proc=cl, function() { 
    X <- reticulate::import("pandas"); X$`__version__` 
})
basiliskStop(cl)

# Persistence of variables is possible within a Start/Stop pair.
cl <- basiliskStart(tmploc)
basiliskRun(proc=cl, function() {
    assign(x="snake.in.my.shoes", 1, envir=basilisk::findPersistentEnv())
})
basiliskRun(proc=cl, function() {
    get("snake.in.my.shoes", envir=basilisk::findPersistentEnv())
})
basiliskStop(cl)


[Package basilisk version 0.99.60 Index]