SciPy is an enormous Python library for scientific computing. This article will explain how to get started with SciPy, survey what the library has to offer, and give some examples of how to use it for common tasks.
First steps with SciPy
The SciPy download page has links to the SourceForge download sites for SciPy and NumPy. (SciPy depends on NumPy and so both packages must be installed in order to use SciPy.) The version of SciPy (and NumPy) must be compatible with your version of Python. At the time of this writing, SciPy is available for Python 2.6 and earlier. In particular, Python 3.0 is not yet supported.
IronPython cannot use SciPy directly because much of SciPy is implemented in C and at this time IronPython can only directly import modules implemented in Python. However, the Ironclad package enables IronPython to use SciPy. See Numerical computing in IronPython with Ironclad for details.
To start using SciPy, import the
scipy package. By convention, the
scipy package is often imported with the
sp abbreviation for ease of use.
>>> import scipy as sp
There is some functionality at the root of the
scipy hierarchy, but most functionality is located in sub-packages that must be imported separately. For example, the
erf function is located in the
special sub-package for special functions. To call the
erf function, you need to first import the
>>> from scipy import special
The SciPy documentation page has links to extensive documentation available in HTML, PDF, and CHM (HTML help) formats.
As with any Python package, you can also find help for SciPy objects using Python's
help() function from the command line. However, sometimes
help is unhelpful when it comes to SciPy. The function
scipy.info() is analogous to the standard
help() function but specialized to give better documentation for SciPy objects. When
scipy.info() is given a string argument, it does a search for matching objects. When
scipy.info() is given an object, it returns documentation specific to that object. For example, if
scipy was imported as
returns documentation on both the gamma probability distribution and the gamma function. But,
only returns documentation for the gamma function.
The following table lists the sub-packages of
scipy along with a brief description of each. The next section will give examples using some of the more common sub-packages.
|Mathematical and physical constants|
|Input and output|
|Maximum entropy models|
|Multi-dimensional image processing|
|Orthogonal distance regression|
|Spatial algorithms and data structures|
SciPy is huge. The draft SciPy Reference Guide is currently 632 pages. This article will only illustrate a tiny sampling of what you can do with SciPy, focusing on some of the more common applications.
special sub-package contains mathematical functions beyond those included in the standard Python
math package. The most commonly used special function is probably the gamma function, Γ(x). The following example shows how to access it from SciPy.
>>> from scipy.special import gamma
SciPy contains a dozen functions related to the gamma function. For example, there is a separate function
gammaln just to return the logarithm of the gamma function. This may seem redundant, but it is very practical. Since the gamma function grows very quickly, it can easily overflow, and so the logarithm of the gamma function is often more useful than the gamma function itself. Some of the other gamma-related functions in SciPy include the incomplete gamma function
gammainc, the beta function
beta, and the logarithmic derivative of the gamma function
psi. Gamma functions are far from the only special functions included in SciPy. All the most commonly used special functions are included: Bessel functions, Fresnel integrals, etc.
Some of the functions in
special would not be classified as “special” in a mathematical sense. These are elementary functions that are included because they present numerical difficulties. For example,
scipy.special contains a function
log1p that evaluates log(1 + x). This function may seem useless at first: why not just use
math.log(1 + x) instead of
log1p? The problem is that in applications, you often need to evaluate log(1 + x) for small values of x. If x is sufficiently small (e.g., less than 10-16), then
math.log(1 + x) will return 0 because 1 + x will equal x to machine precision and
math.log(1 + x) will simply return log(1) which equals 0. The function
log1p evaluates log(1 + x) indirectly without first computing 1 + x.
constants sub-package contains a wide variety of physical constants. The following code will display a few common constants:
>>> from scipy import constants
>>> constants.c # speed of light
>>> constants.h # Plank’s constant
>>> constants.N_A # Avogadro’s number
physical_constants has descriptive strings as keys. The values are triples containing a constant's value, its units of measurement, and the uncertainty in the value. For example, the dictionary gives this information on the mass of an electron.
>>> constants.physical_constants["electron mass"]
(9.1093825999999998e-031, ‘kg’, 1.5999999999999999e-037)
In addition to physical constants,
constants contains information on units. For example,
constants.nautical_mile equals 1852, the number of meters in a nautical mile. And, in case you ever wondered,
constants.troy_ounce equals 0.0311034768, the number of kilograms in a troy ounce. There is also support for SI and binary unit prefixes. For example, the SI prefix
constants.kilo equals 103 = 1000.0, and the binary prefix
constants.kibi = 210 = 1024.
integrate sub-package contains several routines for numerical integration. The most commonly used routine is
quad (named for “quadrature”, an old-fashioned name for integration). The first argument to
quad is a function of one variable to integrate. For simple functions, it is convenient to use
lambda to define the function inline, though of course integrands can be defined elsewhere using
quad function returns a pair: the value of the integral and an estimate of the error in the value. The following code integrates cos(ex) between the limits of 2 and 3.
>>> from scipy import integrate
>>> integrate.quad(lambda x: math.cos(math.exp(x)), 2, 3)
To specify infinite limits of integration in
quad, use the constant
scipy.inf for ∞, as in the following example:
>>> from scipy import inf
integrate.quad(lambda x: math.exp(-x*x), -inf, inf)
integrate sub-package contains other integration routines, such as
dblquad for double integrals and
tplquad for triple integrals. It also contains
odeint for numerically evaluating systems of ordinary differential equations.
Probability and statistics
stats sub-package contains a wealth of functions for probability and statistics. The library currently features 81 continuous distributions and 12 discrete distributions. The following example shows how to compute the probability that a normal (Gaussian) random variable with mean 0 and standard deviation 3 takes on a value less than 5. It also shows how to generate five random samples from the same distribution.
>>> from scipy.stats import norm
>>> norm.cdf(5, 0, 3)
>>> norm.rvs(0, 3, size=5)
array([ 4.85229537, 3.0104119 , 1.13189841, 5.19688369, -2.97970912])
See Probability distributions in SciPy for more examples of working with probability distributions.
stats sub-package has simple functions such as
std for computing the sample standard deviation of an array of numbers. It has more sophisticated functions such as
glm for working with linear regression, analysis of variance, etc. It also contains functions for numerous statistical tests and common chores such as producing histograms.
Note that some statistical functionality is located outside the
stats sub-package. For example, orthogonal distance regression support is contained in its own sub-package
odr. Also, you might use the
linalg sub-package in conjunction with
The SciPy.org website has links to further documentation, including tutorials and cookbook examples.
The Mathesaurus is a sort of Rosetta stone for comparing mathematical environments such as SciPy, Matlab, R, etc. The resources on that site are useful even if you do not know one of the other packages and just want a cheat sheet for doing various tasks in Python (especially SciPy).
The IPython shell is a popular environment for working interactively with SciPy. Also, many SciPy users use matplotlib to create plots from either the standard Python command line for from IPython.