Author Topic: Why not to use arange - and the alternative!  (Read 38337 times)

0 Members and 1 Guest are viewing this topic.

Offline Anders Blom

  • QuantumWise Staff
  • Supreme ATK Wizard
  • *****
  • Posts: 4972
  • Country: dk
  • Reputation: 78
    • View Profile
    • QuantumWise
Why not to use arange - and the alternative!
« on: February 24, 2009, 11:44 »
As useful as it may seem, numpy.arange() is not a reliable function. I recommend everyone to consider using numpy.linspace() instead, when possible.

From the documentation, it is pretty clear what arange() should deliver:

Code: [Select]
arange(start,stop,step)

should give an array with values [start,start+step,start+2*step,...] until start+N*step > stop. "stop" is not part of the interval, the manual says.

So, let's try:

Code: [Select]
arange(0.4, 1.1, 0.1)

Ok, this is what you expect: you get [0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0], and as the documentation says the endpoint, 1.1, is not included.

But, now try

Code: [Select]
arange(0.4, 1.1, 0.1)

This time you get [0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1] - the end-point is suddenly included!!!  :o

The problem lies deeply buried in the bit representation of floating numbers. Due to the rule used to determine the number of points in the array, the last element in the output array may be larger than "stop". Actually, the manual is honest enough to mention this, but it makes it a bit hard to trust the output from this function!

It should be said, that the implementation of arange() has been under a lot of debate, regarding this point; see e.g. http://osdir.com/ml/python.numeric.general/2006-02/threads.html, and search for "arange" on that page. However, no result has come of this, and as a result I recommend that arange() not be used if it can be avoided.

Fortunately, there is an alternative, which actually is even better (ok, it depends a bit on what you are trying to achieve). The function numpy.linspace() is a bit unknown but very useful, and above all more predictable!

The syntax is very simple (I just paste the result of "help linspace" here, for convenience):

Quote
linspace(start, stop, num=50, endpoint=True, retstep=False)
    Return evenly spaced numbers over a specified interval.

    Returns `num` evenly spaced samples, calculated over the
    interval [`start`, `stop` ].

    The endpoint of the interval can optionally be excluded.

    Parameters
    ----------
    start : {float, int}
        The starting value of the sequence.
    stop : {float, int}
        The end value of the sequence, unless `endpoint` is set to False.
        In that case, the sequence consists of all but the last of ``num + 1``
        evenly spaced samples, so that `stop` is excluded.  Note that the step
        size changes when `endpoint` is False.
    num : int, optional
        Number of samples to generate. Default is 50.
    endpoint : bool, optional
        If True, `stop` is the last sample. Otherwise, it is not included.
        Default is True.
    retstep : bool, optional
        If True, return (`samples`, `step`), where `step` is the spacing
        between samples.

    Returns
    -------
    samples : ndarray
        There are `num` equally spaced samples in the closed interval
        ``[start, stop]`` or the half-open interval ``[start, stop)``
        (depending on whether `endpoint` is True or False).
    step : float (only if `retstep` is True)
        Size of spacing between samples.

Use this function whenever you want to create a sequence of real numbers between a start and an end point, with a specified number of points. This is often more handy than specifying the point spacing, as you must do for arange() anyway.