technology from back to front

Python quirks

I’ve been using Python for a while. Recently I have noted some nuances, wonders and counter-intuitive things I ran into. The list grew surprisingly fast.

Disclaimer: Most of the problems that I list here can be understood and explained. It’s just my opinion that something is odd, so forgive me if I raise something that, in your opinion, is not an issue at all.

Functions / Standard library

Sort returns nothing

All my friends who learn Python have a problem with sort. Apparently <list>.sort() returns None, which causes a lot confusion. What they need is the builtin sorted.

>>> [1,0].sort()
None
>>> sorted([1,0])
[0, 1]

Tuples constructor

Constructing tuples is misleading for beginners. For example foo(1,2) is way different than foo((1,2)). On the other hand foo(1) is the same as foo((1)).

>>> def foo(*args):
…    print “%r” % (args,)
…
>>> foo(1,2)
(1, 2)

>>> foo((1,2))
((1, 2),)

But:

>>> foo(1)
(1,)

>>> foo((1))
(1,)

This often appears in conjunction with print:

>>> print “%r” % (1,2)
Traceback (most recent call last):
  File “<stdin>”, line 1, in <module>

TypeError: not all arguments converted during string formatting

Inconsistent get interface

I’m often in doubt about the behaviour of get-like methods, due to inconsistent behaviour. Currently some of them raise exception, others just return None. Why None?

>>> {}.get(1)
None

>>> getattr(1, ‘a’)
Traceback (most recent call last):
  File “<stdin>”, line 1, in <module>

AttributeError: ‘int’ object has no attribute ‘a’
>>> getattr(1, ‘a’, None)

None

Inconsistent select/poll interface

It’s less apparent, but still reasonable to point out that the select module is inconsistent. It specifies methods, some of which take seconds, others take milliseconds as a parameter.

>>> select.select(_, _, _, seconds)
>>> select.poll.poll(milliseconds)

>>> select.poll.epoll(seconds)

Internals

Circular imports

It’s not a surprise that Python doesn’t handle circular imports. But what does that actually mean?

Let’s create two files: a.py and b.py. Let’s import b from a and a from b:

$ cat a.py
 var_1 = 1
 import b
 var_2 = 2
 print “Hello world from module a”

 print “Imported b,  b.var_1=%r b.var_2=%r” \
         % ( getattr(b, “var_1″, None),     \
             getattr(b, “var_2″, None))


$ cat b.py
 var_1 = 1
 import a
 var_2 = 2
 print “Hello world from module b”

 print “Imported a,  a.var_1=%r a.var_2=%r” \
         % ( getattr(a, “var_1″, None),     \
             getattr(a, “var_2″, None))

Think for a while about what result you would expect.

$ python -c "import a"
Hello world from module 'b'.
Imported a,  a.var_1=1 a.var_2=None
Hello world from module 'a'.
Imported b,  b.var_1=1 b.var_2=2
What actually happened?
  • We requested module a. Module a runs.
  • a requests module b. Flow goes to b.
  • b requests a. Python understands that it’s actually in the middle of creating module a, and gives back the reference to half-loaded namespace from module a.
  • Module b prints out a.var_1, which has the correct value, but a.var_2 is not set yet, so it defaults to None.
  • After that everything continues normally.

Module naming and side effects

It’s often forgotten that direct imports from inside the module, like import a imports something way different than global import import blah.a. For example, if you created a file /tmp/a.py:

$ cd /tmp; PYTHONPATH=.. python
 >>> import a
 >>> import tmp.a

The commands are importing different module from Python’s point of view. If a.py has any side effects, they will be executed twice. A common bug is to use local paths from inside the module, while encouraging users to use global module paths from outside. This leads to double imports. So, if you’re importing local files from inside a python module, consider the syntax with dot:

# Imagine we’re in a module *tmp*, in a file *b.py*:
import a          # bad, import is different than *tmp.a*
import tmp.a      # better, but we can’t rename the module easily

from . import a   # perfect! imports file a.py from _this_ module.

What do imports import?

If I need a.b.c.d(), Python requires me to understand which of the parts describe a module, which describe a file inside module, which describe a class and which a function. In this case I could assume that a.b.c are modules and d() is a global function:

>>> from a.b.c import d
>>> d()

But that can be wrong! a can be a module and b.c.d() can describe a class, sub class and sub sub class.

>>> from a import b
>>> b.c.d()

That’s not all. In normal cases lacking a proper import causes an exception. Sometimes it doesn’t… My favourite example is os.path. I still don’t know if I should import os or os.path. Both versions work:

>>> import os
>>> os.path.devnull
‘/dev/null’
>>> import os.path
>>> os.path.devnull
‘/dev/null’

Module reload

Apparently Python does allow you to dynamically reload modules.
That’s a pretty neat feature. But in practice it’s not very useful -
modules are usually imported from global namespaces and local imports
from inside the code are considered slow.

Loading order

Python uses several mechanisms for loading modules:
  • System modules are loaded from /usr/lib/python2.5.
  • Other are in /usr/lib/python2.5/site-packages.
  • There’s also /usr/share/python-support.
  • And /usr/lib/python-support.
  • I haven’t yet mentioned eggs.
  • And eggs have *.pth files.

Python Eggs are really dirty. Install a few of them and run:

>>> import sys
>>> sys.path
['', '/usr/lib/python2.5/site-packages/multiprocessing-2.6.2.1-py2.5-linux-x86_64.egg',
'/usr/lib/python2.5/site-packages/amqplib-0.6.1-py2.5.egg', ...]

Yes! Eggs are injected into the loading paths, polluting your system Python installation, and hurting Python startup time.

Btw. I tried to force Python to use eggs from my home directory:
it’s painful. Not to mention the problems there are with
platform-specific eggs.

999+1 is not 1000

The well-known “feature” of Python integers is that they don’t play nicely with is operator. Internally, small integers are reused objects and is, which checks object memory location, works fine. Greater integers are created as new objects every time, so is fails.

>>> 1 is 1
True
>>> 1000 is 1000
True

>>> 999+1 is 1000
False
>>> 2+1 is 3

True

To make things even worse the behaviour changes over python versions.
For example 100+1 is 101 returns True in Python 2.5 but False
in Python 2.4.

The order of unpacking

Have you ever wondered what is the order during tuple unpacking?

>>> _, _, _  = 1,2,3
>>> _

3

On the other hand this syntax is not allowed in function declarations. Strange.

>>> def a(_, _, _): print _
…
 File “<stdin>”, line 1

SyntaxError: duplicate argument ‘_’ in function definition

Speaking of function declarations, there’s a nice feature that allows you to define named parameters before unnamed ones, although this syntax works only for function definitions, not for usage:

>>> def foo(a=1, b=2, *args, **kwargs):

…      print “a=%r b=%r args=%r kwargs=%r” % (a,b,args,kwargs
…
>>> foo(4,5,6)
a=4 b=5 args=(6,) kwargs={}

>>> foo(a=4, b=5, 6)        # I would expect this to work!
  File “<stdin>”, line 1
SyntaxError: non-keyword arg after keyword arg

Python has deterministic garbage collection

Unlike other dynamic languages, Python uses reference counting as a garbage collection mechanism. During normal execution objects are freed right in the moment where they lose the last reference. This means that while the program runs Python shouldn’t have any unexpected hiccups!
Unlike Java or Erlang, Python can run predictably smoothly. Am I saying that Python is a proper realtime language, and that you could use it in
a medical ventilator?

Well, Python does have advanced garbage collection
but it’s only used to free cyclic references. With proper programming discipline you can avoid creating reference loops. Oh, and please
do avoid setting __del__ destructors, as Python can’t free reference loops with objects that define them.

__del__ is often mislead

As mentioned above, please just don’t use __del__. But if you have to, you can learn more about it here.

Parser

Multiline comments are not comments

Python doesn’t have multiline comments. Instead, multiline strings are used. This leads to some quirks:

# This works fine:
if True:
   a = 1
#comment

# This fails, though I’d expect it to work:
if True:
   a = 1
”’
multiline comment
”’

Multi line strings

Like in C, a string can be continued on the next line. Though it’s a bit misleading.

# Sometimes backslash is required:
a =  ‘blabla’ \
     ‘blabla’

# In other cases it’s not needed:

a = (‘blabla’
     ‘blabla’)

Finally finally is interesting

There’s nothing especially broken about finally keyword. I just find this behaviour a bit quirky:

>>> def t():
…     try:
…         return Truefinally:
…         return False

…
>>> t()
False

It’s worth to mention that finally wasn’t working with except until 2.5.

List comprehensions, kind of.

Let’s start with someone else’s opinion:

Tony Garnock-Jones (leastfixedpoint): RT @aconbere:for stmts in python should accept the same filtering ops as list comprehensions [for x in
xs if x] => for x in xs if x:

List comprehensions contain for keyword. For example this is perfectly normal:

>>> [k for k in range(3) if k != 2]

But it’s not possible to use that syntax in a for statement itself:

>>> for k in range(3) if k != 2: pass # bad!

I have to use ugly workaround:

>>> for x in [k for k in range(3) if k != 2]:

Python readability could be easily improved in this case.

Forgotten features

Ellipsis?

Ellipsis is a little known constant in Python. Just like True,

False or None. Apparently Ellipsis is always “bigger” than anything, as
opposite to None, which is always “smaller” than anything.

>>> None < -1
True
>>> Ellipsis > 1
True

>>> None < float(“-infinity”)
True
>>> Ellipsis > float(“+infinity”)

True

If you wonder why anyone invented Ellipsis consider this example.

>>> a=[1]
>>> a.insert(1,a)
>>> a  # three dots stand for Ellipsis

[1, [...]]

Though I have no clue why would I ever need a construction like that.

Forgotten types

There’s a pretty interesting buffer type. Though I’m not
exactly sure when would I want to use it, nor how to make it writable.

>>> import array
>>> a = array.array(‘B’,[1,2,3])
>>> b = buffer(a)

>>> b[0]
‘\x01′
>>> a[0]=2
>>> b[0]
‘\x02′

>>> b[0]=3
Traceback (most recent call last):
 File “<stdin>”, line 1, in <module>
TypeError: buffer is read-only

Another strange type is slice:

>>> slice
<type ’slice’>

Private attributes on classes

You can have private class members in Python. Though I find this behaviour more a problem than a solution.

If you ever wondered why implementation of classes is complicated in Python, this can give you an insight into the internal hacks.

>>> class A:
…   _a = ‘a’
…   __b = ‘b’

…
>>> A._a
‘a’
>>> A.__b
Traceback (most recent call last):
 File “<stdin>”, line 1, in <module>
AttributeError: class A has no attribute ‘__b’

>>> dir(A)
['_A__b', '__doc__', '__module__', '_a']
>>> A._A__b
‘b’

What is the value of one?

Surprisingly, a value of a string “1″ is more than infinity.

>>> “1″ > 0
True

>>> “1″ > 1
True
>>> “1″ > float(“+infinity”)
True

Don’t make that mistake. Never compare strings and numbers. Oh, one
more thing. Ellipsis doesn’t have the largest value as I previously
suggested:

>>> “1″ > Ellipsis > float(“+infinity”)

True

Can you hash undefined?

If you ever wondered what is the hash of various constants, here is the answer:

>>> (1).__hash__()
1
>>> (2).__hash__()

2
>>> None.__hash__()
2030240
>>> Ellipsis.__hash__()
2035288

I noticed that the hashes are different on different machines, it looks like a memory location.

Famous threading/Ctrl+C issue

If Ctrl+C doesn’t work on your threaded program, here’s a good explanation why.

Epilogue

There are even more hidden features in Python.

by
marek
on
29/10/09
  1. In order to understand the quirks in Python you need to understand the language’s history. It sucked more in the beginning, and improved continuously while maintaining backwards-compatibility. Python 3 (which isn’t backwards-compatible) fixes many of there problems.

    A few comments about stuff you mention:

    Sort returns nothing

    Makes sense, list.sort() is destructive, why would you want to return anything? Then again, why would you want to use destructive sort? The only reason is efficiency, but if you want efficiency then you’re unlikely to be using Python (at least not without some kind of homogeneous vector library). I almost never use destructive sort, and when I do I usually add a comment next to it explaining why I’m doing that.

    Tuples constructor

    The tuples constructor is the comma. The parentheses often found around a tuple are there just to group it and separate it from other tokens, but they’re optional.

    &gt;&gt;&gt; x = 1, 2, 3
    &gt;&gt;&gt; x[2]
    3

    Print is magical

    It would be perhaps more correct to say that print is evil. This is one thing that’s fixed in Python 3.

    Inconsistent get interface

    I never thought of it this way, and never heard anyone complaining. get and getattr are very different, and they operate in a very different way. get is for dict-like objects, getattr for all object specializations. In the name of explicit-is-better-than-implicit, I usually write {}.get(1, None), but last week I heard that this is considered unidiomatic, so maybe we just need to memorize the rule.

    999+1 is not 1000

    That’s not really a quirk, and pretty much all modern languages will exhibit something similar. The fact that 5 is 2 + 3 is an implementation detail (lower integers are interned for efficiency). The threshold for when interning stops will be different from one Python implementation to the other. You should never ever use is to compare numbers, it’s meaningless (if you use a linter like PyLint or PyFlakes you’ll get a warning when doing this).

    Multi line strings

    Don’t use the backslash. It’s there, but it’s absolutely unnecessary, and it makes working with the code harder because the parser treats a group of lines separated with backslash as a single line, which is confusing. Also most tools and IDEs are likely to not handle it well. In contemporary Python code you’ll almost never find the backslash used. Instead, if you need to break the line, use parentheses to format your expression.

    Can you hash undefined?

    None and Ellipsis are just simple singleton objects with no intrinsic value, so their hash is memory based.

  2. Social comments and analytics for this post…

    This post was mentioned on Twitter by Greg Ferrell: RT @lshift: New blog post: Python quirks http://bit.ly/upUFK

  3. Finally finally is interesting

    This is the same in java.

    for x in [k for k in range(3) if k != 2]:

    You can use (k for k ….) instead, to return a generator comprehension, rather then a list comprehension for what it is worth.

  4. the misleading thing about the tuple example is the function call and usage of *args, this has nothing to do with tuples and it would surely be simpler, and easier to understand for a beginning without the function there at all.

    simply put, the misleading thing about tuples is that-unlike lists-the parenthesis do not a tuple make. as noted above, the comma makes the tuple. in fact you do not need the parenthesis at all to make a tuple, just the comma.

  5. [...] Python quirks « LShift Ltd. http://www.lshift.net/blog/2009/10/29/python-quirks – view page – cached I’ve been using Python for a while. Recently I have noted some nuances, wonders and counter-intuitive things I ran into. The list grew surprisingly — From the page [...]

  6. Great list! All these things have logical explanations yet do appear strange on the surface. I think you should mention the behavior of using a list as a default argument value in a function, this is the number one gotcha for new python programmers.

  7. There are some reasonable explanations in reddit thread:
    http://www.reddit.com/r/programming/comments/9z18w/pythonquirks9991isnot1000circular_imports/

  8. Stephen McDonald
    you should mention the behavior of using a list as a default argument value

    I forgot about this one! There’s also more to say about has_key() function.

  9. [...] This post was Twitted by _kemar [...]

  10. [...] Enlace: LShift [...]

  11. Curiosidades Python | Moova! News on the Move
    on 02/11/09 at 6:58 am

    [...] LShift Bookmark Leave a comment No comments [...]

  12. “If you ever wondered why implementation of classes is complicated in Python”
    thats not true.

    Slice

    from http://docs.python.org/reference/datamodel.html?highlight=slice :
    Slice objects are used to represent slices when extended slice syntax is used. This is a slice using two colons, or multiple slices or ellipses separated by commas, e.g., a[i:j:step], a[i:j, k:l], or a[..., i:j]. They are also created by the built-in slice() function.

    so, please read documentation before writing…

  13. @fire, I’m confident that Marek knows what slices are. As I understand what he wrote, he is commenting on the interesting and perhaps surprising fact that slices are first-class types. And did you have a specific point to make in support of your flat rejection of the idea that the implementation of classes in Python might be complicated? I look forward to your contribution.

  14. Since when is reference counting deterministic? When a reference is finished with, you can’t know if memory will be freed (expensive operation) or a reference count is reduced (very cheap operation).

  15. @Pete, true; but you do know that objects will be destroyed as soon as possible after they are no longer reachable. It’s a different kind of determinism: resource-use determinism, rather than execution-time determinism. Both have a place in real-time systems; it’d be interesting to explore the relationship between the two.

  16. While you do identify these quirks, many of your “workarounds” are not Pythonic constructs. In effect you’re ignoring the in-language solutions to your problem, apparently, because you don’t like that the problem is there, and therefore are not interested in learning how to work with it.

    I do admit that the criticisms about the relative pathing that occurs during imports is actually crummy and difficult to work around. No argument there, however:

    Example:
    Inconsistent get interface. This is just wrong. Get and Getattr are in no way equivalent. get is for retrieving elements from key-ed structures. getattr is for accessing attributes safely.

    Yes, objects have a dict method, which is a key-ed structure, but they only describe instance attributes on an instantiated object. getattr provides a safe interface to retrieve all attributes, whether they are inherited from a parent class or part of the instance itself.

    Multi line strings. Again it’s not a one case to another thing. \ is a line-continuation operator, while () is a wrapping construct that allows for line breaks in an expression. Either can be used to accomplish multi-line statements that would otherwise be considered a syntax error.

    Hashing undefined. “Hashes” of objects in nearly any language are simply an in-memory representation of the object. If all built-in hashes of all objects were portable across all languages, there would be no need for libraries that implement MD5 or similar.. we could all just call the built-in hash functions of our languages.

    List comprehensions, kind of. There is an excellent reason for the division in syntax. For loops are for-loops. Actual control structures, whereas list comprehensions define whole lists as well as generators. For instance:

    [x for x in somelistreturningfunction()]
    is immediately evaluated as an entire list in memory, while
    (x for x in somelistreturningfunction())
    is a generator, which does not evaluate until the list is actually looped over.

    It would stand to reason that since these are distinct constructs that could be passed around as lists that their syntax should not follow the control structure used to loop over said list. In other words:

    for x in [...] != [for x in [...]]
    They’re different constructs for different purposes.

    No doubt some of your considerations are legitimate, but many are just wrong.

  17. Some more corrections.

    Ellipsis is not bigger than anything. It’s just bigger than the types that start with a letter which follows E. Consider this:

    Ellipsis > AttributeError
    False

    That’s because of another quirk of Python — if there is no comparison operator for the types, it compares the names of the types. In hindsight, that was a mistake. Please don’t rely on it.

  18. Also, the way to translate [f(x) for x in y if g(x)] into for statement is to use:

    for x in y:
        if not g(x):
            continue
        f(x)
  19. I noticed that the hashes are different on different machines, it looks like a memory location.

    Actually the default hash of any object is its id(), which is dependent on implementation, but is the memory address in case of CPython.

    Also, please note that reference counting is also an implementation detail of CPython and by no means is part of the language specification.

 
 


+ 9 = eighteen

2000-14 LShift Ltd, 1st Floor, Hoxton Point, 6 Rufus Street, London, N1 6PE, UK+44 (0)20 7729 7060   Contact us