Python quirks
I’ve been using Python for a while. Recently I have noted some nuances, wonders and counter-intuitive things I ran into. The list grew surprisingly fast.
Disclaimer: Most of the problems that I list here can be understood and explained. It’s just my opinion that something is odd, so forgive me if I raise something that, in your opinion, is not an issue at all.
Functions / Standard library
Sort returns nothing
All my friends who learn Python have a problem with sort. Apparently <list>.sort() returns None, which causes a lot confusion. What they need is the builtin sorted.
>>> [1,0].sort()
None
>>> sorted([1,0])
[0, 1]
Tuples constructor
Constructing tuples is misleading for beginners. For example foo(1,2) is way different than foo((1,2)). On the other hand foo(1) is the same as foo((1)).
>>> def foo(*args):
… print “%r” % (args,)
…
>>> foo(1,2)
(1, 2)
>>> foo((1,2))
((1, 2),)
But:
>>> foo(1)
(1,)
>>> foo((1))
(1,)
This often appears in conjunction with print:
>>> print “%r” % (1,2)
Traceback (most recent call last):
File “<stdin>”, line 1, in <module>
TypeError: not all arguments converted during string formatting
Print is magical
I’d love to use pprint.pprint in the same way print is used.
>>> print [1,2]
[1, 2]
>>> pprint.pprint([1,2])
[1, 2]
Unfortunately pprint is not special and has to be used with brackets. Hopefully this issue was addressed by Python 3 - print will also always require brackets.
Inconsistent get interface
I’m often in doubt about the behaviour of get-like methods, due to inconsistent behaviour. Currently some of them raise exception, others just return None. Why None?
>>> {}.get(1)
None
>>> getattr(1, ‘a’)
Traceback (most recent call last):
File “<stdin>”, line 1, in <module>
AttributeError: ‘int’ object has no attribute ‘a’
>>> getattr(1, ‘a’, None)
None
Inconsistent select/poll interface
It’s less apparent, but still reasonable to point out that the select module is inconsistent. It specifies methods, some of which take seconds, others take milliseconds as a parameter.
>>> select.select(_, _, _, seconds)
>>> select.poll.poll(milliseconds)
>>> select.poll.epoll(seconds)
Internals
Circular imports
It’s not a surprise that Python doesn’t handle circular imports. But what does that actually mean?
Let’s create two files: a.py and b.py. Let’s import b from a and a from b:
$ cat a.py
var_1 = 1
import b
var_2 = 2
print “Hello world from module a”
print “Imported b, b.var_1=%r b.var_2=%r” \
% ( getattr(b, “var_1″, None), \
getattr(b, “var_2″, None))
$ cat b.py
var_1 = 1
import a
var_2 = 2
print “Hello world from module b”
print “Imported a, a.var_1=%r a.var_2=%r” \
% ( getattr(a, “var_1″, None), \
getattr(a, “var_2″, None))
Think for a while about what result you would expect.
$ python -c "import a"
Hello world from module 'b'.
Imported a, a.var_1=1 a.var_2=None
Hello world from module 'a'.
Imported b, b.var_1=1 b.var_2=2
- What actually happened?
- We requested module a. Module a runs.
- a requests module b. Flow goes to b.
- b requests a. Python understands that it’s actually in the middle of creating module a, and gives back the reference to half-loaded namespace from module a.
- Module b prints out a.var_1, which has the correct value, but a.var_2 is not set yet, so it defaults to None.
- After that everything continues normally.
Module naming and side effects
It’s often forgotten that direct imports from inside the module, like import a imports something way different than global import import blah.a. For example, if you created a file /tmp/a.py:
$ cd /tmp; PYTHONPATH=.. python
>>> import a
>>> import tmp.a
The commands are importing different module from Python’s point of view. If a.py has any side effects, they will be executed twice. A common bug is to use local paths from inside the module, while encouraging users to use global module paths from outside. This leads to double imports. So, if you’re importing local files from inside a python module, consider the syntax with dot:
# Imagine we’re in a module *tmp*, in a file *b.py*:
import a # bad, import is different than *tmp.a*
import tmp.a # better, but we can’t rename the module easily
from . import a # perfect! imports file a.py from _this_ module.
What do imports import?
If I need a.b.c.d(), Python requires me to understand which of the parts describe a module, which describe a file inside module, which describe a class and which a function. In this case I could assume that a.b.c are modules and d() is a global function:
>>> from a.b.c import d
>>> d()
But that can be wrong! a can be a module and b.c.d() can describe a class, sub class and sub sub class.
>>> from a import b
>>> b.c.d()
That’s not all. In normal cases lacking a proper import causes an exception. Sometimes it doesn’t… My favourite example is os.path. I still don’t know if I should import os or os.path. Both versions work:
>>> import os
>>> os.path.devnull
‘/dev/null’
>>> import os.path
>>> os.path.devnull
‘/dev/null’
Module reload
Apparently Python does allow you to dynamically reload modules. That’s a pretty neat feature. But in practice it’s not very useful - modules are usually imported from global namespaces and local imports from inside the code are considered slow.
Loading order
- Python uses several mechanisms for loading modules:
- System modules are loaded from /usr/lib/python2.5.
- Other are in /usr/lib/python2.5/site-packages.
- There’s also /usr/share/python-support.
- And /usr/lib/python-support.
- I haven’t yet mentioned eggs.
- And eggs have *.pth files.
Python Eggs are really dirty. Install a few of them and run:
>>> import sys
>>> sys.path
['', '/usr/lib/python2.5/site-packages/multiprocessing-2.6.2.1-py2.5-linux-x86_64.egg',
'/usr/lib/python2.5/site-packages/amqplib-0.6.1-py2.5.egg', ...]
Yes! Eggs are injected into the loading paths, polluting your system Python installation, and hurting Python startup time.
Btw. I tried to force Python to use eggs from my home directory: it’s painful. Not to mention the problems there are with platform-specific eggs.
999+1 is not 1000
The well-known “feature” of Python integers is that they don’t play nicely with is operator. Internally, small integers are reused objects and is, which checks object memory location, works fine. Greater integers are created as new objects every time, so is fails.
>>> 1 is 1
True
>>> 1000 is 1000
True
>>> 999+1 is 1000
False
>>> 2+1 is 3
True
To make things even worse the behaviour changes over python versions. For example 100+1 is 101 returns True in Python 2.5 but False in Python 2.4.
The order of unpacking
Have you ever wondered what is the order during tuple unpacking?
>>> _, _, _ = 1,2,3
>>> _
3
On the other hand this syntax is not allowed in function declarations. Strange.
>>> def a(_, _, _): print _
…
File “<stdin>”, line 1
SyntaxError: duplicate argument ‘_’ in function definition
Speaking of function declarations, there’s a nice feature that allows you to define named parameters before unnamed ones, although this syntax works only for function definitions, not for usage:
>>> def foo(a=1, b=2, *args, **kwargs):
… print “a=%r b=%r args=%r kwargs=%r” % (a,b,args,kwargs
…
>>> foo(4,5,6)
a=4 b=5 args=(6,) kwargs={}
>>> foo(a=4, b=5, 6) # I would expect this to work!
File “<stdin>”, line 1
SyntaxError: non-keyword arg after keyword arg
Python has deterministic garbage collection
Unlike other dynamic languages, Python uses reference counting as a garbage collection mechanism. During normal execution objects are freed right in the moment where they lose the last reference. This means that while the program runs Python shouldn’t have any unexpected hiccups! Unlike Java or Erlang, Python can run predictably smoothly. Am I saying that Python is a proper realtime language, and that you could use it in a medical ventilator?
Well, Python does have advanced garbage collection but it’s only used to free cyclic references. With proper programming discipline you can avoid creating reference loops. Oh, and please do avoid setting __del__ destructors, as Python can’t free reference loops with objects that define them.
__del__ is often mislead
As mentioned above, please just don’t use __del__. But if you have to, you can learn more about it here.
Parser
Multiline comments are not comments
Python doesn’t have multiline comments. Instead, multiline strings are used. This leads to some quirks:
# This works fine:
if True:
a = 1
#comment
# This fails, though I’d expect it to work:
if True:
a = 1
”’
multiline comment
”’
Multi line strings
Like in C, a string can be continued on the next line. Though it’s a bit misleading.
# Sometimes backslash is required:
a = ‘blabla’ \
‘blabla’
# In other cases it’s not needed:
a = (‘blabla’
‘blabla’)
Finally finally is interesting
There’s nothing especially broken about finally keyword. I just find this behaviour a bit quirky:
>>> def t():
… try:
… return True
… finally:
… return False
…
>>> t()
False
It’s worth to mention that finally wasn’t working with except until 2.5.
List comprehensions, kind of.
Let’s start with someone else’s opinion:
Tony Garnock-Jones (leastfixedpoint): RT @aconbere:for stmts in python should accept the same filtering ops as list comprehensions [for x in xs if x] => for x in xs if x:
List comprehensions contain for keyword. For example this is perfectly normal:
>>> [k for k in range(3) if k != 2]
But it’s not possible to use that syntax in a for statement itself:
>>> for k in range(3) if k != 2: pass # bad!
I have to use ugly workaround:
>>> for x in [k for k in range(3) if k != 2]:
Python readability could be easily improved in this case.
Forgotten features
Ellipsis?
Ellipsis is a little known constant in Python. Just like True, False or None. Apparently Ellipsis is always “bigger” than anything, as opposite to None, which is always “smaller” than anything.
>>> None < -1
True
>>> Ellipsis > 1
True
>>> None < float(“-infinity”)
True
>>> Ellipsis > float(“+infinity”)
True
If you wonder why anyone invented Ellipsis consider this example.
>>> a=[1]
>>> a.insert(1,a)
>>> a # three dots stand for Ellipsis
[1, [...]]
Though I have no clue why would I ever need a construction like that.
Forgotten types
There’s a pretty interesting buffer type. Though I’m not exactly sure when would I want to use it, nor how to make it writable.
>>> import array
>>> a = array.array(‘B’,[1,2,3])
>>> b = buffer(a)
>>> b[0]
‘\x01′
>>> a[0]=2
>>> b[0]
‘\x02′
>>> b[0]=3
Traceback (most recent call last):
File “<stdin>”, line 1, in <module>
TypeError: buffer is read-only
Another strange type is slice:
>>> slice
<type ’slice’>
Private attributes on classes
You can have private class members in Python. Though I find this behaviour more a problem than a solution.
If you ever wondered why implementation of classes is complicated in Python, this can give you an insight into the internal hacks.
>>> class A:
… _a = ‘a’
… __b = ‘b’
…
>>> A._a
‘a’
>>> A.__b
Traceback (most recent call last):
File “<stdin>”, line 1, in <module>
AttributeError: class A has no attribute ‘__b’
>>> dir(A)
['_A__b', '__doc__', '__module__', '_a']
>>> A._A__b
‘b’
What is the value of one?
Surprisingly, a value of a string “1″ is more than infinity.
>>> “1″ > 0
True
>>> “1″ > 1
True
>>> “1″ > float(“+infinity”)
True
Don’t make that mistake. Never compare strings and numbers. Oh, one more thing. Ellipsis doesn’t have the largest value as I previously suggested:
>>> “1″ > Ellipsis > float(“+infinity”)
True
Can you hash undefined?
If you ever wondered what is the hash of various constants, here is the answer:
>>> (1).__hash__()
1
>>> (2).__hash__()
2
>>> None.__hash__()
2030240
>>> Ellipsis.__hash__()
2035288
I noticed that the hashes are different on different machines, it looks like a memory location.
Famous threading/Ctrl+C issue
If Ctrl+C doesn’t work on your threaded program, here’s a good explanation why.
Epilogue
There are even more hidden features in Python.

Tom Berger
on 29/10/09 at 4:58 pm
In order to understand the quirks in Python you need to understand the language’s history. It sucked more in the beginning, and improved continuously while maintaining backwards-compatibility. Python 3 (which isn’t backwards-compatible) fixes many of there problems.
A few comments about stuff you mention:
Makes sense, list.sort() is destructive, why would you want to return anything? Then again, why would you want to use destructive sort? The only reason is efficiency, but if you want efficiency then you’re unlikely to be using Python (at least not without some kind of homogeneous vector library). I almost never use destructive sort, and when I do I usually add a comment next to it explaining why I’m doing that.
The tuples constructor is the comma. The parentheses often found around a tuple are there just to group it and separate it from other tokens, but they’re optional.
It would be perhaps more correct to say that print is evil. This is one thing that’s fixed in Python 3.
I never thought of it this way, and never heard anyone complaining. get and getattr are very different, and they operate in a very different way. get is for dict-like objects, getattr for all object specializations. In the name of explicit-is-better-than-implicit, I usually write {}.get(1, None), but last week I heard that this is considered unidiomatic, so maybe we just need to memorize the rule.
That’s not really a quirk, and pretty much all modern languages will exhibit something similar. The fact that 5 is 2 + 3 is an implementation detail (lower integers are interned for efficiency). The threshold for when interning stops will be different from one Python implementation to the other. You should never ever use is to compare numbers, it’s meaningless (if you use a linter like PyLint or PyFlakes you’ll get a warning when doing this).
Don’t use the backslash. It’s there, but it’s absolutely unnecessary, and it makes working with the code harder because the parser treats a group of lines separated with backslash as a single line, which is confusing. Also most tools and IDEs are likely to not handle it well. In contemporary Python code you’ll almost never find the backslash used. Instead, if you need to break the line, use parentheses to format your expression.
None and Ellipsis are just simple singleton objects with no intrinsic value, so their hash is memory based.
uberVU - social comments
on 29/10/09 at 5:32 pm
Social comments and analytics for this post…
This post was mentioned on Twitter by Greg Ferrell: RT @lshift: New blog post: Python quirks http://bit.ly/upUFK...
tef
on 29/10/09 at 6:14 pm
This is the same in java.
You can use (k for k ….) instead, to return a generator comprehension, rather then a list comprehension for what it is worth.
ted
on 29/10/09 at 6:32 pm
the misleading thing about the tuple example is the function call and usage of *args, this has nothing to do with tuples and it would surely be simpler, and easier to understand for a beginning without the function there at all.
simply put, the misleading thing about tuples is that-unlike lists-the parenthesis do not a tuple make. as noted above, the comma makes the tuple. in fact you do not need the parenthesis at all to make a tuple, just the comma.
Twitter Trackbacks for Python quirks « LShift Ltd. [lshift.net] on Topsy.com
on 29/10/09 at 6:48 pm
[...] Python quirks « LShift Ltd. http://www.lshift.net/blog/2009/10/29/python-quirks – view page – cached I’ve been using Python for a while. Recently I have noted some nuances, wonders and counter-intuitive things I ran into. The list grew surprisingly — From the page [...]
Stephen McDonald
on 29/10/09 at 7:23 pm
Great list! All these things have logical explanations yet do appear strange on the surface. I think you should mention the behavior of using a list as a default argument value in a function, this is the number one gotcha for new python programmers.
marek
on 29/10/09 at 9:13 pm
There are some reasonable explanations in reddit thread:
http://www.reddit.com/r/programming/comments/9z18w/pythonquirks9991isnot1000circular_imports/
marek
on 29/10/09 at 9:38 pm
I forgot about this one! There’s also more to say about has_key() function.
Twitted by _kemar
on 30/10/09 at 9:46 am
[...] This post was Twitted by _kemar [...]
Curiosidades Python
on 01/11/09 at 1:33 am
[...] Enlace: LShift [...]
Curiosidades Python | Moova! News on the Move
on 02/11/09 at 6:58 am
[...] LShift Bookmark Leave a comment No comments [...]
fire
on 04/11/09 at 5:54 am
from http://docs.python.org/reference/datamodel.html?highlight=slice :
Slice objects are used to represent slices when extended slice syntax is used. This is a slice using two colons, or multiple slices or ellipses separated by commas, e.g., a[i:j:step], a[i:j, k:l], or a[..., i:j]. They are also created by the built-in slice() function.
so, please read documentation before writing…
tonyg
on 09/11/09 at 6:58 pm
@fire, I’m confident that Marek knows what slices are. As I understand what he wrote, he is commenting on the interesting and perhaps surprising fact that slices are first-class types. And did you have a specific point to make in support of your flat rejection of the idea that the implementation of classes in Python might be complicated? I look forward to your contribution.
Pete
on 17/11/09 at 11:23 am
Since when is reference counting deterministic? When a reference is finished with, you can’t know if memory will be freed (expensive operation) or a reference count is reduced (very cheap operation).
tonyg
on 17/11/09 at 1:05 pm
@Pete, true; but you do know that objects will be destroyed as soon as possible after they are no longer reachable. It’s a different kind of determinism: resource-use determinism, rather than execution-time determinism. Both have a place in real-time systems; it’d be interesting to explore the relationship between the two.