Wrapping some piece of logic in a function provides all the promised benefits: code reuse, scope isolation, unit-testing boundary, etc. However, as always, there are no unalloyed goods and I’m particularly partial to Cindy Sridharan’s Small Functions considered Harmful take on the harms of taking this too far. There is also a non-zero performance penalty where there is no optimizing compiler to inline function calls and out of curiosity I wanted to look at what this performance penalty is for Python by trying to measure the overhead of Python functions.

I am quite sure there are some benchmarking boobytraps I’m missing, but the approach I took to measure function call overhead was to make a series of increasingly nested function calls and then time how long each one took. For example, if we define

def a():
return 29

def b():
return a()

def c():
return b()


then by timing how much longer b() takes than a() and c() than b() and so on then can attribute that increase to function call overhead

We can then repeat the pattern of nested function calls for a few different types of functions:

1. Plain no-argument functions (as above).
2. 1 argument functions to see how expensive passing arguments is.
3. Instance methods with no arguments to check the overhead of adding classes to the mix.
4. Instance methods with 1 argument.

The code for the remaining cases looks something like:

def times2_a(x):
return 2*x

def times2_b(x):
return times2_a(x)

def times2_c(x):
return times2_b(x)

class OneLiners():
def a(self):
return 29

def b(self):
return self.a()

def c(self):
return self.b()

def times2_a(self, x):
return 2*x

def times2_b(self, x):
return self.times2_a(x)

def times2_c(self, x):
return self.times2_b(x)


We can then time everything using iPython’s timeit magic.

timing_results = {}

timing_results['Function No Arg'] = []

ti = %timeit -o -q a()
timing_results['Function No Arg'].append(ti)

ti = %timeit -o -q b()
timing_results['Function No Arg'].append(ti)

⋮


And plot the timing results as function time versus function depth (up to 10) and run some simple linear regression to estimate the slope in order to extract the overhead per function call assuming it is constant with stack size.

fig,ax = plt.subplots(figsize=(10,6))

max_depth = 10

# explicit order because dictionary order is only preserved for Python 3.6+
for ct, label in enumerate(('Function No Arg', 'Function 1 Arg', 'Class Method No Arg', 'Class Method 1 Arg')):
results = timing_results[label]
xs = 1 + np.arange(max_depth)
# the results from timeit changed at some point
#     vals = np.array([ti.average/1e-9 for ti in results])
#     errs = np.array([ti.stdev/1e-9 for ti in results])
vals = np.array([np.mean(ti.all_runs)/ti.loops/1e-9 for ti in results])
errs = np.array([np.std(ti.all_runs)/ti.loops/1e-9 for ti in results])
fit_coeffs, V = np.polyfit(xs, vals, deg=1, cov=True, w=1/errs)
err_bars = ax.errorbar(xs, vals, yerr=errs, linestyle="none", elinewidth=3, marker=".", label=label+" | {:.0f} +- {:.0f} ns".format(overheads[label],overheads[label]), color="C{}".format(ct))
ax.plot(xs, np.poly1d(fit_coeffs)(xs))
ax.set_xlabel("Function Depth")
ax.set_title("Function Call Overhead in Python {}".format(platform.python_version()))

plt.legend(loc="upper center")

The results consistently show a function call overhead in the 50-100 ns range and increasing overhead with either function arguments or using class methods. If some code is only making a few function calls and trying to perform logic on a human timescale of many 10’s of ms then this overhead of 10’s of nanoseconds is probably immaterial. However, if you’re applying some small function to many millions of array elements then this overhead is very material. Of course going fast in array processing in Python is all about avoiding this penalty and calling out to C via e.g. numpy.