Python Memory Footprint
This is a cross listed blog post. I had posted this blog on my company’s blog and wanted to repost here as it is a topic that very few python developers understand. Python has a high memory footprint, understanding that is the key to writing very space efficient python programs.
Note there will be a follow up to this blog post to discuss memory footprint of data structures used by numpy. Numpy has its own set of data structures that can be more efficient depending on the use case. When it comes to scientific computation (matrices, numerical methods, …) and hence Machine Learning (NLP, AI, …) can be much more efficient.
This article applies to python 2.7 64-bit (32bit and py3k may be different)
Edit: I added simpler estimate formulas along side the actual formulas, so memory footprint can be quickly visualized.
Edit2: 32 bit python seems to be using around half of the memory; this seems to be due to the use of 32bit pointers instead of 64. That being said, you are limited to 2GB of memory.
Some developers are unaware of the memory footprint python has and tend to hit walls especially if they are trying to load big data into memory instead of using efficient cache oblivious algorithms and data-structures.
This post demonstrates the memory footprint of basic python objects/data-structures. You can use this data to estimate how much memory you would need to support your program or better layout your program if your memory starts to run out. This data was collected using the python profiler Guppy-PE.
Boolean and Numerical Types
Boolean (bool): 24 bytes
import guppy hp = guppy.hpy() In : hp.iso(True) Out: Partition of a set of 1 object. Total size = 24 bytes. Index Count % Size % Cumulative % Kind (class / dict of class) 0 1 100 24 100 24 100 bool
Integers (int): 24 bytes
In : hp.iso(1) Out: Partition of a set of 1 object. Total size = 24 bytes. Index Count % Size % Cumulative % Kind (class / dict of class) 0 1 100 24 100 24 100 int In : hp.iso(2**62) Out: Partition of a set of 1 object. Total size = 24 bytes. Index Count % Size % Cumulative % Kind (class / dict of class) 0 1 100 24 100 24 100 int
Long Integers (long): 32 + int(math.log(NUMBER, 2) / 60) * 8 bytes
In : hp.iso(long(0)) Out: Partition of a set of 1 object. Total size = 24 bytes. Index Count % Size % Cumulative % Kind (class / dict of class) 0 1 100 24 100 24 100 long In : hp.iso(long(2**60)) Out: Partition of a set of 1 object. Total size = 40 bytes. Index Count % Size % Cumulative % Kind (class / dict of class) 0 1 100 40 100 40 100 long In : hp.iso(2**120) Out: Partition of a set of 1 object. Total size = 48 bytes. Index Count % Size % Cumulative % Kind (class / dict of class) 0 1 100 48 100 48 100 long In : hp.iso(2**180) Out: Partition of a set of 1 object. Total size = 56 bytes. Index Count % Size % Cumulative % Kind (class / dict of class) 0 1 100 56 100 56 100 long # 32 + int(math.log(abs(0) or 1, 2) / 60) * 8
Float: 24 bytes
In : hp.iso(1.0) Out: Partition of a set of 1 object. Total size = 24 bytes. Index Count % Size % Cumulative % Kind (class / dict of class) 0 1 100 24 100 24 100 float In : hp.iso(128301289308129083901231.09102783098192083091823089120839012) Out: Partition of a set of 1 object. Total size = 24 bytes. Index Count % Size % Cumulative % Kind (class / dict of class) 0 1 100 24 100 24 100 float
Decimal (decimal.Decimal): 80 bytes
In : hp.iso(decimal.Decimal('1.0')) Out: Partition of a set of 1 object. Total size = 80 bytes. Index Count % Size % Cumulative % Kind (class / dict of class) 0 1 100 80 100 80 100 decimal.Decimal In : hp.iso(decimal.Decimal('128301289308129083901231.09102783098192083091823089120839012')) Out: Partition of a set of 1 object. Total size = 80 bytes. Index Count % Size % Cumulative % Kind (class / dict of class) 0 1 100 80 100 80 100 decimal.Decimal
Strings
Note: In python 2 string concat using the __add__ or essentially ‘+’ creates intermediate strings which will essentially grab much more memory than you need. The efficient way to join strings is to use the string join method or ‘%s’ string formatting (for example). Just avoid use of ‘+’ with strings until you move to python 3.
Every 8 chars use 8 bytes, with an initial 40 bytes (for up to 3 chars)
String: 40 + ((len(s) – 4) / 8 + 1) * 8 bytes ~= 40 + len(s) * 8
In : hp.iso('a'*3) Out: Partition of a set of 1 object. Total size = 40 bytes. Index Count % Size % Cumulative % Kind (class / dict of class) 0 1 100 40 100 40 100 str In : hp.iso('a'*4) Out: Partition of a set of 1 object. Total size = 48 bytes. Index Count % Size % Cumulative % Kind (class / dict of class) 0 1 100 48 100 48 100 str In : hp.iso('a'*12) Out: Partition of a set of 1 object. Total size = 56 bytes. Index Count % Size % Cumulative % Kind (class / dict of class) 0 1 100 56 100 56 100 str In : hp.iso('a'*20) Out: Partition of a set of 1 object. Total size = 64 bytes. Index Count % Size % Cumulative % Kind (class / dict of class) 0 1 100 64 100 64 100 str
DataStructures (Lists, Tuples, Dict)
the following is just the structure’s memory usage and not whats inside of it:
Tuple: 56 + 8 * len(t) bytes
In : hp.iso(tuple()) Out: Partition of a set of 1 object. Total size = 56 bytes. Index Count % Size % Cumulative % Kind (class / dict of class) 0 1 100 56 100 56 100 tuple In : hp.iso(tuple(range(1))) Out: Partition of a set of 1 object. Total size = 64 bytes. Index Count % Size % Cumulative % Kind (class / dict of class) 0 1 100 64 100 64 100 tuple In : hp.iso(tuple(range(2))) Out: Partition of a set of 1 object. Total size = 72 bytes. Index Count % Size % Cumulative % Kind (class / dict of class) 0 1 100 72 100 72 100 tuple In : hp.iso(tuple(range(100))) Out: Partition of a set of 1 object. Total size = 856 bytes. Index Count % Size % Cumulative % Kind (class / dict of class) 0 1 100 856 100 856 100 tuple
List: 72 + 64 * int(1 + (len(l) + 1) / 8) bytes ~= 72 + len(l) * 8
In : hp.iso(list()) Out: Partition of a set of 1 object. Total size = 72 bytes. Index Count % Size % Cumulative % Kind (class / dict of class) 0 1 100 72 100 72 100 list In : hp.iso(list(range(1))) Out: Partition of a set of 1 object. Total size = 136 bytes. Index Count % Size % Cumulative % Kind (class / dict of class) 0 1 100 136 100 136 100 list In : hp.iso(list(range(8))) Out: Partition of a set of 1 object. Total size = 200 bytes. Index Count % Size % Cumulative % Kind (class / dict of class) 0 1 100 200 100 200 100 list In : hp.iso(list(range(16))) Out: Partition of a set of 1 object. Total size = 264 bytes. Index Count % Size % Cumulative % Kind (class / dict of class) 0 1 100 264 100 264 100 list
Dictionary (dict): memory based on number of buckets, below are the details, and here is the pattern that seems to be exhibited. the first 5 elements are included in the initial 280 bytes. The next bucket can hold up to (2**4) 16 more elements with 52.5 bytes per element. The next bucket can hold (2 ** 6) 64 more elements with 36 bytes per element. The next bucket can host (2 ** 8) 256 more elements with 36 bytes per element. The next can host (2** 10) 1024 more elements with 36 bytes per element … I have not tried to come up with a formula for this one, feel free to solve this in the comments.
In : hp.iso(dict()) In : hp.iso(dict([(x,None) for x in range(5)])) Out: Partition of a set of 1 object. Total size = 280 bytes. Index Count % Size % Cumulative % Kind (class / dict of class) 0 1 100 280 100 280 100 dict (no owner) In : hp.iso(dict([(x,None) for x in range(6)])) In : hp.iso(dict([(x,None) for x in range(5 + 16)])) Out: Partition of a set of 1 object. Total size = 1048 bytes. Index Count % Size % Cumulative % Kind (class / dict of class) 0 1 100 1048 100 1048 100 dict (no owner) In : hp.iso(dict([(x,None) for x in range(6 + 16)])) In : hp.iso(dict([(x,None) for x in range(5 + 16 + 64)])) Out: Partition of a set of 1 object. Total size = 3352 bytes. Index Count % Size % Cumulative % Kind (class / dict of class) 0 1 100 3352 100 3352 100 dict (no owner) In : hp.iso(dict([(x,None) for x in range(6 + 16 + 64)])) In : hp.iso(dict([(x,None) for x in range(5 + 16 + 64 + 128)])) Out: Partition of a set of 1 object. Total size = 12568 bytes. Index Count % Size % Cumulative % Kind (class / dict of class) 0 1 100 12568 100 12568 100 dict (no owner) In : hp.iso(dict([(x,None) for x in range(6 + 16 + 64 + 128)])) In : hp.iso(dict([(x,None) for x in range(5 + 16 + 64 + 128 + 1024)])) Out: Partition of a set of 1 object. Total size = 49432 bytes. Index Count % Size % Cumulative % Kind (class / dict of class) 0 1 100 49432 100 49432 100 dict (no owner)
This is a cross listed blog post. I had posted this blog on my company’s blog and wanted to repost here as it is a topic that very few python developers understand. Python has a high memory footprint, understanding that is the key to writing very space efficient python programs.Note there will be a follow…