我正在编写一个简单的 Python 程序.
I am writing a simple Python program.
我的程序似乎无法线性访问字典,即使算法是二次的,它的运行时间也会呈指数增长.我使用字典来记忆值.这似乎是一个瓶颈.
My program seems to suffer from linear access to dictionaries, its run-time grows exponentially even though the algorithm is quadratic. I use a dictionary to memoize values. That seems to be a bottleneck.
我散列的值是点的元组.每个点为:(x,y), 0 <= x,y <= 50字典中的每个key是: 一个2-5点的元组:((x1,y1),(x2,y2),(x3,y3),(x4,y4))
The values I'm hashing are tuples of points. Each point is: (x,y), 0 <= x,y <= 50 Each key in the dictionary is: A tuple of 2-5 points: ((x1,y1),(x2,y2),(x3,y3),(x4,y4))
键的读取次数比写入次数多很多.
The keys are read many times more often than they are written.
我是否正确地认为 python dicts 会受到此类输入的线性访问时间的影响?
Am I correct that python dicts suffer from linear access times with such inputs?
据我所知,集合有保证的对数访问时间.如何在 Python 中使用集合(或类似的东西)模拟字典?
As far as I know, sets have guaranteed logarithmic access times. How can I simulate dicts using sets(or something similar) in Python?
edit 根据要求,这是记忆功能的(简化)版本:
edit As per request, here's a (simplified) version of the memoization function:
def memoize(fun): memoized = {} def memo(*args): key = args if not key in memoized: memoized[key] = fun(*args) return memoized[key] return memo 推荐答案请参阅时间复杂度.python dict是一个hashmap,如果hash函数不好并导致很多冲突,那么它最坏的情况是O(n).然而,这是一种非常罕见的情况,其中添加的每个项目都具有相同的哈希值,因此被添加到同一链中,这对于主要的 Python 实现来说极不可能.平均时间复杂度当然是O(1).
See Time Complexity. The python dict is a hashmap, its worst case is therefore O(n) if the hash function is bad and results in a lot of collisions. However that is a very rare case where every item added has the same hash and so is added to the same chain which for a major Python implementation would be extremely unlikely. The average time complexity is of course O(1).
最好的方法是检查并查看您正在使用的对象的哈希值.CPython Dict 使用 int PyObject_Hash (PyObject *o) 相当于 hash(o).
The best method would be to check and take a look at the hashs of the objects you are using. The CPython Dict uses int PyObject_Hash (PyObject *o) which is the equivalent of hash(o).
快速检查后,我还没有找到两个散列为相同值的元组,这表明查找是 O(1)
After a quick check, I have not yet managed to find two tuples that hash to the same value, which would indicate that the lookup is O(1)
l = [] for x in range(0, 50): for y in range(0, 50): if hash((x,y)) in l: print "Fail: ", (x,y) l.append(hash((x,y))) print "Test Finished"CodePad(24 小时可用)