# 倒排索引优化 - 跳表

```a = [1, 2, 3, 6, 9, 11, 45, 67]
b = [4, 6, 13, 45, 69, 98]

i = j = 0
result = []
while i < len(a) and j < len(b):
if a[i] == b[j]:
result.append(a[i])
i = i + 1
j = j + 1
elif a[i] < b[j]:
i = i + 1
else:
j = j + 1

print result

# 输出
[6, 45]
```

```[1, 2, 3, 4, 5, ... 10001, 10005]
[1, 10001, 10008]
```

```a = range(10008)
b = [1, 10001, 10008]

i = j = 0
result = []
step = 100
count = 0
while i < len(a) and j < len(b):
if a[i] == b[j]:
result.append(a[i])
i = i +1
j = j + 1
count = count + 1
elif a[i] < b[j]:
while (i + step < len(a)) and a[i+step] <= b[j]:
i = i + step
count = count + 1
else:
i = i + 1
count = count + 1
else:
while (j + step < len(b)) and b[j+step] <= a[i]:
j = j + 5000
count = count + 1
else:
j = j + 1
count = count + 1

print result
print count

a = range(10008)
b = [1, 10001, 10008]
count = 0

i = j = 0
result = []
while i < len(a) and j < len(b):
if a[i] == b[j]:
result.append(a[i])
i = i + 1
j = j + 1
count = count + 1
elif a[i] < b[j]:
i = i + 1
count = count + 1
else:
j = j + 1
count = count + 1

print result
print count
```

1. 这里为了简单说明跳表的思路, 全部用了数组表示倒排表, 其实真实的数据结构应该是链表结构(linked list). 这才符合磁盘存储结构.

2. 跳表的原始结构算法比这个复杂, 而且根据场景的不同, 跳表有不同的实现. 这里因为不是利用跳表的快速查询功能, 所以没有多级指针索引概念, 详细跳表实现查考: skip list

## K：跳表

??跳表(SkipList)是一种随机化的数据结构,目前在redis和leveldb中都有用到它,它的效率和红黑树以及 AVL 树不相上下,但跳表的原理相当简单,只要你能熟练操作链表, 就能轻松实现一个 SkipList. 考虑一个有序表: 从该有序表中搜索元素 < 23, 43, 59 >,需要比较的次数分别为 < 2, 4, 6 >,总共比较的次数为 2 + 4 + 6 = 12 次. 有没有优化的算法吗?链表是有序的,但不能使用二分查找.类似二叉搜索树,我们把一些节点提取出来