Skip to content

Commit 708b004

Browse files
committed
use high level interface kumbu to connect to multipul queues
1 parent b55607d commit 708b004

File tree

10 files changed

+207
-30
lines changed

10 files changed

+207
-30
lines changed

README.md

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3,13 +3,12 @@ pyspider [![Build Status]][Travis CI] [![Coverage Status]][Coverage] [![Try]][De
33

44
A Powerful Spider(Web Crawler) System in Python. **[TRY IT NOW!][Demo]**
55

6-
- Write script in python with powerful API
7-
- Python 2&3
6+
- Write script in Python
87
- Powerful WebUI with script editor, task monitor, project manager and result viewer
9-
- Javascript pages supported!
10-
- MySQL, MongoDB, SQLite, PostgreSQL as database backend
11-
- Task priority, retry, periodical, recrawl by age and more
12-
- Distributed architecture
8+
- [MySQL](https://www.mysql.com/), [MongoDB](https://www.mongodb.org/), [Redis](http://redis.io/), [SQLite](https://www.sqlite.org/), [PostgreSQL](http://www.postgresql.org/) with [SQLAlchemy](http://www.sqlalchemy.org/) as database backend
9+
- [RabbitMQ](http://www.rabbitmq.com/), [Beanstalk](http://kr.github.com/beanstalkd/), [Redis](http://redis.io/) and [Kombu](http://kombu.readthedocs.org/) as message queue
10+
- Task priority, retry, periodical, recrawl by age, etc...
11+
- Distributed architecture, Crawl Javascript pages, Python 2&3, etc...
1312

1413
Documentation: [http://docs.pyspider.org/](http://docs.pyspider.org/)
1514
Tutorial: [http://docs.pyspider.org/en/latest/tutorial/](http://docs.pyspider.org/en/latest/tutorial/)

docs/Command-Line.md

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -87,15 +87,18 @@ type:
8787
#### --message-queue
8888

8989
```
90-
rabbitmq:
91-
amqp://username:password@host:5672/%2F
92-
Refer: https://www.rabbitmq.com/uri-spec.html
93-
beanstalk:
94-
beanstalk://host:11300/
95-
redis:
96-
redis://host:6379/db
97-
builtin:
98-
None
90+
rabbitmq:
91+
amqp://username:password@host:5672/%2F
92+
see https://www.rabbitmq.com/uri-spec.html
93+
beanstalk:
94+
beanstalk://host:11300/
95+
redis:
96+
redis://host:6379/db
97+
kombu:
98+
kombu+transport://userid:password@hostname:port/virtual_host
99+
see http://kombu.readthedocs.org/en/latest/userguide/connections.html#urls
100+
builtin:
101+
None
99102
```
100103

101104
#### --phantomjs-proxy

docs/Deployment.md

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -78,13 +78,15 @@ type:
7878
You can use connection URL to specify the message queue:
7979

8080
```
81-
rabbitmq:
82-
amqp://username:password@host:5672/%2F
83-
Refer: https://www.rabbitmq.com/uri-spec.html
84-
beanstalk:
85-
beanstalk://host:11300/
86-
redis:
87-
redis://host:6379/db
81+
rabbitmq:
82+
amqp://username:password@host:5672/%2F
83+
Refer: https://www.rabbitmq.com/uri-spec.html
84+
beanstalk:
85+
beanstalk://host:11300/
86+
redis:
87+
redis://host:6379/db
88+
builtin:
89+
None
8890
```
8991

9092
running

docs/index.md

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3,13 +3,12 @@ pyspider [![Build Status][Build Status]][Travis CI] [![Coverage Status][Coverage
33

44
A Powerful Spider(Web Crawler) System in Python. **[TRY IT NOW!][Demo]**
55

6-
- Write script in python with powerful API
7-
- Python 2&3
6+
- Write script in Python
87
- Powerful WebUI with script editor, task monitor, project manager and result viewer
9-
- Javascript pages supported!
10-
- MySQL, MongoDB, SQLite, PostgreSQL as database backend
11-
- Task priority, retry, periodical, recrawl by age and more
12-
- Distributed architecture
8+
- [MySQL](https://www.mysql.com/), [MongoDB](https://www.mongodb.org/), [Redis](http://redis.io/), [SQLite](https://www.sqlite.org/), [PostgreSQL](http://www.postgresql.org/) with [SQLAlchemy](http://www.sqlalchemy.org/) as database backend
9+
- [RabbitMQ](http://www.rabbitmq.com/), [Beanstalk](http://kr.github.com/beanstalkd/), [Redis](http://redis.io/) and [Kombu](http://kombu.readthedocs.org/) as message queue
10+
- Task priority, retry, periodical, recrawl by age, ...
11+
- Distributed architecture, Crawl Javascript pages, Python 2&3, ...
1312

1413

1514
Sample Code

pyspider/message_queue/__init__.py

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,11 +20,14 @@ def connect_message_queue(name, url=None, maxsize=0):
2020
2121
rabbitmq:
2222
amqp://username:password@host:5672/%2F
23-
Refer: https://www.rabbitmq.com/uri-spec.html
23+
see https://www.rabbitmq.com/uri-spec.html
2424
beanstalk:
2525
beanstalk://host:11300/
2626
redis:
2727
redis://host:6379/db
28+
kombu:
29+
kombu+transport://userid:password@hostname:port/virtual_host
30+
see http://kombu.readthedocs.org/en/latest/userguide/connections.html#urls
2831
builtin:
2932
None
3033
"""
@@ -49,5 +52,10 @@ def connect_message_queue(name, url=None, maxsize=0):
4952
db = 0
5053

5154
return Queue(name, parsed.hostname, parsed.port, db=db, maxsize=maxsize)
55+
else:
56+
if url.startswith('kombu+'):
57+
url = url[len('kombu+'):]
58+
from .kombu_queue import Queue
59+
return Queue(name, url, maxsize=maxsize)
5260

5361
raise Exception('unknow connection url: %s', url)

pyspider/message_queue/kombu_queue.py

Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
#!/usr/bin/env python
2+
# -*- encoding: utf-8 -*-
3+
# vim: set et sw=4 ts=4 sts=4 ff=unix fenc=utf8:
4+
# Author: Binux<roy@binux.me>
5+
# http://binux.me
6+
# Created on 2015-05-22 20:54:01
7+
8+
import time
9+
import umsgpack
10+
from kombu import Connection, enable_insecure_serializers
11+
from kombu.serialization import register
12+
from kombu.exceptions import ChannelError
13+
from six.moves import queue as BaseQueue
14+
15+
16+
register('umsgpack', umsgpack.packb, umsgpack.unpackb, 'application/x-msgpack')
17+
enable_insecure_serializers(['umsgpack'])
18+
19+
20+
class KombuQueue(object):
21+
"""
22+
kombu is a high-level interface for multiple message queue backends.
23+
24+
KombuQueue is built on top of kombu API.
25+
"""
26+
27+
Empty = BaseQueue.Empty
28+
Full = BaseQueue.Full
29+
max_timeout = 0.3
30+
31+
def __init__(self, name, url="amqp://", maxsize=0, lazy_limit=True):
32+
"""
33+
Constructor for KombuQueue
34+
35+
url: http://kombu.readthedocs.org/en/latest/userguide/connections.html#urls
36+
maxsize: an integer that sets the upperbound limit on the number of
37+
items that can be placed in the queue.
38+
"""
39+
self.name = name
40+
self.conn = Connection(url)
41+
self.queue = self.conn.SimpleQueue(self.name, no_ack=True, serializer='umsgpack')
42+
43+
self.maxsize = maxsize
44+
self.lazy_limit = lazy_limit
45+
if self.lazy_limit and self.maxsize:
46+
self.qsize_diff_limit = int(self.maxsize * 0.1)
47+
else:
48+
self.qsize_diff_limit = 0
49+
self.qsize_diff = 0
50+
51+
def qsize(self):
52+
try:
53+
return self.queue.qsize()
54+
except ChannelError:
55+
return 0
56+
57+
def empty(self):
58+
if self.qsize() == 0:
59+
return True
60+
else:
61+
return False
62+
63+
def full(self):
64+
if self.maxsize and self.qsize() >= self.maxsize:
65+
return True
66+
else:
67+
return False
68+
69+
def put(self, obj, block=True, timeout=None):
70+
if not block:
71+
return self.put_nowait()
72+
73+
start_time = time.time()
74+
while True:
75+
try:
76+
return self.put_nowait(obj)
77+
except BaseQueue.Full:
78+
if timeout:
79+
lasted = time.time() - start_time
80+
if timeout > lasted:
81+
time.sleep(min(self.max_timeout, timeout - lasted))
82+
else:
83+
raise
84+
else:
85+
time.sleep(self.max_timeout)
86+
87+
def put_nowait(self, obj):
88+
if self.lazy_limit and self.qsize_diff < self.qsize_diff_limit:
89+
pass
90+
elif self.full():
91+
raise BaseQueue.Full
92+
else:
93+
self.qsize_diff = 0
94+
return self.queue.put(obj)
95+
96+
def get(self, block=True, timeout=None):
97+
try:
98+
ret = self.queue.get(block, timeout)
99+
return ret.payload
100+
except self.queue.Empty:
101+
raise BaseQueue.Empty
102+
103+
def get_nowait(self):
104+
try:
105+
ret = self.queue.get_nowait()
106+
return ret.payload
107+
except self.queue.Empty:
108+
raise BaseQueue.Empty
109+
110+
def delete(self):
111+
self.queue.queue.delete()
112+
113+
def __del__(self):
114+
self.queue.close()
115+
116+
117+
Queue = KombuQueue

pyspider/message_queue/redis_queue.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,6 @@ def full(self):
5454

5555
def put_nowait(self, obj):
5656
if self.lazy_limit and self.last_qsize < self.maxsize:
57-
print(self.name, self.last_qsize)
5857
pass
5958
elif self.full():
6059
raise self.Full

requirements.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,3 +18,4 @@ SQLAlchemy>=0.9.7
1818
six
1919
amqp>=1.3.0
2020
redis
21+
kombu

setup.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,7 @@
7878
'unittest2>=0.5.1',
7979
'SQLAlchemy>=0.9.7',
8080
'redis',
81+
'kombu',
8182
],
8283
},
8384

tests/test_message_queue.py

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -169,3 +169,51 @@ def tearDownClass(self):
169169
self.q2.get()
170170
while not self.q3.empty():
171171
self.q3.get()
172+
173+
class TestKombuQueue(TestMessageQueue, unittest.TestCase):
174+
kombu_url = 'kombu+memory://'
175+
176+
@classmethod
177+
def setUpClass(self):
178+
from pyspider.message_queue import connect_message_queue
179+
with utils.timeout(3):
180+
self.q1 = connect_message_queue('test_queue', self.kombu_url, maxsize=5)
181+
self.q2 = connect_message_queue('test_queue', self.kombu_url, maxsize=5)
182+
self.q3 = connect_message_queue('test_queue_for_threading_test', self.kombu_url)
183+
while not self.q1.empty():
184+
self.q1.get()
185+
while not self.q2.empty():
186+
self.q2.get()
187+
while not self.q3.empty():
188+
self.q3.get()
189+
190+
@classmethod
191+
def tearDownClass(self):
192+
while not self.q1.empty():
193+
self.q1.get()
194+
self.q1.delete()
195+
while not self.q2.empty():
196+
self.q2.get()
197+
self.q2.delete()
198+
while not self.q3.empty():
199+
self.q3.get()
200+
self.q3.delete()
201+
202+
@unittest.skip('test cannot pass, get is buffered')
203+
@unittest.skipIf(os.environ.get('IGNORE_RABBITMQ'), 'no rabbitmq server for test.')
204+
class TestKombuAmpqQueue(TestKombuQueue):
205+
kombu_url = 'kombu+amqp://'
206+
207+
@unittest.skip('test cannot pass, put is buffered')
208+
@unittest.skipIf(os.environ.get('IGNORE_REDIS'), 'no redis server for test.')
209+
class TestKombuRedisQueue(TestKombuQueue):
210+
kombu_url = 'kombu+redis://'
211+
212+
@unittest.skip('test cannot pass, get is buffered')
213+
@unittest.skipIf(os.environ.get('IGNORE_BEANSTALK'), 'no beanstalk server for test.')
214+
class TestKombuBeanstalkQueue(TestKombuQueue):
215+
kombu_url = 'kombu+beanstalk://'
216+
217+
@unittest.skipIf(os.environ.get('IGNORE_MONGODB'), 'no rabbitmq server for test.')
218+
class TestKombuMongoDBQueue(TestKombuQueue):
219+
kombu_url = 'kombu+mongodb://'

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy