在网上下了一个 python 爬虫程序，怎么运行？

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

现在注册

已注册用户请登录

这是一个创建于 3436 天前的主题，其中的信息可能已经有所发展或是发生改变。

豆瓣图书爬虫程序

第 1 条附言 · 2016-07-02 18:51:15 +08:00

我换 linux 环境现在可以运行了，运行$ python doubanSpider.py 后一直在下载，是什么意思额？
/usr/local/lib/python2.7/dist-packages/bs4/__init__.py:166: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

To get rid of this warning, change this:

BeautifulSoup([your markup])

to this:

BeautifulSoup([your markup], "html.parser")

markup_type=markup_type))
Downloading Information From Page 1
Downloading Information From Page 2
Downloading Information From Page 3
Downloading Information From Page 4
Downloading Information From Page 5
Downloading Information From Page 6
WARNING:root:Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.
Downloading Information From Page 7
Downloading Information From Page 8
Downloading Information From Page 9
Downloading Information From Page 10
Downloading Information From Page 11
Downloading Information From Page 12
Downloading Information From Page 13
Downloading Information From Page 14
Downloading Information From Page 15
Downloading Information From Page 16
Downloading Information From Page 17
Downloading Information From Page 18
Downloading Information From Page 19
Downloading Information From Page 20
Downloading Information From Page 21
Downloading Information From Page 22
Downloading Information From Page 23
Downloading Information From Page 24

爬虫

Python

程序

豆瓣

15 条回复 • 2016-07-03 20:15:38 +08:00

upczww

2016-07-01 14:09:51 +08:00 via Smartisan T1

没有代码别人怎么帮你？

grey5659

2016-07-01 14:15:11 +08:00

@upczww 就是这个 http://blog.csdn.net/lanbing510/article/details/45887075 .py 文件如何访问？

AnonymousID

2016-07-01 14:23:36 +08:00 via Android

难道不是直接运行那个 py 文件？

grey5659

2016-07-01 14:24:41 +08:00

@AnonymousID 不会吧？

AnonymousID

2016-07-01 14:30:15 +08:00 via Android

@grey5659 明明就是啊，不就那一个文件可以执吗

AnonymousID

2016-07-01 14:31:05 +08:00 via Android

楼上漏了个行字

niboy

2016-07-01 15:14:29 +08:00

首先要安装 python ，到 python.org 上下载安装，然后双击运行 python 文件或者 python ***.py

grey5659

2016-07-01 15:20:43 +08:00

@niboy 安装了双击一闪而过，改名成 doubanSpider.pyw 用 IDLE 打开后，运行 run module 提示 Traceback (most recent call last):
File "C:\Users\lenovo\Desktop\DouBanSpider-master\doubanSpider.pyw", line 7, in <module>
import requests
ImportError: No module named requests

niboy

2016-07-01 15:27:38 +08:00

@grey5659
你缺少 requests 依赖包。。 http://blog.csdn.net/alpha5/article/details/24964009

其他的，你自己想办法解决吧，比如下面这些。。
import numpy as np
from bs4 import BeautifulSoup
from openpyxl import Workbook

upczww

2016-07-01 15:39:17 +08:00

就一个文件，直接运行就好了

ksupertu

2016-07-01 23:27:40 +08:00

安装 python2.7 然后 cmd 运行 pip install requests ，省事的话就虚拟个 ubuntu 去运行，不然 windows 各种 bug 让你抓狂

luyuncheng

2016-07-01 23:28:04 +08:00

你不应该先学学 python 入门？

grey5659

2016-07-02 18:46:47 +08:00

@luyuncheng 只是当工具用用

grey5659

2016-07-02 18:48:54 +08:00

@niboy @ksupertu
我换 linux 环境现在可以运行了，运行$ python doubanSpider.py 后一直在下载，是什么意思额？
/usr/local/lib/python2.7/dist-packages/bs4/__init__.py:166: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

To get rid of this warning, change this:

BeautifulSoup([your markup])

to this:

BeautifulSoup([your markup], "html.parser")

markup_type=markup_type))
Downloading Information From Page 1
Downloading Information From Page 2
Downloading Information From Page 3
Downloading Information From Page 4
Downloading Information From Page 5
Downloading Information From Page 6
WARNING:root:Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.
Downloading Information From Page 7
Downloading Information From Page 8
Downloading Information From Page 9
Downloading Information From Page 10
Downloading Information From Page 11
Downloading Information From Page 12
Downloading Information From Page 13
Downloading Information From Page 14
Downloading Information From Page 15
Downloading Information From Page 16
Downloading Information From Page 17
Downloading Information From Page 18
Downloading Information From Page 19
Downloading Information From Page 20
Downloading Information From Page 21
Downloading Information From Page 22
Downloading Information From Page 23
Downloading Information From Page 24

ksupertu

2016-07-03 20:15:38 +08:00 via Android

没什么大问题，就是 beautifulsoup 这个库报了个警告，因为没显式指定 html 解析器，爬虫已经在工作了