抓网页，数据无法解码，看着不像编码给错 - V2EX

Home Sign Up Sign In

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

Sign Up Now

For Existing Member Sign In

推荐学习书目

› Learn Python the Hard Way

Python Sites

› PyPI - Python Package Index

› http://diveintopython.org/toc/index.html

› Pocoo

值得关注的项目

› PyPy

› Celery

› Jinja2

› Read the Docs

› gevent

› pyenv

› Stackless Python

› Beautiful Soup

› 结巴中文分词

› Green Unicorn

› Sentry

› Shovel

› pytest

Python 编程

› pep8 Checker

Styles

› PEP 8

› Google Python Style Guide

› Code Style from The Hitchhiker's Guide

This topic created in 3056 days ago, the information mentioned may be changed or developed.

初学... 代码是这样：

from html.parser import HTMLParser
import urllib.request
import chardet

pars = HTMLParser()
home_url = "https://wallstreetcn.com/"
response = urllib.request.urlopen(home_url)
content = response.read()
encoding = chardet.detect(content)
pars.feed(content.decode(encoding["encoding"],errors="ignore"))

chrome 看网页 metadata 里面 charset 用的 utf-8，我这里无论直接用'utf-8' 还是检测编码，均无法正确解码，有点 response 根本就没给出正确数据的感觉。请教一下

7 replies • 2017-12-27 15:59:04 +08:00

1

n329291362

Dec 17, 2017

1

1f8b 开头。。。。gzip 压缩啊最简单的
import gzip
gzip.decompress(content)

2

swordspoet

Dec 18, 2017 via iPhone

换一个 HTML 解析器，html.parser 的容错率不高，试试看 lxml

3

swordspoet

Dec 18, 2017 via iPhone

from bs4 import BeautifulSoup

standard_html = BeautifulSoup(content, 'lxml')

试试看这个～

4

free9fw

Dec 18, 2017

Accept-Encoding:gzip, deflate, br

5

hukangha

OP

Dec 18, 2017

@n329291362
果然... 可是如果是其他的压缩什么的怎么办... 只能这样靠丰富的经验么...

6

n329291362

Dec 19, 2017

@hukangha
也可以控制头 Accept-Encoding 让服务端返回没压缩过的数据
网页最多也就是 gzip 压缩在复杂也复杂不到哪

7

F1024

Dec 27, 2017

import os
import requests

html = requests.get('https://wallstreetcn.com').content.decode('utf-8')
print(html)

os.system("pause")

是这个吗

About · Help · Advertise · Blog · API · FAQ · Solana · 2699 Online Highest 6679 ·

Select Language

创意工作者们的社区

World is powered by solitude

VERSION: 3.9.8.5 · 35ms · UTC 03:08 · PVG 11:08 · LAX 20:08 · JFK 23:08
♥ Do have faith in what you're doing.