比如这贴:
https://www.v2ex.com/t/220720#r_2427082
逗号被匹配成了 URL 的一部分。
根据 RFC2396
http://www.ietf.org/rfc/rfc2396.txt (参考
http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html )中的 2.1 URI and non-ASCII characters :
In the simplest case, the original character sequence contains only
characters that are defined in US-ASCII, and the two levels of
mapping are simple and easily invertible: each 'original character'
is represented as the octet for the US-ASCII code for it, which is,
in turn, represented as either the US-ASCII character, or else the
"%" escape sequence for that octet.
当然下文还提到一个复杂情况的。但是鉴于浏览器会在复制非 ASCII 字符的时候自动进行% escape ,那么其实也就几乎可以忽略那个“复杂情况”了。