Distant-Supervision
定义 Definition
(机器学习/自然语言处理)“远程监督/弱监督”方法:不直接人工逐条标注训练数据,而是利用现成的外部资源(如知识库、词典、规则、网页信息)自动生成带噪声的标签,用来训练模型。常用于关系抽取、文本分类、命名实体识别等任务。(也常写作 distant supervision,不一定加连字符。)
发音 Pronunciation (IPA)
/ˈdɪstənt ˌsuːpɚˈvɪʒən/
例句 Examples
Distant supervision can reduce the cost of labeling data.
远程监督可以降低数据标注的成本。
In relation extraction, distant supervision aligns knowledge-base facts with sentences, but the automatically generated labels often contain noise that the model must learn to handle.
在关系抽取中,远程监督会把知识库事实与句子对齐,但自动生成的标签往往含有噪声,模型需要学会应对这些噪声。
词源 Etymology
distant 表示“间接的、隔着距离的”,supervision 表示“监督、指导”。合在一起强调:监督信号并非来自近距离的人工逐条检查,而是“隔一层”从外部资源间接获得,因此通常更便宜但更嘈杂(噪声更大)。该术语在机器学习与信息抽取领域中逐渐固定下来。
相关词 Related Words
文献与作品 Literary/Notable Works
- Mintz, M., Bills, S., Snow, R., & Jurafsky, D. (2009). Distant supervision for relation extraction without labeled data.
- Riedel, S., Yao, L., & McCallum, A. (2010). *Modeling relations and their mentions without labeled text.*(常被视为远程监督关系抽取的重要工作之一)
- Hoffmann, R., Zhang, C., Ling, X., Zettlemoyer, L., & Weld, D. S. (2011). Knowledge-based weak supervision for information extraction of overlapping relations.
- Ratner, A., Bach, S., Ehrenberg, H., Fries, J., Wu, S., & Ré, C. (2016). *Data programming: Creating large training sets, quickly.*(与弱监督/远程监督密切相关的代表性框架思路)