文本挖掘（英文版）_2009年人民邮电出版社出版的图书

文本挖掘（英文版）

2009年人民邮电出版社出版的图书

《文本挖掘（英文版）》是 2009年8月人民邮电出版社出版的图书，作者是费尔德曼。该书中涵盖了核心文本挖掘操作、文本挖掘预处理技术、分类、聚类、信息提取、信息提取的概率模型、预处理应用、可视化方法、链接分析、文本挖掘应用等内容，很好地结合了文本挖掘的理论和实践。

内容简介

《文本挖掘(英文版)》是一部文本挖掘领域名著，作者为世界知名的权威学者。《文本挖掘(英文版)》非常适合文本挖掘、信息检索领域的研究人员和实践者阅读，也适合作为高等院校计算机及相关专业研究生的数据挖掘和知识发现等课程的教材。

作者简介

Ronen FeIdmarl，机器学习、数据挖掘和非结构化数据管理的先驱人物。以色列Bar一liarl大学数学与计算机科学系高级讲师、数据挖掘实验室主任，Clearforest公司（主要为企业和政府机构开发下一代文本挖掘应用）合作创始人、董事长，现在还是纽约大学Stern商学院的副教授。

James Sanger风险投资家，商业数据解决方案、因特网应用和IT安全产品领域公认的行业专家。他于1982年与人合伙创立了ABS Vetllures公司。此前，他是DB Capital纽约公司的常务董事他本科毕业于宾夕法尼亚大学，研究生就读于牛津大学和利物浦大学他是IEEE和美国人工智能协会（AAAI）会员。

媒体推荐

“……我购买了这本书。这本书绝对是非常值得拥有的参考书。”

——L.Venkata Subramaniam，IBM印度研究实验室

“一本由该领域最重要专家鳊写的文本挖掘导论。这本书写得非常好。完美地结合了文本挖掘的理论和实践，既适合研究人员又适合实践者……极力推荐那些没有任何计算语言学背景而想钻研文本挖掘领域的人阅读本书。”

——Rada Mihalcea，北得克萨斯大学

文本挖掘已经成为令人兴奋的新兴研究领域。本书由世界知名的权威学者编写，除了讲解核心文本挖掘和链路检测算法及技术之外，还介绍了高级预处理技术。并考虑了知识表示方面的因素以及可视化方法。此外。书中还探讨了有关技术在实践中的应用，很好地兼顾了文本挖掘的理论和实践

I. Introduction to Text Mining 1

I.1 Defining Text Mining 1

I.2 General Architecture of Text Mining Systems 13

II. Core Text Mining Operations 19

II.1 Core Text Mining Operations 19

II.2 Using Background Knowledge for Text Mining 41

II.3 Text Mining Query Languages 51

III. Text Mining Preprocessing Techniques 57

III.1 Task-Oriented Approaches 58

III.2 Further Reading 62

IV. Categorization 64

IV.1 Applications of Text Categorization 65

IV.2 Definition of the Problem 66

IV.3 Document Representation 68

IV.4 Knowledge Engineering Approach to TC 70

IV.5 Machine Learning Approach to TC 70

IV.6 Using Unlabeled Data to Improve Classification 78

IV.7 Evaluation of Text Classifiers 79

IV.8 Citations and Notes 80

V. Clustering 82

V.1 Clustering Tasks in Text Analysis 82

V.2 The General Clustering Problem 84

V.3 Clustering Algorithms 85

V.4 Clustering of Textual Data 88

V.5 Citations and Notes 92

VI. Information Extraction 94

VI.1 Introduction to Information Extraction 94

VI.2 Historical Evolution of IE: The Message Understanding Conferences and Tipster 96

VI.3 IE Examples 101

VI.4 Architecture of IE Systems 104

VI.5 Anaphora Resolution 109

VI.6 Inductive Algorithms for IE 119

VI.7 Structural IE 122

VI.8 Further Reading 129

VII. Probabilistic Models for Information Extraction 131

VII.1 Hidden Markov Models 131

VII.2 Stochastic Context-Free Grammars 137

VII.3 Maximal Entropy Modeling 138

VII.4 Maximal Entropy Markov Models 140

VII.5 Conditional Random Fields 142

VII.6 Further Reading 145

VIII. Preprocessing Applications Using Probabilistic and Hybrid Approaches 146

VIII.1 Applications of HMM to Textual Analysis 146

VIII.2 Using MEMM for Information Extraction 152

VIII.3 Applications of CRFs to Textual Analysis 153

VIII.4 TEG: Using SCFG Rules for Hybrid Statistical–Knowledge-Based IE 155

VIII.5 Bootstrapping 166

VIII.6 Further Reading 175

IX. Presentation-Layer Considerations for Browsing and Query Refinement 177

IX.1 Browsing 177

IX.2 Accessing Constraints and Simple Specification Filters at the Presentation Layer 185

IX.3 Accessing the Underlying Query Language 186

IX.4 Citations and Notes 187

X. Visualization Approaches 189

X.1 Introduction 189

X.2 Architectural Considerations 192

X.3 Common Visualization Approaches for Text Mining 194

X.4 Visualization Techniques in Link Analysis 225

X.5 Real-World Example: The Document Explorer System 235

XI. Link Analysis 244

XI.1 Preliminaries 244

XI.2 Automatic Layout of Networks 246

XI.3 Paths and Cycles in Graphs 250

XI.4 Centrality 251

XI.5 Partitioning of Networks 259

XI.6 Pattern Matching in Networks 272

XI.7 Software Packages for Link Analysis 273

XI.8 Citations and Notes 274

XII. Text Mining Applications 275

XII.1 General Considerations 276

XII.2 Corporate Finance: Mining Industry Literature for Business Intelligence 281

XII.3 A “Horizontal” Text Mining Application: Patent Analysis Solution Leveraging a Commercial Text Analytics Platform 297

XII.4 Life Sciences Research: Mining Biological Pathway Information with GeneWays 309

Appendix A: DIAL: A Dedicated Information Extraction Language forText Mining 317

A.1 What Is the DIAL Language? 317

A.2 Information Extraction in the DIAL Environment 318

A.3 Text Tokenization 320

A.4 Concept and Rule Structure 320

A.5 Pattern Matching 322

A.6 Pattern Elements 323

A.7 Rule Constraints 327

A.8 Concept Guards 328

A.9 Complete DIAL Examples 329

Bibliography 337

Index 391

参考资料

文本挖掘（英文版）.豆瓣.

最新修订时间：2023-06-27 13:16

条目作者

小编

资深百科编辑

概述

内容简介

作者简介

参考资料