Improved use of machine learning techniques in named entity recognition

Kitoogo, Fredrick Edward

dc.contributor.author	Kitoogo, Fredrick Edward
dc.date.accessioned	2012-03-30T08:51:10Z
dc.date.available	2012-03-30T08:51:10Z
dc.date.issued	2009-09
dc.identifier.uri	http://hdl.handle.net/10570/495
dc.description	A Dissertation Submitted to the School of Graduate Studies in partial fulfillment for the award of the Degree of Doctor of Philosophy in Computer Science of Makerere University.	en_US
dc.description.abstract	The current digital era and particularly the evolution of the World Wide Web (WWW) has generated a multiplicity of knowledge resources stored in electronic formats. Some of the texts even have some form of resource description framework describing embedded meta-knowledge such as Author, Title, Date, Subject, and so on. The existence of such unexploited knowledge has arisen into the need for the utilization of large volumes of information from the resources, a key area of natural language processing (NLP). One of the primary methods of NLP used in understanding natural language is Named Entity Recognition (NER), a technique of systematically identifying and classifying (component) words into predefined entities (such as Person, Organization or Location names). Although many approaches to NER have been developed, the complexity of the NER task has posed a great challenge to develop systems with better performance. The recent trend employed to tackle the NER problem is the use of machine learning techniques. In this work, we begin with an extensive review of literature related to the research, then present the approaches which embrace the widely used machine learning dynamics for natural language processing which constitute classifier combination, feature engineering and meta-knowledge. We introduce the notion of recursive stacking for NER to smarten the classifier combination technique. A multi-objective genetic algorithm (MOGA) and a feature exploration technique are applied for feature engineering. Correspondingly, we formalize the domain independence capability in NER by introducing the concept of domain independent features. Consequently the idea of meta-knowledge is used to provide a basis for the use of specific classification algorithms as well as their corresponding combinations. To exhibit the feasibility of the approaches used, we induce the different models on different data sets which mainly comprised of manually annotated judicial data sets. Comprehensive experimental results demonstrate the benefits of our approaches. The methods applied in this work are empirically constituted and the results of this work provide a theoretical justification for integrating the three machine learning dynamics and provide a fundamental step in achieving a framework for NER.	en_US
dc.language.iso	en	en_US
dc.subject	Machine learning	en_US
dc.subject	Natural language processing	en_US
dc.subject	Named entity recognition	en_US
dc.title	Improved use of machine learning techniques in named entity recognition	en_US
dc.type	Thesis, phd	en_US

Files in this item

Name:: kitoogo-fredrick-edward-cit-ph ...
Size:: 2.739Mb
Format:: PDF
Description:: Thesis report

View/Open

This item appears in the following Collection(s)

School of Computing and Informatics Technology (CIT) Collection

Show simple item record