Table of Contents

[ Home ] – [ Research ] – [ Teaching ] – [ Misc ]


MODIFIER ICI LE TITRE DE VOTRE NOUVELLE PAGE

Third-party developper

Dans ma démarche de lister les différents composants, j'ai essayé d'abord d'identifier “qui faisait” et ensuite “ce qui est fait” (comme composant).

Dans cette deuxième étape, il est important de savoir ce que l'on pourra utiliser. Les caractéristiques des composants qu'il faut noter sont

Corriger, compléter, y placer les références vers les différentes guides officiels

Resituer UIMA

Guide du développer

Annuaires de composants

UIMA Sandbox Suggested Analysis Components

APACHE UIMA Annuaire

http://incubator.apache.org/uima/external-resources.html

Jena University Language & Information Engineering (JULIE) Lab

http://www.julielab.de/component/option,com_frontpage/Itemid,1/

Language Technologies Institute of the School of Computer Science at Carnegie Mellon University

Component Library http://uima.lti.cs.cmu.edu:8080/UCR/Welcome.do

The Center for Computational Pharmacology at the University of Colorodo

bio-nlp : has wrapped a number of popular bio-informatic annotators as UIMA components http://bionlp-uima.sourceforge.net/

RASP is a domain-independent, robust parsing system for English

http://www.digitalpebble.com/rasp4uima/index.html : processes of tokenisation, tagging, lemmatization and parsing

IBM IBM

http://www.alphaworks.ibm.com/tech/uima/download

IBM-UIMA-Adapter-2.2.zip

Projets Chez IBM

A variety of advanced IBM research projects focusing on developing and applying UIMA http://domino.research.ibm.com/comm/research_projects.nsf/pages/uima.researchProjects.html

Pas toujours en cours et semble être rattaché à la version IBM UIMA

Persistence

Persistence - XMI et MOF par l'OMG

XMI et MOF par l'Object Management Group (OMG)(international, open membership, not-for-profit computer industry consortium)

XML Metadata Interchange (XMI) is a model driven XML Integration framework for defining, interchanging, manipulating and integrating XML data and objects. XMI-based standards are in use for integrating tools, repositories, applications and data warehouses. XMI provides rules by which a schema can be generated for any valid XMI-transmissible MOF-based metamodel. ; XMI provides a mapping from MOF to XML. As MOF and XML technology evolved, the XMI mapping is being updated to comply with the latest versions of these specifications. Updates to the XMI mapping have tracked these version changes in a manner consistent with the existing XMI Production of XML Schema specification (XMI Version 2). ; Meta-Object Facility (MOF) is an extensible model driven integration framework for defining, manipulating and integrating metadata and data in a platform independent manner. MOF-based standards are in use for integrating tools, applications and data.

Persistence - BD

With Apache Derby

Interopérabilité

Interopérabilité - API C++, etc.

voir mail de diffusion et svn

Interopérabilité - Service Web

The UIMA Simple Server makes results of UIMA processing available in a simple, XML-based format. The intended use of the the Simple Server is to provide UIMA analysis as a REST service. The Simple Server is implemented as a Java Servlet, and can be deployed into any Servlet container (such as Apache Tomcat or Jetty). Click here to access the user documentation of the Simple Server. http://incubator.apache.org/uima/sandbox.html#simple-server

Interopérabilité - Bean Scripting Framework

The Bean Scripting Framework (BSF) Annotator is an Apache UIMA analysis engine that provides a link between the UIMA framework and the scripting languages that are supported by Apache BSF (http://jakarta.apache.org/bsf). The current implementation comes with examples in Beanshell (http://www.beanshell.org) and Rhino Javascript (http://www.mozilla.org/rhino). Simple tests have also been conducted successfully with Jython (http://jython.sourceforge.net/Project/index.html) and JRuby (http://jruby.codehaus.org). http://incubator.apache.org/uima/sandbox.html#bsf.annotator

Packaging des composants PEAR

NLP Composants

NLP component - Word Tokenization

Sandbox - Whitespace tokenizer annotator - http://incubator.apache.org/uima/sandbox.html#whitespace.tokenizer

bio-nlp http://bionlp-uima.sourceforge.net/ avec Genia Tagger, LingPipe, Penn BioTokenizer

OpenNLPTokenizer tokenizes the text and creates token annotations that span the tokens - Apache UIMA Example Wrappers for the OpenNLP Tools - http://uima.lti.cs.cmu.edu:8080/UCR/pages/static/osnlp/OpenNLPReadme.html english

NLP component - Word Stemming

Sandbox - Snowball Annotator - http://incubator.apache.org/uima/sandbox.html#snowball.annotator

NLP component - POS

OpenNLPPOSTagger assigns part-of-speech tags to tokens - Apache UIMA Example Wrappers for the OpenNLP Tools - http://uima.lti.cs.cmu.edu:8080/UCR/pages/static/osnlp/OpenNLPReadme.html english

NLP component - Sentence Spliter

bio-nlp http://bionlp-uima.sourceforge.net/ avec KeX LingPipe OpenNLP

OpenNLPSentenceDetector detects sentence boundaries and creates Sentence annotations that span these boundaries - Apache UIMA Example Wrappers for the OpenNLP Tools - http://uima.lti.cs.cmu.edu:8080/UCR/pages/static/osnlp/OpenNLPReadme.html english

NLP component - Phrasal and Clause Parsing

english

NLP component - Named Entity and acronyms

english

NLP component - Semantic Parsing

Tool

Tool - Machine Learning

WEKA

UIMA Wrapper for Lingpipe Classifier http://uima.lti.cs.cmu.edu:8080/UCR/pages/static/osnlp/UIMALingpipeClassifier.htm

Tool - Analyser

Sandbox - Regular Expression Annotator http://incubator.apache.org/uima/sandbox.html#regex.annotator

Tool - Annotation Editor

Sandbox - Cas Editor is an annotation tool which supports manual and automatic annotation of CAS files. http://incubator.apache.org/uima/sandbox.html#CAS%20Editor

Tool - Dictionary Annotator

Sandbox - Dictionary Annotator is an Apache UIMA analysis engine that creates annotations based on word lists that are compiled to simple dictionaries. http://incubator.apache.org/uima/sandbox.html#dict.annotator

Tool - OpenNLP

APACHE UIMA ; wrappers ; NLP Process ; English Models/Buildable For French ?

OpenNLP Tools is an open source package of natural language processing components written in pure Java. The tools are based on Adwait Ratnaparkhi's Ph.D. dissertation (UPenn, 1998), which shows how to apply Maximum Entropy models to various language ambiguity problems. The OpenNLP Tools rely on the OpenNLP MAXENT package, a mature Java package for training and using maximum entropy models.

The OpenNLP Tools package (as of Version 1.3) includes a sentence detector, tokenizer, part-of-speech tagger, noun phrase chunker, shallow parser, named entity detector, and co-reference resolver. All together these tools provide a rich and powerful set of text analysis capabilities.

The Apache UIMA Example Wrappers for OpenNLP provides UIMA annotators for most of the OpenNLP Tools components, allowing you to run the OpenNLP Tools as UIMA annotators.

Tool - GATE

UIMA-GATE interoperability layer is based on the UIMA SDK version 1.2.3 (i.e. IBM alpha) http://gate.ac.uk/sale/tao/#chap:uima

GATE vs UIMA

UIMA (http://www.research.ibm.com/UIMA/) is a language processing framework developed by IBM. UIMA and GATE share some functionality but are complementary in most respects.  GATE now provides an interoperability layer to allow UIMA applications to include GATE components in their processing and vice-versa. 
It has many similarities to the GATE architecture – it represents documents as text plus annotations, and allows users to define pipelines of analysis engines that manipulate the document (or Common Analysis Structure in UIMA terminology) in much the same way as processing resources do in GATE. 
Clearly, it would be useful to be able to include UIMA components in GATE applications and vice-versa, letting GATE users take advantage of UIMA’s flexible deployment options and UIMA users access JAPE and the many useful plugins already available in GATE. 

There are some components in GATE (particularly Annie and JAPE) which I would like to use in UIMA.
UIMA has many features in common with other software architectures for language engineering such as GATE4,5 and ATLAS.6 Each of these systems isolates the core algorithms that perform language processing from system services such as storing of data, communication between components, and visualization of results. However, UIMA's emphasis on  transferring UIM technologies to products has led to a richer architecture that allows integrating applications with a host of enterprise products (e.g., WebSphere* Portal Server, Lotus* Workplace) and a variety of middleware and platform options.
We (Temis) have built our new corporate product on top of UIMA. We made this decision on year ago now. The choice was mainly between using UIMA or to do it ourselves. We resisted to the last option! We did a quick survey of other frameworks (GATE ...) but UIMA was more appropriated for

our need of a core framework platform. We liked its homogeneousity, the quality of the code, the documentation, the quick evolution, the planned move to open source and to a commercial friendly license. Send me a private message if you want to have a talk about this. (pascal.coupet <at> temis.com).

Dear Ekaterina,
I cannot directly answer your question as I am not an UIMA or GATE  wizard. I can tell you why I elected  UIMA rather than GATE and OpenNLP.
 1. I use a finite state machine toolbox of my own written in Java but
    I did not want to close the door to other applications wirtten in
    C or in Perl and for what I read when I made my decision only UIMA
    offered a clear and clean way to integrate  C or  Perl apps via 
    the descriptors fence.
 2. I know UIMA has been used in heavy industrial applications by IBM
    like Business Insight,
 3. I did not find any major differences in the documentations
    concerning the annotation scheme. In fact for my own purpose a
    list of labels, a start and an end position in the text was just
    fine for me.
 4. I did not need any other linguistics tools than mines.
 5. and last but not the least it was crucial to me to have the
    possibility to integrate easily the unstructured information part
    in Eclipse and only UIMA offered an easy way to do it.
 6. Besides, it is not too hard to integrate external applications in
    Eclipse so if somedays I need some other tools I know it will be
    easier to integrate them in eclipse than in any other environment.

In short UIMA and Eclipse are solid and complementary on the long term I  believe they have better odds than GATE and OpenNLP, even if in terms of implemented algorithms and programs UIMA is poorer than GATE and OpenNLP.
Ekaterina Buyko wrote:
> Hi,
>
> I am looking for a comparison of UIMA and GATE systems.
>
>  What does UIMA offer more or less as GATE does it? I am interested in 
> the general contrast between the UIMA and GATE and in particular in 
> the comparison of type systems and GATE annotation schemata. Can we 
> convert all UIMA types to GATE types without any restrictions or does 
> UIMA offers more features in implementation of annotation schemata as 
> GATE?
>
> Thanks,
>
> Katja
>
-- 
Cordialement/Regards
Christian Mauceri
http://hermeneute.com/Christian

* http://article.gmane.org/gmane.comp.apache.uima.general/348/match=perl

Application

Application - Search Engine and Semantic Search

Principes d'UIMA pour la recherche sémantique

http://www-306.ibm.com/software/data/enterprise-search/omnifind-enterprise/

UIMA Lucene CAS Indexer (LuCAS) http://www.julielab.de/content/view/117/186/

Application - bio

http://bionlp-uima.sourceforge.net/

Application - Web sémantique

http://mondeca.wordpress.com/2007/09/11/uima-peut-il-reconcilier-le-text-mining-et-les-outils-semantiques/

Bibliographie

D. Ferrucci and A. Lally. “UIMA: an architectural approach to unstructured information processing in the corporate research environment,” Natural Language Engineering 10, No. 3-4, 327-348 (2004). www.research.ibm.com/UIMA/

D. Ferrucci and A. Lally, “Building an example application with the Unstructured Information Management Architecture,” IBM Systems Journal 43, No. 3, 455-475 (2004). http://www.research.ibm.com/journal/sj43-3.html

T. Goetz and O. Suhre “Design and implementation of the UIMA Common Analysis System,” IBM Systems Journal 43, No. 3, 490-515 (2004). http://www.research.ibm.com/journal/sj43-3.html

R. Mack, S. Mukherjea, A. Soffer, N. Uramoto, E. Brown, A. Coden, J. Cooper, A. Inokuchi, B. Iyer, Y. Mass, H. Matsuzawa, and L. V. Subramaniam Text analytics for life science using the Unstructured Information Management Architecture IBM Systems Journal 43, No. 3, p. 490 http://www.research.ibm.com/journal/sj43-3.html

N. Uramoto, H. Matsuzawa, T. Nagano, A. Murakami, H. Takeuchi, and K. Takeda A text-mining system for knowledge discovery from biomedical documents IBM Systems Journal 43, No. 3, p. 516 http://www.research.ibm.com/journal/sj43-3.html

Towards Declarative Information Extraction: The Almaden Story, Industrial KeyNote talk by Shivakumar Vaithyanathan at Web Intelligence, 2007. http://domino.research.ibm.com/comm/research_projects.nsf/pages/uima.projectUimaArchitectureFramework.html

Anthony Levas, Eric Brown, J. William Murdock, and David Ferrucci. “The Semantic Analysis Workbench (SAW): Towards a Framework for Knowledge Gathering and Synthesis.” Proceedings of the International Conference on Intelligence Analysis. McClean, VA, May 2-6, 2005.

The Linguistic Annotation Workshop A Merger of NLPXML 2007 and FLAC 2007 The LAW ACL 2007 Prague, Czech Republic, June 28-29, 2007 http://www.ling.uni-potsdam.de/acl-lab/LAW-07.html

Udo Hahn, Ekaterina Buyko, Katrin Tomanek, Scott Piao, John McNaught, Yoshimasa Tsuruoka, and Sophia Ananiadou. An annotation type system for a data-driven NLP pipeline. In The LAW at ACL 2007 – Proceedings of the Linguistic Annotation Workshop, pages 33–40. Prague, Czech Republic, June 28-29, 2007. Stroudsburg, PA: Association for Computational Linguistics, 2007. http://www.ling.uni-potsdam.de/acl-lab/LAW-07.html

Scott Piao, Ekaterina Buyko, Yoshimasa Tsuruoka, Katrin Tomanek, Jin Dong Kim, John McNaught, Udo Hahn, Jian Su, and Sophia Ananiadou. Bootstrep annotation scheme: Encoding information for text mining. In Corpus Linguistics 2007 -– Proceedings of the 4th Corpus Linguistics Conference. Birmingham, England, U.K., July 27-30, 2007, 2007.

Other

A faire

Format de donnée

.xmi

input/output dir .txt

gère simplement langue et charset

De l'aide sur


As far as NLP proper is concerned, Carnegie Mellon University's Language Technology Institute is hosting an UIMA Component Repository web site (http://uima.lti.cs.cmu.edu), where developers can post information about their analytics components and anyone can find out more about free and commercially available UIMA-compliant analytics. Additionally, free analytic tools that can work with UIMA include those from the General Architecture for Text Engineering (GATE - http://gate.ac.uk/) and OpenNLP (http://opennlp.sourceforge.net/) communities, as well as Jena University’s Language & Information Engineering (JULIE) (http://www.julielab.de) Lab. Commercial analytics are available from IBM, as well as from other software vendors such as Attensity, ClearForest, Temis and Nstein.

Outre IBM, plusieurs organisations universitaires et industrielles utilisent UIMA pour développer des analyseurs et des solutions d'UIM.

Des communautés d'utilisateurs et de développeurs actives, comme peuvent en témoigner les listes de diffusion dédiées.

Il existe encore peu d'universités et d'industriels utilisateurs d'UIMA.

Des travaux d'intégration qui témoignent d'un intérêt certain pour la plate-forme : encapsuleur réciproque envers GATE, encapsuleur d'outils de la suite OpenNLP, Lucene, Weka…

L'actualité tourne autour d'un profond effort sur la plate-forme plutôt que sur les passerelles ; peut requérir une adaptation suivant la version initiale du SDK pour lequel fut développé ; mais en lien avec le développement de la plate-forme, toujours un développement en cours d'un adaptateur de composants développé pour la version IBM alpha

including statistical and rule-based Natural Language Processing (NLP), Information Retrieval (IR), machine learning, and ontologies, semantic search, information extraction and text mining

Nicolas Hernandez