[ Home ] – [ Research ] – [ Teaching ] – [ Misc ]
Dans ma démarche de lister les différents composants, j'ai essayé d'abord d'identifier “qui faisait” et ensuite “ce qui est fait” (comme composant).
Dans cette deuxième étape, il est important de savoir ce que l'on pourra utiliser. Les caractéristiques des composants qu'il faut noter sont
Corriger, compléter, y placer les références vers les différentes guides officiels
UIMA Sandbox Suggested Analysis Components
http://www.julielab.de/component/option,com_frontpage/Itemid,1/
Component Library http://uima.lti.cs.cmu.edu:8080/UCR/Welcome.do
bio-nlp : has wrapped a number of popular bio-informatic annotators as UIMA components http://bionlp-uima.sourceforge.net/
http://www.digitalpebble.com/rasp4uima/index.html : processes of tokenisation, tagging, lemmatization and parsing
http://www.alphaworks.ibm.com/tech/uima/download
IBM-UIMA-Adapter-2.2.zip
A variety of advanced IBM research projects focusing on developing and applying UIMA http://domino.research.ibm.com/comm/research_projects.nsf/pages/uima.researchProjects.html
Pas toujours en cours et semble être rattaché à la version IBM UIMA
XMI et MOF par l'Object Management Group (OMG)(international, open membership, not-for-profit computer industry consortium)
XML Metadata Interchange (XMI) is a model driven XML Integration framework for defining, interchanging, manipulating and integrating XML data and objects. XMI-based standards are in use for integrating tools, repositories, applications and data warehouses. XMI provides rules by which a schema can be generated for any valid XMI-transmissible MOF-based metamodel. ; XMI provides a mapping from MOF to XML. As MOF and XML technology evolved, the XMI mapping is being updated to comply with the latest versions of these specifications. Updates to the XMI mapping have tracked these version changes in a manner consistent with the existing XMI Production of XML Schema specification (XMI Version 2). ; Meta-Object Facility (MOF) is an extensible model driven integration framework for defining, manipulating and integrating metadata and data in a platform independent manner. MOF-based standards are in use for integrating tools, applications and data.
With Apache Derby
voir mail de diffusion et svn
The UIMA Simple Server makes results of UIMA processing available in a simple, XML-based format. The intended use of the the Simple Server is to provide UIMA analysis as a REST service. The Simple Server is implemented as a Java Servlet, and can be deployed into any Servlet container (such as Apache Tomcat or Jetty). Click here to access the user documentation of the Simple Server. http://incubator.apache.org/uima/sandbox.html#simple-server
The Bean Scripting Framework (BSF) Annotator is an Apache UIMA analysis engine that provides a link between the UIMA framework and the scripting languages that are supported by Apache BSF (http://jakarta.apache.org/bsf). The current implementation comes with examples in Beanshell (http://www.beanshell.org) and Rhino Javascript (http://www.mozilla.org/rhino). Simple tests have also been conducted successfully with Jython (http://jython.sourceforge.net/Project/index.html) and JRuby (http://jruby.codehaus.org). http://incubator.apache.org/uima/sandbox.html#bsf.annotator
Sandbox - Whitespace tokenizer annotator - http://incubator.apache.org/uima/sandbox.html#whitespace.tokenizer
bio-nlp http://bionlp-uima.sourceforge.net/ avec Genia Tagger, LingPipe, Penn BioTokenizer
OpenNLPTokenizer tokenizes the text and creates token annotations that span the tokens - Apache UIMA Example Wrappers for the OpenNLP Tools - http://uima.lti.cs.cmu.edu:8080/UCR/pages/static/osnlp/OpenNLPReadme.html english
Sandbox - Snowball Annotator - http://incubator.apache.org/uima/sandbox.html#snowball.annotator
OpenNLPPOSTagger assigns part-of-speech tags to tokens - Apache UIMA Example Wrappers for the OpenNLP Tools - http://uima.lti.cs.cmu.edu:8080/UCR/pages/static/osnlp/OpenNLPReadme.html english
bio-nlp http://bionlp-uima.sourceforge.net/ avec KeX LingPipe OpenNLP
OpenNLPSentenceDetector detects sentence boundaries and creates Sentence annotations that span these boundaries - Apache UIMA Example Wrappers for the OpenNLP Tools - http://uima.lti.cs.cmu.edu:8080/UCR/pages/static/osnlp/OpenNLPReadme.html english
english
english
WEKA
UIMA Wrapper for Lingpipe Classifier http://uima.lti.cs.cmu.edu:8080/UCR/pages/static/osnlp/UIMALingpipeClassifier.htm
Sandbox - Regular Expression Annotator http://incubator.apache.org/uima/sandbox.html#regex.annotator
Sandbox - Cas Editor is an annotation tool which supports manual and automatic annotation of CAS files. http://incubator.apache.org/uima/sandbox.html#CAS%20Editor
Sandbox - Dictionary Annotator is an Apache UIMA analysis engine that creates annotations based on word lists that are compiled to simple dictionaries. http://incubator.apache.org/uima/sandbox.html#dict.annotator
APACHE UIMA ; wrappers ; NLP Process ; English Models/Buildable For French ?
OpenNLP Tools is an open source package of natural language processing components written in pure Java. The tools are based on Adwait Ratnaparkhi's Ph.D. dissertation (UPenn, 1998), which shows how to apply Maximum Entropy models to various language ambiguity problems. The OpenNLP Tools rely on the OpenNLP MAXENT package, a mature Java package for training and using maximum entropy models.
The OpenNLP Tools package (as of Version 1.3) includes a sentence detector, tokenizer, part-of-speech tagger, noun phrase chunker, shallow parser, named entity detector, and co-reference resolver. All together these tools provide a rich and powerful set of text analysis capabilities.
The Apache UIMA Example Wrappers for OpenNLP provides UIMA annotators for most of the OpenNLP Tools components, allowing you to run the OpenNLP Tools as UIMA annotators.
UIMA-GATE interoperability layer is based on the UIMA SDK version 1.2.3 (i.e. IBM alpha) http://gate.ac.uk/sale/tao/#chap:uima
GATE vs UIMA
UIMA (http://www.research.ibm.com/UIMA/) is a language processing framework developed by IBM. UIMA and GATE share some functionality but are complementary in most respects. GATE now provides an interoperability layer to allow UIMA applications to include GATE components in their processing and vice-versa.
It has many similarities to the GATE architecture – it represents documents as text plus annotations, and allows users to define pipelines of analysis engines that manipulate the document (or Common Analysis Structure in UIMA terminology) in much the same way as processing resources do in GATE. Clearly, it would be useful to be able to include UIMA components in GATE applications and vice-versa, letting GATE users take advantage of UIMA’s flexible deployment options and UIMA users access JAPE and the many useful plugins already available in GATE. There are some components in GATE (particularly Annie and JAPE) which I would like to use in UIMA.
UIMA has many features in common with other software architectures for language engineering such as GATE4,5 and ATLAS.6 Each of these systems isolates the core algorithms that perform language processing from system services such as storing of data, communication between components, and visualization of results. However, UIMA's emphasis on transferring UIM technologies to products has led to a richer architecture that allows integrating applications with a host of enterprise products (e.g., WebSphere* Portal Server, Lotus* Workplace) and a variety of middleware and platform options.
We (Temis) have built our new corporate product on top of UIMA. We made this decision on year ago now. The choice was mainly between using UIMA or to do it ourselves. We resisted to the last option! We did a quick survey of other frameworks (GATE ...) but UIMA was more appropriated for
our need of a core framework platform. We liked its homogeneousity, the quality of the code, the documentation, the quick evolution, the planned move to open source and to a commercial friendly license. Send me a private message if you want to have a talk about this. (pascal.coupet <at> temis.com).
Dear Ekaterina, I cannot directly answer your question as I am not an UIMA or GATE wizard. I can tell you why I elected UIMA rather than GATE and OpenNLP. 1. I use a finite state machine toolbox of my own written in Java but I did not want to close the door to other applications wirtten in C or in Perl and for what I read when I made my decision only UIMA offered a clear and clean way to integrate C or Perl apps via the descriptors fence. 2. I know UIMA has been used in heavy industrial applications by IBM like Business Insight, 3. I did not find any major differences in the documentations concerning the annotation scheme. In fact for my own purpose a list of labels, a start and an end position in the text was just fine for me. 4. I did not need any other linguistics tools than mines. 5. and last but not the least it was crucial to me to have the possibility to integrate easily the unstructured information part in Eclipse and only UIMA offered an easy way to do it. 6. Besides, it is not too hard to integrate external applications in Eclipse so if somedays I need some other tools I know it will be easier to integrate them in eclipse than in any other environment. In short UIMA and Eclipse are solid and complementary on the long term I believe they have better odds than GATE and OpenNLP, even if in terms of implemented algorithms and programs UIMA is poorer than GATE and OpenNLP.
Ekaterina Buyko wrote: > Hi, > > I am looking for a comparison of UIMA and GATE systems. > > What does UIMA offer more or less as GATE does it? I am interested in > the general contrast between the UIMA and GATE and in particular in > the comparison of type systems and GATE annotation schemata. Can we > convert all UIMA types to GATE types without any restrictions or does > UIMA offers more features in implementation of annotation schemata as > GATE? > > Thanks, > > Katja > -- Cordialement/Regards Christian Mauceri http://hermeneute.com/Christian * http://article.gmane.org/gmane.comp.apache.uima.general/348/match=perl
Principes d'UIMA pour la recherche sémantique
http://www-306.ibm.com/software/data/enterprise-search/omnifind-enterprise/
UIMA Lucene CAS Indexer (LuCAS) http://www.julielab.de/content/view/117/186/
http://bionlp-uima.sourceforge.net/
D. Ferrucci and A. Lally. “UIMA: an architectural approach to unstructured information processing in the corporate research environment,” Natural Language Engineering 10, No. 3-4, 327-348 (2004). www.research.ibm.com/UIMA/
D. Ferrucci and A. Lally, “Building an example application with the Unstructured Information Management Architecture,” IBM Systems Journal 43, No. 3, 455-475 (2004). http://www.research.ibm.com/journal/sj43-3.html
T. Goetz and O. Suhre “Design and implementation of the UIMA Common Analysis System,” IBM Systems Journal 43, No. 3, 490-515 (2004). http://www.research.ibm.com/journal/sj43-3.html
R. Mack, S. Mukherjea, A. Soffer, N. Uramoto, E. Brown, A. Coden, J. Cooper, A. Inokuchi, B. Iyer, Y. Mass, H. Matsuzawa, and L. V. Subramaniam Text analytics for life science using the Unstructured Information Management Architecture IBM Systems Journal 43, No. 3, p. 490 http://www.research.ibm.com/journal/sj43-3.html
N. Uramoto, H. Matsuzawa, T. Nagano, A. Murakami, H. Takeuchi, and K. Takeda A text-mining system for knowledge discovery from biomedical documents IBM Systems Journal 43, No. 3, p. 516 http://www.research.ibm.com/journal/sj43-3.html
Towards Declarative Information Extraction: The Almaden Story, Industrial KeyNote talk by Shivakumar Vaithyanathan at Web Intelligence, 2007. http://domino.research.ibm.com/comm/research_projects.nsf/pages/uima.projectUimaArchitectureFramework.html
Anthony Levas, Eric Brown, J. William Murdock, and David Ferrucci. “The Semantic Analysis Workbench (SAW): Towards a Framework for Knowledge Gathering and Synthesis.” Proceedings of the International Conference on Intelligence Analysis. McClean, VA, May 2-6, 2005.
The Linguistic Annotation Workshop A Merger of NLPXML 2007 and FLAC 2007 The LAW ACL 2007 Prague, Czech Republic, June 28-29, 2007 http://www.ling.uni-potsdam.de/acl-lab/LAW-07.html
Udo Hahn, Ekaterina Buyko, Katrin Tomanek, Scott Piao, John McNaught, Yoshimasa Tsuruoka, and Sophia Ananiadou. An annotation type system for a data-driven NLP pipeline. In The LAW at ACL 2007 – Proceedings of the Linguistic Annotation Workshop, pages 33–40. Prague, Czech Republic, June 28-29, 2007. Stroudsburg, PA: Association for Computational Linguistics, 2007. http://www.ling.uni-potsdam.de/acl-lab/LAW-07.html
Scott Piao, Ekaterina Buyko, Yoshimasa Tsuruoka, Katrin Tomanek, Jin Dong Kim, John McNaught, Udo Hahn, Jian Su, and Sophia Ananiadou. Bootstrep annotation scheme: Encoding information for text mining. In Corpus Linguistics 2007 -– Proceedings of the 4th Corpus Linguistics Conference. Birmingham, England, U.K., July 27-30, 2007, 2007.
A faire
.xmi
input/output dir .txt
gère simplement langue et charset
As far as NLP proper is concerned, Carnegie Mellon University's Language Technology Institute is hosting an UIMA Component Repository web site (http://uima.lti.cs.cmu.edu), where developers can post information about their analytics components and anyone can find out more about free and commercially available UIMA-compliant analytics. Additionally, free analytic tools that can work with UIMA include those from the General Architecture for Text Engineering (GATE - http://gate.ac.uk/) and OpenNLP (http://opennlp.sourceforge.net/) communities, as well as Jena University’s Language & Information Engineering (JULIE) (http://www.julielab.de) Lab. Commercial analytics are available from IBM, as well as from other software vendors such as Attensity, ClearForest, Temis and Nstein.
Outre IBM, plusieurs organisations universitaires et industrielles utilisent UIMA pour développer des analyseurs et des solutions d'UIM.
Des communautés d'utilisateurs et de développeurs actives, comme peuvent en témoigner les listes de diffusion dédiées.
Il existe encore peu d'universités et d'industriels utilisateurs d'UIMA.
Des travaux d'intégration qui témoignent d'un intérêt certain pour la plate-forme : encapsuleur réciproque envers GATE, encapsuleur d'outils de la suite OpenNLP, Lucene, Weka…
L'actualité tourne autour d'un profond effort sur la plate-forme plutôt que sur les passerelles ; peut requérir une adaptation suivant la version initiale du SDK pour lequel fut développé ; mais en lien avec le développement de la plate-forme, toujours un développement en cours d'un adaptateur de composants développé pour la version IBM alpha
including statistical and rule-based Natural Language Processing (NLP), Information Retrieval (IR), machine learning, and ontologies, semantic search, information extraction and text mining