NLTK is a great open source NLP package written in Python. It comes with an online book. I decided to try to embed IronPython under C# and run NLTK from there. Here are a few thoughts about the experience.
Problems with embedding IronPython and NLTK
- Some libraries that NLTK uses are not installed in IronPython, e.g. zlib and numpy, you can mainly patch this up
- You need a good understanding of how embedded IronPython works
- The connection between Python and C# is not seamless
- Sending data between Python and C# takes work
- NLTK is pretty slow at starting up
- Doing large scale machine learning in NLTK is slow
C# and IronPython
IronPython is a very good implementation of Python, but in C# 3.5 there is still a mismatch between C# and Python; this becomes an issue when you are dealing with a library as big as NLTK.The integration between IronPython and C# is going to improve with C# 4.0. How much remains to be seen.
To embed or not to embed
When is embedding IronPython and NLTK inside C# a good idea?Separate processes for NLTK under CPython and C#
If your C# tasks and your NLP tasks are not interacting too much, it might be simpler to have a C# program call a NLP CPython program as an external process. E.g. you want to analyze the content of a Word document. You would open the Word document in C# create a Python process pipe the text into it and read the result back in JSON or XML and display it in ASP, WPF or WinForms.Small NLP tasks
There is a learning curve for both NLTK and embedded IronPython, that slows down you down when you start work.Medium sized NLP projects
The setup cost is not an issue so embedding IronPython and NLTK could work very well here.Big NLP projects
The setup cost is not an issue, but at some point the mismatch between Python and C#, will start to outweigh the advantages you get.Prototyping in NLTK
Start writing your application in NLTK either under CPython or IronPython. This should improve development time substantially. You might find that your prototype is good enough and you do not need to port it to C#; or you will have a working program that you can port to C#.References
- Post about running NLTK from IronPython
- Chapter 15 of IronPython in Action is about embedding IronPython in C# or VB.NET
- Source code examples from IronPython in Action
- Here is a short intro to embedding IronPython by Michael Foord
- I tried loading Jeff Hardy's IronPython.Zlib.dll using Assembly.LoadFile, that did not work but I could add it with clr.AddReference from the embedded Python code
-Sami Badawi
Have you checked this .Net NLP platform:
ReplyDeletehttp://www.proxem.com/Default.aspx?tabid=119
It's still on beta, but you can give it a go anyway.
Sami
ReplyDeleteI need help for step by step of installing Python and integrating the NLTK library & IronPython.
I am working on a project to auto summarize using key phrases. So i need POS tagger for it. Platform am working on is C# 4.0 on Visual Studio 2010.
Reply ASAP.
Hi, Can you please explain how you solved this problem? e.g. Where to copy zlib, etc
ReplyDeleteThanks
How did nlp practitioner the NLP Thorough educating guidebook? I experienced modifying a predicament just by shifting my response to it.
ReplyDeleteThese libraries should also extend to Python and C++. Counseling perth uses Linux and Unix frameworks for NLP.
ReplyDeleteIt is summer, 2013. Have you done anything new with this using .NET 4.5? Is there an "integration cookbook" out there?
ReplyDeleteHi Isamu,
ReplyDeleteNo I have not looked at this for a long time.