Tuesday, September 29, 2015

Practical Scala, Haskell and Category Theory

Functional programming has moved from academia to industry in the last few years. It is theoretical with a steep learning curve. I have worked with strongly typed functional programming for 4 years. I took the normal progression, first Scala then Haskell and ended with category theory.

What practical results does functional programming give me?

I typically do data mining, NLP and back end programming. How does functional programming help me with NLP, AI and math?

    Scala

    Scala is a complex language that can take quite some time to learn. For a couple of years I was unsure if it really improved my productivity compared to Java or Python.

    After 2 years my productivity in Scala went up. I find that Scala is an excellent choice for creating data mining pipelines because it is:
    • Fast
    • Stable
    • Has lot of quality libraries
    • Has a very advanced type system
    • Good DSL for Hadoop (Scalding)

    Natural Language Processing in Scala

    Before Scala I did NLP in Python. I used NLTK the Natural Language Toolkit for 3 years.

    NLTK vs. ScalaNLP


    NLTK

    • Easy to learn and very flexible
    • Gives you a lot of functionality out of the box
    • Very adaptable, handles a lot of different structured file formats

    What I did not like about NLTK was:
    • It had a very inefficient representation of a text features as a Dictionary
    • The file format readers were not producing exactly matching structures and this did not get caught by the type system
    • You have to jump between Python, NumPy and C or Fortran for low level work

    ScalaNLP

    ScalaNLP merged different Scala numeric and NLP libraries. It is a very active parent project of Breeze and Eric.

    ScalaNLP Breeze

    Breeze is a full featured, fast numeric library that uses the type system to great effect.
    • Linear algebra
    • Probability Distribution
    • Regression algorithms
    • You can drop down to the bottom level without having to program in C or Fortran

    ScalaNLP Eric

    Eric is the natural language processing part of ScalaNLP. It has become a competitive NLP library with many algorithms for several human languages:
    • Reader for text corpora
    • Tokenizer
    • Sentence splitter
    • Part-of-speech tagger
    • Named entity recognition
    • Statistical parser


    Video lecture by David Hall the Eric lead

    Machine Learning in Scala

    The most active open source Scala machine learning library is MLib which is part of the Spark project.
    Spark now has data frames like R and Pandas.
    It is easy to set up machine learning pipelines, do cross validation and optimization of hyper parameters.

    I did text classification and set it up in Spark MLib in only 100 lines of code. The result had satisfactory accuracy.

    AI Search Problem in Scala vs. in Lisp

    I loved Lisp when I learned it at the university. You could do all these cool Artificial Intelligence tree search problems. For many years I suffered from Lisp envy.

    Tree search works a little differently in Scala, let me illustrate by 2 examples.

    Example 1: Simple Tree Search for Bird Flu

    You have an input HTML page and parsed into a DOM tree. Look for the word bird and flu in a paragraph that is not part of the advertisement section.
    I can visualize what a search tree for this would look like.

    Example2: Realistic Bird Flu Medication Search

    The problems I deal with at work are often more complex:
    Given a list of medical websites, search for HTML pages with bird flu and doctor recommendations for medications to take. Then do a secondary web search to see if the doctors are credible.

    Parts of Algorithm for Example 2

    This is a composite search problem:
    • Search HTML pages for the words bird and flu close to each other in DOM structure
    • Search individual match to ensure this is not in advertisement section
    • Search for Dr names
    • Find what Dr name candidates could be matched up with the section about bird flu
    • Web search for Dr to determine popularity and credentials
    Visualizing this as a tree search is hard for me.

    Lazy Streams to the Rescue

    Implementing solutions to the Example 2 bird flu medication problem takes:
    • Feature extractors
    • Machine learning on top of that
    • Correlation of a disease and a doctor
    This lends itself well to using Scala's lazy streams. Scala makes it easy to use the lazy streams and the type system gives a lot of support, especially when plugging together various streams.

    Outline of Lazy Streams Algorithm for Example 2

    1. Stream of all web pages
    2. Stream of tokenized trees
    3. Steam of potential text matches e.g. avian influenza, H5N1
    4. Filter Stream 3 if it is an advertisement part of the DOM tree, (no Dr Mom)
    5. Stream of potential Dr text matches from Stream 2
    6. Stream of good Dr names. Detected with machine learning
    7. Merge Stream 3 and Stream 6 to get bird flu and doctor name combination
    8. Web search stream for the doctor names from Stream 7 for ranking of result

    AI Search Problem in Lisp

    Tree search is Lisp's natural domain. Lisp could certainly handle Example 2 the more complex bird flu medication search. Even using a similar lazy stream algorithm.

    Additionally, Lisp has the ability to do very advanced meta programming:
    Rules that create other rules or work on multiple levels. Things I do not know how to do in Scala.

    Lisp gives you a lot of power to handle open ended problems and it is great for knowledge representation. When you try to do the same in Scala you end up either writing Lisp or Prolog style code or using RDF or graph databases.

    Some Scala Technical Details

    Here are a few observations on working with Scala.

    Scala's Low Rent Monads

    Monads are a general way to compose functionality. They are a very important organizing principle in Scala. Except is not really monads it is just syntactic sugar.

    You give us a map and a flatMap function and we don't ask any questions.

    Due to the organization of the standard library and subtyping you can even combine an Option and a List, which should strictly not be possible. Still this give you a lot of power.
    I do use Scala monads with no shame.

    Akka and Concurrency

    Scala's monads make it convenient to work with two concurrency constructs: Futures and Promises.

    Akka is a library implementing an Erlang style actor model in Scala.
    I have used Akka for years and it is a good framework to organize a lot of concurrent computation that requires communication.

    The type system does not help you with the creation of parent actors so you are not sure that they exist. This makes it hard to write unit tests for actors.

    Akka is good but the whole Erlang actor idea is rather low level.

    Scalaz and Cake Patterns

    Scalaz is a very impressive library that implements big parts of Haskell’s standard library in Scala.
    Scalaz’s monad typeclass is invariant, which fixes the violations allowed in the standard library.

    Cake Patterns allows for recursive modules, which make dependency injection easier. This is used in the Scala compiler.

    Both of these libraries got me into trouble as a beginner Scala programmer. I would not recommend them for beginners.

    How do you determine if you should use this heavy artillery?
    Once you feel that you are spending a lot of time repeating code due to insufficient abstraction you can consider it. Otherwise:

    Keep It Simple.

    Dependent Types and Category Theory in Scala

    There are many new theoretical developments in Scala:
    • Dotty - a new compiler built on DOT a new type-theoretic foundation of Scala
    • Cats library - a simplified version of Scalaz implementing concepts from category theory
    • Shapeless library for dependent types. I am using this in my production code since Shapeless is used in Slick and Parboiled2


    Haskell

    Haskell is a research language from 1990. In 2008 its popularity started to rise. You can now find real jobs working in Haskell. Most publicized is that Facebook wrote their spam filter in Haskell.



    Why is Haskell so Hard to Learn?

    It took me around 2 years to learn to program in Haskell, which is exceptionally long. I have spoken to other people at Haskell meetups who have told me the same.

    Mathematical Precision

    Python effectively uses the Pareto principle: 20% of the features will give give you 80% of the functionality; Python has very few structures in the core language and reuses them.

    Haskell uses many more constructs. E.g. exception handling can be done in many different ways each with small advantages. You can chose the optimal exception monad transformer that has least dependencies for your problem.

    Cabal Hell and Stack

    Haskell is a fast developing language with a very deep stack of interdependent libraries.
    When I started programming in it, it was hard to set up even a simple project since you could not get the libraries to compile with versions that were compatible with each other.
    The build system is called cabal, and this phenomenon is called Cabal Hell.
    If you have been reading mailing list there are a lot of references to Cabal Hell.

    The Haskell consulting company FPComplete first released Stackage a curated list of libraries that works together. In 2015 they went further and released Stack which is a system that installs different versions of Haskell to work with Stackage versions.

    This has really made Haskell development easier.

    Dependently Typed Constructs in Haskell

    Dependently typed languages are the next step after Haskell. In normal languages the type system and the objects of the language are different systems. In dependently typed languages the objects and the types inhabits the same space. This gives more safety and greater flexibility but also makes it harder to program in.
    The type checker has to be replaced with a theorem-prover.

    You have to prove that the program is correct, and the proofs are part of the program and first order constructs.

    Haskell has a lot of activities towards emulating dependently typed languages.
    The next version of the Haskell compiler GHC 8 is making a big push for more uniform handling of types and kinds.

    Practical Haskell

    Haskell is a pioneering language and still introducing new ideas. It has clearly shown that it is production ready by being able to handle Facebook's spam filter.

    Aesthetically I prefer terse programming and like to use Haskell for non work related programming.

    There is a great Haskell community in New York City. Haskell feels like a subculture where Scala has now become the establishment. That said I do not feel Haskell envy when I program in Scala on a daily basis.

    Learning Haskell is a little like running a marathon. You get in good mental shape.


    Category Theory

    Category theory is often called Abstract Nonsense both by practitioners and detractors.
    It is a very abstract field of mathematics and its utility is pretty controversial.

    It abstracts internal properties of objects away and instead looks at relations between objects.
    Categories require very little structure and so there are categories everywhere.  Many mathematical objects can be turned into categories in many different ways. This high level of abstraction makes it hard to learn.

    There is a category Hask of Haskell types and functions.


    Steve Awodey lecture series on category theory

    Vector Spaces Described With a Few String Diagrams

    To give a glimpse of the power of category theory: In this video lecture John Baez shows how you can express the axioms of finite dimensional vector spaces with a few string diagrams.

    Video lecture by John Baez

    With 2 more simple operations you can extend it to control theory.

    Quest For a Solid Foundation of Mathematics

    At the university I embarked on a long quest for a solid scientific foundation. Fist I studied chemistry and physics. Quantum physics drove me to studying mathematics for more clarity. For higher clarity and a solid foundation I studied mathematical logic.
    I did not find clarity in mathematical logic. Instead I found:

    The Dirty Secret About the Foundation of Mathematics

    My next stop was the normal foundation for modern mathematics: ZFC, Zermelo–Fraenkel set theory with the axiom of choice.

    This was even less intuitive than logic. There were more non intuitive axioms. This was like learning computer science from a reference of x86 assembly: A big random mess. There were also an uncertain connection between the axioms of logic and the axioms set theory.

    ZFC and first order logic makes 2 strong assumptions:
    1. Law of Excluded Middle
    2. Axiom of Choice
    Law of Excluded Middle is saying that every mathematical sentence is either true or false. This is a very strong assumption that was not motivated at all. And it certainly does not extend to other sentences.

    Constructive Mathematics / Intuitionistic Logic

    There was actually a debate about what should be a foundation for mathematics at the beginning of the 20th century.
    A competing foundation of mathematics was Brouwer's constructive mathematics. In order to prove something about a mathematical object you need to be able to construct it and via the Curry-Howard correspondence this is equivalent to writing a program constructing a particular type.

    This was barely mentioned at the university. I had one professor who once briefly said that there was this other thing called intuitionistic logic, but it was so much harder to prove things in it, why should we bother.

    Recently constructive mathematics have had a revival with Homotopy Type Theory. HoTT is based on category theory, type theory, homotopy theory and intuitionistic logic.
    This holds a lot of promise and is another reason why category theory is practical for me.


    Robert Harper's lectures on type theory
    end with an introduction to HoTT

    Future of Intelligent Software

    There are roughly 2 main approaches to artificial intelligence
    • Top down or symbolic techniques e.g. logic or Lisp
    • Bottom up or machine learning techniques e.g. neural networks
    The symbolic approach was favored for a long time but did not deliver on its promise. Now machine learning is everywhere and has created many advances in modern software.

    To me it seems obvious that more intelligent software needs both. But combining them has been an elusive goal since they are very different by nature.

    Databases created a revolution in data management. They reduce data retrieval to simplified first order logic, you just write a logic expression for what you want.

    Dependently typed language is the level of abstraction where programs and logic merge.
    I think that intelligent software of the future will be a combination of dependently typed languages and machine learning.
    A promising approach is: Discovery of Bayesian network models from data. This finds causality in a form that can be combined with logic reasoning.

    Conclusion

    I invested a lot of time in statically typed functional languages and was not sure how much this would help me in my daily work. It helped a lot, especially with reuse and stability.

    Scala has made it substantially easier to create production quality software.

    MLib and ScalaNLP are 2 popular open source projects. They show me that Scala is a good environment for NLP and machine learning.

    I am only starting to see an outline of category theory, dependently typed languages and HoTT. It looks like computer science and mathematics are not mainly done, but we still have some big changes ahead of us.