Languages and Logic: mathematics

Functional programming has moved from academia to industry in the last few years. It is theoretical with a steep learning curve. I have worked with strongly typed functional programming for 4 years. I took the normal progression, first Scala then Haskell and ended with category theory.

What practical results does functional programming give me?

I typically do data mining, NLP and back end programming. How does functional programming help me with NLP, AI and math?

Scala

Scala is a complex language that can take quite some time to learn. For a couple of years I was unsure if it really improved my productivity compared to Java or Python.

After 2 years my productivity in Scala went up. I find that Scala is an excellent choice for creating data mining pipelines because it is:

Fast
Stable
Has lot of quality libraries
Has a very advanced type system
Good DSL for Hadoop (Scalding)

Natural Language Processing in Scala

Before Scala I did NLP in Python. I used NLTK the Natural Language Toolkit for 3 years.

NLTK vs. ScalaNLP

NLTK

Easy to learn and very flexible
Gives you a lot of functionality out of the box
Very adaptable, handles a lot of different structured file formats

What I did not like about NLTK was:

It had a very inefficient representation of a text features as a Dictionary
The file format readers were not producing exactly matching structures and this did not get caught by the type system
You have to jump between Python, NumPy and C or Fortran for low level work

ScalaNLP

ScalaNLP merged different Scala numeric and NLP libraries. It is a very active parent project of Breeze and Eric.

ScalaNLP Breeze

Breeze is a full featured, fast numeric library that uses the type system to great effect.

Linear algebra
Probability Distribution
Regression algorithms
You can drop down to the bottom level without having to program in C or Fortran

ScalaNLP Eric

Eric is the natural language processing part of ScalaNLP. It has become a competitive NLP library with many algorithms for several human languages:

Reader for text corpora
Tokenizer
Sentence splitter
Part-of-speech tagger
Named entity recognition
Statistical parser

Video lecture by David Hall the Eric lead

Machine Learning in Scala

The most active open source Scala machine learning library is MLib which is part of the Spark project.
Spark now has data frames like R and Pandas.
It is easy to set up machine learning pipelines, do cross validation and optimization of hyper parameters.

I did text classification and set it up in Spark MLib in only 100 lines of code. The result had satisfactory accuracy.

AI Search Problem in Scala vs. in Lisp

I loved Lisp when I learned it at the university. You could do all these cool Artificial Intelligence tree search problems. For many years I suffered from Lisp envy.

Tree search works a little differently in Scala, let me illustrate by 2 examples.

Example 1: Simple Tree Search for Bird Flu

You have an input HTML page and parsed into a DOM tree. Look for the word bird and flu in a paragraph that is not part of the advertisement section.
I can visualize what a search tree for this would look like.

Example2: Realistic Bird Flu Medication Search

The problems I deal with at work are often more complex:
Given a list of medical websites, search for HTML pages with bird flu and doctor recommendations for medications to take. Then do a secondary web search to see if the doctors are credible.

Parts of Algorithm for Example 2

This is a composite search problem:

Search HTML pages for the words bird and flu close to each other in DOM structure
Search individual match to ensure this is not in advertisement section
Search for Dr names
Find what Dr name candidates could be matched up with the section about bird flu
Web search for Dr to determine popularity and credentials

Visualizing this as a tree search is hard for me.

Lazy Streams to the Rescue

Implementing solutions to the Example 2 bird flu medication problem takes:

Feature extractors
Machine learning on top of that
Correlation of a disease and a doctor

This lends itself well to using Scala's lazy streams. Scala makes it easy to use the lazy streams and the type system gives a lot of support, especially when plugging together various streams.

Outline of Lazy Streams Algorithm for Example 2

Stream of all web pages
Stream of tokenized trees
Steam of potential text matches e.g. avian influenza, H5N1
Filter Stream 3 if it is an advertisement part of the DOM tree, (no Dr Mom)
Stream of potential Dr text matches from Stream 2
Stream of good Dr names. Detected with machine learning
Merge Stream 3 and Stream 6 to get bird flu and doctor name combination
Web search stream for the doctor names from Stream 7 for ranking of result

AI Search Problem in Lisp

Tree search is Lisp's natural domain. Lisp could certainly handle Example 2 the more complex bird flu medication search. Even using a similar lazy stream algorithm.

Additionally, Lisp has the ability to do very advanced meta programming:
Rules that create other rules or work on multiple levels. Things I do not know how to do in Scala.

Lisp gives you a lot of power to handle open ended problems and it is great for knowledge representation. When you try to do the same in Scala you end up either writing Lisp or Prolog style code or using RDF or graph databases.

Some Scala Technical Details

Here are a few observations on working with Scala.

Scala's Low Rent Monads

Monads are a general way to compose functionality. They are a very important organizing principle in Scala. Except is not really monads it is just syntactic sugar.

You give us a map and a flatMap function and we don't ask any questions.

Due to the organization of the standard library and subtyping you can even combine an Option and a List, which should strictly not be possible. Still this give you a lot of power.
I do use Scala monads with no shame.

Akka and Concurrency

Scala's monads make it convenient to work with two concurrency constructs: Futures and Promises.

Akka is a library implementing an Erlang style actor model in Scala.

I have used Akka for years and it is a good framework to organize a lot of concurrent computation that requires communication.

The type system does not help you with the creation of parent actors so you are not sure that they exist. This makes it hard to write unit tests for actors.

Akka is good but the whole Erlang actor idea is rather low level.

Scalaz and Cake Patterns

Scalaz is a very impressive library that implements big parts of Haskell’s standard library in Scala.
Scalaz’s monad typeclass is invariant, which fixes the violations allowed in the standard library.

Cake Patterns allows for recursive modules, which make dependency injection easier. This is used in the Scala compiler.

Both of these libraries got me into trouble as a beginner Scala programmer. I would not recommend them for beginners.

How do you determine if you should use this heavy artillery?
Once you feel that you are spending a lot of time repeating code due to insufficient abstraction you can consider it. Otherwise:

Keep It Simple.

Dependent Types and Category Theory in Scala

There are many new theoretical developments in Scala:

Dotty - a new compiler built on DOT a new type-theoretic foundation of Scala
Cats library - a simplified version of Scalaz implementing concepts from category theory
Shapeless library for dependent types. I am using this in my production code since Shapeless is used in Slick and Parboiled2

Haskell

Haskell is a research language from 1990. In 2008 its popularity started to rise. You can now find real jobs working in Haskell. Most publicized is that Facebook wrote their spam filter in Haskell.

Why is Haskell so Hard to Learn?

It took me around 2 years to learn to program in Haskell, which is exceptionally long. I have spoken to other people at Haskell meetups who have told me the same.

Mathematical Precision

Python effectively uses the Pareto principle: 20% of the features will give give you 80% of the functionality; Python has very few structures in the core language and reuses them.

Haskell uses many more constructs. E.g. exception handling can be done in many different ways each with small advantages. You can chose the optimal exception monad transformer that has least dependencies for your problem.

Cabal Hell and Stack

Haskell is a fast developing language with a very deep stack of interdependent libraries.
When I started programming in it, it was hard to set up even a simple project since you could not get the libraries to compile with versions that were compatible with each other.
The build system is called cabal, and this phenomenon is called Cabal Hell.
If you have been reading mailing list there are a lot of references to Cabal Hell.

The Haskell consulting company FPComplete first released Stackage a curated list of libraries that works together. In 2015 they went further and released Stack which is a system that installs different versions of Haskell to work with Stackage versions.

This has really made Haskell development easier.

Dependently Typed Constructs in Haskell

Dependently typed languages are the next step after Haskell. In normal languages the type system and the objects of the language are different systems. In dependently typed languages the objects and the types inhabits the same space. This gives more safety and greater flexibility but also makes it harder to program in.

The type checker has to be replaced with a theorem-prover.

You have to prove that the program is correct, and the proofs are part of the program and first order constructs.

Haskell has a lot of activities towards emulating dependently typed languages.
The next version of the Haskell compiler GHC 8 is making a big push for more uniform handling of types and kinds.

Practical Haskell

Haskell is a pioneering language and still introducing new ideas. It has clearly shown that it is production ready by being able to handle Facebook's spam filter.

Aesthetically I prefer terse programming and like to use Haskell for non work related programming.

There is a great Haskell community in New York City. Haskell feels like a subculture where Scala has now become the establishment. That said I do not feel Haskell envy when I program in Scala on a daily basis.

Learning Haskell is a little like running a marathon. You get in good mental shape.

Category Theory

Category theory is often called Abstract Nonsense both by practitioners and detractors.
It is a very abstract field of mathematics and its utility is pretty controversial.

It abstracts internal properties of objects away and instead looks at relations between objects.
Categories require very little structure and so there are categories everywhere. Many mathematical objects can be turned into categories in many different ways. This high level of abstraction makes it hard to learn.

There is a category Hask of Haskell types and functions.

Steve Awodey lecture series on category theory

Vector Spaces Described With a Few String Diagrams

To give a glimpse of the power of category theory: In this video lecture John Baez shows how you can express the axioms of finite dimensional vector spaces with a few string diagrams.

Video lecture by John Baez

With 2 more simple operations you can extend it to control theory.

Quest For a Solid Foundation of Mathematics

At the university I embarked on a long quest for a solid scientific foundation. Fist I studied chemistry and physics. Quantum physics drove me to studying mathematics for more clarity. For higher clarity and a solid foundation I studied mathematical logic.
I did not find clarity in mathematical logic. Instead I found:

Some random badly motivated axioms and inference rules
And Godel's nebulous incompleteness theorem

The Dirty Secret About the Foundation of Mathematics

My next stop was the normal foundation for modern mathematics: ZFC, Zermelo–Fraenkel set theory with the axiom of choice.

This was even less intuitive than logic. There were more non intuitive axioms. This was like learning computer science from a reference of x86 assembly: A big random mess. There were also an uncertain connection between the axioms of logic and the axioms set theory.

ZFC and first order logic makes 2 strong assumptions:

Law of Excluded Middle
Axiom of Choice

Law of Excluded Middle is saying that every mathematical sentence is either true or false. This is a very strong assumption that was not motivated at all. And it certainly does not extend to other sentences.

Constructive Mathematics / Intuitionistic Logic

There was actually a debate about what should be a foundation for mathematics at the beginning of the 20th century.
A competing foundation of mathematics was Brouwer's constructive mathematics. In order to prove something about a mathematical object you need to be able to construct it and via the Curry-Howard correspondence this is equivalent to writing a program constructing a particular type.

This was barely mentioned at the university. I had one professor who once briefly said that there was this other thing called intuitionistic logic, but it was so much harder to prove things in it, why should we bother.

Recently constructive mathematics have had a revival with Homotopy Type Theory. HoTT is based on category theory, type theory, homotopy theory and intuitionistic logic.
This holds a lot of promise and is another reason why category theory is practical for me.

Robert Harper's lectures on type theory
end with an introduction to HoTT

Future of Intelligent Software

There are roughly 2 main approaches to artificial intelligence

Top down or symbolic techniques e.g. logic or Lisp
Bottom up or machine learning techniques e.g. neural networks

The symbolic approach was favored for a long time but did not deliver on its promise. Now machine learning is everywhere and has created many advances in modern software.

To me it seems obvious that more intelligent software needs both. But combining them has been an elusive goal since they are very different by nature.

Databases created a revolution in data management. They reduce data retrieval to simplified first order logic, you just write a logic expression for what you want.

Dependently typed language is the level of abstraction where programs and logic merge.
I think that intelligent software of the future will be a combination of dependently typed languages and machine learning.
A promising approach is: Discovery of Bayesian network models from data. This finds causality in a form that can be combined with logic reasoning.

Conclusion

I invested a lot of time in statically typed functional languages and was not sure how much this would help me in my daily work. It helped a lot, especially with reuse and stability.

Scala has made it substantially easier to create production quality software.

MLib and ScalaNLP are 2 popular open source projects. They show me that Scala is a good environment for NLP and machine learning.

I am only starting to see an outline of category theory, dependently typed languages and HoTT. It looks like computer science and mathematics are not mainly done, but we still have some big changes ahead of us.

ShapeLogic 0.9 contains new functional and declarative constructs

Lazy streams
Simplified and expanded lazy calculations

The functional and declarative construct of ShapeLogic can be used independently of the image processing code. It only requires a 200 KB jar file to use this in other application, and there is no dependency of third party jars.

ShapeLogic 0.9 letter recognition example is still using a different system for declarative programming. More on that later in this post.

To test lazy streams in pure form I tried them on the 10 first mathematical problems in Project Euler. I think that ShapeLogic streams provided for simple solutions to the mathematical problems.
However they are more verbose that say solutions in the Scala language:
http://scala-blogs.org/2007/12/project-euler-fun-in-scala.html

Solutions to a few of the Project Euler mathematical problems

Project Euler is a list of 178, mathematical problems, that can be solved by computers.

1 Add all the natural numbers below 1000 that are multiples of 3 or 5

NaturalNumberStream naturalNumberStream = new NaturalNumberStream(1,999);
ListFilterStream<Integer> filter = new BaseListFilterStream<Integer>(naturalNumberStream) {
public boolean evaluate(Integer object) {return object % 3 == 0 || object % 5 == 0;}
};
SumAccumulator accumulator = new SumAccumulator(filter);
System.out.println(accumulator.getValue());

2 Find the sum of all the even-valued terms in the Fibonacci sequence which do not exceed one million

BaseListStream1<Object,Integer>fibonacci = new BaseListStream1<Object,Integer>(){
{ _list.add(1); _list.add(2);}
public Integer invoke(Object input, int index) {return get(index-2) + get(index-1);}
};
ListFilterStream<Integer> filter = new BaseListFilterStream<Integer>(fibonacci) {
public boolean evaluate(Integer object) { return object % 2 == 0; }
};
SumAccumulator accumulator = new SumAccumulator(filter) {
{_inputElement = 0;}
public boolean hasNext(){ return _inputElement <= theNumber; }
};
System.out.println(accumulator.getValue());

Comparison between streams and old declarative constructs

In ShapeLogic 0.8 a cornerstone in declarative programming was a goal driven system of tasks with sub tasks. The letter recognition example reads in a rule database and translated it into goals/tasks with sub tasks. Somewhat like the programming language Prolog.
In case of uncertainty the different choices would live in an artificial intelligence choice tree that is tied to the tree of tasks and sub tasks.

Integrating old declarative constructs with lazy streams

Based on what is needed in the current letter match example. It seems like the stream based approach is simpler and able to handle all the problem that the goal driven approach could.
I expect the stream approach to supersede the goal approach.
The rule database will be read and translated to streams. So the rule for the letter A would be translated into a filter that could filter a stream of polygons into a stream of polygons that represent the letter A.

Also currently the rules now are using JEXL to translate a text rule into something executable.
Example:
Rule for letter A: polygon.holeCount == 1.
In ShapeLogic 1.0, you should be able to use Java 6 Scripting, JSR 223, to be able to define these rules, in one of the 25 supported scripting languages. In ShapeLogic 0.9: Groovy, JRuby, and JavaScript was tested with streams, but not with the rule databases.

Using streams for concurrent programming

Streams can also support parallel or concurrent programming, which is important with the CPU intensive operations in image processing and computer vision. Especially with the advent of cheap multi processor machines.

Example: Find polygons in a stack of images

You define a lazy data stream for this and set a stream property
randomAccess = true
This indicates that individual elements can be calculated independently. The factory creating the stream could create a parallel version of the stream and assign each operation its own thread. Note that the result would be a stream of polygons for each image.

-Sami Badawi

Tuesday, September 29, 2015

Practical Scala, Haskell and Category Theory