Languages and Logic: AI

Showing posts with label AI. Show all posts

Tuesday, September 29, 2015

Practical Scala, Haskell and Category Theory

Functional programming has moved from academia to industry in the last few years. It is theoretical with a steep learning curve. I have worked with strongly typed functional programming for 4 years. I took the normal progression, first Scala then Haskell and ended with category theory.

What practical results does functional programming give me?

I typically do data mining, NLP and back end programming. How does functional programming help me with NLP, AI and math?

Scala

Scala is a complex language that can take quite some time to learn. For a couple of years I was unsure if it really improved my productivity compared to Java or Python.

After 2 years my productivity in Scala went up. I find that Scala is an excellent choice for creating data mining pipelines because it is:

Fast
Stable
Has lot of quality libraries
Has a very advanced type system
Good DSL for Hadoop (Scalding)

Natural Language Processing in Scala

Before Scala I did NLP in Python. I used NLTK the Natural Language Toolkit for 3 years.

NLTK vs. ScalaNLP

NLTK

Easy to learn and very flexible
Gives you a lot of functionality out of the box
Very adaptable, handles a lot of different structured file formats

What I did not like about NLTK was:

It had a very inefficient representation of a text features as a Dictionary
The file format readers were not producing exactly matching structures and this did not get caught by the type system
You have to jump between Python, NumPy and C or Fortran for low level work

ScalaNLP

ScalaNLP merged different Scala numeric and NLP libraries. It is a very active parent project of Breeze and Eric.

ScalaNLP Breeze

Breeze is a full featured, fast numeric library that uses the type system to great effect.

Linear algebra
Probability Distribution
Regression algorithms
You can drop down to the bottom level without having to program in C or Fortran

ScalaNLP Eric

Eric is the natural language processing part of ScalaNLP. It has become a competitive NLP library with many algorithms for several human languages:

Reader for text corpora
Tokenizer
Sentence splitter
Part-of-speech tagger
Named entity recognition
Statistical parser

Video lecture by David Hall the Eric lead

Machine Learning in Scala

The most active open source Scala machine learning library is MLib which is part of the Spark project.
Spark now has data frames like R and Pandas.
It is easy to set up machine learning pipelines, do cross validation and optimization of hyper parameters.

I did text classification and set it up in Spark MLib in only 100 lines of code. The result had satisfactory accuracy.

AI Search Problem in Scala vs. in Lisp

I loved Lisp when I learned it at the university. You could do all these cool Artificial Intelligence tree search problems. For many years I suffered from Lisp envy.

Tree search works a little differently in Scala, let me illustrate by 2 examples.

Example 1: Simple Tree Search for Bird Flu

You have an input HTML page and parsed into a DOM tree. Look for the word bird and flu in a paragraph that is not part of the advertisement section.
I can visualize what a search tree for this would look like.

Example2: Realistic Bird Flu Medication Search

The problems I deal with at work are often more complex:
Given a list of medical websites, search for HTML pages with bird flu and doctor recommendations for medications to take. Then do a secondary web search to see if the doctors are credible.

Parts of Algorithm for Example 2

This is a composite search problem:

Search HTML pages for the words bird and flu close to each other in DOM structure
Search individual match to ensure this is not in advertisement section
Search for Dr names
Find what Dr name candidates could be matched up with the section about bird flu
Web search for Dr to determine popularity and credentials

Visualizing this as a tree search is hard for me.

Lazy Streams to the Rescue

Implementing solutions to the Example 2 bird flu medication problem takes:

Feature extractors
Machine learning on top of that
Correlation of a disease and a doctor

This lends itself well to using Scala's lazy streams. Scala makes it easy to use the lazy streams and the type system gives a lot of support, especially when plugging together various streams.

Outline of Lazy Streams Algorithm for Example 2

Stream of all web pages
Stream of tokenized trees
Steam of potential text matches e.g. avian influenza, H5N1
Filter Stream 3 if it is an advertisement part of the DOM tree, (no Dr Mom)
Stream of potential Dr text matches from Stream 2
Stream of good Dr names. Detected with machine learning
Merge Stream 3 and Stream 6 to get bird flu and doctor name combination
Web search stream for the doctor names from Stream 7 for ranking of result

AI Search Problem in Lisp

Tree search is Lisp's natural domain. Lisp could certainly handle Example 2 the more complex bird flu medication search. Even using a similar lazy stream algorithm.

Additionally, Lisp has the ability to do very advanced meta programming:
Rules that create other rules or work on multiple levels. Things I do not know how to do in Scala.

Lisp gives you a lot of power to handle open ended problems and it is great for knowledge representation. When you try to do the same in Scala you end up either writing Lisp or Prolog style code or using RDF or graph databases.

Some Scala Technical Details

Here are a few observations on working with Scala.

Scala's Low Rent Monads

Monads are a general way to compose functionality. They are a very important organizing principle in Scala. Except is not really monads it is just syntactic sugar.

You give us a map and a flatMap function and we don't ask any questions.

Due to the organization of the standard library and subtyping you can even combine an Option and a List, which should strictly not be possible. Still this give you a lot of power.
I do use Scala monads with no shame.

Akka and Concurrency

Scala's monads make it convenient to work with two concurrency constructs: Futures and Promises.

Akka is a library implementing an Erlang style actor model in Scala.

I have used Akka for years and it is a good framework to organize a lot of concurrent computation that requires communication.

The type system does not help you with the creation of parent actors so you are not sure that they exist. This makes it hard to write unit tests for actors.

Akka is good but the whole Erlang actor idea is rather low level.

Scalaz and Cake Patterns

Scalaz is a very impressive library that implements big parts of Haskell’s standard library in Scala.
Scalaz’s monad typeclass is invariant, which fixes the violations allowed in the standard library.

Cake Patterns allows for recursive modules, which make dependency injection easier. This is used in the Scala compiler.

Both of these libraries got me into trouble as a beginner Scala programmer. I would not recommend them for beginners.

How do you determine if you should use this heavy artillery?
Once you feel that you are spending a lot of time repeating code due to insufficient abstraction you can consider it. Otherwise:

Keep It Simple.

Dependent Types and Category Theory in Scala

There are many new theoretical developments in Scala:

Dotty - a new compiler built on DOT a new type-theoretic foundation of Scala
Cats library - a simplified version of Scalaz implementing concepts from category theory
Shapeless library for dependent types. I am using this in my production code since Shapeless is used in Slick and Parboiled2

Haskell

Haskell is a research language from 1990. In 2008 its popularity started to rise. You can now find real jobs working in Haskell. Most publicized is that Facebook wrote their spam filter in Haskell.

Why is Haskell so Hard to Learn?

It took me around 2 years to learn to program in Haskell, which is exceptionally long. I have spoken to other people at Haskell meetups who have told me the same.

Mathematical Precision

Python effectively uses the Pareto principle: 20% of the features will give give you 80% of the functionality; Python has very few structures in the core language and reuses them.

Haskell uses many more constructs. E.g. exception handling can be done in many different ways each with small advantages. You can chose the optimal exception monad transformer that has least dependencies for your problem.

Cabal Hell and Stack

Haskell is a fast developing language with a very deep stack of interdependent libraries.
When I started programming in it, it was hard to set up even a simple project since you could not get the libraries to compile with versions that were compatible with each other.
The build system is called cabal, and this phenomenon is called Cabal Hell.
If you have been reading mailing list there are a lot of references to Cabal Hell.

The Haskell consulting company FPComplete first released Stackage a curated list of libraries that works together. In 2015 they went further and released Stack which is a system that installs different versions of Haskell to work with Stackage versions.

This has really made Haskell development easier.

Dependently Typed Constructs in Haskell

Dependently typed languages are the next step after Haskell. In normal languages the type system and the objects of the language are different systems. In dependently typed languages the objects and the types inhabits the same space. This gives more safety and greater flexibility but also makes it harder to program in.

The type checker has to be replaced with a theorem-prover.

You have to prove that the program is correct, and the proofs are part of the program and first order constructs.

Haskell has a lot of activities towards emulating dependently typed languages.
The next version of the Haskell compiler GHC 8 is making a big push for more uniform handling of types and kinds.

Practical Haskell

Haskell is a pioneering language and still introducing new ideas. It has clearly shown that it is production ready by being able to handle Facebook's spam filter.

Aesthetically I prefer terse programming and like to use Haskell for non work related programming.

There is a great Haskell community in New York City. Haskell feels like a subculture where Scala has now become the establishment. That said I do not feel Haskell envy when I program in Scala on a daily basis.

Learning Haskell is a little like running a marathon. You get in good mental shape.

Category Theory

Category theory is often called Abstract Nonsense both by practitioners and detractors.
It is a very abstract field of mathematics and its utility is pretty controversial.

It abstracts internal properties of objects away and instead looks at relations between objects.
Categories require very little structure and so there are categories everywhere. Many mathematical objects can be turned into categories in many different ways. This high level of abstraction makes it hard to learn.

There is a category Hask of Haskell types and functions.

Steve Awodey lecture series on category theory

Vector Spaces Described With a Few String Diagrams

To give a glimpse of the power of category theory: In this video lecture John Baez shows how you can express the axioms of finite dimensional vector spaces with a few string diagrams.

Video lecture by John Baez

With 2 more simple operations you can extend it to control theory.

Quest For a Solid Foundation of Mathematics

At the university I embarked on a long quest for a solid scientific foundation. Fist I studied chemistry and physics. Quantum physics drove me to studying mathematics for more clarity. For higher clarity and a solid foundation I studied mathematical logic.
I did not find clarity in mathematical logic. Instead I found:

Some random badly motivated axioms and inference rules
And Godel's nebulous incompleteness theorem

The Dirty Secret About the Foundation of Mathematics

My next stop was the normal foundation for modern mathematics: ZFC, Zermelo–Fraenkel set theory with the axiom of choice.

This was even less intuitive than logic. There were more non intuitive axioms. This was like learning computer science from a reference of x86 assembly: A big random mess. There were also an uncertain connection between the axioms of logic and the axioms set theory.

ZFC and first order logic makes 2 strong assumptions:

Law of Excluded Middle
Axiom of Choice

Law of Excluded Middle is saying that every mathematical sentence is either true or false. This is a very strong assumption that was not motivated at all. And it certainly does not extend to other sentences.

Constructive Mathematics / Intuitionistic Logic

There was actually a debate about what should be a foundation for mathematics at the beginning of the 20th century.
A competing foundation of mathematics was Brouwer's constructive mathematics. In order to prove something about a mathematical object you need to be able to construct it and via the Curry-Howard correspondence this is equivalent to writing a program constructing a particular type.

This was barely mentioned at the university. I had one professor who once briefly said that there was this other thing called intuitionistic logic, but it was so much harder to prove things in it, why should we bother.

Recently constructive mathematics have had a revival with Homotopy Type Theory. HoTT is based on category theory, type theory, homotopy theory and intuitionistic logic.
This holds a lot of promise and is another reason why category theory is practical for me.

Robert Harper's lectures on type theory
end with an introduction to HoTT

Future of Intelligent Software

There are roughly 2 main approaches to artificial intelligence

Top down or symbolic techniques e.g. logic or Lisp
Bottom up or machine learning techniques e.g. neural networks

The symbolic approach was favored for a long time but did not deliver on its promise. Now machine learning is everywhere and has created many advances in modern software.

To me it seems obvious that more intelligent software needs both. But combining them has been an elusive goal since they are very different by nature.

Databases created a revolution in data management. They reduce data retrieval to simplified first order logic, you just write a logic expression for what you want.

Dependently typed language is the level of abstraction where programs and logic merge.
I think that intelligent software of the future will be a combination of dependently typed languages and machine learning.
A promising approach is: Discovery of Bayesian network models from data. This finds causality in a form that can be combined with logic reasoning.

Conclusion

I invested a lot of time in statically typed functional languages and was not sure how much this would help me in my daily work. It helped a lot, especially with reuse and stability.

Scala has made it substantially easier to create production quality software.

MLib and ScalaNLP are 2 popular open source projects. They show me that Scala is a good environment for NLP and machine learning.

I am only starting to see an outline of category theory, dependently typed languages and HoTT. It looks like computer science and mathematics are not mainly done, but we still have some big changes ahead of us.

Tuesday, February 15, 2011

Is IBM Watson Beginning An AI Boom?

Artificial intelligence fell out of favor in the 1970s, the start of first artificial intelligence winter, and has mainly been out of favor since. In April 2010 I wrote a post about how you can now get a paying job doing AI, machine learning and natural language processing outside academia.

Now barely one year later I have seen a few demonstrations that signal that artificial intelligence has taken another leap towards mainstream acceptance:

Yann LeCun demonstrated a computer vision system that could learn to recognize objects from his pocket after being shown a few examples, under a talk about learning feature hierarchies for computer vision
Andrew Hogue demonstrated Google Squared and Google Sentiment Analysis at Google Tech Talk, those systems both show rudimentary understanding of web pages and use word association
IBM Watson super computer is competing against the best human players on Jeopardy

All these 3 systems contain some real intelligence. Rudimentary by human standard, but AI has gone from the very specialized systems to handling more general tasks. It feels like AI is picking up steam. I am seeing startups based on machine learning pop up. This reminds me of the Internet boom in 1990s. I moved to New York in 1996, at the beginning of the Internet boom. I saw firsthand the crazy gold rush where fortunes were made and lost in short time, Internet startups were everywhere and everybody was talking about IPOs. This got me thinking, are we headed towards an artificial intelligence boom, and what would it look like?

IBM Watson

IBM Watson is a well executed factoid extraction system, but it is a brilliant marketing move, promoting IBM's new POWER7 system and their Smart Planet consulting services. It gives some people the impression that we already have human-like AI, and in that sense it could serve as a catalyst for investments in AI. This post is not about human-like artificial intelligence, but about the spread of shallow artificial intelligence.

Applications For Shallow Artificial Intelligence

Both people and corporations would gain value from having AI systems that they could ask free form questions to and get answers from in very diverse topics. In particular in these fields:

Medical science
Law
Surveillance
Military

Many people, me included, are concerned about a big brother state and military use of AI, but I do not think that is going to stop adaption. These people play for keeps.

There are signs that the financial service industry is starting to use sentiment analysis for their pricing and risk models. Shallow AI would be a good candidate for more advanced algorithmic trading.

Bottom Up vs. Top Down Approaches

Here is a very brief simplified introduction to AI techniques and tools. AI is a loosely defined field, with a loose collection of techniques. You can roughly categorize them it top down and bottom up approaches.

Top down or symbolic techniques

Automated reasoning
Logic
Many forms of tree search
Semantic networks
Planning

Bottom up or machine learning techniques

Neural networks, computer with similar structure to the brain
Machine learning

The top down systems are programmed by hand, while the bottom up systems learns themselves based on examples without human intervention, a bit like the brain.

What Is Causing This Sudden Leap?

Many top down techniques were developed by the 1960s. They were very good ideas, but they did not scale; they only worked for small toy problems.
Neural networks are an important bottom up technique. They started in 1950s, but fell out of favor; they came roaring back in 1980s. In the 1990 the machine learning / statistical approaches to natural language processing beat out Chomsky's generative grammar approach.

The technology that is needed for what we are doing now have been around for a long time. Why are these systems popping up now?

I think that we are seeing the beginning of a combination machine learning with top down techniques. The reason why this have taken so long is that it is hard to combine top down and bottom up techniques. Let me elaborate a little bit:

Bottom up AI / machine learning are black boxes that you give some input and expected output and it will adjust a lot of parameter numbers so it can mimic the result. Usually the numbers will not make much sense they just work.

In top down / symbolic AI you are creating detailed algorithms for working with concepts that make sense.

Both top down and bottom up techniques are now well developed and better understood. This makes it easier to combine them.

Other reasons for the leap are:

Cheap, powerful and highly parallel computers
Open source software, were programmers from around the world develop free software. This makes programming into more of an industrial assembly of parts.

Who Will Benefit From An AI Boom?

Here are some groups of companies that made a lot of money during the Internet boom:

Cisco and Oracle the tool makers
Amazon and eBay small companies that grew to become domineering in e-commerce
Google and Yahoo advertisement driven information companies

Initially big companies like IBM and Google that can create the technology should have an advantage, whether it will be in the capacity of tool makers or domineering players.

It is hard to predict how high the barrier to entry in AI will be. AI programs are just trained on regular text found on or off the Internet. And today's super computer is tomorrow's game console. The Internet has a few domineering players, but it is generally decentralized and anybody can have a web presence.

New York is now filled with startups using machine learning as a central element. They are "funded", but it seems like they got some seed capital. So maybe there is room for smaller companies to compete in the AI space.

Job Skills That Will Be Required In An AI Boom

During the Internet boom I met people with a bit of technical flair and no education beyond high school who picked up HTML in a week and next thing they were making $60/hour doing plain HTML. I think that the jobs in artificial intelligence are going to be a little more complex than those 1990s web developer jobs.

In my own work I have noticed a move from writing programming to teaching software based on examples. This is a dramatic change, and it requires a different skill set.

I think that there will still be plenty of need for programmers, but cognitive science, mathematics, statistics and linguistics will be skills in demand.

My work would benefit from me having better English language skills. The topic that I am dealing with is, after all, the English language. So maybe that English literature degree could come in handy.

Currently I feel optimistic about the field of artificial intelligence; there is progress after years of stagnation. We are wrestling a few secrets away from Mother Nature, and are making progress in understanding how the brain works. Sill, introduction of such powerful technology as artificial intelligence is going to affect society for better and worse.

Saturday, April 3, 2010

Data Mining rediscovers Artificial Intelligence

Artificial intelligence started in the 1950s with very high expectations. AI did not deliver on the expectations and fell into decades long discredit. I am seeing signs that Data Mining and Business Intelligence are bringing AI into mainstream computing. This blog posting is a personal account of my long struggle to work in artificial intelligence during different trends in computer science.

In the 1980s I was studying mathematics and physics, which I really enjoyed. I was concerned about my job prospects, there are not many math or science jobs outside of academia. Artificial intelligence seemed equally interesting but more practical, and I thought that it could provide me with a living wage. Little did I know that artificial intelligence was about to become an unmentionable phrase that you should not put on your resume if you wanted a paying job.

Highlights of the history of artificial intelligence

In 1956 AI was founded.
In 1957 Frank Rosenblatt invented Perceptron, the first generation of neural networks. It was based on the way the human brain works, and provided simple solutions to some simple problems.
In 1958 John McCarthy invented LISP, the classic AI language. Mainstream programming languages have borrowed heavily from LISP and are only now catching up with LISP.
In the 1960s AI got lots of defense funding. Especially military translation software translating from Russian to English.

AI theory made quick advances and a lot was developed early on. AI techniques worked well on small problems. It was expected that AI could learn, using machine learning, and this soon would lead to human like intelligence.

This did not work out as planned. The machine translation did not work well enough to be usable. The defense funding dried up. The approaches that had worked well for small problems did not scale to bigger domains. Artificial intelligence fell out of favor in the 1970s.

AI advances in the 1980s

When I started studying AI, it was in the middle of a renaissance and I was optimistic about recent advances:

The discovery of new types of neural networks, after Perceptron networks had been discredited in an article by Marvin Minsky
Commercial expert system were thriving
The Japanese Fifth Generation Computer Systems project, written in the new elegant declarative Prolog language had many people in the West worried
Advances in probability theory Bayesian Networks / Causal Network

In order to combat this brittleness of intelligence Doug Lenat started a large scale AI project CYC in 1984. His idea was that there is no free lunch, and in order to build an intelligent system, you have to use many different types of fine tuned logical inference; and you have to hand encode it with a lot of common sense knowledge. Cycorp spent hundreds of man years building their huge ontology. Their hope was that CYC would be able to start learning on its own, after training it for some years.

AI in the 1990s

I did not loose my patience but other people did, and AI went from the technology of the future to yesterday's news. It had become a loser that you did not want to be associated with.

During the Internet bubble when venture capital founding was abundant, I was briefly involved with an AI Internet start up company. The company did not take off; its main business was emailing discount coupons out to AOL costumers. This left me disillusioned, thinking that I just have to put on a happy face when I worked on the next web application or trading system.

AI usage today

Even though AI stopped being cool, regular people are using its use it in more and more places:

Spam filter
Search engines use natural language processing
Biometric, face and fingerprint detection
OCR, check reading in ATM
Image processing in coffee machine detecting misaligned cups
Fraud detection
Movie and book recommendations
Machine translation
Speech understanding and generation in phone menu system

Euphemistic words for AI techniques

The rule seem to be that you can use AI techniques as long as you call it something else, e.g.:

Business Intelligence
Collective Intelligence
Data Mining
Information Retrieval
Machine Learning
Natural Language Processing
Predictive Analytics
Pattern Matching

AI is entering mainstream computing now

Recently I have seen signs that AI techniques are moving into mainstream computing:

I went to a presentation for SPSS statistical modeling software, and was shocked how many people now are using data mining and machine learning techniques. I was sitting next to people working in a prison, adoption agency, marketing, disease prevention NGO.
I started working on a data warehouse using SQL Server Analytic Services, and found that SSAS has a suite of machine learning tools.
Functional and declarative techniques are spreading to mainstream programming languages.

Business Intelligence compared to AI

Business Intelligence is about aggregating a company's data into an understandable format and analyzing it to provide better business decisions. BI is currently the most popular field using artificial intelligence techniques. Here are a few words about how it differs from AI:

BI is driven by vendors instead of academia
BI is centered around expensive software packages with a lot of marketing
The scope is limited, e.g. find good prospective customers for your products
Everything is living in databases or data warehouses
BI is data driven
Reporting is a very important component of BI

Getting a job in AI

I recently made a big effort to steer my career towards AI. I started an open source computer vision project, ShapeLogic and put AI back on my resume. A head hunter contacted me and asked if I had any experience in Predictive Analytics. It took me 15 minutest to convince her that Predictive Analytics and AI was close enough that she could forward my resume. I got the job, my first real AI and NLP job.

The work I am doing is not dramatically different from normal software development work. I spend less time on machine learning than on getting AJAX to work with C# ASP.NET for the web GUI; or upgrade the database ORM from ADO.NET strongly typed datasets to LINQ to SQL. However, it was very gratifying to see my program started to perform a task that had been very time consuming for the company's medical staff.

Is AI regaining respect?

No, not now. There are lots of job postings for BI and data mining but barely any for artificial intelligence. AI is still not a popular word, except in video games where AI means something different. When I worked as a games developer what was called AI was just checking if your character was close to an enemy and then the enemy would start shooting in your character's direction.

After 25 long years of waiting I am very happy to see AI techniques has finally become a commodity, and I enjoy working with it even if I have to disguise this work by whatever the buzzword of the day is.

-Sami Badawi