Languages and Logic: 2013

Monday, September 23, 2013

Big Data: What Worked?

"Big data" created an explosion of new technologies and hype: NoSQL, Hadoop, cloud computing, highly parallel systems and analytics.

I have worked with big data technologies for several years. It has been a steep learning curve, but lately I had more success stories.

This post is about the big data technologies I like and continue to use. Big data is a big topic. These are some highlights from my experience.

I will relate big data technologies to modern web architecture with predictive analytics and raise the question:

What big data technologies should I use for my web startup?

Classic Three Tier Architecture

For a long time software development was dominated by the three tier architecture / client server architecture. It is well described and conceptually simple:

Client
Server for business logic
Database

It is straightforward to figure out what computations should go where.

Modern Web Architecture With Analytics

The modern web architecture is not nearly as well established. It is more like an 8 tiered architecture with the following components:

Web client
Caching
Stateless web server
Real-time services
Database
Hadoop for log file processing
Text search
Ad hoc analytics system

I was hoping that I could do something closer to the 3-tier architecture, but the components have very different features. Kicking off a Hadoop job from a web request could adversely affect your time to first byte.

A problem with the modern web architecture is that a given calculation can be done in many of those components.

Architecture for Predictive Analytics

It is not at all clear what component predictive analytics should be done in.

First you need to collect user metrics. In what components can do you do this?

Web servers, store the metric in Redis / caching
Web servers, store the metric in the database
Real-time services aggregates the user metric
Hadoop run on the log files

User metric is needed by predictive analytics and machine learning. Here are some scenarios for this:

If you are doing simple popularity based predictive analytics this can be done in the web server or a real-time service.
If you use a Bayesian bandit algorithm you will need to use a real-time service for that.
If you recommend based on user similarity or item similarity you will need to use Hadoop.

Hadoop

Hadoop is a very complex piece of software to handle very large amounts of data that cannot be handled by conventional software because it is too big to fit on one computer.

I compared different Hadoop libraries in my post: Hive, Pig, Scalding, Scoobi, Scrunch and Spark.

Most developers that have used Hadoop complain about it. I am no exception. I still have problems with Hadoop jobs failing due to errors that are hard to diagnose. Generally I have been a lot happier about Hadoop lately. I am only using Hadoop for big custom extractions or calculations from log files stored in HDFS. I do my Hadoop work in Scalding or HIVE

The Hadoop library Mahout can calculate user recommendations based on user similarity or item similarity.

Scalding

Hadoop code in Scalding looks a lot like normal Scala code. The scripts I am writing are often just 10 lines of code and look a lot like my other Scala code. The catch is that you need to be able to write idiomatic functional Scala code.

HIVE

HIVE makes it easy to extract and combine data from HDFS. You just write SQL after some setup of a directory with table structure in HDFS.

Real-time Services

Libraries like Akka, Finagle and Storm are good for having long running stateful computations.
It is hard to write correct highly parallel code that scales to multiple machines using normal multithreaded programming. For more details see my blog post: Akka vs. Finagle vs. Storm.

Akka and Spray

Akka is a simple actor model taken from the Erlang language. In Akka you have a lot of very lightweight actors, they can share a thread pool. They do not block on shared state but communicate by sending immutable messages.

One reason that Akka is a good fit for real-time services is that you can do varying degrees of loose coupling and all services can talk with each other.

It is hard to change from traditional multithreaded programming to using the actor model. There are just a lot of new actor idioms and design patterns that you have to learn. At first the actor model seems like working with a sack of fleas. You have much less control over the flow due to the distributed computation.

Spray makes it easy to put a web or RESTful interface to your service. This makes it easy to connect your service with the rest of the world. Spray also has the best Scala serialization system I have found.

Akka is well suited for: E-commerce, high frequency trading in finance, online advertising and simulations.

Akka in Online Advertising

Millions of users are interacting with fast changing ad campaigns. You could have actors for:

Each available user
Each ad campaign
Each ad
Clustering algorithms
Each cluster of users
Each cluster of ads

Each actor is developing in time and can notify and query all other actors.

NoSQL

There are a lot of options, with no query standard:

Cassandra
CouchDB
HBase
Memcached
MongoDB
Redis
SOLR

I will describe my initial excitement about NoSQL, the comeback of SQL databases and my current view on where to use NoSQL and where to use SQL.

MongoDB

MongoDB was my first NoSQL technology. I used it to store structured medical documents.

Creating a normalized SQL database that represents a structured data format is a sizable task and you easily end up with 20 tables. It is hard to insert a structured document into the database in the right sequence, so foreign key constraints are satisfied. LINQ to SQL helped with this but it was slow.

I was amazed by MongoDB's simplicity:

It was trivial to install
It could insert 1 million documents very fast
I could use the same Python NLP tools for many different types of documents

I felt that SQL databases were so 20th century.

After some use I realized that interacting with MongoDB was not as easy from Scala. I tried different libraries Subset and Casbah.

I also realized that it is a lot harder to query data from MongoDB than a SQL database both in syntax and expressiveness.
Recently SQL databases have added JSON as a data type, taking away some of MongoDB's advantage.

Today I use SQL databases for curated data. But MongoDB for ad hoc structured document data.

Redis

Redis is an advanced key value store that is mainly living in memory but with backup to disk. Redis is a good fit for caching. It has some specialized operations:

Simple to age out data
Simulates pub sub
Atomic update increments
Atomic list append
Set operations

Redis also supports sharding well, in the driver you just give a list of Redis servers and it will send the data to the right server. Redistributing data after adding more sharded servers to Redis is cumbersome.

I first thought that Redis had an odd array of features but it fits the niche of real-time caching.

SOLR

SOLR is the most used enterprise text search technology. It is built on top of Lucene.
It can store and search document with many fields using an advanced query language.
It has an ecosystem of plugins doing a lot of the things that you would want. It is also very useful for natural language processing. You can even use SOLR as a presentation system for your NLP algorithms.

To Cloud or not to Cloud

A few years back I thought that I would soon be doing all my work using cloud computing services like Amazon's AWS. This did not happen, but virtualization did. When I request a new server the OPS team usually spins up a virtual machine.
A problem with cloud services is that storage is expensive. Especially Hadoop sized storage.

If I were in a startup I would probably consider the cloud.

Big and Simple

My fist rule for software engineering is: Keep is simple.

This is particularly important in big data since size creates inherent complexity.
I made the mistake of being too ambitious too early and think out too many scenarios.

Startup Software Stack

Back to the question:

What big data technologies should I use for my web startup?

A common startup technology stack is:
Ruby on Rails for your web server and Python for your analytics and hope that a lot of beefy Amazon EC2 servers will scale your application when your product takes off.
It is fast to get started and the cloud will save you. What could possibly go wrong?

The big data approach I am describing here is more stable and scalable, but before you learn all these technologies you might run out of money.

My answer is: It depends on how much data and how much money you have.

Big Data Not Just Hype

"Big data" is misused and hyped. Still there is a real problem, we are generating an astounding amount of data and sometimes you have to work with it. You need new technologies to wrangle this data.

Whenever I see a reference to Hadoop in a library I get very uneasy. These complex big data technologies are often used where much simpler technologies would have sufficed. Make sure your really need them before you start. This could be the difference between your project succeeding or failing.

It has been humbling to learn these technologies but after much despair I now enjoy working with them and find them essential for those truly big problems.

Friday, May 17, 2013

LISP Prolog and Evolution

I just saw David Nolen give a talk at a LispNYC Meetup called:

LISP is Too Powerful

It was a provocative and humorous talk. David showed all the powerful features of LISP and said that the reason why LISP is not more adapted is that it is too powerful. Everybody laughed but it made me think. LISP was decades ahead of other languages, why did it not become a mainstream language?

David Nolen is a contributor to Clojure and ClojureScript.
He is the creator of Core Logic a port of miniKanren. Core Logic is a Prolog like system for doing logic programming.

When I went to university my two favorite languages were LISP and Prolog. There was a big debate weather LISP or Prolog would win dominance. LISP and Prolog were miles ahead of everything else back then. To my surprise they were both surpassed by imperative and object oriented languages, like: Visual Basic, C, C++ and Java.

What happened? What went wrong for LISP?

Prolog

Prolog is a declarative or logic language created in 1972.

It works a little like SQL: You give it some facts and ask a question, and, without specifying how, prolog will find the results for you. It can express a lot of things that you cannot express in SQL.

A relational database that can run SQL is a complicated program, but Prolog is very simple and works using 2 simple principles:

Unification
Backtracking

The Japanese Fifth Generation Program was built in Prolog. That was a big deal and scared many people in the West in the 1980s.

LISP

LISP was created by John McCarthy in 1958, only one year after Fortran, the first computer language. It introduced so many brilliant ideas:

Garbage collection
Functional programming
Homoiconicity code is just a form of data
REPL
Minimal syntax, you program in abstract syntax trees

It took other languages decades to catch up, partly by borrowing ideas from LISP.

Causes for LISP Losing Ground

I discussed this with friends. Their views varied, but here are some of the explanations that came up:

Better marketing budget for other languages
Start of the AI winter
DARPA stopped funding LISP projects in the 1990s
LISP was too big and too complicated and Scheme was too small
Too many factions in the LISP world
LISP programmers are too elitist
LISP on early computers was too slow
An evolutionary accident
Lowest common denominator wins

LISP vs. Haskell

I felt it was a horrible loss that the great ideas of LISP and Prolog were lost. Recently I realized:

Haskell programs use many of the same functional programming techniques as LISP programs. If you ignore the parenthesis they are similar.

On top of the program Haskell has a very powerful type system. That is based on unification of types and backtracking, so Haskell's type system is basically Prolog.

You can argue that Haskell is the illegitimate child of LISP and Prolog.

Similarity between Haskell and LISP

Haskell and LISP both have minimal syntax compared to C++, C# and Java.
LISP is more minimal, you work directly in AST.
In Haskell you write small snippets of simple code that Haskell will combine.

A few Haskell and LISP differences

LISP is homoiconic, Haskell is not
LISP has a very advanced object system CLOS
Haskell uses monadic computations

Evolution and the Selfish Gene

In the book The Selfish Gene, evolutionary biologist Richard Dawkins makes an argument that genes are much more fundamental than humans. Humans have a short lifespan while genes live for 10,000s of years. Humans are vessels for powerful genes to propagate themselves, and combine with other powerful genes.

If you apply his ideas to computer science, languages, like humans, have a relatively short lifespan; ideas, on the other hand, live on and combine freely. LISP introduced more great ideas than any other language.

Open source software has sped up evolution in computer languages. Now languages can inherit from other languages at a much faster rate. A new language comes along and people start porting libraries.

John McCarthy's legacy is not LISP but: Garbage collection, functional programming, homoiconicity, REPL and programming in AST.

The Sudden Rise of Clojure

A few years back I had finally written LISP off as dead. Then out of nowhere Rich Hickey single-handed wrote Clojure.

Features of Clojure

Run on the JVM
Run under JavaScript
Used in industry
Strong thriving community
Immutable data structures
Lock free concurrency

Clojure proves that it does not take an Google, Microsoft or Oracle to create a language. It just takes a good programmer with a good idea.

Typed LISP

I have done a lot of work in both strongly typed and dynamic languages.

Dynamic languages give you speed of development and are better suited for loosely structured data.

After working with Scala and Haskell I realized that you can have a less obtrusive type system. This gives stability for large applications.

There is no reason why you cannot combine strong types or optional types with LISP, in fact, there are already LISP dialects out there that did this. Let me briefly mention a few typed LISPs that I find interesting:

Typed Racket and Typed Clojure do not have as powerful types systems as Haskell. None of these languages have the momentum of Haskell, but Clojure showed us how fast a language can grow.

LISP can learn a lesson from all the languages that borrowed ideas from LISP.

It is nature's way.

Wednesday, May 1, 2013

Collision with the Zeitgeist

It has been 5 years since I started my blog. Back then I was alone with my obscure computer interests: functional programming languages, machine learning and AI.

I felt lucky when I met a Python programmer that I could chat with once in a while. Occasionally I would boil over and just start a computer rant to people at social events, until they ran off.

When I met my first Haskell programmer at a machine learning Meetup, it was like seeing a unicorn. Now I use Scala for work and discuss Haskell, Idris, LISP, category theory or machine learning on a daily basis.

In April 2013 my blog had over 8000 page views, which is more than I ever had imagined.

I am still surprised that my once obscure interests have become part of the zeitgeist, and hope they will stay for a while. But with technology you never know what is coming next.

Monday, April 1, 2013

Akka vs. Finagle vs. Storm

Akka, Finagle and Storm are 3 new open source frameworks for distributed parallel and concurrent programming. They all run on the JVM and work well with Java and Scala.

They are very useful for many common problems:

Real-time analytics
Complex website with different input and outputs
Finance
Multiplayer games
Big data

Akka, Finagle and Storm are all very elegant solutions optimized for different problems. It is confusing what framework you should use for what problem. I hope that I can clarify this.

The 30 seconds history of parallel programming

Parallel / concurrent programming is hard. It had rudimentary support in C and C++ in the 1990s.

In 1995 Java made it much simpler to do simple concurrent programming on one machine by adopting the monitor primitive. Still if you had more than a few threads you could easily get deadlocks, and it did not solve the bigger problem of spreading a computation over many machines.

The growth of big data created a strong demand for better frameworks.

MapReduce and Hadoop

Hadoop the open source version of Google's MapReduce is the most well known method for distributed parallel programming.
It does streaming batch computation, by translating an algorithm to a sequence of map and reduce steps.
Map is fully parallel.
Reduce collect the result from the mapping step.

Hadoop has a steep learning curve. You should only use it when you have no other choice. Hadoop has a heavy stack with a lot of dependencies.

My post Hive, Pig, Scalding, Scoobi, Scrunch and Spark describes simpler frameworks built on top of Hadoop, but they are still complex.

Hadoop has long response time, and it not suited for real-time responses.

The Akka, Finagle and Storm frameworks are all easier to use than Hadoop and are suited for real-time responses.

Storm

Storm is created by Twitter and open sourced in 2011. It is written in Clojure and Java, but it works well with Scala. It is well suited for doing statistics and analytics on massive streams of data.

Storm can describe streaming computation very simply: You make a graph of you computation with some input data source called spouts at the top, below that computation nodes called bolts that can depend on any spout or bolt that has been computed above it, but you cannot have cycles. The graph is called a topology.

Features

Storm will deal with communication between machines and bolts
Consistent hashing to spread computation to right instance of a bolt
Error recovery due to hardware or network failure
Storm does not handle computations that fail due to inherent errors well
Can do analytics on Twitter scale input, literally
You can create a bolt in a non JVM language as long at it talks Thrift
Build in support for Ruby, Python, and Fancy

Word counter example in Storm

The hello world example for Hadoop is to count word frequencies in a big expanding text corpus.

This is a word counting topology in Storm:

Input spout is a stream of texts
Tokenization blot, you tell Storm how many instance of this you want, they can run either in local mode on one machine or in distributed mode on several machines.
Counting blot. This too can be split over different machines.

Spam filter topology in Storm

Let me give an outline of how to make a statistical spam filter using simple natural language processing. This can be used on emails, postings or comments.

There are 2 data paths:

Training path

Containing messages that have been classified as spam or ham.

Input spout: is tuple of the text and status: spam or ham
Tokenizer bolt. Input 1
Word frequency bolt. Input bolt: 2
Bigram finder bolt. Input bolts: 2 and 3
Most important word and bigram selector bolt. Input bolts: 2, 3 and 4
Feature extractor bolt. Input bolt: 5
Bayesian model bolt. Input 3, 4 and 6,

Categorizer path

The unprocessed messages would be sent to the Bayesian model bolt so it can be categorized as spam or ham.

Input spout: tuple of the text and an id for message
Tokenizer
The feature extractor. Extracting optimal feature set of words and bigram. Bolt 6 from above.
Bayesian model. Bolt 7 from above
Result bolt tuple message id and probability of spam

This spam filter is quite a complex system that fits well with Storm.

Akka

Akka v0.6 was released in 2010. Akka is now at v2.2 and is maintained by TypeSafe the company that maintain Scala and it part of their stack. It is written in Scala and Java.

Features

Akka is based on the Erlang language that was developed to handle telecom equipment and be highly parallel and fault tolerant.
Akka gives you very fine grained parallel programming. You split you tasks up into smaller sub tasks and have a lot of actors compute these tasks and communicate the result.
Works well for two way communication.
An actor that can communicate with mailboxes and they are single threaded.
The motto of Akka's team is: Let it fail early.
If an actor fails Akka has different strategies for recovery.
Actors can have share access to an object on one machine, e.g. a cache.
Communication between different actors is very simple, it is just a method call and it will give you a future to a value.
Actors are very lightweight, you can have 2.7 million actors per GB of heap.
There is a executions context with shared thread pool.
This works well when there is IO bound computations.
Turns computation into a Future, a monadic composable asynchronous computation.
Incorporate dataflow variables from a research language called Oz.

Use cases

Agents
Simulations
Social media
Multiplayer game
Finance
Online betting

Finagle

Finagle is created by Twitter and open sourced in 2011. It is written in Scala and Java.

Finagle lets internal and external heterogeneous services communicate. It implement asynchronous Remote Procedure Call (RPC) clients and servers.

Features

It is simple to create servers and clients for your application
Remote services calling each other
Speaks different protocols: HTTP, Thrift, Memcashed
Failover between servers
Is good for having one service query other services
Uniform error policy
Uniform limits on retry
Two way communication
Load balancing
Turns computation into a Future, a monadic composable asynchronous computation.

Use cases

Complex website using different services and supporting many different protocols
Web crawler

Side by side comparisons

Akka vs. Finagle

Akka and Finagle can both do two-way communication.

Akka is great as long as everything lives on one actor system or multiple remote actors systems. It is very simple to have them communicate.

Finagle is much more flexible if you have heterogeneous services and you need different protocols and fallbacks between servers.

Akka vs. Storm

Akka is better for actors that talk back and forth, but you have to keep track the actors, and make strategies for setting up different actor systems on different servers and make asynchronous request to those actor systems. Akka is more flexible than Storm but there is also more to keep track of.

Storm is for computations that move from upstream sources to different downstream sinks. It is very simple to set this up in Storm so it run computation over many distributed servers.

Finagle vs. Storm

Finagle and Storm both handle failover between machines well.

Finagle does heterogeneous two-way communication.

Storm does homogeneous one way communication, but does it in a simple way

Serialization of objects

Serialization of objects is a problem for all of these, since you have to send object between different machine, and the default Java serialization has problems:

It is very verbose
It does not work well with Scala's singleton objects

Here are a few Scala open source libraries:

My experience

It has been hard for me to decide what framework is the better fit in a given situation, despite a lot of reading and experimenting.

I started working on an analytics program and Storm was a great fit and much simpler than Hadoop.

I moved to Akka since I needed to:

Incorporate more independent sub systems all written in Scala
Make asynchronous call to external services that could fail
Setup up on-demand services

Now I have to integrate this analytics program with other internal and external services some in Scala some in other languages I am now considering if Finagle might be a better fit for this. Despite Akka being easier in a homogeneous environment.

Afterthought

When I went to school parallel programming was a buzz word. The reasoning was:

Parallel programming works like the brain and it will solve our computational problems

We did not really know how you would program it. Now it is finally coming of age.

Akka, Finagle, Storm and MapReduce are different elegant solutions to distributed parallel programming. They all use ideas from functional programming.

Tuesday, February 5, 2013

Scala vs. Haskell vs. Python

Functional programming is on the upswing, but should you bet your career on it, or is it a short-lived technology fad?

I have long wanted to use functional programming professionally and for the last year I have. Mainly Scala, written in Haskell style, plus some real Haskell programming.

Here is my impression of Scala and Haskell compared to my benchmark language, Python.

Scala

Scala is a functional object oriented hybrid language running on the JVM. It was created by Martin Odersky in 2003. Scala took Java / JVM and organized it nicely according to a few orthogonal principles.
Working in Scala has been a pleasure, there is a lot to like:

You have easy access to the giant world of Java libraries
Lot of libraries written for Scala
Very fast only 2 to 3 time slower than C
Big ecosystem
Easy to define a DSL in Scala so you can do everything in Scala
Very advanced type system
Adapted in the industry by: Twitter, LinkedIn, Foursquare, ...
Scala is the most adapted functional language
Web frameworks: Play, Scalatra, any kind of Java Servlets
Scalding: a very nice framework for Hadoop programming
Akka: an Erlang style actor system
Mixin composition
Good GUI with Swing and JavaFX
SBT the best build tool I have used
Scala is a full stack multi purpose language

Issues

It is very complex
It is a kitchen sink language
Confusing to keep Scala collections and parallel Java collections apart

Eclipse Plugin Scala IDE for Eclipse

The Scala Eclipse plugin is very solid, but not quite as good as the fantastic Java support.

Syntax highlighting
Code completion
Debugger
Shows compile errors with explanation
Rudimentary refactoring
Jump to definition

Monad and Applicative Functor

Two very important concepts in functional programming are monad and applicative functor.

The best reference I found was: Learn You a Haskell for a Great Good!.

A monad gives you simple ways of composing different operations. First it seems like an odd principle. Understanding monad took me several months.

In UNIX and OS X you can create complex program by piping simple commands together. A monad generalizes this a lot.

Once you understand the monad you will see monads pop up so many places. The monad is an amazingly powerful construct.

The last place I found monads unexpectedly showed up was in asynchronous programming, e.g. used in AJAX.
You send an external request and you do not block but you have a callback for when the result comes back. This is efficient but messy to program especially if you have a chain of requests to process and you have to have a lot of callbacks floating around. You can do this type of calculations using a future / promise, and luckily a future is a monad so you string a long list of operations after each other in a very simple way.

Scalaz

Scalaz is a Scala library that replicates a lot of Haskell constructs, at the cost of being similarly hard to understand.

You can work with monads in Scala without using Scalaz since the "for-statement" in Scala is syntactic sugar for monadic "for-comprehension".

I have programed Java in a functional style both professionally and for my open source project. It is possible but it is rather verbose and clunky. Scala is much more powerful, simpler and cleaner than both Java approaches, and Scalaz is a big step up from Scala.

When I started programming in Scala I read a really funny blog post called Truth about Scala that describes how a team starts to use Scala and first they are excited, but it quickly descends into a death spiral of complexity. I was concerned with this and tried to keep my code as simple as possible and avoid Scalaz for a long time. I would advise other to become very comfortable with Scala before starting to work with Scalaz.

Haskell

Haskell is a strongly typed, lazy, pure functional programming language. It is an academic research language created by a committee in 1987.
One reason that I got into Haskell was in order to understand monads and applicative functors, they are important constructs in Haskell and category theory.

There is a steep learning curve for Haskell. Maybe it is more like a hump you have to get over. Just getting to basic proficiency is hard. It took me around one year of low intensity studying, but one day it just made sense.

Haskell now has a lot of libraries
Libraries and dependencies are handled by Cabal
It is fast only 2 - 3 times slower than C
Great concurrency
Repa native parallel numerical array processing
Very small language
Very pure
Very terse code
Very advanced type system
Hoogle a Haskell search engine
Great web frameworks Happstack, Snap and Yesod

Issues

Bad GUI support
Module system is crude

Hoogle, a Search Engine for Haskell

A colleague told me that when he needed a function he would write it out its signature and put it into Hoogle and often it will take him to the function that he needed. First time I tried it and it actually took me to a function that solved a bigger part of the problem than what I was looking for.

When I searched Hoogle for this function signature:

(a -> Bool) -> [a] -> [Int]

I got these results in EclipseFP:

Eclipse Plugin EclipseFP

EclipseFP with Hoogle

The Haskell Eclipse plugin is quite good:

Syntax highlighting
Cabal integration
Hoogle integration
Code completion
Debugger
GHCi integration with automatic reload

Python

Python is a high-level language built on ideas from functional, imperative and object oriented programming. It was created by Guido van Rossum in 1989.

For many years Python was my favorite language. It is a language for kids and also for scientists and a lot of people in between.

Python is probably the easiest language to learn
It took me a day to learn well enough to use
Very minimal language
Very terse code
Excellent wrapper language
Many implementations: CPython, Jython (JVM), IronPython (CLR), PyPy
Good bindings to numerical packages: NumPy, SciPy
Used in computer vision since OpenCV choosing Python to be its scripting language
Used in natural language processing due to the NLTK
Great web frameworks: Django, TurboGear, CherryPy

Issues

Python is not quite a full stack language there are a few missing pieces:

Bad GUI support
Low-level numerical programming had to be done in external packages
Concurrency
Speed around 50 times slower compared to C

Eclipse Plugin PyDev

I like PyDev it has:

Syntax highlighting
Code completion
Debugger

Best Programming Language for Kids

If a kid can understand a technology it is well designed. My daughter is turning 5 and I am thinking about what language I should introduce her to first.

Python

My first inclination was to teach her Python since it is the simplest, but it needs to give immediate visual feedback. Python's lack of a good GUI is a problem.

Haskell

I have also been tempted to show her some Haskell to teach her good habits in a pure and minimal language. But if I tell her that:

"A monad is just a monoid in the category of endofunctors"

she will walk away or scream.

Scala

Kojo is a LOGO like graphical turtle programming environment written in Scala. Scala's type inference makes it simpler for kids who will not have a good concept of types.

My daughter plays with Kojo and she likes it. She comes and asks me if we can do the turtle?

Kojo notice green drawing turtle in the middle

So unexpectedly, Scala the biggest language, was the most kid friendly language. Based on a very small sample size.

Category Theory

Haskell is using plenty of concepts from category theory. E.g. the monad. In my quest to understand it I started to study category theory.

Category theory has been called: "Abstract nonsense", both by its practitioners and critics. And for very good reasons. It can suck you into a black hole of abstraction.

Category Theory Introductions

You do not need to understand category theory to program in Haskell or Scalaz, but if it helps you here are a few introduction videos.

Dominic Verity presents a gentle introduction to Category Theory:

http://vimeo.com/17207564

Dominic Verity on Category Theory (Part 2)

Error792's category theory class, currently there are 5 parts

Math and Programming

I have often said that there is no connection between math and programming. The only math you need to program is counting, and occasionally, addition. I felt:

Programmers are the grease monkeys of today

We move some data around and throw it on webpages

After working in Scala and Haskell I have changed my tune:

When you program in Scala you feel like an engineer

When you program in Haskell you feel like a mathematician

Adapting Haskell and Scalaz for a Team

Using Haskell and Scalaz takes a special mindset and a lot of dedication. I have been very lucky to work at a place that has attracted physicists, mathematicians and theoretical CS people.

If a big part of your team does not have these qualities you risk wasting time and chasing developers away.

On the other hand if your team is using Haskell or Scalaz you will attract this brand of developers.

Conclusion

I had high expectation when I started using functional programming full time, but I have been disappointed by new technology many times before. Functional programming met my high expectations. It has been challenging and very enjoyable.

I was a C++ programmer for 8 years, and considered C++ the one true way for high speed, high level programming.
Recently I looked at a code sample written in C++ and it hurts my eyes: Filled with boilerplate and state.

Functional programming is addictive and will make you spoiled

Functional programming is here to stay. It has been an important part of C# since v3.0. It is finally getting added to Java in Java 8 coming out soon. The classic functional languages LISP or ML are are the basis of: Clojure and F# that have thriving community and are used in industry. The time has come to invest some time in understanding functional programming.

Python

I enjoy Scala and Haskell more than Python, but Python seem to be the language that I always go back to. It is a power tool that adds very little weight to your programmer's toolbox. You get high return on investment with Python, while with Scala and especially Haskell you have to invest a lot and for a long time before you break even.

Scala

Scala is now popular enough that you can get a job doing it. Moving from Java or C# to Scala is pretty easy. Since you can start programming Scala like Java. Scala is a big and complex language with a big ecosystem and it takes months to get a deeper understanding. Scala is substantially more powerful than Java 7, but Java 8 has supposedly taken a lot of ideas from Scala.

Haskell

Haskell is definitely the road less traveled, but it is a road, not a trail. It is an academic research language created in 1987. Recently it has started to break into the mainstream. There are a few jobs in Haskell. Gaining basic proficiency in Haskell is quite hard, but afterwards other languages look a little clunky. Writing Haskell feels like doing math.

Scala vs. Haskell

Scala is a safer bet for most programmers, since it is better adapted to more tasks, and you can approximate Haskell pretty well with Scalaz. Scala has a very advanced type system to handle its object oriented features.

Haskell appeals to functional language purists, mathematicians and category theorists. Esthetically I prefer Haskell. It is terser and the type inference is better.

In most cases external factors would dictate whether Scala or Haskell would be a better fit for your project.

Haskell vs. Python

Haskell and Python have a lot in common:

Minimalistic languages
White space delimited
Very terse
List comprehension
Important tuple type
GUI binding to wxWidget, GTK

Haskell is statically typed and optimized towards purity and speed.

Python is dynamically typed and optimized towards pragmatism and simplicity.

Newer follow up article Practical Scala, Haskell and Category Theory