Fighting spam with Mollom on Glassfish

One of my main projects for the past six months has been the conversion of the
Mollom spam fighting service from an isolated
Java project into a Java Enterprise project. Recently, as part of this
conversion, we’ve migrated the Mollom Backend to the
Glassfish 3.0.1 Application Server.
This is, however, only the beginning….

The Mollom project, co-founded by Dries Buytaert
and Benjamin Schrauwen,
provides a “software as a service” backend that tells clients whether a
specific comment is spam or ham. If Mollom is not sure how a specific post
should be classified, it returns “unsure,” and clients most often then
display a CAPTCHA challenge to verify the ‘humanity’ of the poster.
The whole process is self-learning, and a number of parameters
(text analysis, IP addresses, and included links) are taking into account.
I’m continually amazed by Mollom’s accuracy.

Infrastructure

When I began work on this project in July of last year, the backend used
its own implementation of thread-pool, connection-pool, resource management
(and other similar things that we all wrote at least once in our Java careers).
This was quite reasonable for the early development of the system: early
Mollom work concentrated on the functionality of the classifiers and
reputation systems, and infrastructure was added and refined gradually.

As Mollom grows in popularity, more infrastructural work was needed.
In the end, before our most recent upgrades began, most resources were consumed
by infrastructure issues, instead of Mollom-specific concepts. Both Dries and
Benjamin realized that this was the moment to start using software designed as
a solution to some of these typical infrastructure problems, and we began
work on the port to Glassfish.

Why Glassfish?

The Java EE 6 specification addresses many of the needs of Mollom. Database
connection pooling, persistence management, data abstraction, thread pooling,
enterprise beans to provide core functionality — all these features are
needed in large enterprise projects and Mollom is no different in this area
than other large projects.

Glassfish was the first application server that fully implemented the Java
EE 6 specification, and it does so in a clear way. This is important to me:
when unexpected behavior occurs, I want to be able to trace that behavior.
The fact that Glassfish is open-source is also really helpful. The Glassfish
engineers are very approachable, and for people that don’t want to spend
hours in deep code dives, there are supported solutions to many common
problems.

Dries and I were recently interviewed about the Glassfish port of Mollom,
and you can read more about Mollom and Glassfish here. Also check out
this blogentry from Dries.

Going forward

The migration to a professional enterprise infrastructure like Glassfish is
not the endpoint of the project. It does, however, provide Mollom a solid
foundation that allows the creation of additional services and the use of
additional protocols.

Currently, the Mollom API uses XML-RPC but we are testing a REST
implementation as well. Actually, my blog has been using the Mollom REST
implementation for some time now, and we’ve been experimenting with it in a
number of ways.

I strongly belief that a clean REST interface to the Mollom spam module
will allow for broad and easy integration of many existing projects to
interface with Mollom.

The additional services we’ve added — and the robust backend they are running
on — are very exciting. Mollom goes way beyond protecting websites from
comment spam, and we’ve a number of things up our sleeve. Stay tuned for
new chapters in the Mollom story.