Jonas Byström Jonas Byström - 2 months ago 16
Java Question

PHP to Java (using PtoJ)

I would like to transition our codebase from poorly written PHP code to poorly written Java, since I believe Java code is easier to tidy up. What are the pros and cons, and for those who have done it yourselves, would you recommend PtoJ for a project of about 300k ugly lines of code? Tips and tricks are most welcome; thanks!

Answer

Poorly written PHP is likely to be very hard to convert because a lot of the bad stuff in PHP just doesn't exist in Java (the same is true vice versa though, so don't take that as me saying Java is better - I'm going to keep well clear of that flame-war).

If you're talking about a legacy PHP app, then its highly likely that your code contains a lot of procedural code and inline HTML, neither of which are going to convert well to Java.

If you're really unlucky, you'll have things like eval() statements, dynamic variable names (using $$ syntax), looped include() statements, reliance on the 'register_globals' flag, and worse. That kind of stuff will completely thwart any conversion attempt.

Your other major problem is that debugging the result after the conversion is going to be hell, even if you have beautiful code to start with. If you want to avoid regressions, you will basically need to go through the entire code base on both sides with a fine comb.

The only time you're going to get a satisfactory result from an automated conversion of this type is if you start with a reasonably tide code base, written at least mainly in up-to-date OOP code.

In my opinion, you'd be better off doing the refacting excersise before the conversion. But of course, given your question, that would rather defeat the point. Therefore my recommendation is to stick it in PHP. PHP code can be very good, and even bad PHP can be polished up with a bit of refactoring.

[EDIT]

In answer to @Jonas's question in the comments, 'what is the best way to refactor horrible PHP code?'

It really depends on the nature of the code. A large monolithic block of code (which describes a lot of the bad PHP I've seen) can be very hard (if not imposible) to implementunit tests for. You may find that functional tests are the only kind of tests you can write on the old code base. These would use Selenium or similar tools to run the code through the browser as if it were a user. If you can get a set of reliable functional tests written, it is good for helping you remain confident that you aren't introducing regressions.

The good news is that it can be very easy - and satisfying - to rip apart bad code and rebuild it.

The way I've approached it in the past is to take a two-stage approach.

Stage one rewrites the monolithic code into decent quality procedural code. This is relatively easy, and the new code can be dropped into place as you go. This is where the bulk of the work happens, but you'll still end up with procedural code. Just better procedural code.

Stage two: once you've got a critical mass of reasonable quality procedural code, you can then refactor it again into an OOP model. This has to wait until later, because it is typically quite hard to convert old bad quality PHP straight into a set of objects. It also has to be done in fairly large chunks because you'll be moving large amounts of code into objects all at once. But if you did a good job in stage one, then stage two should be fairly straightforward.

When you've got it into objects, then you can start seriously thinking about unit tests.