Friday, January 11, 2008

CICE/ARM observations

When looking at a proposed new language feature it's often useful to compare it against similar features in other languages, and against alternative proposals. Such comparisons can lead to a deeper understanding of the opportunities and pitfalls involved, and this certainly holds true in the case of adding closures to Java.

Neal Gafter has made available a prototype implementation of a large chunk of the BGGA specification, which was immensely helpful in my own attempts at understanding that proposal. However, Josh Bloch's recent comments on the closures debate caused me to think that it would be useful to have a similar prototype of the CICE and ARM proposals as a more concrete basis for comparison, so over the last couple of weeks I've attempted to put one together (it makes a change from the day job). The result is that I now have a version of javac which implements one interpretation of CICE/ARM.

A number of observations about both CICE/ARM and BGGA have come out of building and using this thing, and I'll blog about some of them over the coming days. For now, I'll start off with a couple of quick comments:

Firstly and most importantly, both the CICE and ARM documents really are 'a bit sketchy' (to quote Josh Bloch) in places - the ARM document in particular, which covers a good many of the possible options and issues within its chosen design space, but seems to stop short of making a call on some of the more important ones (this is why I say I've implemented 'one interpretation' of them). There seem to be quite a few valid possibilities when you combine the options for resource acquisition/initialisation/disposal and exception handling, and there's no obvious 80/20 rule apparent to me at least. In comparison, BGGA avoids such issues largely by being more open and flexible about these options, and therefore not having to specify them.

Something I didn't expect was that 'this' within a CICE doesn't mean what I subconsciously expect it to when writing code - I find myself thinking it means the instance of the containing class, not of the CICE itself. Maybe it's because of the shorter syntax, or the fact that I've been using BGGA recently, although I do get this problem with normal anonymous classes in Java from time to time anyway.

Finally (and I'm not sure about this one yet) I have a suspicion that the current syntax for a CICE could be a bit tricky to parse if you want to stick to one-token lookahead, possibly involving a fair bit of messing with the grammar which javac is based on.

More later.

4 comments:

Josh Bloch said...

Mark,

Good Lord, if I had know that you were doing this, I'd have updated and tightened up the proposals a bit first! I haven't done anything to them since they were (co-)written, and I've had a number of thoughts about how to improve them in the meantime.

Thanks so much for putting together a prototype! Honestly, I'm speechless. I'd love to play with it.

Josh

P.S. As for parsing difficulties, I'm not surprised. The syntax for CICE wasn't designed all that carefully. It was a first cut, really. I'd welcome suggestions for improvement, from a parseablility or readability standpoint. One thing that various folks have mentioned is that you could leave out the types of the parameters: SAM-types have only one method, so the parameter types are redundant. Whether that makes the construct more ore less readable, I don't know. I suspect it would have no impact on the parsing difficulty.

Mark Mahieu said...

Hi Josh,

I'm very keen to hear your current thoughts on how the proposals should look :)

I should be able to make the prototype available this weekend, along with some form of release note explaining what it does and doesn't implement out of the ideas in the existing documents.

Also, please don't take my concerns about parseability too seriously at this point - I only spent a short time looking into it before deciding that it wasn't the key problem I was trying to solve right then, and opting for multi-token lookahead as a temporary approach. I'll revisit it once the initial prototype is available though, as it's nagging me.

Regarding the suggestion that types could be omitted from the parameters, I take it to mean that the following code fragment:

    Collections.sort(list, Comparator<Integer>(Integer i1, Integer i2) {
        return i2.compareTo(i1);
    });

could be written as:

    Collections.sort(list, Comparator<Integer>(i1, i2) {
        return i2.compareTo(i1);
    });

It's easy enough to implement I think, but in my opinion this falls into the same trap C# has recently with its type inference - for a reader unfamiliar with the APIs involved, the most useful information about i1 and i2 is no longer readily apparent. It also looks very similar to a constructor invocation, as the parameter declarations now look like expressions.

Personally I think I'd prefer it if the compiler could infer the type parameter(s) - making something like this possible:

    Collections.sort(list, Comparator(Integer i1, Integer i2) {
        return i2.compareTo(i1);
    });

As the CICE document mentions, one approach is to infer the types from the method signature and body. However, this wouldn't work so well if some but not all of the type parameters are used in the implemented method. Consider the following class, which crops up in some form every so often:

public abstract class FilteredMap<K, V> extends HashMap<K,V> {
    
    public V put(K key, V value) {
        V oldValue = null;
        if (acceptsValue(value)) {
            oldValue = super.put(key, value);
        }
        return oldValue;
    }

    abstract boolean acceptsValue(V value);
}

Creating one of these using a CICE, V can be inferred from the method but K cannot. In that case, it would be possible to allow the following, where only the types which cannot be inferred need be specified:

    Map<Foo, Bar> noNullValueMap = FilteredMap<Foo>(Bar value) {
        return value != null;
    };

But that's nasty for multiple reasons - the programmer writing this code has to understand far too much about how the type inference works in order to work out which of the type parameters are needed, and anyone reading the code is going to be just as confused... what is a FilteredMap<Foo> ?. All or nothing are the only options that work for me in this case.

The more generally useful constructor type inference which Neal and others have blogged about might allow the following, which seems far simpler:

    Collections.sort(list, Comparator<>(Integer i1, Integer i2) {
        return i2.compareTo(i1);
    });

    Map<Foo, Bar> filteredMap = FilteredMap<>(Bar value) {
        return value != null;
    };



Mark

Casper Bang said...

The following of the ARM proposal is not exactly correct:

"...for C#, it is the using statement. Java deserves no less. In fact, it deserves better: The only way to handle multiple resources with using statements is to nest them, which is ugly."

You can indeed new up multiple variables, but they are required to be of the same type (or a subtype), you simply separate the instantiations with a comma.

Marian said...

This won't actually have effect, I suppose like this.