Tuesday, January 23, 2018

Adding Terms to Javadoc Search with Java 9

There is a relatively old web page called "Proposed Javadoc Tags" that appears to have originally been written in conjunction with Javadoc 1.2 that lists "tags that Sun may implement in Javadoc someday." The tags in this list are @category, @example, @tutorial, @index, @exclude, @todo, @internal, @obsolete, and @threadsafety. One of these tags, @index, has moved from "Proposed Tags" to "Standard Tags" with its inclusion in Java 9. The Java 9 Javadoc tool documentation states that the @index tag is used to specify an indexed "search term or a phrase" that can be searched for in Java 9's new Javadoc Search feature.

The ability to add terms for searching in Javadoc generated documentation has been desired for some time as demonstrated by the existence of JDK-4034228 ("stddoclet: Add @index doc-comment tag for generating an index from common words"), JDK-4279638 ("Javadoc comments: Need ability to tag words for inclusion in the API index"), and JDK-4100717 ("Allow user-specified index entries"). JEP 225 ("Javadoc Search") was used to "add a search box to API documentation generated by the standard doclet that can be used to search for program elements and tagged words and phrases within the documentation."

Javadoc in Java 9 and later will automatically include several constructs in the "Search" that can be performed from the generated HTML output. These searchable by default strings are those based on methods' names, members' names, types' names, packages' names, and modules' names. The advantage offered by @index is that phrases or search terms not built into the names of these just-listed constructs can be explicitly to the searched index.

There are several examples of where the ability to add customized text for searching Javadoc generated documentation can be useful. The Javadoc tool documentation references the "domain-specific term ulps" ("units in the last place") and explains that although "ulps is used throughout the java.lang.Math class," it "doesn't appear in any class or method declaration names." Using @index would allow the API designers of the Math class to add "ulps" to the searchable index to help people find the Math class when searching for "ulps." In Effective Java's Third Edition, Josh Bloch references another example of where Javadoc {@index} might be useful. In Item 56, Bloch cites an example using {@index IEEE 754} ("IEEE Standard for Floating-Point Arithmetic").

I recently ran into a case in the JDK where I thought use of {@index} would be appropriate. I posted recently on the Dual-Pivot Quicksort, but realized that one does not find any matches for that term when searching the Javadoc-generated output. It seems like it would be useful to add terms such as "Dual Pivot Quicksort" and "Mergesort" to the Javadoc search index via {@index}.

Unfortunately, having spaces in the text embedded in the {@index } tag seems to result in only the terms before the first space showing up in the rendered HTML (and being the only portions that can be searched). To demonstrate this, the following ridiculously contrived Java code contains three {@index} Javadoc tags representative of the three examples just discussed.

Java Code Using {@index} in Its Documentation

package dustin.examples.javadoc;

/**
 * Used to demonstrate use of JDK 9's Javadoc tool
 * "@index" tag.
 */
public class JavadocIndexDemonstrator
{
   /**
    * This method complies with the {@index IEEE 754} standard.
    */
   public void doEffectiveJava3Example()
   {
   }

   /**
    * Accuracy of the floating-point Math methods is measured in
    * terms of {@index ulps}, "units in the last place."
    */
   public void doMathUlpsExample()
   {
   }

   /**
    * This method uses a version of the {@index Dual-Pivot Quicksort}.
    */
   public void doDualPivotQuicksort()
   {
   }
}

When the Javadoc tool is executed against the above code on my Windows 10 machine in Java 9.0.4, the generated HTML page looks like this:

The "754" is missing in the generated HTML after "IEEE" and the "Quicksort" is missing after "Dual-Pivot" in the methods' documentation. The next code listing shows the generated HTML source code for these pieces with missing text.

HTML Source

<div class="block">This method uses a version of the <a id="Dual-Pivot" class="searchTagResult">Dual-Pivot</a>.</div>
 . . .
<div class="block">This method complies with the <a id="IEEE" class="searchTagResult">IEEE</a> standard.</div>

From the HTML output just shown, it becomes apparent why only the text before the first space appears in the page and is searchable. The "id" attribute associated with the "searchTagResult" class for each searchable entry consists of the searchable string. Because HTML "id" attributes cannot have spaces, only the characters up to the first space can be used for the "id" value.

Because spaces are not allowed in the "id" attributes, one of the following work-arounds would need to be used when dealing with multiple words in a single phrase for which search is desired.

  1. Remove spaces
    • "{@index IEEE 754}" becomes "{@index IEEE754}"
    • "{@index Dual-Pivot Quicksort}" becomes "{@index Dual-PivotQuicksort}"
  2. Replace spaces with allowable character (for example, hyphen)
    • "{@index IEEE 754}" becomes "{@index IEEE-754}"
    • "{@index Dual-Pivot Quicksort}" becomes "{@index Dual-Pivot-Quicksort}"
  3. Use separate {@index} for each word in phrase
    • "{@index IEEE 754}" becomes "{@index IEEE} {@index 754}"
    • "{@index Dual-Pivot Quicksort}" becomes "{@index Dual-Pivot} {@index Quicksort}"
  4. Use {@index} only on most important terms in phrase
    • "{@index Dual-Pivot Quicksort}" becomes "{@index Dual-Pivot} Quicksort"
  5. Represent multiple word phrase with common single word representation
    • This is why "ulps" in the Javadoc tool documentation works well rather than "units in the last place."

The "Motivation" section of JEP 225 ("Javadoc Search") nicely summarizes the benefits of this ability to search for terms in Javadoc:

The API documentation pages generated by the standard doclet can be hard to navigate if you're not already familiar with their layout. An external search engine can be used, but that may lead to an out-dated or irrelevant page. The browser's built-in search function can be used, but that is limited to searching within the current page rather than an entire body of documentation.

Although adding search capability to Javadoc-generated documentation is a minor addition in Java 9, it can be used to make documentation of one's Java code more useful to other developers and users of that code.

Monday, January 22, 2018

Faster Sorting of Arrays of Primitives Coming to Java?

It appears that sorting arrays of primitives in Java may experience a performance improvement in the not-so-far future. Vladimir Yaroslavskiy has posted a message to the core-libs-dev mailing list titled "The new optimized version of Dual-Pivot Quicksort" in which Yaroslavskiy writes of an "optimized and faster version of Dual-Pivot Quicksort" that he has "been working on ... for the last 5 years."

The "The new optimized version of Dual-Pivot Quicksort" message includes some historical background on the Dual-Pivot Quicksort; highlights relative performance of the new version for random data, "nearly structured arrays," and "period inputs"; provides a comprehensive summary of the changes involved; and provides a link for open code review of the changes.

The Dual-Pivot Quicksort algorithm was introduced to Java in 2009. In another core-libs-dev mailing list post written in September 2009 and called "Replacement of Quicksort in java.util.Arrays with new Dual-Pivot Quicksort", Yaroslavskiy wrote, "I'd like to share with you new Dual-Pivot Quicksort which is faster than the known implementations (theoretically and experimental). I'd like to propose to replace the JDK's Quicksort implementation by new one." That post described the "classical Quicksort algorithm" scheme and some modifications to that scheme before describing how "the new Dual-Pivot Quicksort uses *two* pivots elements" instead of the single pivot element used by all earlier schemes.

The original message "Replacement of Quicksort in java.util.Arrays with new Dual-Pivot Quicksort" features some other interesting historical details as well that are highlighted here.

  • An e-mail message pasted into this message from Jon Bentley states, "I think that Vladimir's contributions to Quicksort go way beyond anything that I've ever done, and rank up there with Hoare's original design and Sedgewick's analysis." That message also provides brief but interesting historical background on the development of quicksort. That message says much about Yaroslavskiy's contributions, but I think it also says much about Jon Bentley's character.
  • An e-mail message pasted into this message from Josh Bloch states, "I believe it's not unlikely that this code may end up getting ported to many languages and widely deployed in much the manner of Bentley and McIlroy's fine sort (which is nearing 20 successful years in the field)." This has turned out to be the case as other languages (or libraries for languages) have adopted this algorithm in some measure with examples including JavaScript, Python, and Ruby.

The likely performance improvements from the new and improved version of the Dual-Pivot Quicksort will be seen in use of the overloaded versions of Arrays.sort() methods on primitive array types. The search term "Dual-Pivot Quicksort" occurs 14 times in the Javadoc-generated HTML output associated with the JDK 9 version of the Arrays class:

Because the quicksort is only used for sorting primitives, these performance enhancements to the dual-pivot quicksort only affect methods on primitives and don't affect methods such as Arrays.sort(Object[]) that tend to use the merge sort instead.

As far as I can tell, there is no specific release of Java for which these performance improvements are targeted, but they seem to have had extensive review and testing, so the improvement of performance related to sorting of arrays of primitives may be coming soon to a version of Java near you.

References

Saturday, January 20, 2018

Immutable Versus Unmodifiable in JDK 10

Nearly two months ago, Stuart Marks wrote, "Immutability is like wine." He then reminded readers of Schopenhauer's Law of Entropy: "If you put a spoonful of wine in a barrel full of sewage, you get sewage. If you put a spoonful of sewage in a barrel full of wine, you get sewage." With that provided background, Marks applied Schopenhauer's Law of Entropy to immutability with "immutability" replacing "wine" and "mutability" replacing "sewage" to make this insightful observation:

Similarly, if you add a little immutability to something mutable, you get mutability. And if you add a little mutability to something immutable, you get mutability.

The context of this quotation is an online discussion starting in October regarding JDK 10-targeted JDK-8177290 ("add copy factory methods for unmodifiable List, Set, Map") and JDK-8184690 ("add Collectors for collecting into unmodifiable List, Set, and Map"). JDK-8177290 is a subtask of JDK-8156070 ("Immutable Collections enhancements"), which is described as "a container for various enhancements and improvement subtasks for the immutable collections." The discussion is rather lengthy with multiple and often quite different perspectives involving terms such as "immutable" and "unmodifiable." Indeed, in the first post in this discussion, Mark writes, "The term 'immutable' is inextricably intertwined with 'persistent' when it comes to data structures, and I believe we'll be explaining this forever if Java's 'immutable' means something different from everybody else's."

Pointers to the final determination on terminology to use can be found in the current text associated with JDK-8191517 ("Add copy factory methods for unmodifiable List, Set, Map"). This text includes this statement, "Provide definitions for 'view' collections, 'unmodifiable' collections, and 'unmodifiable view' collections." JDK-8191517 also references webrev.4.zip and specdiff.4.zip for additional low-level details. The remainder of this post will look at some of the low-level details documented in those referenced ZIP files.

The Javadoc comments added to select interfaces' source code in the referenced zip files contain additional details regarding the terms "'view' collections," "'unmodifiable' collections," and "'unmodifiable view' collections." For example, the Javadoc for java.util.Collection has the following descriptions added to its interface-level Javadoc comment:

  • "View Collections" - "Most collections manage storage for elements they contain. By contrast, view collections themselves do not store elements, but instead they rely on a backing collection to store the actual elements. Operations that are not handled by the view collection itself are delegated to the backing collection."
  • "Unmodifiable Collections" - "An unmodifiable collection is a collection, all of whose mutator methods ... are specified to throw UnsupportedOperationException. Such a collection thus cannot be modified by calling any methods on it. For a collection to be properly unmodifiable, any view collections derived from it must also be unmodifiable."
    • Regarding Modifications: "An unmodifiable collection is not necessarily immutable. If the contained elements are mutable, the entire collection is clearly mutable, even though it might be unmodifiable. ... However, if an unmodifiable collection contains all immutable elements, it can be considered effectively immutable."
  • "Unmodifiable View Collections" - "An unmodifiable view collection is a collection that is unmodifiable and that is also a view onto a backing collection. Its mutator methods throw UnsupportedOperationException}, as described above, while reading and querying methods are delegated to the backing collection. The effect is to provide read-only access to the backing collection."
    • Regarding Modifications: "Note that changes to the backing collection might still be possible, and if they occur, they are visible through the unmodifiable view. Thus, an unmodifiable view collection is not necessarily immutable. However, if the backing collection of an unmodifiable view is effectively immutable, or if the only reference to the backing collection is through an unmodifiable view, the view can be considered effectively immutable."
    • Examples: "[Collections] returned by Collections.unmodifiableCollection [and] Collections.unmodifiableList."

The bullets above look in detail at the comments added to the Javadoc for the java.util.Collection class, but Javadoc comments for other collections interfaces also have significant new commentary regarding immutability and unmodifiability related to those specific interfaces. For example, the java.util.List interface Javadoc comment shown in the previously referenced ZIP files discusses "Unmodifiable Lists", convenient mechanisms available to access such Lists, and characteristics of Lists retrieved through those mechanisms. The Javadoc comments for the java.util.Set and java.util.Map interfaces receive similar treatment.

So far, I've mostly focused on how the Javadoc documentation is being enhanced and how the terminology is being changed from "immutable" to "unmodifiable." It is worth pointing out here, however, that this change in terminology is associated with the addition of new "copy factory methods" and new Collectors that will make it easier to access unmodifiable collections. JDK-8191517 summarizes these new methods:

  • "Add a family of copyOf() methods to java.util.List, Set, and Map to copy the elements from an existing collection or Map."
  • "Add a family of collectors to java.util.stream.Collectors that will create an unmodifiable List, Set, or Map from a stream."

The Javadoc comment for the forthcoming Map.copyOf(Map) method states, "Returns an unmodifiable Map containing the entries of the given Map. The given Map must not be null, and it must not contain any null keys or values. If the given Map is subsequently modified, the returned Map will not reflect such modifications." An interesting (but not surprising) "Implementation Note" in the Javadoc comment states, "If the given Map is an unmodifiable Map , calling copyOf will generally not create a copy." The numerous overloaded Map.of() methods added to Map with Java 9 have their Javadoc comments modified to replace "immutable" with "unmodifiable" and to replace references to the section titled "Immutable Map Static Factory Methods" with references to the new name for that section ("Unmodifiable Maps"). The term "structurally immutable" has also been replaced by "unmodifiable."

The Set.copyOf(Collection) and List.copyOf(Collection) methods coming to Java 10 are similar to that described in the last paragraph for Map.copyOf(Map) and include the same changes in comment terminology mentioned for Map.

The additions to the Collectors class in Java 10 described by JDK-8191517 are the four methods toUnmodifiableList(), toUnmodifiableSet(), and two overloaded versions of toUnmodifiableMap(-) (one version accepts a BinaryOperator parameter).

As the virtues of immutability are being more generally realized and as Java developers strive to apply immutability more often in their applications, it is typically important to know precisely how a given structure, collection, or view can be modified. JDK 10 is slated to add more methods to make it easier for Java developers to achieve immutability (or at least unmodifiability) of the collection and the comments on the most important interfaces and on the Collections class should help developers to more clearly understand what is mutable and what is not mutable in the constructs they select for their applications.

Wednesday, January 17, 2018

What's New in Effective Java's Third Edition?

Ever since hearing about the pending publication of the Third Edition of Effective Java, I've wondered what would be new in it. I assumed that features introduced to Java since Java 6 would be covered and that is indeed the case. However, there are some other changes as well to this third edition of the Java developer classic. In this post, I provide a high-level overview of the topics that are added, changed significantly, or removed in this third edition.

Before listing what I've observed that appears to be new in Effective Java, Third Edition, I need to make the disclaimer statement that I'm likely to miss several changes throughout this book with 12 chapters encompassing 90 items covering well over 350 pages. This post is not intended to provide detailed coverage of the changes in the third edition, but rather is intended as a high-level sampling of the changes and readers are encouraged to borrow or purchase a copy of this Third Edition of Effective Java to access the low-level details.

As expected, there is significant new content in Effective Java, Third Edition related to new features of Java 7, Java 8, and even Java 9.

Java 7

An obvious new item motivated by Java 7 is Item 9 ("Prefer try-with-resources to try-finally") because try-with-resources was introduced with Java 7. Item 32 ("Combine generics and varargs judiciously") is new to the third edition and discusses Java 7-introduced @SafeVarargs annotation (which received some enhancements with Java 9).

Item 8 ("Avoid finalizers and cleaners") has been updated to discuss how to use Java 7-introduced AutoCloseable interface to replace finalizers and cleaners in some of their most common usages. Item 49 ("Check parameters for validity") has been updated to reference Objects.requireNonNull methods introduced with Java 7.

Item 80 ("Prefer executors, tasks, and streams to threads") has "streams" added to its title since the second edition of Effective Java and includes discussion regarding the addition of Fork/Join to the Executor framework in Java 7. Item 59 ("Know and Use the Libraries") discusses the ThreadLocalRandom that was introduced in Java 7.

Item 56 ("Write doc comments for all exposed API elements") discusses the -Xdoclint switch added to javadoc's command-line with JDK 7.

Java 8

Item 21 ("Design interfaces for posterity") covers best practices related to the use of default methods in Java interfaces. The entire Chapter 7 ("Lambdas and Streams") is, as its title describes, related to lambdas and streams introduced with Java 8 and consists of seven items (Item 42 through Item 48) on these functional programming concepts. Item 55 ("Return optionals judiciously") discusses proper use of Java 8-introduced Optional.

Item 1 ("Consider static factory methods instead of constructors") is not a new item in the third edition, but it now discusses static methods in interfaces as supported in Java 8 and enhanced in Java 9. Item 19 ("Design and document for inheritance or else prohibit it") is also not new, but now mentions the Javadoc @implSpec tag that was "added in Java 8 and used heavily in Java 9." Not surprisingly, Item 56 ("Write doc comments for all exposed API elements") also discusses @implSpec use.

Item 50 ("Make defensive copies when needed") does not focus much on it (dates and times are not the focus of that item), but does reference employing Instant instead of Date as of Java 8.

Java 9

The third edition of Effective Java provides less guidance than I anticipated related to modularity (Java Platform Module System), which is arguably the first thing many of us associate with Java 9. Item 15 ("Minimize the accessibility of classes and members") discusses the "two additional, implicit access levels introduced as part of the module system."

Item 8 ("Avoid finalizers and cleaners") was titled simply "Avoid finalizers" in the second edition. The addition of "and cleaners" to this item's title reflects that Java 9 deprecated the finalizer (for reasons I'm all too familiar with) and replaced it with the Cleaner class.

Item 56 ("Write doc comments for all exposed API elements") discusses use of Java 9-introduced Javadoc tag {@index}. Item 59 ("Know and Use the Libraries") discusses the method transferTo(OutputStream) that was added to InputStream with Java 9 in its discussion of why it's important to know what's available in standard libraries.

Item 6 ("Avoid creating unnecessary objects") mentions the deprecation in Java 9 of the Boolean constructor that accepts a single String parameter as an illustration of a point being made in that item. Incidentally, the only other Boolean constructor [Boolean(boolean)] was also deprecated in Java 9.

Effective Java, Third Edition addresses refinements made in Java 9 to static methods in interfaces (Item 1) and to Optional (Item 55). Item 19 also references Java 9's heavy use of @implSpec. Each of these of these three items were highlighted in the "Java 8" section earlier in this post.

Version-independent New General Java Items

There are some new items in Effective Java, Third Edition that lack an obvious connection to a newer version of Java than that covered in the second edition. These include Item 5 ("Prefer dependency injection to hardwiring resources"), Item 25 ("Limit source files to a single top-level class"), and Item 85 ("Prefer alternatives to Java serialization"). I have written a bit more about Item 85 ("Prefer alternatives to Java serialization") in my blog post "Using Google's Protocol Buffers with Java."

A Removed Item and the Appendix

One item from the second edition of Effective Java appears to have been entirely removed. The useful "Appendix" of the third edition is titled "Items Corresponding to the Second Edition" and it indicates that the single-page Item 73 ("Avoid thread groups") from the second edition has been "retired" in the third edition. This Appendix is also structured such that it's easy to identify that the second edition's Item 21 ("Use function objects to represent strategies") is replaced in third edition by Item 42 ("Prefer lambdas to anonymous classes"). Incidentally, there seems to be very few typos in this book in any of its editions, but one typo that does stand out in the third edition is for the row in the Appendix that correlates Item 69 from the second edition with Item 81 of the third edition.

Minor Text Updates

Several items in the third edition of Effective Java have had minor text updates, some of which have significant meaning in the change. These are the most difficult to call out, but I provide one example here. In the second edition, Bloch wrote in parenthetical passing that StringBuffer is "largely obsolete" compared to StringBuilder, but in the third edition this is more strongly worded to state that StringBuffer is the "obsolete predecessor" of StringBuilder. I agree wholeheartedly with that change in text.

Introduction

Eleven chapters in the Effective Java, Third Edition encompass the 90 items constituting "Best Practices for the Java Platform." However, Chapter 1 ("Introduction") is valuable because it associates "key features" from Java 7, Java 8, and Java 9 with the item or items that discuss those key features and the release of Java which introduced those key features. I wish I had paid attention to it earlier, but did not see this handy table on page 1 until after I was mostly finished composing this post. That table would have saved me a lot of time in identifying the items that cover Java 7, Java 8, and Java 9 new features!

The "Introduction" is also worth reading because it lays out the "few fundamental principles" from which "most of the rules in this book derive." I like that Bloch explicitly states in the Introduction, "This book is not for beginners: it assumes that you are already comfortable with Java." There are countless forums and threads online in which people ask for a good book for those new to Java. While I have highly recommended the various editions of Effective Java for intermediate and advanced Java developers, I've always felt that beginning Java developers are better off with a book written for learning Java and then should come to Effective Java when they know core concepts and want to know how to apply those concepts as clearly and simply as possible.

Conclusion

This post has highlighted some of the most significant additions and changes to Effective Java in the Third Edition. However, I only mentioned some of the quick references to Java 7, Java 8, and Java 9 and undoubtedly missed some new and changed text in my summary. The references to a few of the minor changes to items to reflect newer versions of Java have been intended to illustrate how new Java features are woven into multiple items that at first glance don't seem necessarily related to a newer version of Java.

Effective Java is the only book I've ever purchased three copies of, but I have now purchased one copy of each edition over the years and have not been sorry for doing so. The third edition of Effective Java not only covers new features of Java 7, Java 8, and Java 9, but also adds items and updates pre-existing items to reflect Josh Bloch's latest thinking on best practices using the Java programming language.

Tuesday, January 16, 2018

Using Google's Protocol Buffers with Java

Effective Java, Third Edition was recently released and I have been interested in identifying the updates to this class Java development book whose last edition only covered through Java 6. There are obviously completely new items in this edition that are closely related to Java 7, Java 8, and Java 9 such as Items 42 through 48 in Chapter 7 ("Lambdas and Streams"), Item 9 ("Prefer try-with-resources to try-finally"), and Item 55 ("Return optionals judiciously"). I was (very slightly) surprised to realize that the third edition of Effective Java had a new item not specifically driven by the new versions of Java, but that was instead was driven by developments in the software development world independent of the versions of Java. That item, Item 85 ("Prefer alternatives to Java Serialization") is what motivated me to write this introductory post on using Google's Protocol Buffers with Java.

In Item 85 of Effective Java, Third Edition, Josh Bloch emphasizes in bold text the following two assertions related to Java serialization:

  1. "The best way to avoid serialization exploits is to never deserialize anything."
  2. "There is no reason to use Java serialization in any new system you write."

After outlining the dangers of Java deserialization and making these bold statements, Bloch recommends that Java developers employ what he calls (to avoid confusion associated with the term "serialization" when discussing Java) "cross-platform structured-data representations." Bloch states that the leading offerings in this category are JSON (JavaScript Object Notation) and Protocol Buffers (protobuf). I found this mention of Protocol Buffers to be interesting because I've been reading about and playing with Protocol Buffers a bit lately. The use of JSON (even with Java) is exhaustively covered online. I feel like awareness of Protocol Buffers may be less among Java developers than awareness of JSON and so feel like a post on using Protocol Buffers with Java is warranted.

Google's Protocol Buffers is described on its project page as "a language-neutral, platform-neutral extensible mechanism for serializing structured data." That page adds, "think XML, but smaller, faster, and simpler." Although one of the advantages of Protocol Buffers is that they support representing data in a way that can be used by multiple programming languages, the focus of this post is exclusively on using Protocol Buffers with Java.

There are several useful online resources related to Protocol Buffers including the main project page, the GitHub protobuf project page, the proto3 Language Guide (proto2 Language Guide is also available), the Protocol Buffer Basics: Java tutorial, the Java Generated Code Guide, the Java API (Javadoc) Documentation, the Protocol Buffers release page, and the Maven Repository page. The examples in this post are based on Protocol Buffers 3.5.1.

The Protocol Buffer Basics: Java tutorial outlines the process for using Protocol Buffers with Java. It covers a lot more possibilities and things to consider when using Java than I will cover here. The first step is to define the language-independent Protocol Buffers format. This a done in a text file with the .proto extension. For my example, I've described my protocol format in the file album.proto which is shown in the next code listing.

album.proto

syntax = "proto3";

option java_outer_classname = "AlbumProtos";
option java_package = "dustin.examples.protobuf";

message Album
{
  string title = 1;
  repeated string artist = 2;
  int32 release_year = 3;
  repeated string song_title = 4;
}

Although the above definition of a protocol format is simple, there's a lot covered. The first line explicitly states that I'm using proto3 instead of the assumed default proto2 that is currently used when this is not explicitly specified. The two lines beginning with option are only of interest when using this protocol format to generate Java code and they indicate the name of the outermost class and the package of that outermost class that will be generated for use by Java applications to work with this protocol format.

The "message" keyword indicates that this structure, named "Album" here, is what needs to be represented. There are four fields in this construct with three of them being string format and one being an integer (int32). Two of the four fields can exist more than once in a given message because they are annotated with the repeated reserved word. Note that I created this definition without considering Java except for the two options that specify details of generation of Java classes from this format specification.

The album.proto file shown above now needs to be "compiled" into the Java source class file (AlbumProtos.java in the dustin.examples.protobuf package) that will allow for writing and reading Protocol Buffers's binary format that corresponds to the defined protocol format. This generation of Java source code file is accomplished using the protoc compiler that is included in the appropriate operating system-based archive file. In my case, because I'm running this example in Windows 10, I downloaded and unzipped protoc-3.5.1-win32.zip to get access to this protoc tool. The next image depicts my running protoc against album.proto with the command protoc --proto_path=src --java_out=dist\generated album.proto.

For running the above, I had my album.proto file in the src directory pointed to by --proto_path and I had a created (but empty) directory called build\generated for the generated Java source code to be placed in as specified by --java_out flag.

The generated class's Java source code file AlbumProtos.java in the specified package has more than 1000 lines and I won't list that generated class source code here, but it's available on GitHub. Among the several interesting things to note about this generated code is the lack of import statements (fully qualified package names used instead for all class references). More details regarding the Java source code generated by protoc is available in the Java Generated Code guide. It's important to note that this generated class AlbumProtos has still not been influenced by any of my own Java application code and is solely generated from the album.proto text file shown earlier in the post.

With the generated Java source code available for AlbumProtos, I now add the directory in which this class was generated to my IDE's source path because I'm treating it as a source code file now. I could have alternatively compiled it into a .class or .jar to use as a library. With this generated Java source code file now in my source path, I can build it alongside my own code.

Before going further in this example, we need a simple Java class to represent with Protocol Buffers. For this, I'll use the class Album that is defined in the next code listing (also available on GitHub).

Album.java

package dustin.examples.protobuf;

import java.util.ArrayList;
import java.util.List;

/**
 * Music album.
 */
public class Album
{
   private final String title;

   private final List<String> artists;

   private final int releaseYear;

   private final List<String> songsTitles;

   private Album(final String newTitle, final List<String> newArtists,
                 final int newYear, final List<String> newSongsTitles)
   {
      title = newTitle;
      artists = newArtists;
      releaseYear = newYear;
      songsTitles = newSongsTitles;
   }

   public String getTitle()
   {
      return title;
   }

   public List<String> getArtists()
   {
      return artists;
   }

   public int getReleaseYear()
   {
      return releaseYear;
   }

   public List<String> getSongsTitles()
   {
      return songsTitles;
   }

   @Override
   public String toString()
   {
      return "'" + title + "' (" + releaseYear + ") by " + artists + " features songs " + songsTitles;
   }

   /**
    * Builder class for instantiating an instance of
    * enclosing Album class.
    */
   public static class Builder
   {
      private String title;
      private ArrayList<String> artists = new ArrayList<>();
      private int releaseYear;
      private ArrayList<String> songsTitles = new ArrayList<>();

      public Builder(final String newTitle, final int newReleaseYear)
      {
         title = newTitle;
         releaseYear = newReleaseYear;
      }

      public Builder songTitle(final String newSongTitle)
      {
         songsTitles.add(newSongTitle);
         return this;
      }

      public Builder songsTitles(final List<String> newSongsTitles)
      {
         songsTitles.addAll(newSongsTitles);
         return this;
      }

      public Builder artist(final String newArtist)
      {
         artists.add(newArtist);
         return this;
      }

      public Builder artists(final List<String> newArtists)
      {
         artists.addAll(newArtists);
         return this;
      }

      public Album build()
      {
         return new Album(title, artists, releaseYear, songsTitles);
      }
   }
}

With a Java "data" class defined (Album) and with a Protocol Buffers-generated Java class available for representing this album (AlbumProtos.java), I'm ready to write Java application code to "serialize" the album information without using Java serialization. This application (demonstration) code resides in the AlbumDemo class which is available on GitHub and from which I'll highlight relevant portions of in this post.

We need to generate a sample instance of Album to use in this example and this is accomplished with the next hard-coded listing.

Generating Sample Instance of Album

/**
 * Generates instance of Album to be used in demonstration.
 *
 * @return Instance of Album to be used in demonstration.
 */
public Album generateAlbum()
{
   return new Album.Builder("Songs from the Big Chair", 1985)
      .artist("Tears For Fears")
      .songTitle("Shout")
      .songTitle("The Working Hour")
      .songTitle("Everybody Wants to Rule the World")
      .songTitle("Mothers Talk")
      .songTitle("I Believe")
      .songTitle("Broken")
      .songTitle("Head Over Heels")
      .songTitle("Listen")
      .build();
}

The Protocol Buffers generated class AlbumProtos includes a nested AlbumProtos.Album class that I'll be using to store the contents of my Album instance in binary form. The next code listing demonstrates how this is done.

Instantiating AlbumProtos.Album from Album

final Album album = instance.generateAlbum();
final AlbumProtos.Album albumMessage
   = AlbumProtos.Album.newBuilder()
      .setTitle(album.getTitle())
      .addAllArtist(album.getArtists())
      .setReleaseYear(album.getReleaseYear())
      .addAllSongTitle(album.getSongsTitles())
      .build();

As the previous code listing demonstrates, a "builder" is used to populate the immutable instance of the class generated by Protocol Buffers. With a reference to this instance, I can now easily write the contents of the instance out in Protocol Buffers's binary form using the method toByteArray() on that instance as shown in the next code listing.

Writing Binary Form of AlbumProtos.Album

final byte[] binaryAlbum = albumMessage.toByteArray();

Reading a byte[] array back into an instance of Album can be accomplished as shown in the next code listing.

Instantiating Album from Binary Form of AlbumProtos.Album

/**
 * Generates an instance of Album based on the provided
 * bytes array.
 *
 * @param binaryAlbum Bytes array that should represent an
 *    AlbumProtos.Album based on Google Protocol Buffers
 *    binary format.
 * @return Instance of Album based on the provided binary form
 *    of an Album; may be {@code null} if an error is encountered
 *    while trying to process the provided binary data.
 */
public Album instantiateAlbumFromBinary(final byte[] binaryAlbum)
{
   Album album = null;
   try
   {
      final AlbumProtos.Album copiedAlbumProtos = AlbumProtos.Album.parseFrom(binaryAlbum);
      final List<String> copiedArtists = copiedAlbumProtos.getArtistList();
      final List<String> copiedSongsTitles = copiedAlbumProtos.getSongTitleList();
      album = new Album.Builder(
         copiedAlbumProtos.getTitle(), copiedAlbumProtos.getReleaseYear())
         .artists(copiedArtists)
         .songsTitles(copiedSongsTitles)
         .build();
   }
   catch (InvalidProtocolBufferException ipbe)
   {
      out.println("ERROR: Unable to instantiate AlbumProtos.Album instance from provided binary data - "
         + ipbe);
   }
   return album;
}

As indicated in the last code listing, a checked exception InvalidProtocolBufferException can be thrown during the invocation of the static method parseFrom(byte[]) defined in the generated class. Obtaining a "deserialized" instance of the generated class is essentially a single line and the rest of the lines are getting data out of the instantiation of the generated class and setting that data in the original Album class's instance.

The demonstration class includes two lines that print out the contents of the original Album instance and the instance ultimately retrieved from the binary representation. These two lines include invocations of System.identityHashCode() on the two instances to prove that they are not the same instance even though their contents match. When this code is executed with the hard-coded Album instance details shown earlier, the output looks like this:

BEFORE Album (1323165413): 'Songs from the Big Chair' (1985) by [Tears For Fears] features songs [Shout, The Working Hour, Everybody Wants to Rule the World, Mothers Talk, I Believe, Broken, Head Over Heels, Listen]
 AFTER Album (1880587981): 'Songs from the Big Chair' (1985) by [Tears For Fears] features songs [Shout, The Working Hour, Everybody Wants to Rule the World, Mothers Talk, I Believe, Broken, Head Over Heels, Listen]

From this output, we see that the relevant fields are the same in both instances and that the two instances truly are unique. This is a bit more work than using Java's "nearly automatic" Serialization mechanism implementing the Serializable interface, but there are important advantages associated with this approach that can justify the cost. In Effective Java, Third Edition, Josh Bloch discusses the security vulnerabilities associated with deserialization in Java's default mechanism and asserts that "There is no reason to use Java serialization in any new system you write."

Monday, January 15, 2018

Easy Fine-Grained Sorting with JDK 8

Java 8's introduction of streams and useful static/default methods on the Comparator interface make it easy to compare two objects based on individual fields' values without need to implement a compare(T,T) method on the class whose objects are being compared.

I'm going to use a simple Song class to help demonstrate this and its Song.java code listing is shown next.

Song.java

package dustin.examples.jdk8;

/**
 * Simple class encapsulating details related to a song
 * and intended to be used for demonstration of JDK 8.
 */
public class Song
{
   /** Song title. */
   private final String title;

   /** Album on which song was originally included. */
   private final String album;

   /** Song's artist. */
   private final String artist;

   /** Year song was released. */
   private final int year;

   /**
    * Constructor accepting this instance's title, artist, and release year.
    *
    * @param newTitle Title of song.
    * @param newAlbum Album on which song was originally included.
    * @param newArtist Artist behind this song.
    * @param newYear Year song was released.
    */
   public Song(final String newTitle, final String newAlbum,
               final String newArtist, final int newYear)
   {
      title = newTitle;
      album = newAlbum;
      artist = newArtist;
      year = newYear;
   }

   public String getTitle()
   {
      return title;
   }

   public String getAlbum()
   {
      return album;
   }

   public String getArtist()
   {
      return artist;
   }

   public int getYear()
   {
      return year;
   }

   @Override
   public String toString()
   {
      return "'" + title + "' (" + year + ") from '" + album + "' by " + artist;
   }
}

The Song class whose listing was just shown lacks a compare method, but we can still compare instances of this class in JDK 8 very easily. Based on the class definition of Song just shown, the following code can be used to sort a List of song instances based, in order, on year released, artist, and finally album.

Sorting List of Songs by Year, Artist, and Album (in that order)

/**
 * Returns a sorted version of the provided List of Songs that is
 * sorted first by year of song's release, then sorted by artist,
 * and then sorted by album.
 *
 * @param songsToSort Songs to be sorted.
 * @return Songs sorted, in this order, by year, artist, and album.
 */
private static List<Song> sortedSongsByYearArtistAlbum(
   final List<Song> songsToSort)
{
   return songsToSort.stream()
      .sorted(
         Comparator.comparingInt(Song::getYear)
                   .thenComparing(Song::getArtist)
                   .thenComparing(Song::getAlbum))
      .collect(Collectors.toList());
}

The above code listing would have been slightly less verbose had I statically imported the Comparator and the Collectors, but it's still fairly concise to include those interface and class names in the listing and probably more useful for an introductory blog post on this subject.

In the above code listing, the static default methods Comparator.comparingInt and Comparator.thenComparing are used to sort the stream of Song associated with the underlying List by year, and then by artist, and finally by album. The code is highly readable and allows for comparison of objects (and resulting sorting of those instances) based on arbitrary individual accessor methods without need for an explicitly specified Comparator (natural sorting order used for each compared accessor result). Note that if an explicit Comparator is desired, it can be provided to these static default methods via overloaded methods of the same name that accept a Comparator.

The next code listing is the entire demonstration class. It includes the method just shown and also shows the contrived example constructed of an unsorted List of songs.

FineGrainSortingDemo.java

package dustin.examples.jdk8;

import static java.lang.System.out;

import java.util.ArrayList;
import java.util.Comparator;
import java.util.List;
import java.util.stream.Collectors;

/**
 * Demonstration of easy fine-grained sorting in JDK 8 via
 * stream support for sorting and Comparator's static and
 * default method implementations.
 */
public class FineGrainSortingDemo
{
   /**
    * Construct List of {@code Song}s.
    * 
    * @return Instances of {@code Song}.
    */
   private static List<Song> generateSongs()
   {
      final ArrayList<Song> songs = new ArrayList<>();
      songs.add(
         new Song(
            "Photograph",
            "Pyromania",
            "Def Leppard",
            1983));
      songs.add(
         new Song(
            "Hysteria",
            "Hysteria",
            "Def Leppard",
            1987));
      songs.add(
         new Song(
            "Shout",
            "Songs from the Big Chair",
            "Tears for Fears",
            1984));
      songs.add(
         new Song(
            "Everybody Wants to Rule the World",
            "Songs from the Big Chair",
            "Tears for Fears",
            1985));
      songs.add(
         new Song(
            "Head Over Heels",
            "Songs from the Big Chair",
            "Tears for Fears",
            1985
         ));
      songs.add(
         new Song(
            "Enter Sandman",
            "Metallica",
            "Metallica",
            1991
         )
      );
      songs.add(
         new Song(
            "Money for Nothing",
            "Brothers in Arms",
            "Dire Straits",
            1985
         )
      );
      songs.add(
         new Song(
            "Don't You (Forget About Me)",
            "A Brass Band in African Chimes",
            "Simple Minds",
            1985
         )
      );
      return songs;
   }

   /**
    * Returns a sorted version of the provided List of Songs that is
    * sorted first by year of song's release, then sorted by artist,
    * and then sorted by album.
    *
    * @param songsToSort Songs to be sorted.
    * @return Songs sorted, in this order, by year, artist, and album.
    */
   private static List<Song> sortedSongsByYearArtistAlbum(
      final List<Song> songsToSort)
   {
      return songsToSort.stream()
         .sorted(
            Comparator.comparingInt(Song::getYear)
                      .thenComparing(Song::getArtist)
                      .thenComparing(Song::getAlbum))
         .collect(Collectors.toList());
   }

   /**
    * Demonstrate fine-grained sorting in JDK 8.
    *
    * @param arguments Command-line arguments; none expected.
    */
   public static void main(final String[] arguments)
   {
      final List<Song> songs = generateSongs();
      final List<Song> sortedSongs = sortedSongsByYearArtistAlbum(songs);
      out.println("Original Songs:");
      songs.stream().forEach(song -> out.println("\t" + song));
      out.println("Sorted Songs");
      sortedSongs.forEach(song -> out.println("\t" + song));
   }
}

The output from running the above code is shown next and lists the newly ordered Songs after using the sorting code. It's worth noting that this stream.sorted() operation does not change the original List (it acts upon the stream rather than upon the List).

Original Songs:
 'Photograph' (1983) from 'Pyromania' by Def Leppard
 'Hysteria' (1987) from 'Hysteria' by Def Leppard
 'Shout' (1984) from 'Songs from the Big Chair' by Tears for Fears
 'Everybody Wants to Rule the World' (1985) from 'Songs from the Big Chair' by Tears for Fears
 'Head Over Heels' (1985) from 'Songs from the Big Chair' by Tears for Fears
 'Enter Sandman' (1991) from 'Metallica' by Metallica
 'Money for Nothing' (1985) from 'Brothers in Arms' by Dire Straits
 'Don't You (Forget About Me)' (1985) from 'A Brass Band in African Chimes' by Simple Minds
Sorted Songs
 'Photograph' (1983) from 'Pyromania' by Def Leppard
 'Shout' (1984) from 'Songs from the Big Chair' by Tears for Fears
 'Money for Nothing' (1985) from 'Brothers in Arms' by Dire Straits
 'Don't You (Forget About Me)' (1985) from 'A Brass Band in African Chimes' by Simple Minds
 'Everybody Wants to Rule the World' (1985) from 'Songs from the Big Chair' by Tears for Fears
 'Head Over Heels' (1985) from 'Songs from the Big Chair' by Tears for Fears
 'Hysteria' (1987) from 'Hysteria' by Def Leppard
 'Enter Sandman' (1991) from 'Metallica' by Metallica

JDK 8's introduction of streams and default and static methods in interfaces (particularly on Comparator in this case) make it easy to compare two objects field-by-field in a desirable order without any explicit Comparator other than the pre-built static default methods on the Comparator interface if the fields being compared have a desired natural order.

Thursday, January 11, 2018

Converting Collections to Maps with JDK 8

I have run into the situation several times where it is desirable to store multiple objects in a Map instead of a Set or List because there are some advantages from using a Map of unique identifying information to the objects. Java 8 has made this translation easier than ever with streams and the Collectors.toMap(...) methods.

One situation in which it has been useful to use a Map instead of a Set is when working with objects that lack or have sketchy equals(Object) or hashCode() implementations, but do have a field that uniquely identifies the objects. In those cases, if I cannot add or fix the objects' underlying implementations, I can gain better uniqueness guarantees by using a Map of the uniquely identifying field of the class (key) to the class's instantiated object (value). Perhaps a more frequent scenario when I prefer Map to List or Set is when I need to lookup items in the collection by a specific uniquely identifying field. A map lookup on a uniquely identifying key is speedy and often much faster than depending on iteration and comparing each object with invocation to the equals(Object) method.

With JDK 8, it's easier than ever to construct a Map from an existing List or Set. To help demonstrate this, a simple Book class will be used. This Book class does not override equals(Object) or hashCode() from the Object class and so is not an appropriate class to use in a Set or as a Map key. However, its getIsbn() method returns an International Standard Book Number that is assumed unique for purposes of this demonstration.

Book.java

package dustin.examples.jdk8;

/**
 * Represents a book, but does not override {@code equals(Object)}
 * or {@code hashCode()}.
 */
public class Book
{
   /** International Standard Book Number (ISBN-13). */
   final String isbn;

   /** Title of book. */
   final String title;

   /** Edition of book. */
   final int edition;

   /**
    * Constructor.
    *
    * @param newIsbn International Standard Book Number (-13).
    * @param newTitle Title.
    * @param newEdition Edition.
    */
   public Book(final String newIsbn, final String newTitle, final int newEdition)
   {
      isbn = newIsbn;
      title = newTitle;
      edition = newEdition;
   }

   /**
    * Provide ISBN-13 identifier associated with this book.
    *
    * @return ISBN-13 identifier.
    */
   public String getIsbn()
   {
      return isbn;
   }

   /**
    * Provide title of this book.
    *
    * @return Book's title.
    */
   public String getTitle()
   {
      return title;
   }

   /**
    * Provide edition of this book.
    *
    * @return Book's edition.
    */
   public int getEdition()
   {
      return edition;
   }

   @Override
   public String toString()
   {
      return title + " (Edition " + edition + ") - ISBN-13: " + isbn;
   }
}

With this class in place, the demonstration class CollectionToMapDemo shows how easy it is with JDK 8 to convert various Java collection types (Set, List, and even arrays) to a Map.

CollectionToMapDemo.java

package dustin.examples.jdk8;

import static java.lang.System.out;

import java.util.Arrays;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Set;
import java.util.stream.Collectors;

/**
 * Demonstrates conversion of Java collections to Java Maps.
 */
public class CollectionToMapDemo
{
   /**
    * Multiple instances of Book, a class that lacks a proper
    * equals(Object) method, but for which its getIsbn() method
    * is assumed to return a unique identifier for each instance.
    */
   private static final Book[] books;

   static
   {
      books = new Book[]
         {
            new Book("978-0-201-31005-4", "Effective Java", 1),
            new Book("978-0-321-35668-0", "Effective Java", 2),
            new Book("978-0-13-468599-1", "Effective Java", 3)
         };
   }

   /**
    * Convert provided array of Book instances to Map of each Book's ISBN to
    * that instance of the Book.
    * 
    * @param booksArray Array of Book instances.
    * @return Map of each book's ISBN (key) to the book's full instance (value).
    */
   private static Map<String, Book> convertArrayToMap(final Book[] booksArray)
   {
      return Arrays.stream(booksArray).collect(
         Collectors.toMap(Book::getIsbn, book -> book));
   }

   /**
    * Convert provided List of Book instances to Map of each Book's ISBN to
    * that instance of the Book.
    *
    * @param booksList List of Book instances.
    * @return Map of each book's ISBN (key) to the book's full instance (value).
    */
   private static Map<String, Book> convertListToMap(final List<Book> booksList)
   {
      return booksList.stream().collect(
         Collectors.toMap(Book::getIsbn, book -> book));
   }

   /**
    * Convert provided Set of Book instances to Map of each Book's ISBN to
    * that instance of the Book.
    *
    * @param booksSet Set of Book instances.
    * @return Map of each book's ISBN (key) to the book's full instance (value).
    */
   private static Map<String, Book> convertSetToMap(final Set<Book> booksSet)
   {
      return booksSet.stream().collect(
         Collectors.toMap(Book::getIsbn, book -> book));
   }

   public static void main(final String[] arguments)
   {
      out.println("ARRAY->MAP:\n" + convertArrayToMap(books));

      final List<Book> booksList = Arrays.asList(books);
      out.println("LIST->MAP:\n" + convertListToMap(booksList));

      final Set<Book> booksSet
         = new HashSet<>(Arrays.stream(books).collect(Collectors.toSet()));
      out.println("SET->MAP:\n" + convertSetToMap(booksSet));
   }
}

The most important methods in the class listing just shown are convertArrayToMap(Book[]), convertListToMap(List<Book>), and convertSetToMap(Set<Book>). All three implementations are the same once a stream based on the underlying Set, List, or array has been accessed. In all three cases, it's merely a matter of using one of the stream's collect() method (a reduction operator that is typically preferable over sequential iteration) and passing it an implementation of the Collector interface that is provided via a predefined toMap() Collector from the Collectors class.

The output from running this demonstration class against the instances of Book is shown next:

ARRAY->MAP:
{978-0-201-31005-4=Effective Java (Edition 1) - ISBN-13: 978-0-201-31005-4, 978-0-321-35668-0=Effective Java (Edition 2) - ISBN-13: 978-0-321-35668-0, 978-0-13-468599-1=Effective Java (Edition 3) - ISBN-13: 978-0-13-468599-1}
LIST->MAP:
{978-0-201-31005-4=Effective Java (Edition 1) - ISBN-13: 978-0-201-31005-4, 978-0-321-35668-0=Effective Java (Edition 2) - ISBN-13: 978-0-321-35668-0, 978-0-13-468599-1=Effective Java (Edition 3) - ISBN-13: 978-0-13-468599-1}
SET->MAP:
{978-0-201-31005-4=Effective Java (Edition 1) - ISBN-13: 978-0-201-31005-4, 978-0-321-35668-0=Effective Java (Edition 2) - ISBN-13: 978-0-321-35668-0, 978-0-13-468599-1=Effective Java (Edition 3) - ISBN-13: 978-0-13-468599-1}

I have run into several situations in which it has been advantageous to have multiple objects in a Map of unique identifier to full instance of those objects, but have been given the objects in a Set, List, or array. Although it's never been particularly difficult to convert these Sets, Lists, and arrays to Maps in Java, it's easier than ever in Java 8 to make this conversion.