Implementing another CV paper

This project has a less interesting story. At SIGGRAPH 2019, I swore that I saw a poster on automatically laying out websites, or something along those lines. I originally wanted to contact the authors, but for the life of me, couldn’t find what I swear I saw in the conference proceedings. Instead, I found another poster from SIGGRAPH Asia.

This paper, “Balance-Based Photo Posting”, described an algorithm to aesthetically arrange images in grids on social network post (the work was coauthored by a researcher at Weibo) Like on Twitter, if I post more than one image, twitter will arrange those images in a grid. By using the process they outlines, those automatically-created grids could look a lot nicer.

The code they described really had two components: scoring the entire image with a color balance formula, and scoring images that are perpendicular based on content similarity. The former was done with a formula called DCM or Deviation from the Center of Mass and was pretty easy to implement with Numpy.

The latter relied on a prebuilt, pre-trained neural network to reduce the content of an image into a vector, then the distance between two adjacent image’s vectors were compared. The authors used VGG-16 trained on ImageNet, but after looking it up, the job of the NN was fairly general-purpose, and a lot of other work was done in the area. I picked another NN developed by Google Research and trained on ImageNet, MobileNet, because it was optimized for devices with low storage and memory, and I figured this might be useful in getting the network to run on-device instead of off on a server. Either way, there was a slight payoff in terms of prediction accuracy, but because we’re dealing with aesthetics, and not human lives or otherwise, we could make do with a little slop in the results. A few lines of code later, and I had the model from TF hub.

For a trivial example, I picked four photos from my last project, cropped them randomly into squares, and used OpenCV to make a grid. The last part of the paper mentioned a genetic algorithm to find the arrangement with the lowest score (determined by another formula) but I got a little lazy and just did 20 or so iterations with just point mutations (swapping two images at random). Crossover might be useful with more complex grids.

I liked the output a lot, so this algorithm might actually work well in real life. Combine it with my previous work to find the best crop of images, and it could be an easy way to create aesthetic, content-aware image collages. A quick search in the App Store shows that there are more than a few popular apps that allow users to make photo collages with different layouts and share them online, but how many of them use machine learning?

The Jupyter notebook that I made can be found on GitHub. GitHub - quin2/autoArrange: Implementation of balance-Based Photo Posting (Song et. al.).