I’ve always been fascinated by photo mosaics. I suppose it goes back to the first time I saw one on a family vacation to Disneyland. They had photo mosaics throughout the park of the Disney characters that were made from the images that the cast members would take of families on vacation.
Fast forward a few years (okay…maybe a decade and a half or thereabouts). I’ve dabbled a bit with photography and had built up a library of images about 40 or 50k deep. The memory of the photo mosaics jumped to my mind. I had experience manipulating images programatically with php and GD, so I decided that what I would do was load up each image, crop/resize it, then compute a value for each color channel in the image (R,G,B) then store that in a database. I read a couple of articles about photo mosaics and it seemed like everyone suggested computing the root-mean-square (RMS) value of each color channel rather than the average, so that’s what I did.
Then, I loaded up a source image, divided it into squares, computed the RMS values for each color channel and tried to match that to an image in my DB. Unfortunately, the results weren’t what I had hoped. I simply had too small of a pool of images to have enough unique images to construct a proper photo mosaic. I archived the code and forgot all about it.
I was up late the other night with some heartburn and was stumbling around the internet looking for something, anything, that was interesting. I’m an Android user and we recently got Instagram, but I haven’t really played with it too much. I’ve taken a few pictures, but not many. I somehow landed on the Instagram developer’s site. I started reading through their docs and inspiration struck….I no longer lacked a sufficiently sized pool of images from which to draw to construct a proper photo mosaic.
I set about toying with their API. I noticed that if I grabbed the /media/popular API endpoint, I could get upwards of 400 profile picture URLs per API request. The profile pictures were small (75 pixels square), so I wouldn’t need to burn CPU cycles or bandwidth downloading and then resizing larger images. I set up a script to scrape the API, being careful to respect any rate limits by building in exponential back offs and setting sufficient manual delays to stay well under the 5k requests per hour to the API. When I discovered a profile pic, I simply output the user id and the picture’s url to a flat file. I didn’t keep track of unique images or anything like that (I wanted to but I was working on a box with 2GB of memory and I have other services running), so what I didn’t pay for in memory I ended up paying for in disk I/O later. I thought about doing something clever with a bitmask using Redis, but laziness ultimately won out.
Next, I wrote a perl script to parse the flat file to grab any image that I didn’t already have on disk. I set my algorithm up to ensure that any single directory didn’t have more than 1000 entries in it.
A third script processed the images on disk, computing the RMS values for each color channel and sticking the results in a MySQL DB.
Now, the fun part. I let that run for a week (obviously in violation of the Instagram API terms of service since I kept the images on disk for longer than 24 hours - I could have worked around this by storing the image url in the DB and purging it from local disk after calculating the RMS values, but I figured it wasn’t a big deal). I collected and processed over three million unique profile images. I loaded up a source image and ran it through my matching algorithm. It spit out a photo mosaic about 22MB large using 5676 profile images. What do you think?
Close up of photo mosaic:
I had a 20”x30” copy printed that now hangs on my wall. A unique self portrait - a combination of photography and programming culminating in art. I think the work is transformative because in printed version (my original intent), each image ends up being 1⁄4” x 1⁄4” in size and it is only together as a whole that the images make something, thus it is my belief that it does not infringe copyright, however I’m not a lawyer and will comply with any takedown notices that I receive.
UPDATE: Adding a color image example