Quick-and-dirty image segmentation

PUBLISHED ON JAN 15, 2018

A member of our lab does handwritten digit recognition in a highly structured setting. Without going into details, we want to read images of three-digit handwritten codes and recognize which number the image depicts. To avoid having to predict one of a thousand classes, we want to to split the images into three single digits.

I’m sure there are some very fancy ways indeed of doing this, but the following structure makes me think we can start with simpler solutions: (i) There are always three digits. Knowing how many digits we’re looking for is useful information. (ii) One image is one number. There is not much else in the image, so it’s enough to just find two cut points along the x-axis.

My best guess is that the best cut points will be the ones where there is the least amount of ink if you were to somehow collapse the image to fit entirely within the x axis. We do leave gaps between digits when writing, though some digits may be connected by a line due to quirks in handwriting and so on. You can see this in some pretty great data that I made for the purposes of this treatment. I cut out the individual numbers in that photo with the crop tool in the OS X Preview app and put them in a folder.

Segmentation with mixture models in R

R is my preferred statistical computing environment, but I’m sure you can find this type of functionality anywhere. I choose to focus on the number I consider the worst: 586. I can never get those 8s right.

library(magick)   # for image maipulation
## Warning: package 'magick' was built under R version 3.3.2
## Linking to ImageMagick 6.9.6.6
## Enabled features: cairo, fontconfig, freetype, pango, rsvg, webp
## Disabled features: fftw, ghostscript, lcms, x11
library(plyr)     # vectorized operations

img <- image_read("data/digits/586.jpeg")
plot(img)

#convert to greyscale
img <- image_quantize(img, max = 256, colorspace = "gray", dither = NULL, treedepth = NULL)

# get the intensities & look at them
mat <- image_data(img)[1,,]
nrows <- dim(mat)[1]
gsm <- matrix(as.numeric(mat), nrow=nrows)
hist(gsm, main="Histogram of intensities")

As we can see, the intensities roughly form two clusters, lights and darks (higher and lower numbers). Let’s say that intensity 130 roughly separates the lights from the darks.

# thresholding
gsm_tf <- gsm < 130
image(t(apply(gsm_tf, 1, rev)))

OK! Looks decent. I will go over this row by row (ie column by column because the matrix is sideways) and output the column numbers (ie row numbers, corresponds to x-coordinates in the original image) where there is ink (ie a True value after thresholding). This will yields a distribution over ink horizontally in the image.

hst <- alply(gsm_tf, 2, which)
occ <- NULL
for (i in 1:length(hst)) {
  occ <- c(occ, hst[[i]])
}

hist(occ, nclass=20, main="Where is the ink at")

This looks roughly like three clusters roughly corresponding to the locations of the three digits. I will model this as a mixture of three normal distributions. That’s exactly what it sounds like: fit three normal distributions to these data. Hopefully one for each bump, which hopefully correspons to one for each digit.

Mixture models are fit with the EM algorithm, which we can mostly gloss over. But: One important detail is that the method accepts as an argument your best guess as to the size, scale, and location of the three components. I reason as follows: (i) if you split the image into thirds, it’s reasonable to believe that the three numbers each lie in the middle of one of these thirds. These are my guesses for for the means, mu. (ii) Probably there goes as much ink into one digit as the next, so the size, lambda, of each component is probably one third. (iii) I set the initial scale or standard deviation, sigma, to .5 arbitrarily. These are just starting points, you understand; the algorithm iterates toward better values to maximize the likelihood of the data.

library(mixtools) # fits mixture distributions
## Warning: package 'mixtools' was built under R version 3.3.2
## mixtools package, version 1.1.0, Released 2017-03-10
## This package is based upon work supported by the National Science Foundation under Grant No. SES-0518772.
third <- nrows/3   # nrows because the image is tipped on its side
mix <- normalmixEM(occ, lambda = 1/3, mu = c(third/2, third + third/2, 2*third + third/2), sigma = .5)
## number of iterations= 50
plot(mix, whichplots=2)

The plot above shows the mixture distribution, its three components in different colors. This is everything we need: now I can simply choose my first cut as the point where red becomes less likely than green, and the second cut as the point where green becomes less likely than blue. This is the intersection between the respective density curves. You can easily find the analytical solution for this, but I just brute force it with a numerical solver below.

# I want to be sure that the clusters are in order so that I don't suddenly
# compare the wrong components
mu <- mix$mu[order(mix$mu)]
sigma <- mix$sigma[order(mix$mu)]


sepline <- function(mu, sigma) {
  f <- function(x) dnorm(x, m=mu[1], sd=sigma[1]) - dnorm(x, m=mu[2], sd=sigma[2])

  uniroot(f, interval=c(mu[1], mu[2]))$root
}


# for knitr reasons I have to reload the image
img <- image_read("data/digits/586.jpeg")
plot(img)
abline(v=sepline(mu[1:2], sigma[1:2]), lwd=2, col="red")
abline(v=sepline(mu[2:3], sigma[2:3]), lwd=2, col="red")

There you have it! I’m super happy with how well that turned out. It might not be sexy methodology; mixture models date back at least to the late 1800s. But I think it’s good progress for a short afternoon’s work.

Results for all numbers

I’ll show the results for all of my numbers now. I’ve packaged the above in a function, shown at the end of this post, because that’s just good sense.

# don't believe for a second that I didn't copy-paste this from the 
# terminal and format it with the vim blockwise visual mode
filenames <- c(
  "097.jpeg",
  "210.jpeg",
  "350.jpeg",
  "351.jpeg",
  "460.jpeg",
  "463.jpeg",
  "586.jpeg",
  "727.jpeg",
  "768.jpeg",
  "846.jpeg"
)

for (fname in filenames) {
  img <- image_read(paste0("data/digits/", fname))
  plot(img)
  abline(v=cutpoints(img), col="red", lwd=2)
}

## number of iterations= 10

## number of iterations= 8

## number of iterations= 10

## number of iterations= 9

## number of iterations= 9

## number of iterations= 18

## number of iterations= 50

## number of iterations= 16

## number of iterations= 7

## number of iterations= 9

The cutpoint function

cutpoints <- function(img) {
  img <- image_quantize(img, max = 256, colorspace = "gray", dither = NULL, treedepth = NULL)
  
  mat <- image_data(img)[1,,]
  nrows <- dim(mat)[1]
  gsm <- matrix(as.numeric(mat), nrow=nrows)
  gsm_tf <- gsm < 130
  
  hst <- alply(gsm_tf, 2, which)
  occ <- NULL
  for (i in 1:length(hst)) {
    occ <- c(occ, hst[[i]])
  }
  
  third <- nrows/3   # nrows because the image is tipped on its side
  mix <- normalmixEM(occ, lambda = 1/3, mu = c(third/2, third + third/2, 2*third + third/2), sigma = .5)
  
  mu <- mix$mu[order(mix$mu)]
  sigma <- mix$sigma[order(mix$mu)]
  
  
  sepline <- function(mu, sigma) {
    f <- function(x) dnorm(x, m=mu[1], sd=sigma[1]) - dnorm(x, m=mu[2], sd=sigma[2])
  
    uniroot(f, interval=c(mu[1], mu[2]))$root
  }
  
  
  c(sepline(mu[1:2], sigma[1:2]), sepline(mu[2:3], sigma[2:3]))
}