Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

How to Improve OCR Accuracy using Image Preprocessing

OCR stands for Optical Character Recognition. It is a process to convert document photos or scene photos into machine-encoded text. There are several tools available to implement OCR in your system, but the most popular and efficient tools are Tesseract OCR and Cloud Vision.

OCR Tool uses AI and Machine Learning as well as a trained custom model. Text Recognition depends on a variety of factors to produce good quality output. OCR output highly depends on the quality of the input image. This is why every OCR engine provides guidelines regarding the quality of the input image and its size. These guidelines help the OCR engine to produce accurate results.

Here, Image Preprocessing comes into play to improve the input image quality so that the OCR engine gives you accurate output. The following image shows processing operations to enhance the quality of your input image.

As visible in the above flow, there are many other steps required to improve accuracy, but we will learn more about preprocessing in this article.

The main objective of the Pre-processing phase is to make it as straightforward as possible for the OCR system to distinguish a character/word from the background.

Before discussing these techniques, let’s understand how an OCR system comprehends an image. For an OCR system, an Image is a multidimensional array (2D array if the image is grayscale (or) binary, 3D array if the image is colored). Each cell in the matrix is called a Pixel, and it can store an 8-bit integer, which means the pixel range is 0–255.

Some of the most basic and essential Preprocessing techniques are:-

  1. Image Scaling
  2. Binarization
  3. De-skew (Skew Correction)

Let’s go through each preprocessing technique mentioned above one-by-one

1. Image Scaling

Image Resizing is essential for image analysis. The OCR engine gives an accurate output of the image, which has 300 DPI. DPI describes the resolution of the image or, in other words, it denotes printed dots per inch.

Image formatting is essential for image analysis. Different images may have other formats, which may affect OCR results. You can get more ideas about image format by studying “System.Drawing.Imaging.PixelFormat” options. Based on personal experience, I would suggest converting the image format to 32 BPP & ARGB format.

Image scaling can be achieved by following the piece of code given below:

using (Bitmap bitmap = (Bitmap)Image.FromFile("file.jpg")) 
        	{ 
                using (Bitmap newBitmap = new Bitmap(bitmap.Width, bitmap.Height, System.Drawing.Imaging.PixelFormat.Format32bppArgb)) 
            	{ 
                    newBitmap.SetResolution(300, 300); 
                    using (System.Drawing.Graphics g = System.Drawing.Graphics.FromImage(newBitmap)) 
                	{ 
                        g.DrawImageUnscaled(bitmap, 0, 0); 
                	} 
                    newBitmap.Save("file300.jpg", ImageFormat.Jpeg); 
            	} 
        	}

2. Binarization

Binarization means converting a colored image into an image that consists of only black and white pixels (Black pixel value=0 and White pixel value=255). This can be done by fixing a threshold (normally threshold=127, exactly half of the pixel range 0–255). If the pixel value is greater than the threshold, it is considered a white pixel or deemed a black pixel. 

The threshold value may differ based on image contrast and brightness. The best practice is to find the minimum and maximum pixel value through the image and then consider the median value as a threshold.

Binarization can be done using the following sample class:

public class Binarization 
	{ 
        IntPtr Iptr = IntPtr.Zero; 
        Rectangle cropArea; 
        System.Drawing.Bitmap OriginalImage; 
        Bitmap source; 
        BitmapData bitmapData = null; 
        public byte[] Pixels { get; set; } 
        public int Depth { get; private set; } 
        public int Width { get; private set; } 
        public int Height { get; private set; } 
 
        public Binarization(string imagePath, Rectangle cropRect) 
    	{ 
            OriginalImage = new Bitmap(imagePath); 
            source = OriginalImage.Clone(cropRect, OriginalImage.PixelFormat); 
            cropArea = cropRect; 
    	} 
 
        /// <summary> 
        /// Lock bitmap data 
        /// </summary> 
        private void LockBits() 
    	{ 
            // Get width and height of bitmap 
        	Width = source.Width; 
        	Height = source.Height; 
 
            // get total locked pixels count 
            int PixelCount = Width * Height; 
 
            // Create rectangle to lock 
            Rectangle rect = new Rectangle(0, 0, Width, Height); 
 
            // get source bitmap pixel format size 
        	Depth = System.Drawing.Bitmap.GetPixelFormatSize(source.PixelFormat); 
 
            // Check if bpp (Bits Per Pixel) is 8, 24, or 32 
            if (Depth != 8 && Depth != 24 && Depth != 32) 
        	{ 
                throw new ArgumentException("Only 8, 24 and 32 bpp images are supported."); 
        	} 
 
            // Lock bitmap and return bitmap data 
            bitmapData = source.LockBits(rect, ImageLockMode.ReadWrite, 
                                         source.PixelFormat); 
 
            // create byte array to copy pixel values 
            int step = Depth / 8; 
        	Pixels = new byte[PixelCount * step]; 
            Iptr = bitmapData.Scan0; 
 
            // Copy data from pointer to array 
            Marshal.Copy(Iptr, Pixels, 0, Pixels.Length); 
    	} 
 
        /// <summary> 
        /// Unlock bitmap data 
        /// </summary> 
        private void UnlockBits() 
    	{ 
            // Copy data from byte array to pointer 
            Marshal.Copy(Pixels, 0, Iptr, Pixels.Length); 
 
            // Unlock bitmap data 
            source.UnlockBits(bitmapData); 
    	} 
        /// <summary> 
        /// Get the color of the specified pixel 
        /// </summary> 
        /// <param name="x"></param> 
        /// <param name="y"></param> 
        /// <returns></returns> 
        private Color GetPixel(int x, int y) 
    	{ 
            Color clr = Color.Empty; 
 
            // Get color components count 
            int cCount = Depth / 8; 
 
            // Get start index of the specified pixel 
            int i = ((y * Width) + x) * cCount; 
 
            if (i > Pixels.Length - cCount) 
                throw new IndexOutOfRangeException(); 
 
            if (Depth == 32) // For 32 bpp get Red, Green, Blue and Alpha 
        	{ 
                byte b = Pixels[i]; 
                byte g = Pixels[i + 1]; 
                byte r = Pixels[i + 2]; 
                byte a = Pixels[i + 3]; // a 
                clr = Color.FromArgb(a, r, g, b); 
        	} 
            if (Depth == 24) // For 24 bpp get Red, Green and Blue 
        	{ 
                byte b = Pixels[i]; 
                byte g = Pixels[i + 1]; 
                byte r = Pixels[i + 2]; 
                clr = Color.FromArgb(r, g, b); 
        	} 
            if (Depth == 8) 
            // For 8 bpp get color value (Red, Green and Blue values are the same) 
        	{ 
                byte c = Pixels[i]; 
                clr = Color.FromArgb(c, c, c); 
        	} 
            return clr; 
    	} 
        /// <summary> 
        /// Set the color of the specified pixel 
        /// </summary> 
        /// <param name="x"></param> 
        /// <param name="y"></param> 
        /// <param name="color"></param> 
        private void SetPixel(int x, int y, Color color) 
    	{ 
            // Get color components count 
            int cCount = Depth / 8; 
 
            // Get start index of the specified pixel 
            int i = ((y * Width) + x) * cCount; 
 
            if (Depth == 32) // For 32 bpp set Red, Green, Blue and Alpha 
        	{ 
            	Pixels[i] = color.B; 
                Pixels[i + 1] = color.G; 
                Pixels[i + 2] = color.R; 
                Pixels[i + 3] = color.A; 
        	} 
            if (Depth == 24) // For 24 bpp set Red, Green and Blue 
        	{ 
            	Pixels[i] = color.B; 
                Pixels[i + 1] = color.G; 
                Pixels[i + 2] = color.R; 
        	} 
            if (Depth == 8) 
            // For 8 bpp set color value (Red, Green and Blue values are the same) 
        	{ 
            	Pixels[i] = color.B; 
        	} 
    	} 
        /// <summary> 
        /// convet color image to grayscale 
        /// this must be called before binarization 
        /// </summary> 
        private void GrayScale() 
    	{ 
            Color p; 
            for (int y = 0; y < source.Height; y++) 
        	{ 
                for (int x = 0; x < source.Width; x++) 
            	{ 
                	p = GetPixel(x, y); 
 
                    //extract pixel component ARGB 
                    int a = p.A; 
                    int r = p.R; 
                    int g = p.G; 
                    int b = p.B; 
 
                    //find average 
                    int avg = (r + g + b) / 3; 
 
                    //set new pixel value 
                    SetPixel(x, y, Color.FromArgb(a, avg, avg, avg)); 
            	} 
        	} 
    	} 
 
        private void Binarize(int thresold) 
    	{ 
            try 
        	{ 
                LockBits(); 
                GrayScale(); 
 
                Color p; 
                byte binary; 
                for (int y = 0; y < source.Height; ++y) 
            	{ 
                    for (int x = 0; x < source.Width; ++x) 
                	{ 
                    	p = GetPixel(x, y); 
                        binary = (byte)(.299 * p.R 
                        	+ .587 * p.G 
                        	+ .114 * p.B); 
                        if (binary < thresold) 
                            SetPixel(x, y, Color.FromArgb(p.A, 0, 0, 0)); 
                        else 
                            SetPixel(x, y, Color.FromArgb(p.A, 255, 255, 255)); 
                	} 
            	} 
        	} 
            catch { } 
            finally 
        	{ 
                UnlockBits(); 
        	} 
    	} 
	}

3. Skew Correction

Skewed images directly impact the line segmentation of the OCR engine, which reduces its accuracy.

While scanning a document, it might be slightly skewed (image aligned at a certain angle with horizontal) sometimes. While extracting the information from the scanned image, detecting & correcting the skew is crucial. 

Skew Correction can be done using the following class:

public class DeskewHelper 
	{ 
        public class HougLine 
    	{ 
            //' Count of points in the line. 
            public int Count; 
            //' Index in Matrix. 
            public int Index; 
            //' The line is represented as all x,y that solve y*cos(alpha)-x*sin(alpha)=d 
            public double Alpha; 
            public double d; 
        } 
 
        // The Bitmap 
        Bitmap cBmp; 
        // The range of angles to search for lines 
        double cAlphaStart = -20; 
        double cAlphaStep = 0.2; 
        int cSteps = 40 * 5; 
        // Precalculation of sin and cos. 
        double[] cSinA; 
        double[] cCosA; 
        // Range of d 
        double cDMin; 
        double cDStep = 1; 
        int cDCount; 
        // Count of points that fit in a line. 
        int[] cHMatrix; 
 
        // calculate the skew angle of the image cBmp 
 
        public double GetSkewAngle() 
    	{ 
            HougLine[] hl; 
            int i; 
            double sum = 0; 
            int count = 0; 
 
            //' Hough Transformation 
            Calc(); 
            //' Top 20 of the detected lines in the image. 
            hl = GetTop(20); 
            //' Average angle of the lines 
            for (i = 0; i < 19; i++) 
        	{ 
                sum += hl[i].Alpha; 
                count += 1; 
        	} 
            return sum / count; 
    	} 
 
        //	' Calculate the Count lines in the image with most points. 
        private HougLine[] GetTop(int Count) 
    	{ 
            HougLine[] hl; 
            int j; 
            HougLine tmp; 
            int AlphaIndex, dIndex; 
            hl = new HougLine[Count]; 
            for (int i = 0; i < Count; i++) 
        	{ 
                hl[i] = new HougLine(); 
        	} 
            for (int i = 0; i < cHMatrix.Length - 1; i++) 
        	{ 
                if (cHMatrix[i] > hl[Count - 1].Count) 
            	{ 
                    hl[Count - 1].Count = cHMatrix[i]; 
                    hl[Count - 1].Index = i; 
                	j = Count - 1; 
                    while (j > 0 && hl[j].Count > hl[j - 1].Count) 
                	{ 
                        tmp = hl[j]; 
                        hl[j] = hl[j - 1]; 
                        hl[j - 1] = tmp; 
                    	j -= 1; 
                	} 
            	} 
        	} 
            for (int i = 0; i < Count; i++) 
        	{ 
                dIndex = hl[i].Index / cSteps; 
                AlphaIndex = hl[i].Index - dIndex * cSteps; 
                hl[i].Alpha = GetAlpha(AlphaIndex); 
                hl[i].d = dIndex + cDMin; 
        	} 
            return hl; 
    	} 
        public DeskewHelper(Bitmap bmp) 
    	{ 
            cBmp = bmp; 
    	} 
 
        //	' Hough Transforamtion: 
        private void Calc() 
    	{ 
            int x; 
            int y; 
            int hMin = cBmp.Height / 4; 
            int hMax = cBmp.Height * 3 / 4; 
            Init(); 
            for (y = hMin; y < hMax; y++) 
            { 
                //for (x = 1; x < cBmp.Width - 2; x++)  
                //original implementation was to go horizontal pixel one by bye  
                // and to improve performance, set increment to +4 
                for (x = 1; x < cBmp.Width - 2; x=x+4) 
            	{ 
                    //' Only lower edges are considered. 
                    if (IsBlack(x, y) == true) 
                	{ 
                        if (IsBlack(x, y + 1) == false) 
                    	{ 
                            Calc(x, y); 
                    	} 
                	} 
            	} 
        	} 
    	} 
 
        //	' Calculate all lines through the point (x,y). 
        private void Calc(int x, int y) 
        { 
            double d; 
            int dIndex; 
            int Index; 
            for (int alpha = 0; alpha < cSteps - 1; alpha++) 
            { 
            	d = y * cCosA[alpha] - x * cSinA[alpha]; 
                dIndex = (int)CalcDIndex(d); 
            	Index = dIndex * cSteps + alpha; 
 
                try 
            	{ 
                    cHMatrix[Index] += 1; 
            	} 
                catch (Exception ex) 
            	{ 
                    Debug.WriteLine(ex.ToString()); 
            	} 
        	} 
    	} 
        private double CalcDIndex(double d) 
    	{ 
            return Convert.ToInt32(d - cDMin); 
    	} 
        private bool IsBlack(int x, int y) 
        { 
            Color c; 
            double luminance; 
        	c = cBmp.GetPixel(x, y); 
            luminance = (c.R * 0.299) + (c.G * 0.587) + (c.B * 0.114); 
            return luminance < 140; 
    	} 
        private void Init() 
    	{ 
            double angle; 
            //' Precalculation of sin and cos. 
            cSinA = new double[cSteps - 1]; 
            cCosA = new double[cSteps - 1]; 
            for (int i = 0; i < cSteps - 1; i++) 
            { 
            	angle = GetAlpha(i) * Math.PI / 180.0; 
            	cSinA[i] = Math.Sin(angle); 
            	cCosA[i] = Math.Cos(angle); 
            } 
            //' Range of d 
            cDMin = -cBmp.Width; 
            cDCount = (int)(2 * (cBmp.Width + cBmp.Height) / cDStep); 
            cHMatrix = new int[cDCount * cSteps]; 
    	} 
        public double GetAlpha(int Index) 
    	{ 
            return cAlphaStart + Index * cAlphaStep; 
    	} 
        public Bitmap RotateImage(double angle) 
    	{ 
            Graphics g; 
            Bitmap tmp = new Bitmap(cBmp.Width, cBmp.Height, PixelFormat.Format32bppRgb); 
            tmp.SetResolution(cBmp.HorizontalResolution, cBmp.VerticalResolution); 
        	g = Graphics.FromImage(tmp); 
            try 
        	{ 
                g.FillRectangle(Brushes.White, 0, 0, cBmp.Width, cBmp.Height); 
                g.RotateTransform((float)angle); 
                g.DrawImage(cBmp, 0, 0); 
        	} 
            finally 
        	{ 
                g.Dispose(); 
        	} 
            return tmp; 
    	} 
	}

In this method, first, we’ll take the binary image, then.

  • Project it horizontally (taking the sum of pixels along rows of the image matrix) to get a histogram of pixels and the height of the image, i.e., count of foreground pixels for every row.
  • Now the image is rotated at various angles (at a small interval of angles called Delta). The difference between the peaks will be calculated (Variance can also be used as one of the metrics). The angle at which the maximum difference between peaks (or Variance) is found, that corresponding angle will be the Skew angle for the image.
  • After finding the Skew angle, we can correct the skewness by rotating the image through an angle equal to the skew angle in the opposite direction of the skew.

There are many other pre-processing techniques you should consider to improve OCR result accuracy. Some known methods are:

  • Noise Removal or Denoise
  • Thinning and Skeletonization
  • Lines or borders Removal

Now you know everything there is to know about improving OCR accuracy using image processing. If there is anything else you wish to know about, get in touch with an expert at DEV IT here.

The post How to Improve OCR Accuracy using Image Preprocessing appeared first on DEV IT Journal.



This post first appeared on DEV IT Journal - Simplifying IT, Empowering Busine, please read the originial post: here

Share the post

How to Improve OCR Accuracy using Image Preprocessing

×

Subscribe to Dev It Journal - Simplifying It, Empowering Busine

Get updates delivered right to your inbox!

Thank you for your subscription

×