How to Recaptcha using tesseract ocr in C#

By FoxLearn 11/27/2024 2:34:26 PM   16.35K
To use Tesseract OCR for reCAPTCHA in a C# Windows Forms Application, you can do the following step.

How to detect text using OCR in C#

Open your Visual Studio, then create a new Windows Forms project.

Next, You can drag and drop the PictureBox, TextBox and Button from the Visual Studio toolbox to your winform, then design a simple UI as shown below that allows you to select an image from your directory when you click the Detect button to perform image recognition in c#.

recaptcha c#

You need to set up Tesseract OCR in your C# application. You can use the Tesseract-OCR library available as a NuGet package.

To recaptcha in c# you can use AForge and Tesseract. It's a c# ocr free, you can search and install the AForge and Tesseract libraries from the Nuget Manage Packages in your visual studio.

Create the OCR method allows you to perform image recognition in c# as shown below.

// c# ocr image
private string OCR(Bitmap bmp)
{
    // c# tesseract
    using (TesseractEngine engine = new TesseractEngine(@"tessdata", "eng", EngineMode.Default))
    {
        engine.SetVariable("tessedit_char_whitelist", "1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ");
        engine.SetVariable("tessedit_unrej_any_wd", true);
        using (var page = engine.Process(bmp, PageSegMode.SingleLine))
        {
            return page.GetText();
        }
    }
}

You should download the tesseract-ocr, then unzip and copy the tesseract into your project. And don't forget to set the Copy to Output Directory property of your files in the tesseract folder to "Copy always".

Next, Create the DeCaptcha method to solve captcha code using tesseract c# example. As you know, tesseract is a ocr library c#.

// c# decaptcha
private string DeCaptcha(Image img)
{
    Bitmap bmp = new Bitmap(img);
    bmp = bmp.Clone(new Rectangle(0, 0, img.Width, img.Height), System.Drawing.Imaging.PixelFormat.Format24bppRgb);
    Erosion erosion = new Erosion();
    Dilatation dilatation = new Dilatation();
    Invert inverter = new Invert();
    ColorFiltering cor = new ColorFiltering();
    cor.Blue = new AForge.IntRange(200, 255);
    cor.Red = new AForge.IntRange(200, 255);
    cor.Green = new AForge.IntRange(200, 255);
    Opening open = new Opening();
    BlobsFiltering bc = new BlobsFiltering() { MinHeight = 10 };
    Closing close = new Closing();
    GaussianSharpen gs = new GaussianSharpen();
    ContrastCorrection cc = new ContrastCorrection();
    FiltersSequence seq = new FiltersSequence(gs, inverter, open, inverter, bc, inverter, open, cc, cor, bc, inverter);
    pictureBox1.Image = seq.Apply(bmp);
    return OCR((Bitmap)pictureBox1.Image);
}

And don't forget to include the namespaces as shown below to your form.

using AForge.Imaging.Filters;
using System;
using System.Drawing;
using System.Windows.Forms;
using Tesseract;

To open an image in c#, you need to add code to handle the btnBrowse click event as the following c# code.

private void btnBrowse_Click(object sender, EventArgs e)
{
    using (OpenFileDialog ofd = new OpenFileDialog() { Filter = "JPG|*.jpg|PNG|*.png" })
    {
        if (ofd.ShowDialog() == DialogResult.OK)
            pictureBox1.Image = Image.FromFile(ofd.FileName);
    }
}

Finally, Add code to handle Detect button click event allows you to use ocr engine to solve captcha image recognition.

private void btnDetect_Click(object sender, EventArgs e)
{
    txtOuput.Text = DeCaptcha(pictureBox1.Image);
}

This is a very basic example of how to use tesseract ocr engine to solve decaptcha in c#. Tesseract is a c# open source ocr library free.

Related