How to convert PDF to Word in C#

By FoxLearn 7/18/2024 7:51:36 AM   5.3K
To convert a PDF to Word using the DocX library in C#, you would typically follow these steps.

How to convert pdf to word in C#

Create a new Windows Forms application project, then you can design a simple layout as shown below allows you to select a pdf file, then convert the pdf file to word in c#.

c# convert pdf to word

PDF to docx converter c#

Next, You need to install DocX from the nuget to your project.

docx c#

This library helps you convert text to docx, you can easily modify your word file without installing Word.

And don't forget to install PDFbox from the nuget to your project. It's a library that helps you read a pdf file, then convert pdf to text in c#.

c# pdfbox

Maybe you will find a lot of pdf-supported libraries, such as itextsharp...etc, but for the scope of this article I only guide how to use the pdfbox library.

Adding a click event handler to the Browser button allows you to select a pdf file.

private void btnBrowser_Click(object sender, EventArgs e)
{
    using (OpenFileDialog ofd = new OpenFileDialog() { Filter = "PDF files|*.pdf" })
    {
        if (ofd.ShowDialog() == DialogResult.OK)
            txtFileName.Text = ofd.FileName;
    }
}

Adding a click event handler to the Convert button allows you to convert pdf to word in c#.

//c# convert pdf to word programmatically
private void btnConvert_Click(object sender, EventArgs e)
{
    if (string.IsNullOrEmpty(txtFileName.Text))
    {
        MessageBox.Show("Please select your pdf file.", "Message", MessageBoxButtons.OK, MessageBoxIcon.Warning);
        return;
    }
    PDDocument doc = PDDocument.load(txtFileName.Text);
    PDFTextStripper stripper = new PDFTextStripper();
    rtfData.Text = (stripper.getText(doc));
    var docName = Path.GetFileNameWithoutExtension(txtFileName.Text) + ".docx";
    var worddoc = DocX.Create(docName);
    worddoc.InsertParagraph(rtfData.Text);
    worddoc.Save();
    Process.Start(docName);
}

After conversion is complete, you can use the Start method of Process class to open the file.

And don't forget to include the namespace below to your form.

//pdf to word converter .net library
using org.apache.pdfbox.pdmodel;
using org.apache.pdfbox.util;
using Xceed.Words.NET;

PDFBox is a powerful Java library for processing PDF files, it's hosted at Apache Software Foundation.

Xceed.Words.NET which is part of the DocX library and it's used to create the Word document.