How to Read a PDF file in C#
By FoxLearn 11/16/2024 3:28:21 AM 9.13K
If you are working with PDFs in C# and need to extract data, manipulate, or create PDF documents, iTextSharp
is one of the most popular libraries to do so.
In this article, we'll explore how to use iTextSharp PDFReader in C# with a practical example to help you get started quickly.
What is iTextSharp?
iTextSharp is a free and open-source PDF library that allows developers to create, read, and manipulate PDF files in C#. It is a port of the Java-based iText library and supports a wide range of PDF operations, including extracting text, creating PDFs, adding images, and more.
iText is a PDF library that allows you to CREATE, ADAPT, INSPECT and MAINTAIN documents in the Portable Document Format, allowing you to add PDF functionality to your software projects with ease.
One of the most common tasks when working with PDFs is extracting data, such as text or metadata. iTextSharp provides the PdfReader
class to read and extract content from a PDF document.
How to read a pdf file in C#?
Open your Visual Studio, then click New Project, then select Visual C# on the left, then Windows and then select Windows Forms Application. Name your project "PdfReader" and then click OK
Before we dive into the code, you need to install the iTextSharp library. This can be done via NuGet package manager in Visual Studio by right-clicking on your project select Manage NuGet Packages -> Search itextsharp -> Install
Next, Drag and drop RichTextBox, Button controls form your Visual toolbox on to your form designer, then you can design your form as shown below.
Using iTextSharp PDFReader in C#
The PdfReader
class is used to read the content of an existing PDF file. You can access individual pages of the PDF, extract text, and perform other operations such as manipulating annotations or extracting metadata.
Include the necessary namespace at the top of your C# file:
using iTextSharp.text.pdf;
Adding a click event handler to the Open button allows you to read a pdf file, then display data to the RichTextBox control.
// how to read a pdf file in c# private void btnOpen_Click(object sender, EventArgs e) { using (OpenFileDialog ofd = new OpenFileDialog() { Filter = "PDF files|*.pdf", ValidateNames = true, Multiselect = false }) { if (ofd.ShowDialog() == DialogResult.OK) { try { // itextsharp pdfreader example c# iTextSharp.text.pdf.PdfReader reader = new iTextSharp.text.pdf.PdfReader(ofd.FileName); // Create a PdfReader object to read the PDF // StringBuilder to hold the extracted text StringBuilder sb = new StringBuilder(); // Iterate over all pages in the PDF for (int i = 1; i <= reader.NumberOfPages; i++) { // Extract text from each page, then append extracted text to the StringBuilder sb.Append(PdfTextExtractor.GetTextFromPage(reader, i)); } richTextBox.Text = sb.ToString(); reader.Close(); } catch (Exception ex) { MessageBox.Show(ex.Message, "Message", MessageBoxButtons.OK, MessageBoxIcon.Error); } } } }
Use PdfReader
class to open the PDF file, then iterate through the pages and extract the text content. And don't forget to close the PDF reader after you've finished reading the content.
If the PDF contains scanned images or graphics instead of text, you might need an OCR tool to extract text. iTextSharp cannot handle this natively, so you would need to integrate an external OCR library, such as Tesseract.
The iTextSharp PDFReader example in C# demonstrated how to extract text from a PDF document using the PdfReader
and PdfTextExtractor
classes. This is just the beginning of what you can do with iTextSharp whether you need to manipulate PDF files, extract data, or even create PDFs from scratch, iTextSharp offers powerful tools for .NET developers working with PDFs.
By following this example, you should now have a solid understanding of how to get started with iTextSharp in C# and use the PdfReader
class to read and process PDF files.
VIDEO TUTORIAL