Create a Program to Extract Data from Excel Files Using Openpyxl in Python

The openpyxl library is powerful, providing a range of functionality to interact with Excel files, such as reading data, writing data, and even formatting cells. Below, I’ll present more examples that extend the basic idea of data extraction and explain everything step-by-step.

Prerequisite: Installing openpyxl

Before using the openpyxl library, you need to ensure it is installed. You can install it via pip:

pip install openpyxl

Now, let’s look at various examples.

Example 1: Reading a Specific Range of Cells

If you’re working with a large Excel sheet and you want to extract data from a specific range, you can do it easily using openpyxl. Here’s how:

import openpyxl

# Load the workbook and sheet
file_path = 'your_excel_file.xlsx'
workbook = openpyxl.load_workbook(file_path)
sheet = workbook.active

# Define the range of cells (e.g., A1 to C5)
for row in sheet['A1:C5']:
    for cell in row:
        print(cell.value, end=" ")  # Print each cell value
    print()  # Print a new line after each row

Explanation:

Range A1:C5: This defines a specific section of the sheet from column A to C and rows 1 to 5.
cell.value: Extracts the value from each cell in the specified range.
end=” “: Ensures that values in the same row are printed on the same line.

This method is useful when you’re only interested in a portion of your Excel sheet, as it helps reduce the processing time for large files.

Example 2: Reading Data by Columns

Sometimes, you might want to extract data column-wise rather than row-wise. Here’s how you can do it:

import openpyxl

# Load the workbook and sheet
file_path = 'your_excel_file.xlsx'
workbook = openpyxl.load_workbook(file_path)
sheet = workbook.active

# Loop through each column (e.g., A, B, C)
for col in sheet.iter_cols(min_col=1, max_col=3, values_only=True):
    print(col)  # Print the entire column

Explanation:

iter_cols(): This method iterates over columns rather than rows. It’s similar to iter_rows(), but it works column-wise.
min_col and max_col: These arguments define the range of columns to extract. In this example, it extracts columns A (1) to C (3).

Example 3: Reading Data from a Specific Sheet

If your Excel workbook has multiple sheets, you might want to read data from a particular sheet. Here’s how you can specify the sheet:

import openpyxl

# Load the workbook
file_path = 'your_excel_file.xlsx'
workbook = openpyxl.load_workbook(file_path)

# Select a specific sheet by name
sheet = workbook['Sheet2']

# Print all data from Sheet2
for row in sheet.iter_rows(values_only=True):
    print(row)

Explanation:

workbook[‘Sheet2’]: This selects a sheet by its name. Make sure the sheet name exists in your workbook.
values_only=True: This argument ensures that only cell values are returned, without additional metadata like cell formatting.

Example 4: Extracting Data from Merged Cells

In Excel files, some cells might be merged. Here’s how you can handle merged cells while reading data:

import openpyxl

# Load the workbook
file_path = 'your_excel_file.xlsx'
workbook = openpyxl.load_workbook(file_path)
sheet = workbook.active

# Extract values from merged cells
for merged_range in sheet.merged_cells.ranges:
    print(f'Merged Range: {merged_range}')
    first_cell = sheet.cell(row=merged_range.min_row, column=merged_range.min_col)
    print(f'Value of the merged cells: {first_cell.value}')

Explanation:

sheet.merged_cells.ranges: This returns the ranges of merged cells.
first_cell.value: Extracts the value from the first cell of the merged range. Usually, merged cells only contain data in the first cell.

Example 5: Converting Excel Data to a Python Dictionary

A common use case is converting Excel data into a more Pythonic structure like a dictionary for further analysis. Here’s how to do that:

import openpyxl

# Load the workbook and sheet
file_path = 'your_excel_file.xlsx'
workbook = openpyxl.load_workbook(file_path)
sheet = workbook.active

# Convert Excel data to a dictionary
data = {}
for row in sheet.iter_rows(min_row=2, values_only=True):  # Skip header
    key = row[0]  # Assuming the first column is the key
    data[key] = row[1:]  # Rest of the columns as values

print(data)

Explanation:

Dictionary Structure: Here, the first column is used as the key and the rest of the columns as the values.
Skipping the Header: By setting min_row=2, the code skips the header row and starts reading from the second row.

Example 6: Writing Data to a New Excel File

Finally, you can also modify and write data back into an Excel file or even create a new one:

import openpyxl

# Create a new workbook and sheet
new_workbook = openpyxl.Workbook()
new_sheet = new_workbook.active

# Write some data to the new sheet
data = [("Name", "Age", "City"), ("Alice", 30, "New York"), ("Bob", 25, "Los Angeles")]

for row in data:
    new_sheet.append(row)

# Save the workbook
new_workbook.save("new_excel_file.xlsx")

Explanation:

Workbook(): Creates a new workbook.
sheet.append(): Appends rows of data to the sheet.
save(): Saves the new workbook to a file.

Create a Program to Extract Data from Excel Files Using Openpyxl in Python

Read More

The Top Health Trends of 2024: What You Need to Know!

Write a Script to Automate Data Backup from One Folder to Another in Python

Leave a Reply Cancel reply

Recent Posts

Recent Comments

Popular Posts

Emerging Trends in Artificial Intelligence and Machine Learning

15 Most Valuable Aerospace Companies In The World

The Increasing Role of Artificial Intelligence in Healthcare

Explore Topics

Press ESC to close

Create a Program to Extract Data from Excel Files Using Openpyxl in Python

Read More

The Top Health Trends of 2024: What You Need to Know!

Write a Script to Automate Data Backup from One Folder to Another in Python

Leave a Reply Cancel reply

Recent Posts

Recent Comments

Popular Posts

Emerging Trends in Artificial Intelligence and Machine Learning

15 Most Valuable Aerospace Companies In The World

The Increasing Role of Artificial Intelligence in Healthcare

Explore Topics