Photo by rawpixel.com
The openpyxl library is powerful, providing a range of functionality to interact with Excel files, such as reading data, writing data, and even formatting cells. Below, I’ll present more examples that extend the basic idea of data extraction and explain everything step-by-step.
Prerequisite: Installing openpyxl
Before using the openpyxl library, you need to ensure it is installed. You can install it via pip:
pip install openpyxl
Now, let’s look at various examples.
Example 1: Reading a Specific Range of Cells
If you’re working with a large Excel sheet and you want to extract data from a specific range, you can do it easily using openpyxl. Here’s how:
import openpyxl
# Load the workbook and sheet
file_path = 'your_excel_file.xlsx'
workbook = openpyxl.load_workbook(file_path)
sheet = workbook.active
# Define the range of cells (e.g., A1 to C5)
for row in sheet['A1:C5']:
for cell in row:
print(cell.value, end=" ") # Print each cell value
print() # Print a new line after each row
Explanation:
- Range A1:C5: This defines a specific section of the sheet from column A to C and rows 1 to 5.
- cell.value: Extracts the value from each cell in the specified range.
- end=” “: Ensures that values in the same row are printed on the same line.
This method is useful when you’re only interested in a portion of your Excel sheet, as it helps reduce the processing time for large files.
Example 2: Reading Data by Columns
Sometimes, you might want to extract data column-wise rather than row-wise. Here’s how you can do it:
import openpyxl
# Load the workbook and sheet
file_path = 'your_excel_file.xlsx'
workbook = openpyxl.load_workbook(file_path)
sheet = workbook.active
# Loop through each column (e.g., A, B, C)
for col in sheet.iter_cols(min_col=1, max_col=3, values_only=True):
print(col) # Print the entire column
Explanation:
- iter_cols(): This method iterates over columns rather than rows. It’s similar to iter_rows(), but it works column-wise.
- min_col and max_col: These arguments define the range of columns to extract. In this example, it extracts columns A (1) to C (3).
Example 3: Reading Data from a Specific Sheet
If your Excel workbook has multiple sheets, you might want to read data from a particular sheet. Here’s how you can specify the sheet:
import openpyxl
# Load the workbook
file_path = 'your_excel_file.xlsx'
workbook = openpyxl.load_workbook(file_path)
# Select a specific sheet by name
sheet = workbook['Sheet2']
# Print all data from Sheet2
for row in sheet.iter_rows(values_only=True):
print(row)
Explanation:
- workbook[‘Sheet2’]: This selects a sheet by its name. Make sure the sheet name exists in your workbook.
- values_only=True: This argument ensures that only cell values are returned, without additional metadata like cell formatting.
Example 4: Extracting Data from Merged Cells
In Excel files, some cells might be merged. Here’s how you can handle merged cells while reading data:
import openpyxl
# Load the workbook
file_path = 'your_excel_file.xlsx'
workbook = openpyxl.load_workbook(file_path)
sheet = workbook.active
# Extract values from merged cells
for merged_range in sheet.merged_cells.ranges:
print(f'Merged Range: {merged_range}')
first_cell = sheet.cell(row=merged_range.min_row, column=merged_range.min_col)
print(f'Value of the merged cells: {first_cell.value}')
Explanation:
- sheet.merged_cells.ranges: This returns the ranges of merged cells.
- first_cell.value: Extracts the value from the first cell of the merged range. Usually, merged cells only contain data in the first cell.
You may also Like : Cybersecurity in the AI and Cloud Era!
Example 5: Converting Excel Data to a Python Dictionary
A common use case is converting Excel data into a more Pythonic structure like a dictionary for further analysis. Here’s how to do that:
import openpyxl
# Load the workbook and sheet
file_path = 'your_excel_file.xlsx'
workbook = openpyxl.load_workbook(file_path)
sheet = workbook.active
# Convert Excel data to a dictionary
data = {}
for row in sheet.iter_rows(min_row=2, values_only=True): # Skip header
key = row[0] # Assuming the first column is the key
data[key] = row[1:] # Rest of the columns as values
print(data)
Explanation:
- Dictionary Structure: Here, the first column is used as the key and the rest of the columns as the values.
- Skipping the Header: By setting min_row=2, the code skips the header row and starts reading from the second row.
Example 6: Writing Data to a New Excel File
Finally, you can also modify and write data back into an Excel file or even create a new one:
import openpyxl
# Create a new workbook and sheet
new_workbook = openpyxl.Workbook()
new_sheet = new_workbook.active
# Write some data to the new sheet
data = [("Name", "Age", "City"), ("Alice", 30, "New York"), ("Bob", 25, "Los Angeles")]
for row in data:
new_sheet.append(row)
# Save the workbook
new_workbook.save("new_excel_file.xlsx")
Explanation:
- Workbook(): Creates a new workbook.
- sheet.append(): Appends rows of data to the sheet.
- save(): Saves the new workbook to a file.
Leave a Reply