Press ESC to close

Write a Program to Generate Reports from CSV Files in Python

Photo by MEDIUM

Generating reports from CSV files can be done by reading the CSV file into memory, processing the data, and generating summary statistics or specific reports. In Python, the pandas library is particularly useful for handling CSV files due to its powerful data manipulation capabilities.

Below is an example program to read a CSV file, perform some basic analysis, and generate a report.

Steps:

  1. Reading CSV Files: Use the pandas library to load CSV data.
  2. Analyzing the Data: Perform operations like counting, summing, grouping, etc.
  3. Generating Reports: Create and display or save the results to another file.

Example Program

import pandas as pd

def generate_report_from_csv(filename):
    """
    Reads a CSV file, analyzes the data, and generates a report.
    """
    try:
        # Load the CSV file into a pandas DataFrame
        data = pd.read_csv(filename)

        # Display basic information about the dataset
        print("Basic Info:")
        print(data.info())
        print("\n")

        # Example Analysis 1: Summary statistics
        print("Summary Statistics:")
        summary_stats = data.describe()
        print(summary_stats)
        print("\n")

        # Example Analysis 2: Count of non-null values in each column
        print("Non-null counts per column:")
        non_null_counts = data.count()
        print(non_null_counts)
        print("\n")

        # Example Analysis 3: Grouping and aggregation (if the data has a categorical column)
        if 'Category' in data.columns:
            print("Category-wise Summary:")
            category_summary = data.groupby('Category').sum()
            print(category_summary)
            print("\n")

        # Example Analysis 4: Correlations (if there are numeric columns)
        print("Correlation Matrix:")
        correlation_matrix = data.corr()
        print(correlation_matrix)
        print("\n")

        # Example: Saving the report to a file (optional)
        report_filename = 'report_summary.csv'
        summary_stats.to_csv(report_filename)
        print(f"Summary statistics saved to {report_filename}")

    except FileNotFoundError:
        print(f"Error: The file {filename} does not exist.")
    except pd.errors.EmptyDataError:
        print(f"Error: The file {filename} is empty.")
    except pd.errors.ParserError:
        print(f"Error: The file {filename} could not be parsed.")
    except Exception as e:
        print(f"An unexpected error occurred: {e}")

# Example usage
if __name__ == "__main__":
    # Replace 'example_file.csv' with the path to your actual CSV file
    generate_report_from_csv('example_file.csv')

Explanation:

  • pandas: The core of this program is built around pandas, which is a powerful library for working with tabular data. pandas.read_csv() is used to load the CSV data into a DataFrame.
  • Basic Info: The .info() method provides an overview of the data, including column names, data types, and non-null counts.
  • Summary Statistics: The .describe() method gives descriptive statistics for numeric columns (like count, mean, std, min, max, etc.).
  • Non-null Counts: The .count() method returns the number of non-null entries for each column.
  • Grouping and Aggregation: If the CSV contains a categorical column like ‘Category’, the .groupby() method can be used to generate summary reports by category.
  • Correlation Matrix: The .corr() method calculates the correlation between numeric columns, providing insights into potential relationships between variables.
  • Error Handling: The program handles several common issues, such as missing files, empty files, or parsing errors.

Also Read : Program to Parse and Analyze Large JSON Files

Additional Enhancements:

  1. Customizable Reports: You can modify the logic to create specific reports, such as calculating totals, averages, or other statistics for specific columns.
  2. Exporting Reports: The report generated can be saved to a new CSV, Excel, or text file using pandas methods like .to_csv() or .to_excel().
  3. Filtering: You can add filtering options to generate reports based on specific conditions.

Example Installation:

If you don’t have pandas installed, you can install it via pip:

pip install pandas

This basic framework can be adapted to various CSV report generation needs.

Leave a Reply

Your email address will not be published. Required fields are marked *