Press ESC to close

Create a Program to Sort a Large Dataset Using Merge Sort in Python.

Merge Sort is a highly efficient, comparison-based sorting algorithm that follows the divide-and-conquer approach. It works by recursively dividing the dataset into smaller subsets, sorting those subsets, and then merging them back together in sorted order.

Here’s how you can implement Merge Sort in Python:

Step-by-Step Implementation:

def merge_sort(data):
    if len(data) > 1:
        # Divide the dataset into two halves
        mid = len(data) // 2
        left_half = data[:mid]
        right_half = data[mid:]

        # Recursively sort both halves
        merge_sort(left_half)
        merge_sort(right_half)

        # Merging the sorted halves
        i = j = k = 0

        # Copy data to temp arrays left_half[] and right_half[]
        while i < len(left_half) and j < len(right_half):
            if left_half[i] < right_half[j]:
                data[k] = left_half[i]
                i += 1
            else:
                data[k] = right_half[j]
                j += 1
            k += 1

        # Checking if any element was left in left_half
        while i < len(left_half):
            data[k] = left_half[i]
            i += 1
            k += 1

        # Checking if any element was left in right_half
        while j < len(right_half):
            data[k] = right_half[j]
            j += 1
            k += 1

def main():
    # Example of sorting a large dataset
    import random
    data_size = 100000  # Size of the dataset
    dataset = [random.randint(0, 1000000) for _ in range(data_size)]

    print("Dataset before sorting:")
    print(dataset[:100])  # Display first 100 elements for brevity

    # Call merge_sort function
    merge_sort(dataset)

    print("\nDataset after sorting:")
    print(dataset[:100])  # Display first 100 elements for brevity

if __name__ == "__main__":
    main()

How It Works:

  1. Divide: The merge_sort function first divides the dataset into two halves. It continues to do so recursively until each subset contains only one element.
  2. Conquer: Once the dataset is divided, the algorithm starts merging the subsets. It compares the elements of each subset and arranges them in the correct order during the merge process.
  3. Merge: The merge function ensures that the two halves are combined into a single sorted array.

Explanation:

  • Base Case: If the length of the dataset is 1 or less, the function simply returns since a single element (or an empty list) is already sorted.
  • Recursive Case: The dataset is split into two halves, which are then sorted independently before being merged together.
  • Merge Step: This is where the actual sorting happens. It combines the two halves into a sorted list by comparing the elements from both halves and arranging them in order.

Example Output:

Given the large dataset, the output will be too large to display entirely, so the example above only prints the first 100 elements before and after sorting.

Dataset before sorting:
[23745, 850204, 927422, 936525, 164013, 218073, 647028, ...]

Dataset after sorting:
[45, 102, 136, 275, 325, 502, 508, ...]

Handling Large Datasets:

  • Merge Sort is well-suited for sorting large datasets due to its O(n log n) time complexity.
  • Python’s recursion depth limit may need to be increased for very large datasets, which can be done using sys.setrecursionlimit() if necessary.

This implementation should work efficiently even for large datasets, but you can tweak the dataset size and range to suit your specific needs.

Leave a Reply

Your email address will not be published. Required fields are marked *