Skip to content

Difflib creates unreasonably large diffs #118150

@Sxderp

Description

@Sxderp

Bug report

Bug description:

#!/usr/bin/python3

import difflib

def get_lines(filename):
    with open(filename, 'r', encoding='utf8') as fd:
        return fd.readlines()

old_new = list(difflib.unified_diff(
    get_lines('external.old'),
    get_lines('external.new'),
))
new_old = list(difflib.unified_diff(
    get_lines('external.new'),
    get_lines('external.old'),
))
print('diff -u external.old external.new.json | wc -l')
print(len(old_new))
print('diff -u external.new external.old.json | wc -l')
print(len(new_old))
$ ./difftest.py
diff -u external.old.json external.new.json | wc -l
30854
diff -u external.new.json external.old.json | wc -l
26
$ diff -u external.old.json external.new.json | wc -l
26
$ diff -u external.new.json external.old.json | wc -l
26

Running on RHEL9 with Python 3.9.18.

I have a JSON file that I'm diffing and I tend to get very large diffs when lines are removed from the file. When lines are added the produced diffs are simple. The JSON file is a list of DNS records.

external.old.json
external.new.json

Here are smaller files that reproduce the effect. In trying to create the smaller files I noticed that if I go too small then the diffs work fine. If you remove the "abcluster.eas.gatech.edu" from both files the diffs work. I could not get the file smaller.

small.external.old.json
small.external.new.json

CPython versions tested on:

3.9

Operating systems tested on:

Linux

Metadata

Metadata

Assignees

No one assigned

    Labels

    type-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      pFad - Phonifier reborn

      Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

      Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


      Alternative Proxies:

      Alternative Proxy

      pFad Proxy

      pFad v3 Proxy

      pFad v4 Proxy