New in version 2.1.
Timing: The basic Ratcliff-Obershelp algorithm is cubic time in the worst case and quadratic time in the expected case. SequenceMatcher is quadratic time for the worst case and has expected-case behavior dependent in a complicated way on how many elements the sequences have in common; best case time is linear.
Each line of a Differ delta begins with a two-letter code:
Code | Meaning |
---|---|
'- ' |
line unique to sequence 1 |
'+ ' |
line unique to sequence 2 |
' ' |
line common to both sequences |
'? ' |
line not present in either input sequence |
Lines beginning with `?
' attempt to guide the eye to
intraline differences, and were not present in either input
sequence. These lines can be confusing if the sequences contain tab
characters.
Optional argument n (default 3
) is the maximum number
of close matches to return; n must be greater than 0
.
Optional argument cutoff (default 0.6
) is a float in
the range [0, 1]. Possibilities that don't score at least that
similar to word are ignored.
The best (no more than n) matches among the possibilities are returned in a list, sorted by similarity score, most similar first.
>>> get_close_matches('appel', ['ape', 'apple', 'peach', 'puppy']) ['apple', 'ape'] >>> import keyword >>> get_close_matches('wheel', keyword.kwlist) ['while'] >>> get_close_matches('apple', keyword.kwlist) [] >>> get_close_matches('accept', keyword.kwlist) ['except']
Optional keyword parameters linejunk and charjunk are
for filter functions (or None
):
linejunk: A function that should accept a single string argument, and return true if the string is junk (or false if it is not). The default is module-level function IS_LINE_JUNK(), which filters out lines without visible characters, except for at most one pound character ("#").
charjunk: A function that should accept a string of length 1. The default is module-level function IS_CHARACTER_JUNK(), which filters out whitespace characters (a blank or tab; note: bad idea to include newline in this!).
Tools/scripts/ndiff.py is a command-line front-end to this function.
>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(1), ... 'ore\ntree\nemu\n'.splitlines(1))) >>> print ''.join(diff), - one ? ^ + ore ? ^ - two - three ? - + tree + emu
Given a sequence produced by Differ.compare() or ndiff(), extract lines originating from file 1 or 2 (parameter which), stripping off line prefixes.
Example:
>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(1), ... 'ore\ntree\nemu\n'.splitlines(1)) >>> diff = list(diff) # materialize the generated delta into a list >>> print ''.join(restore(diff, 1)), one two three >>> print ''.join(restore(diff, 2)), ore tree emu
See Also: