Software for finding similarities in text files (7)

1 Name: #!/usr/bin/anonymous : 2007-07-21 19:44 ID:y1qJwSM/

I suppose this could be more of a general /tech/ question, but since most of the uses for something like this (mine included) would probably be for searching code, I'll ask here. I'm looking for a Windows program that can compare two text files and pick out the similarities rather than the differences. I've got two source files, both about 5megs in size, that are from entirely different programs, but I believe there are certain chunks of code shared between the two and I'd like to find out what they are. all the comparison software I've found so far is way too difference- oriented, making finding similaries a matter of scrolling through the file and finding everything that isn't highlighted, which with a 5 meg file is more than a bit time consuming.

2 Name: mindkiller!m67YyQEVyE : 2007-07-21 22:27 ID:7cIB1W37

Most OS have comparison abilities built in. On Windows the comparison command is simply "comp" IIRC. So check comp's options, it may have the ability to do a positive comparison.

actually it might have been "compare" not "comp"... try both!

3 Name: #!/usr/bin/anonymous : 2007-07-21 23:35 ID:y1qJwSM/

>>2
yeah, there's both "comp" and "fc" in Windows, but they only seem to report differences.

4 Name: dmpk2k!hinhT6kz2E : 2007-07-22 05:43 ID:Heaven

Give this a spin: http://search.cpan.org/~kim/Text-Same-0.06/bin/psame

If that's insufficient, consider starting your search with things like "Smith-Waterman", "Needleman-Wunsch", "Levenshtein distance", "sequence alignment".

I don't think you'll have much luck with larger files though; it's a hard problem.

5 Name: #!/usr/bin/anonymous : 2007-07-23 06:02 ID:DZBALQEi

Did you try Araxis Merge?
Usually it does the job very well.

A 30 day trial should be available here
<a href="http://www.araxis.com/merge/index.html">http://www.araxis.com/merge/index.html</a>

6 Name: #!/usr/bin/anonymous : 2007-07-23 15:23 ID:Heaven

a perl script to invert the output of diff, lol

7 Name: #!/usr/bin/anonymous : 2007-07-24 09:46 ID:Heaven

something akin to sort | uniq -d might suit you, depending on whether by "similarities" you mean "identical lines" or "lines that are merely similar". your options for general purpose text processing are kinda limited on windows though (unless you're into random vb freeware/shareware or don't mind writing your own code from scratch)

This thread has been closed. You cannot post in this thread any longer.