String Metrics - Jaro-Winkler|Levenshtein|Damerau and 4 more.

Muhammad Zeeshan

Yeah! it's me ;)

String Metrics

Calculate It!

String Similarity and Distance

This tool allows you to calculate distance and similarity between strings based on different algorithms

SOURCE

TARGET

Supported Algorithms

Jaro-Winkler

Levenshtein

Normalized-Levenshtein

Damerau

Optimal-String-Alignment

Longest-Common-Subsequence

Metric-Longest-Common-Subsequence

Registration: No Registration

Logging: No Logging

Algorithms: 6+

Jaro-Winkler

The Jaro-Winkler distance is a string metric measuring similarity between two strings. It was developed by Matthew A. Jaro and William E. Winkler. The Jaro-Winkler distance measures the similarity between two strings by comparing the number of matching characters and the transpositions of characters. It is especially effective for comparing short strings, such as names, and is widely used in record linkage.

Levenshtein

The Levenshtein distance is a string metric for measuring the difference between two sequences. The Levenshtein distance between two strings is defined as the minimum number of single-character edits (insertions, deletions, or substitutions) required to change one string into the other. It is used in fields like computer science, bioinformatics, and linguistics for tasks such as spell checking and DNA sequence analysis.

Normalized-Levenshtein

Normalized Levenshtein distance is a variation of the Levenshtein distance where the result is normalized to a value between 0 and 1. This normalization is achieved by dividing the Levenshtein distance by the maximum length of the two strings. Normalized Levenshtein distance provides a measure of similarity that is independent of the length of the strings being compared.

Damerau

The Damerau-Levenshtein distance is a string metric similar to the Levenshtein distance but also allows transpositions of adjacent characters. In addition to insertions, deletions, and substitutions, it includes the transposition of two adjacent characters as an elementary operation. This makes it particularly useful for spell-checking and DNA analysis.

Optimal-String-Alignment

The Optimal String Alignment distance is a variant of the Damerau-Levenshtein distance that considers transpositions of adjacent characters as one operation. It measures the minimum number of edit operations (insertions, deletions, substitutions, or transpositions) required to transform one string into another. This metric is particularly useful for applications where transpositions are common, such as OCR (optical character recognition) and DNA sequence analysis.

Longest-Common-Subsequence

The Longest Common Subsequence (LCS) algorithm is a dynamic programming approach used to find the longest subsequence common to two sequences. A subsequence is a sequence that appears in the same relative order but not necessarily contiguous. The LCS algorithm finds the longest sequence of elements that are common to both sequences, allowing for gaps between elements.

Metric-Longest-Common-Subsequence

The Metric Longest Common Subsequence algorithm extends the Longest Common Subsequence problem by assigning weights or costs to the operations of inserting, deleting, or substituting characters. By incorporating these costs, the algorithm can find the longest common subsequence that minimizes the total cost of transforming one sequence into another. This variant is particularly useful in applications where different edit operations have different costs or weights.

Note:

String metric is based on

this project

Want to get in touch? Feel free to contact me at

mzeeshanid@yahoo.com

Muhammad Zeeshan