The CANTUS Database

Responsory Series: Advent and Lent

Click here to go to the Responsory Series
(will open in a new window)

3. Comparative Methods

3.1 Edit Distance

The default "compare method" is Edit Distance.
"Edit Distance" is the minimum number of insertions, deletions, or substitutions needed to turn one string into another. In this database, records returned with an Edit Distance of zero are exact matches, and those with higher figures are presumably less-closely affiliated. This is a dissimilarity measure.

3.1.i Examples of Edit Distance Comparison

3.2 LCS

"LCS" = Longest Common Sequence; this is a measure of similarity. The items do not need to be contiguous in order to provide results. For the purposes of this database, this method may prove useful when working with fragmented or partly-illegible sources.

3.2.i Examples of LCS Comparison
For the two strings "abcdefg" and "a23cd4e567" the LCS is "acde," or "4".
"ABCDE" compared to "ABE" has an LCS of "abe" (length of 3).
"abcde" versus "ab12ce" has an LCS of "abce" (length of 4).
A note of warning: Because the number returned for LCS is the length of the longest common substring between the two series in question, this number will vary depending on the lengths of the original series. For example, a self-similarity of a series with nine responsories would be eleven (with the beginning and ending indications), while a self-similarity of a series of twelve responsories would be fourteen; the longer series would be much more similar to itself, then, than to the shorter series. Owing to these values, LCS will only provide appropriate results in the similarity matrix and dendrogram functions when chant series of the same original length are compared.

3.3 Matches

The number used in calculations based on "Matches" is the number of chants which are common to each series, regardless of order.

3.3.i Example of Matches Comparison
For the two strings "abcdefg" and "a2b3d4e567c" the "Matches" result would be "5" ("abdec").

3.4 Matches/Pairs

This is the method employed by Hesbert in CAO vol. 5. The calculations are based on: 1) chants which are common to each series (i.e., the number of "matches"), and 2) pairings of chants within the ordering of the chant series.

3.4.i Example of Matches/Pairs Comparison
"Start-A-B-C-D-fine" in one source compared to "Start-B-A-C-D-fine" in another source would result in 4 matches and 2 pairs. Note that the indications of "Start" and "fine" allow for comparison of the ordering of the chants at the beginnings and ends of series.

Last update of this page = 29 July 2009 Contains software or other intellectual property copyright © 2007-2009, Debra Lacoste and Gerard Stafleu.