mcsp-algorithms-0.1.0: Algorithms for Minimum Common String Partition (MCSP) in Haskell.
Safe HaskellSafe-Inferred
LanguageGHC2021

MCSP.Data.String.Extra

Description

Custom operations for String.

Synopsis

Partition operations

type Partition a = [String a] Source #

A collection of substrings of the same string.

chars :: String a -> Partition a Source #

O(n) Split the string in substrings of 1 char each.

>>> chars "abcd"
[a,b,c,d]

Character set analysis

alphabet :: Ord a => String a -> Set a Source #

O(n lg n) The set of all characters in a string.

>>> alphabet "aabacabd"
fromList "abcd"

occurrences :: Ord a => String a -> Map a Int Source #

O(n lg n) The frequency count of each character in a string.

>>> occurrences "aabacabd"
fromList [('a',4),('b',2),('c',1),('d',1)]

singletons :: Ord a => String a -> Set a Source #

O(n lg n) The set of singleton characters in a string.

>>> singletons "aabacabd"
fromList "cd"

repeated :: Ord a => String a -> Set a Source #

O(n lg n) The set of repeated characters in a string.

>>> repeated "aabacabd"
fromList "ab"

hasOneOf :: Ord a => String a -> Set a -> Bool Source #

O(n lg m) Check if at least one of the character of string is present in the given set.

>>> import Data.Set (fromList)
>>> hasOneOf "abca" (fromList "bdf")
True
>>> import Data.Set (fromList)
>>> hasOneOf "xxx" (fromList "bdf")
False

Substring analysis

longestCommonSubstring :: Ord a => String a -> String a -> Maybe (String a) Source #

O(?) Extracts the longest string that is a substring of both strings.

Returns Just the lexicographically largest of the maximal subtrings, or Nothing if strings are disjoint.

>>> longestCommonSubstring "ABABC" "ABCBA"
Just ABC
>>> longestCommonSubstring "13" "1400"
Just 1