scaffold_kit.utils.string_utils
A set of utilities for string manipulation.
This module provides functions for transliterating unicode characters and creating URL-friendly “slugs” from text.
Demo
To run the module’s demonstration code, use the following command:
$ uv run python -m scaffold_kit.utils.string_utils
DIACRITICS_MAP = {'À': 'A', 'Á': 'A', 'Ã': 'A', 'Ä': 'Ae', 'Å': 'A', 'Ā': 'A', 'Ă': 'A', 'Ą': 'A', 'à': 'a', 'á': 'a', 'ã': 'a', 'ä': 'ae', 'å': 'a', 'ā': 'a', 'ă': 'a', 'ą': 'a', 'Ç': 'C', 'Ć': 'C', 'Ĉ': 'C', 'Č': 'C', 'ç': 'c', 'ć': 'c', 'ĉ': 'c', 'č': 'c', 'Ď': 'D', 'Đ': 'D', 'ď': 'd', 'đ': 'd', 'È': 'E', 'É': 'E', 'Ẽ': 'E', 'Ë': 'E', 'Ĕ': 'E', 'Ē': 'E', 'Ě': 'E', 'Ę': 'E', 'è': 'e', 'é': 'e', 'ẽ': 'e', 'ë': 'e', 'ĕ': 'e', 'ė': 'e', 'ě': 'e', 'ę': 'e', 'Ġ': 'G', 'Ģ': 'G', 'Ĝ': 'G', 'Ğ': 'G', 'ġ': 'g', 'ģ': 'g', 'ĝ': 'g', 'ğ': 'g', 'Ĥ': 'H', 'Ħ': 'H', 'ĥ': 'h', 'ħ': 'h', 'Ì': 'I', 'Í': 'I', 'Î': 'I', 'Ï': 'I', 'Į': 'I', 'Ī': 'I', 'İ': 'I', 'ì': 'i', 'í': 'i', 'î': 'i', 'ï': 'i', 'ī': 'i', 'ĩ': 'i', 'Ĵ': 'J', 'ĵ': 'j', 'Ķ': 'K', 'ķ': 'k', 'Ĺ': 'L', 'Ļ': 'L', 'Ľ': 'L', 'Ŀ': 'L', 'ĺ': 'l', 'ļ': 'l', 'ľ': 'l', 'Ñ': 'N', 'Ņ': 'N', 'Ň': 'N', 'ņ': 'n', 'ň': 'n', 'Ò': 'O', 'Ó': 'O', 'Ô': 'O', 'Õ': 'O', 'Ö': 'Oe', 'Ō': 'O', 'Ŏ': 'O', 'Ő': 'O', 'ò': 'o', 'ó': 'o', 'ô': 'o', 'õ': 'o', 'ö': 'oe', 'ō': 'o', 'ŏ': 'o', 'ő': 'o', 'Ù': 'U', 'Ú': 'U', 'Û': 'U', 'Ü': 'Ue', 'Ū': 'U', 'Ů': 'U', 'Ű': 'U', 'Ų': 'U', 'ù': 'u', 'ú': 'u', 'û': 'u', 'ü': 'ue', 'ū': 'u', 'ů': 'u', 'ű': 'u', 'Ŵ': 'W', 'ŵ': 'w', 'Ý': 'Y', 'Ÿ': 'Y', 'ý': 'y', 'ÿ': 'y', 'Ŷ': 'Y', 'ŷ': 'y', 'Ž': 'Z', 'Ż': 'Z', 'ź': 'z', 'ż': 'z', 'ž': 'z'}
module-attribute
#
Constant signifying diacritics map.
LIGATURES_MAP = {'æ': 'ae', 'Æ': 'Ae', 'œ': 'oe', 'Œ': 'Oe', 'ß': 'ss', 'ff': 'ff', 'fi': 'fi', 'fl': 'fl', 'ffi': 'ffi', 'ffl': 'ffl', 'ſt': 'ft', 'st': 'st', 'ij': 'ij', 'IJ': 'Ij', 'ʒ': 'ezh', 'Ʒ': 'Ez'}
module-attribute
#
Constant signifying ligatures map.
TRANSLITERATE_MAP = {None: DIACRITICS_MAP, None: LIGATURES_MAP}
module-attribute
#
Constant signifying transliterate map (diacritics and ligatures merged).
slugify(text)
#
Converts a given string into an url-safe, ascii-only slug.
This function removes or transliterates diacritics, ligatures, and other non-ascii characters while normalising whitespace and punctuation into hyphens. The result contains only lowercase letters ([a-z]), digits ([0-9]) and hyphens, making it suitable for use in urls, file names or keys.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
The original, possibly unicode string that needs to be slugified. |
required |
Returns:
| Type | Description |
|---|---|
str
|
A hyphen-separated ascii slug derived from |
str
|
the transformation leads to an empty string the returned slug will also |
str
|
be empty (“”). |
Examples:
Basic usage:
Complex input with punctuation and mixed spaces:
Already ascii and clean strings remain the same, except for case:
Empty or symbol-only input results in an empty string:
Source code in src/scaffold_kit/utils/string_utils.py
transliterate(text)
#
Transliterates unicode characters to their closest ascii replacements.
This function replaces diacritics, ligatures, and stylistic variants with base ASCII letters, e.g., ‘ñ’ → ‘n’, ‘æ’ → ‘ae’, ‘ß’ → ‘ss’. All remaining non-ASCII characters are removed by a second decomposing and encoding pass.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Any string containing unicode characters. |
required |
Returns:
| Type | Description |
|---|---|
str
|
A plain ASCII string where every non-ASCII glyph has been converted or |
str
|
dropped, resulting in lossy but url-safe output. |
Examples:
Handling diacritics:
Mixed scripts and special characters:
Ligatures and stylists variants:
Emojis and math get stripped: