{"id":434,"date":"2024-08-20T15:00:00","date_gmt":"2024-08-20T15:00:00","guid":{"rendered":"https:\/\/fdswebdesign.com\/?p=434"},"modified":"2024-10-15T23:32:28","modified_gmt":"2024-10-15T23:32:28","slug":"regexes-got-good-the-history-and-future-of-regular-expressions-in-javascript","status":"publish","type":"post","link":"https:\/\/fdswebdesign.com\/index.php\/2024\/08\/20\/regexes-got-good-the-history-and-future-of-regular-expressions-in-javascript\/","title":{"rendered":"Regexes Got Good: The History And Future Of Regular Expressions In JavaScript"},"content":{"rendered":"

Regexes Got Good: The History And Future Of Regular Expressions In JavaScript<\/title><\/p>\n<article>\n<header>\n<h1>Regexes Got Good: The History And Future Of Regular Expressions In JavaScript<\/h1>\n<address>Steven Levithan<\/address>\n<p> 2024-08-20T15:00:00+00:00<br \/>\n 2024-10-15T23:05:45+00:00<br \/>\n <\/header>\n<p>Modern JavaScript regular expressions have come a long way compared to what you might be familiar with. Regexes can be <strong>an amazing tool for searching and replacing text<\/strong>, but they have a longstanding reputation (perhaps outdated, as I\u2019ll show) for being difficult to write and understand.<\/p>\n<p>This is especially true in JavaScript-land, where regexes languished for many years, comparatively underpowered compared to their more modern counterparts in PCRE, Perl, .NET, Java, Ruby, C++, and Python. Those days are over.<\/p>\n<p>In this article, I\u2019ll recount the history of improvements to JavaScript regexes (spoiler: ES2018 and ES2024 changed the game), show examples of modern regex features in action, introduce you to a lightweight <a href=\"https:\/\/github.com\/slevithan\/regex\">JavaScript library<\/a> that makes JavaScript stand alongside or surpass other modern regex flavors, and end with a preview of active proposals that will continue to improve regexes in future versions of JavaScript (with some of them already working in your browser today).<\/p>\n<h2 id=\"the-history-of-regular-expressions-in-javascript\">The History of Regular Expressions in JavaScript<\/h2>\n<p>ECMAScript 3, standardized in 1999, introduced Perl-inspired regular expressions to the JavaScript language. Although it got enough things right to make regexes pretty useful (and mostly compatible with other Perl-inspired flavors), there were some big omissions, even then. And while JavaScript waited 10 years for its next standardized version with ES5, other programming languages and regex implementations added useful new features that made their regexes more powerful and readable.<\/p>\n<p>But that was then.<\/p>\n<blockquote><p>Did you know that nearly every new version of JavaScript has made at least minor improvements to regular expressions?<\/p><\/blockquote>\n<p>Let\u2019s take a look at them.<\/p>\n<p>Don\u2019t worry if it\u2019s hard to understand what some of the following features mean — we\u2019ll look more closely at several of the key features afterward.<\/p>\n<ul>\n<li>ES5 (2009) fixed unintuitive behavior by creating a new object every time regex literals are evaluated and allowed regex literals to use unescaped forward slashes within character classes (<code>\/[\/]\/<\/code>).<\/li>\n<li>ES6\/ES2015 added two new regex flags: <code>y<\/code> (<code>sticky<\/code>), which made it easier to use regexes in parsers, and <code>u<\/code> (<code>unicode<\/code>), which added several significant Unicode-related improvements along with strict errors. It also added the <code>RegExp.prototype.flags<\/code> getter, support for subclassing <code>RegExp<\/code>, and the ability to copy a regex while changing its flags.<\/li>\n<li>ES2018 was the edition that finally made JavaScript regexes pretty good. It added the <code>s<\/code> (<code>dotAll<\/code>) flag, lookbehind, named capture, and Unicode properties (via <code>p{...}<\/code> and <code>P{...}<\/code>, which require ES6\u2019s flag <code>u<\/code>). All of these are extremely useful features, as we\u2019ll see.<\/li>\n<li>ES2020 added the string method <code>matchAll<\/code>, which we\u2019ll also see more of shortly.<\/li>\n<li>ES2022 added flag <code>d<\/code> (<code>hasIndices<\/code>), which provides start and end indices for matched substrings.<\/li>\n<li>And finally, ES2024 added flag <code>v<\/code> (<code>unicodeSets<\/code>) as an upgrade to ES6\u2019s flag <code>u<\/code>. The <code>v<\/code> flag adds a set of multicharacter \u201cproperties of strings\u201d to <code>p{...}<\/code>, multicharacter elements within character classes via <code>p{...}<\/code> and <code>q{...}<\/code>, nested character classes, set subtraction <code>[A--B]<\/code> and intersection <code>[A&&B]<\/code>, and different escaping rules within character classes. It also fixed case-insensitive matching for Unicode properties within negated sets <code>[^...]<\/code>.<\/li>\n<\/ul>\n<p class=\"c-pre-sidenote--left\">As for whether you can safely use these features in your code today, the answer is yes! The latest of these features, flag <code>v<\/code>, is supported in Node.js 20 and <a href=\"https:\/\/caniuse.com\/mdn-javascript_builtins_regexp_unicodesets\">2023-era<\/a> browsers. The rest are supported in 2021-era browsers or earlier.<\/p>\n<p class=\"c-sidenote c-sidenote--right\">Each edition from ES2019 to ES2023 also added additional Unicode properties that can be used via <code>p{...}<\/code> and <code>P{...}<\/code>. And to be a completionist, ES2021 added string method <code>replaceAll<\/code> — although, when given a regex, the only difference from ES3\u2019s <code>replace<\/code> is that it throws if not using flag <code>g<\/code>.<\/p>\n<h3 id=\"aside-what-makes-a-regex-flavor-good\">Aside: What Makes a Regex Flavor Good?<\/h3>\n<p>With all of these changes, how do JavaScript regular expressions now stack up against other flavors? There are multiple ways to think about this, but here are a few key aspects:<\/p>\n<ul>\n<li><strong>Performance.<\/strong><br \/>\nThis is an important aspect but probably not the main one since mature regex implementations are generally pretty fast. JavaScript is strong on regex performance (at least considering V8\u2019s Irregexp engine, used by Node.js, Chromium-based browsers, and <a href=\"https:\/\/hacks.mozilla.org\/2020\/06\/a-new-regexp-engine-in-spidermonkey\/\">even Firefox<\/a>; and JavaScriptCore, used by Safari), but it uses a backtracking engine that is missing any syntax for backtracking control — a major limitation that makes ReDoS vulnerability more common.<\/li>\n<li><strong>Support for advanced features<\/strong> that handle common or important use cases.<br \/>\nHere, JavaScript stepped up its game with ES2018 and ES2024. JavaScript is now best in class for some features like lookbehind (with its infinite-length support) and Unicode properties (with multicharacter \u201cproperties of strings,\u201d set subtraction and intersection, and script extensions). These features are either not supported or not as robust in many other flavors.<\/li>\n<li><strong>Ability to write readable and maintainable patterns.<\/strong><br \/>\nHere, native JavaScript has long been the worst of the major flavors since it lacks the <code>x<\/code> (\u201cextended\u201d) flag that allows insignificant whitespace and comments. Additionally, it lacks regex subroutines and subroutine definition groups (from PCRE and Perl), a powerful set of features that enable writing grammatical regexes that build up complex patterns via composition.<\/li>\n<\/ul>\n<p>So, it\u2019s a bit of a mixed bag.<\/p>\n<blockquote class=\"pull-quote\">\n<p>\n <a class=\"pull-quote__link\" aria-label=\"Share on Twitter\" href=\"https:\/\/twitter.com\/share?text=%0aJavaScript%20regexes%20have%20become%20exceptionally%20powerful,%20but%20they%e2%80%99re%20still%20missing%20key%20features%20that%20could%20make%20regexes%20safer,%20more%20readable,%20and%20more%20maintainable%20%28all%20of%20which%20hold%20some%20people%20back%20from%20using%20this%20power%29.%0a&url=https:\/\/smashingmagazine.com%2f2024%2f08%2fhistory-future-regular-expressions-javascript%2f\"><\/p>\n<p>JavaScript regexes have become exceptionally powerful, but they\u2019re still missing key features that could make regexes safer, more readable, and more maintainable (all of which hold some people back from using this power).<\/p>\n<p> <\/a>\n <\/p>\n<div class=\"pull-quote__quotation\">\n<div class=\"pull-quote__bg\">\n <span class=\"pull-quote__symbol\">\u201c<\/span><\/div>\n<\/p><\/div>\n<\/blockquote>\n<p>The good news is that all of these holes can be filled by a JavaScript library, which we\u2019ll see later in this article.<\/p>\n<div data-audience=\"non-subscriber\" data-remove=\"true\" class=\"feature-panel-container\">\n<aside class=\"feature-panel\">\n<div class=\"feature-panel-left-col\">\n<div class=\"feature-panel-description\">\n<p>Roll up your sleeves and <strong>boost your UX skills<\/strong>! Meet <strong><a data-instant href=\"https:\/\/smart-interface-design-patterns.com\/\">Smart Interface Design Patterns<\/a><\/strong> \ud83c\udf63, a 10h video library by Vitaly Friedman. <strong>100s of real-life examples<\/strong> and live UX training. <a href=\"https:\/\/www.youtube.com\/watch?v=3mwZztmGgbE\">Free preview<\/a>.<\/p>\n<p><a data-instant href=\"https:\/\/smart-interface-design-patterns.com\/\" class=\"btn btn--green btn--large\">Jump to table of contents \u21ac<\/a><\/div>\n<\/div>\n<div class=\"feature-panel-right-col\"><a data-instant href=\"https:\/\/smart-interface-design-patterns.com\/\" class=\"feature-panel-image-link\"><\/p>\n<div class=\"feature-panel-image\">\n<img decoding=\"async\" loading=\"lazy\" class=\"feature-panel-image-img lazyload\" src=\"data:image\/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==\" alt=\"Feature Panel\" width=\"690\" height=\"790\" data-src=\"https:\/\/archive.smashing.media\/assets\/344dbf88-fdf9-42bb-adb4-46f01eedd629\/8c98e7f9-8e62-4c43-b833-fc6bf9fea0a9\/video-course-smart-interface-design-patterns-vitaly-friedman.jpg\"><\/p>\n<\/div>\n<p><\/a>\n<\/div>\n<\/aside>\n<\/div>\n<h2 id=\"using-javascript-s-modern-regex-features\">Using JavaScript\u2019s Modern Regex Features<\/h2>\n<p>Let\u2019s look at a few of the more useful modern regex features that you might be less familiar with. You should know in advance that this is <strong>a moderately advanced guide<\/strong>. If you\u2019re relatively new to regex, here are some excellent tutorials you might want to start with:<\/p>\n<ul>\n<li><a href=\"https:\/\/regexlearn.com\/\">RegexLearn<\/a> and <a href=\"https:\/\/regexone.com\/\">RegexOne<\/a> are interactive tutorials that include practice problems.<\/li>\n<li>JavaScript.info\u2019s <a href=\"https:\/\/javascript.info\/regular-expressions\">regular expressions<\/a> chapter is a detailed and JavaScript-specific guide.<\/li>\n<li><a href=\"https:\/\/www.youtube.com\/watch?v=M7vDtxaD7ZU\">Demystifying Regular Expressions<\/a> (video) is an excellent presentation for beginners by Lea Verou at HolyJS 2017.<\/li>\n<li><a href=\"https:\/\/www.youtube.com\/watch?v=rhzKDrUiJVk\">Learn Regular Expressions In 20 Minutes<\/a> (video) is a live syntax walkthrough in a regex tester.<\/li>\n<\/ul>\n<h3 id=\"named-capture\">Named Capture<\/h3>\n<p>Often, you want to do more than just check whether a regex matches — you want to extract substrings from the match and do something with them in your code. Named capturing groups allow you to do this in a way that makes your regexes and code <strong>more readable<\/strong> and <strong>self-documenting<\/strong>.<\/p>\n<p>The following example matches a record with two date fields and captures the values:<\/p>\n<div class=\"break-out\">\n<pre><code class=\"language-javascript\">const record = 'Admitted: 2024-01-01nReleased: 2024-01-03';\nconst re = \/^Admitted: (?<admitted>d{4}-d{2}-d{2})nReleased: (?<released>d{4}-d{2}-d{2})$\/;\nconst match = record.match(re);\nconsole.log(match.groups);\n\/* \u2192 {\n admitted: '2024-01-01',\n released: '2024-01-03'\n} *\/\n<\/code><\/pre>\n<\/div>\n<p>Don\u2019t worry — although this regex might be challenging to understand, later, we\u2019ll look at a way to make it much more readable. The key things here are that named capturing groups use the syntax <code>(?<name>...)<\/code>, and their results are stored on the <code>groups<\/code> object of matches.<\/p>\n<p>You can also use named backreferences to rematch whatever a named capturing group matched via <code>k<name><\/code>, and you can use the values within search and replace as follows:<\/p>\n<pre><code class=\"language-javascript\">\/\/ Change 'FirstName LastName' to 'LastName, FirstName'\nconst name = 'Shaquille Oatmeal';\nname.replace(\/(?<first>w+) (?<last>w+)\/, '$<last>, $<first>');\n\/\/ \u2192 'Oatmeal, Shaquille'\n<\/code><\/pre>\n<p>For advanced regexers who want to use named backreferences within a replacement callback function, the <code>groups<\/code> object is provided as the last argument. Here\u2019s a fancy example:<\/p>\n<pre><code class=\"language-javascript\">function fahrenheitToCelsius(str) {\n const re = \/(?<degrees>-?d+(.d+)?)Fb\/g;\n return str.replace(re, (...args) => {\n const groups = args.at(-1);\n return Math.round((groups.degrees - 32) * 5\/9) + 'C';\n });\n}\nfahrenheitToCelsius('98.6F');\n\/\/ \u2192 '37C'\nfahrenheitToCelsius('May 9 high is 40F and low is 21F');\n\/\/ \u2192 'May 9 high is 4C and low is -6C'\n<\/code><\/pre>\n<h3 id=\"lookbehind\">Lookbehind<\/h3>\n<p>Lookbehind (introduced in ES2018) is the complement to <em>lookahead<\/em>, which has always been supported by JavaScript regexes. Lookahead and lookbehind are <em>assertions<\/em> (similar to <code>^<\/code> for the start of a string or <code>b<\/code> for word boundaries) that don\u2019t consume any characters as part of the match. Lookbehinds succeed or fail based on whether their subpattern can be found immediately before the current match position.<\/p>\n<p>For example, the following regex uses a lookbehind <code>(?<=...)<\/code> to match the word \u201ccat\u201d (<em>only<\/em> the word \u201ccat\u201d) if it\u2019s preceded by \u201cfat \u201d:<\/p>\n<pre><code class=\"language-javascript\">const re = \/(?<=fat )cat\/g;\n'cat, fat cat, brat cat'.replace(re, 'pigeon');\n\/\/ \u2192 'cat, fat pigeon, brat cat'\n<\/code><\/pre>\n<p>You can also use <em>negative<\/em> lookbehind — written as <code>(?<!...)<\/code> — to invert the assertion. That would make the regex match any instance of \u201ccat\u201d that\u2019s <em>not<\/em> preceded by \u201cfat \u201d.<\/p>\n<pre><code class=\"language-javascript\">const re = \/(?<!fat )cat\/g;\n'cat, fat cat, brat cat'.replace(re, 'pigeon');\n\/\/ \u2192 'pigeon, fat cat, brat pigeon'\n<\/code><\/pre>\n<p>JavaScript\u2019s implementation of lookbehind is one of the very best (matched only by .NET). Whereas other regex flavors have inconsistent and complex rules for when and whether they allow variable-length patterns inside lookbehind, JavaScript allows you to look behind for any subpattern.<\/p>\n<h3 id=\"the-matchall-method\">The <code>matchAll<\/code> Method<\/h3>\n<p>JavaScript\u2019s <code>String.prototype.matchAll<\/code> was added in ES2020 and makes it easier to operate on regex matches in a loop when you need extended match details. Although other solutions were possible before, <code>matchAll<\/code> is often easier, and it avoids gotchas, such as the need to guard against infinite loops when looping over the results of regexes that might return zero-length matches.<\/p>\n<p>Since <code>matchAll<\/code> returns an iterator (rather than an array), it\u2019s easy to use it in a <code>for...of<\/code> loop.<\/p>\n<div class=\"break-out\">\n<pre><code class=\"language-javascript\">const re = \/(?<char1>w)(?<char2>w)\/g;\nfor (const match of str.matchAll(re)) {\n const {char1, char2} = match.groups;\n \/\/ Print each complete match and matched subpatterns\n console.log(`Matched \"${match[0]}\" with \"${char1}\" and \"${char2}\"`);\n}\n<\/code><\/pre>\n<\/div>\n<p><strong>Note<\/strong>: <em><code>matchAll<\/code> requires its regexes to use flag <code>g<\/code> (<code>global<\/code>). Also, as with other iterators, you can get all of its results as an array using <code>Array.from<\/code> or array spreading.<\/em><\/p>\n<pre><code class=\"language-javascript\">const matches = [...str.matchAll(\/.\/g)];\n<\/code><\/pre>\n<h3 id=\"unicode-properties\">Unicode Properties<\/h3>\n<p>Unicode properties (added in ES2018) give you powerful control over multilingual text, using the syntax <code>p{...}<\/code> and its negated version <code>P{...}<\/code>. There are hundreds of different properties you can match, which cover a wide variety of Unicode categories, scripts, script extensions, and binary properties.<\/p>\n<p><strong>Note<\/strong>: <em>For more details, check out the <a href=\"https:\/\/developer.mozilla.org\/en-US\/docs\/Web\/JavaScript\/Reference\/Regular_expressions\/Unicode_character_class_escape\">documentation on MDN<\/a>.<\/em><\/p>\n<p>Unicode properties require using the flag <code>u<\/code> (<code>unicode<\/code>) or <code>v<\/code> (<code>unicodeSets<\/code>).<\/p>\n<h3 id=\"flag-v\">Flag <code>v<\/code><\/h3>\n<p>Flag <code>v<\/code> (<code>unicodeSets<\/code>) was added in ES2024 and is an upgrade to flag <code>u<\/code> — you can\u2019t use both at the same time. It\u2019s a best practice to always use one of these flags to avoid silently introducing bugs via the default Unicode-unaware mode. The decision on which to use is fairly straightforward. If you\u2019re okay with only supporting environments with flag <code>v<\/code> (Node.js 20 and 2023-era browsers), then use flag <code>v<\/code>; otherwise, use flag <code>u<\/code>.<\/p>\n<p>Flag <code>v<\/code> adds support for several new regex features, with the coolest probably being set subtraction and intersection. This allows using <code>A--B<\/code> (within character classes) to match strings in <em>A<\/em> but not in <em>B<\/em> or using <code>A&&B<\/code> to match strings in both <em>A<\/em> and <em>B<\/em>. For example:<\/p>\n<pre><code class=\"language-javascript\">\/\/ Matches all Greek symbols except the letter '\u03c0'\n\/[p{Script_Extensions=Greek}--\u03c0]\/v\n\n\/\/ Matches only Greek letters\n\/[p{Script_Extensions=Greek}&&p{Letter}]\/v\n<\/code><\/pre>\n<p>For more details about flag <code>v<\/code>, including its other new features, check out this <a href=\"https:\/\/v8.dev\/features\/regexp-v-flag\">explainer<\/a> from the Google Chrome team.<\/p>\n<h4 id=\"a-word-on-matching-emoji\">A Word on Matching Emoji<\/h4>\n<p>Emoji are \ud83e\udd29\ud83d\udd25\ud83d\ude0e\ud83d\udc4c, but how emoji get encoded in text is complicated. If you\u2019re trying to match them with a regex, it\u2019s important to be aware that <strong>a single emoji can be composed of one or many individual Unicode code points<\/strong>. Many people (and libraries!) who roll their own emoji regexes miss this point (or implement it poorly) and end up with bugs.<\/p>\n<p>The following details for the emoji \u201c\ud83d\udc69\ud83c\udffb\u200d\ud83c\udfeb\u201d (<em>Woman Teacher: Light Skin Tone<\/em>) show just how complicated emoji can be:<\/p>\n<div class=\"break-out\">\n<pre><code class=\"language-javascript\">\/\/ Code unit length\n'\ud83d\udc69\ud83c\udffb\u200d\ud83c\udfeb'.length;\n\/\/ \u2192 7\n\/\/ Each astral code point (above uFFFF) is divided into high and low surrogates\n\n\/\/ Code point length\n[...'\ud83d\udc69\ud83c\udffb\u200d\ud83c\udfeb'].length;\n\/\/ \u2192 4\n\/\/ These four code points are: u{1F469} u{1F3FB} u{200D} u{1F3EB}\n\/\/ u{1F469} combined with u{1F3FB} is '\ud83d\udc69\ud83c\udffb'\n\/\/ u{200D} is a Zero-Width Joiner\n\/\/ u{1F3EB} is '\ud83c\udfeb'\n\n\/\/ Grapheme cluster length (user-perceived characters)\n[...new Intl.Segmenter().segment('\ud83d\udc69\ud83c\udffb\u200d\ud83c\udfeb')].length;\n\/\/ \u2192 1\n<\/code><\/pre>\n<\/div>\n<p>Fortunately, JavaScript added an easy way to match any individual, complete emoji via <code>p{RGI_Emoji}<\/code>. Since this is a fancy \u201cproperty of strings\u201d that can match more than one code point at a time, it requires ES2024\u2019s flag <code>v<\/code>.<\/p>\n<p>If you want to match emojis in environments without <code>v<\/code> support, check out the excellent libraries <a href=\"https:\/\/github.com\/mathiasbynens\/emoji-regex\">emoji-regex<\/a> and <a href=\"https:\/\/github.com\/slevithan\/emoji-regex-xs\">emoji-regex-xs<\/a>.<\/p>\n<div class=\"partners__lead-place\"><\/div>\n<h2 id=\"making-your-regexes-more-readable-maintainable-and-resilient\">Making Your Regexes More Readable, Maintainable, and Resilient<\/h2>\n<p>Despite the improvements to regex features over the years, native JavaScript regexes of sufficient complexity can still be outrageously hard to read and maintain.<\/p>\n<\/p>\n<blockquote class=\"twitter-tweet\">\n<p lang=\"en\" dir=\"ltr\">Regular Expressions are SO EASY!!!! <a href=\"https:\/\/t.co\/q4GSpbJRbZ\">pic.twitter.com\/q4GSpbJRbZ<\/a><\/p>\n<p>— Garabato Kid (@garabatokid) <a href=\"https:\/\/twitter.com\/garabatokid\/status\/1147063121678389253?ref_src=twsrc%5Etfw\">July 5, 2019<\/a><\/p><\/blockquote>\n<p>\nES2018\u2019s named capture was a great addition that made regexes more self-documenting, and ES6\u2019s <code>String.raw<\/code> tag allows you to avoid escaping all your backslashes when using the <code>RegExp<\/code> constructor. But for the most part, that\u2019s it in terms of readability.<\/p>\n<p class=\"c-pre-sidenote--left\">However, there\u2019s a lightweight and high-performance <a href=\"https:\/\/github.com\/slevithan\/regex\">JavaScript library<\/a> named <code>regex<\/code> (by yours truly) that makes regexes dramatically more readable. It does this by adding key missing features from Perl-Compatible Regular Expressions (PCRE) and outputting native JavaScript regexes. You can also use it as a Babel plugin, which means that <code>regex<\/code> calls are transpiled at build time, so you get a better developer experience without users paying any runtime cost.<\/p>\n<p class=\"c-sidenote c-sidenote--right\"><a href=\"https:\/\/github.com\/PCRE2Project\/pcre2\">PCRE<\/a> is a popular C library used by PHP for its regex support, and it\u2019s available in countless other programming languages and tools.<\/p>\n<p>Let\u2019s briefly look at some of the ways the <code>regex<\/code> library, which provides a template tag named <code>regex<\/code>, can help you write complex regexes that are actually understandable and maintainable by mortals. Note that all of the new syntax described below works identically in PCRE.<\/p>\n<h3 id=\"insignificant-whitespace-and-comments\">Insignificant Whitespace and Comments<\/h3>\n<p>By default, <code>regex<\/code> allows you to freely add whitespace and line comments (starting with <code>#<\/code>) to your regexes for readability.<\/p>\n<pre><code class=\"language-javascript\">import {regex} from 'regex';\nconst date = regex`\n # Match a date in YYYY-MM-DD format\n (?<year> d{4}) - # Year part\n (?<month> d{2}) - # Month part\n (?<day> d{2}) # Day part\n`;\n<\/code><\/pre>\n<p>This is equivalent to using PCRE\u2019s <code>xx<\/code> flag.<\/p>\n<h3 id=\"subroutines-and-subroutine-definition-groups\">Subroutines and Subroutine Definition Groups<\/h3>\n<p>Subroutines are written as <code>g<name><\/code> (where <em>name<\/em> refers to a named group), and they treat the referenced group as an independent subpattern that they try to match at the current position. This enables subpattern composition and reuse, which improves readability and maintainability.<\/p>\n<p>For example, the following regex matches an IPv4 address such as \u201c192.168.12.123\u201d:<\/p>\n<pre><code class=\"language-javascript\">import {regex} from 'regex';\nconst ipv4 = regex`b\n (?<byte> 25[0-5] | 2[0-4]d | 1dd | [1-9]?d)\n # Match the remaining 3 dot-separated bytes\n (. g<byte>){3}\nb`;\n<\/code><\/pre>\n<p>You can take this even further by defining subpatterns for use by reference only via subroutine definition groups. Here\u2019s an example that improves the regex for admittance records that we saw earlier in this article:<\/p>\n<pre><code class=\"language-javascript\">const record = 'Admitted: 2024-01-01nReleased: 2024-01-03';\nconst re = regex`\n ^ Admitted: (?<admitted> g<date>) n\n Released: (?<released> g<date>) $\n\n (?(DEFINE)\n (?<date> g<year>-g<month>-g<day>)\n (?<year> d{4})\n (?<month> d{2})\n (?<day> d{2})\n )\n`;\nconst match = record.match(re);\nconsole.log(match.groups);\n\/* \u2192 {\n admitted: '2024-01-01',\n released: '2024-01-03'\n} *\/\n<\/code><\/pre>\n<h3 id=\"a-modern-regex-baseline\">A Modern Regex Baseline<\/h3>\n<p><code>regex<\/code> includes the <code>v<\/code> flag by default, so you never forget to turn it on. And in environments without native <code>v<\/code>, it automatically switches to flag <code>u<\/code> while applying <code>v<\/code>\u2019s escaping rules, so your regexes are forward and backward-compatible.<\/p>\n<p>It also implicitly enables the emulated flags <code>x<\/code> (insignificant whitespace and comments) and <code>n<\/code> (\u201cnamed capture only\u201d mode) by default, so you don\u2019t have to continually opt into their superior modes. And since it\u2019s a raw string template tag, you don\u2019t have to escape your backslashes <code><\/code> like with the <code>RegExp<\/code> constructor.<\/p>\n<h3 id=\"atomic-groups-and-possessive-quantifiers-can-prevent-catastrophic-backtracking\">Atomic Groups and Possessive Quantifiers Can Prevent Catastrophic Backtracking<\/h3>\n<p>Atomic groups and possessive quantifiers are another powerful set of features added by the <code>regex<\/code> library. Although they\u2019re primarily about performance and resilience against catastrophic backtracking (also known as ReDoS or \u201cregular expression denial of service,\u201d a serious issue where certain regexes can take forever when searching particular, not-quite-matching strings), they can also help with readability by allowing you to write simpler patterns.<\/p>\n<p><strong>Note<\/strong>: <em>You can learn more in the <code>regex<\/code> <a href=\"https:\/\/github.com\/slevithan\/regex#atomic-groups\">documentation<\/a>.<\/em><\/p>\n<div class=\"partners__lead-place\"><\/div>\n<h2 id=\"what-s-next-upcoming-javascript-regex-improvements\">What\u2019s Next? Upcoming JavaScript Regex Improvements<\/h2>\n<p>There are a variety of active proposals for improving regexes in JavaScript. Below, we\u2019ll look at the three that are well on their way to being included in future editions of the language.<\/p>\n<h3 id=\"duplicate-named-capturing-groups\">Duplicate Named Capturing Groups<\/h3>\n<p>This is a Stage 3 (nearly finalized) <a href=\"https:\/\/github.com\/tc39\/proposal-duplicate-named-capturing-groups\">proposal<\/a>. Even better is that, as of recently, it works in all major browsers.<\/p>\n<p>When named capturing was first introduced, it required that all <code>(?<name>...)<\/code> captures use unique names. However, there are cases when you have multiple alternate paths through a regex, and it would simplify your code to reuse the same group names in each alternative.<\/p>\n<p>For example:<\/p>\n<pre><code class=\"language-javascript\">\/(?<year>d{4})-dd|dd-(?<year>d{4})\/\n<\/code><\/pre>\n<p>This proposal enables exactly this, preventing a \u201cduplicate capture group name\u201d error with this example. Note that names must still be unique <em>within<\/em> each alternative path.<\/p>\n<h3 id=\"pattern-modifiers-aka-flag-groups\">Pattern Modifiers (aka Flag Groups)<\/h3>\n<p>This is another Stage 3 <a href=\"https:\/\/github.com\/tc39\/proposal-regexp-modifiers\">proposal<\/a>. It\u2019s already supported in Chrome\/Edge 125 and Opera 111, and it\u2019s coming <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=1899813\">soon<\/a> for Firefox. No word <a href=\"https:\/\/bugs.webkit.org\/show_bug.cgi?id=275672\">yet<\/a> on Safari.<\/p>\n<p>Pattern modifiers use <code>(?ims:...)<\/code>, <code>(?-ims:...)<\/code>, or <code>(?im-s:...)<\/code> to turn the flags <code>i<\/code>, <code>m<\/code>, and <code>s<\/code> on or off for only certain parts of a regex.<\/p>\n<p>For example:<\/p>\n<pre><code class=\"language-javascript\">\/hello-(?i:world)\/\n\/\/ Matches 'hello-WORLD' but not 'HELLO-WORLD'\n<\/code><\/pre>\n<h3 id=\"escape-regex-special-characters-with-regexp-escape\">Escape Regex Special Characters with <code>RegExp.escape<\/code><\/h3>\n<p>This <a href=\"https:\/\/github.com\/tc39\/proposal-regex-escaping\">proposal<\/a> recently reached Stage 3 and has been a long time coming. It isn\u2019t yet supported in any major browsers. The proposal does what it says on the tin, providing the function <code>RegExp.escape(str)<\/code>, which returns the string with all regex special characters escaped so you can match them literally.<\/p>\n<p>If you need this functionality today, the most widely-used package (with more than 500 million monthly npm downloads) is <a href=\"https:\/\/github.com\/sindresorhus\/escape-string-regexp\">escape-string-regexp<\/a>, an ultra-lightweight, single-purpose utility that does minimal escaping. That\u2019s great for most cases, but if you need assurance that your escaped string can safely be used at any arbitrary position within a regex, <code>escape-string-regexp<\/code> recommends the <code>regex<\/code> library that we\u2019ve already looked at in this article. The <code>regex<\/code> library uses interpolation to escape embedded strings in a <a href=\"https:\/\/github.com\/slevithan\/regex#interpolating-escaped-strings\">context-aware way<\/a>.<\/p>\n<h2 id=\"conclusion\">Conclusion<\/h2>\n<p>So there you have it: the past, present, and future of JavaScript regular expressions.<\/p>\n<p>If you want to journey even deeper into the lands of regex, check out <a href=\"https:\/\/github.com\/slevithan\/awesome-regex\">Awesome Regex<\/a> for a list of the best regex testers, tutorials, libraries, and other resources. And for a fun regex crossword puzzle, try your hand at <a href=\"https:\/\/regexle.com\/\">regexle<\/a>.<\/p>\n<p>May your parsing be prosperous and your regexes be readable.<\/p>\n<div class=\"signature\">\n <img decoding=\"async\" src=\"data:image\/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==\" alt=\"Smashing Editorial\" width=\"35\" height=\"46\" loading=\"lazy\" class=\"lazyload\" data-src=\"https:\/\/www.smashingmagazine.com\/images\/logo\/logo--red.png\"><br \/>\n <span>(gg, yk)<\/span>\n<\/div>\n<\/article>\n","protected":false},"excerpt":{"rendered":"<p>Regexes Got Good: The History And Future Of Regular Expressions In JavaScript Regexes Got Good: The History And Future Of Regular Expressions In JavaScript Steven Levithan 2024-08-20T15:00:00+00:00 2024-10-15T23:05:45+00:00 Modern JavaScript regular expressions have come a long way compared to what you might be familiar with. Regexes can be an amazing tool for searching and replacing…<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[10],"tags":[],"class_list":["post-434","post","type-post","status-publish","format-standard","hentry","category-javascript"],"_links":{"self":[{"href":"https:\/\/fdswebdesign.com\/index.php\/wp-json\/wp\/v2\/posts\/434","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fdswebdesign.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fdswebdesign.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/fdswebdesign.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/fdswebdesign.com\/index.php\/wp-json\/wp\/v2\/comments?post=434"}],"version-history":[{"count":2,"href":"https:\/\/fdswebdesign.com\/index.php\/wp-json\/wp\/v2\/posts\/434\/revisions"}],"predecessor-version":[{"id":436,"href":"https:\/\/fdswebdesign.com\/index.php\/wp-json\/wp\/v2\/posts\/434\/revisions\/436"}],"wp:attachment":[{"href":"https:\/\/fdswebdesign.com\/index.php\/wp-json\/wp\/v2\/media?parent=434"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fdswebdesign.com\/index.php\/wp-json\/wp\/v2\/categories?post=434"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fdswebdesign.com\/index.php\/wp-json\/wp\/v2\/tags?post=434"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}