screp: What If grep Understood Structure?
Regular expressions are one of the oldest tools in a programmer's toolkit. They are also one of the most universally despised. Every developer has a version of the same story: you need to match something in a file, you write a regex, it sort of works, you add a special case, it breaks, you escape a backslash, it works again but now matches things it should not, and thirty minutes later you have a line of punctuation that nobody, including you, will ever be able to read again.
There is a famous joke: "A programmer had a problem. They decided to solve it with regex. Now they have two problems."
We built screp because we think the joke points at something real. Regex is not just hard to read. It is the wrong abstraction for structured matching. And there is a better one that already exists in Haskell.
The Problem with Regex
Regex operates on flat strings. It has no concept of nesting, no concept of sequence beyond "this then that," and no composability. You cannot name a sub-pattern and reuse it. You cannot build a complex matcher from simple, tested pieces. Every regex is a monolith.
Consider matching an email address. In regex:
[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
This is not readable. It is not composable. If you want to match "an email followed by a comma followed by another email," you copy-paste the whole thing. If the spec changes (say, allowing a + in the domain), you have to carefully edit the middle of a string that looks like a cat walked across the keyboard.
Now consider the same match expressed as parser combinators:
some alphaNum <+> char '@' <+> some alphaNum <+> char '.' <+> some letter
Each piece is a named, composable unit. some alphaNum is a parser that matches one or more alphanumeric characters. char '@' matches a literal @. The <+> operator sequences two parsers and concatenates their results. You can read it left to right and understand what it does.
This is not a new idea. Haskell's Parsec library has provided parser combinators for decades. What is new is putting them in a CLI tool that works like grep.
What screp Does
screp is a command-line search tool. You give it a pattern and files, and it prints matches with file paths, line numbers, and column positions, just like grep. The difference is that patterns are written in a small DSL based on Parsec combinators instead of regex.
# Find all digit sequences
screp 'some digit' file.txt
# Find TODO comments recursively in Haskell files
screp 'string "TODO"' -r -e .hs ./src/
# Non-greedy: match everything between START and END
screp 'string "START" <+> manyTill anyChar (string "END")' file.txt
# Count matches
screp -c 'some digit' data.txt
The output format is grep-compatible:
file.txt:1:28:123
file.txt:2:5:test@example.com
You can pipe it into the same tools you pipe grep into. It supports recursive search, file extension filtering, JSON output, match counting, and result limits.
The DSL
The pattern language is small and learnable. The primitives are what you would expect from any parsing library:
-
char 'x'matches a single character -
string "abc"matches a literal string -
digitmatches 0-9 -
lettermatches a-z and A-Z -
alphaNummatches letters or digits -
anyCharmatches anything -
spaceandspacesmatch whitespace -
oneOf "abc"andnoneOf "xyz"match character sets
The combinators compose these into larger patterns:
-
p1 <+> p2sequences two parsers and concatenates their results -
p1 <|> p2tries the first parser, falls back to the second -
many pmatches zero or more -
some pmatches one or more -
manyTill p endmatches p repeatedly until end succeeds (non-greedy) -
between '(' ')' pmatches p between delimiters -
count 3 pmatches exactly three times -
try penables backtracking
If you know Haskell, this is just Parsec with the types stripped out for the CLI. If you do not know Haskell, it is still more readable than (?<=\b)[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b.
Custom Parsers
This is where it gets interesting. You can write parser definitions in an actual Haskell file and import them into screp:
-- Parsers.hs
module Parsers (parsers) where
import Text.Parsec
import Text.Parsec.String
import Data.Map (Map)
import qualified Data.Map as Map
parsers :: Map String (Parser String)
parsers = Map.fromList
[ ("email", email)
, ("phone", phone)
]
email :: Parser String
email = do
user <- many1 alphaNum
char '@'
domain <- many1 alphaNum
char '.'
tld <- many1 letter
pure $ user ++ "@" ++ domain ++ "." ++ tld
phone :: Parser String
phone = do
a <- count 3 digit
char '-'
b <- count 3 digit
char '-'
c <- count 4 digit
pure $ a ++ "-" ++ b ++ "-" ++ c
-- Usage:
-- $ screp --import Parsers.hs 'ref "email"' contacts.txt
-- $ screp --import Parsers.hs 'ref "phone"' contacts.txt
This means your complex matchers live in version-controlled, testable Haskell code. You can unit test them with QuickCheck. You can refactor them. You can share them across your team. The CLI pattern just references them by name.
Compare this to maintaining a regex library. In regex world, your "library" is a text file of patterns that you copy-paste. Changing one means re-validating every context it appears in. There is no type checker, no test harness, no refactoring tool. Parser combinators give you all of those for free because they are just Haskell functions.
Why Parser Combinators Beat Regex
The advantage is not just readability, though that matters. The real advantage is composability.
In regex, if you have a pattern for matching a date and a pattern for matching a time, combining them into a datetime pattern means string concatenation. If either sub-pattern uses a capture group, the group indices shift and everything breaks. Regex composition is not composition at all. It is concatenation with side effects.
Parser combinators compose the way functions compose. A parser that matches a date and a parser that matches a time can be combined with <+> to match a datetime. The combination is guaranteed to work if the parts work. There are no hidden interactions, no index shifting, no escape character conflicts.
This is the same principle that makes Haskell code in general more reliable than code in languages with pervasive side effects. When your building blocks are pure and composable, the things you build from them are predictable.
Who This Is For
screp is for anyone who uses grep but has been burned by regex. It is particularly useful for:
- Searching codebases where patterns have structure (function signatures, import statements, TODO markers)
- Log analysis where entries follow known formats that are painful to express in regex
- Data extraction where you need to pull structured values (emails, phone numbers, URLs) from unstructured text
- Teams where shared, testable parser definitions are more maintainable than a wiki page of regex patterns
It is also a gentle introduction to parser combinators for people who have never used them. If you can write some digit <+> char '.' <+> some digit, you have already understood the core idea. The rest is just more combinators.
Try It
cabal install screp
The source is on GitHub. The package is on Hackage.
screp is not a replacement for every use of grep. For simple literal string searches, grep is fine. But the moment you reach for regex and feel that familiar dread, consider whether a parser combinator might be clearer, more composable, and less likely to match things you did not intend.
Regex has two problems. screp has parsers.