screp: What If grep Understood Structure?

Regular expressions are one of the oldest tools in a programmer's toolkit. They are also one of the most universally despised. Every developer has a version of the same story: you need to match something in a file, you write a regex, it sort of works, you add a special case, it breaks, you escape a backslash, it works again but now matches things it should not, and thirty minutes later you have a line of punctuation that nobody, including you, will ever be able to read again.

There is a famous joke: "A programmer had a problem. They decided to solve it with regex. Now they have two problems."

We built screp because we think the joke points at something real. Regex is not just hard to read. It is the wrong abstraction for structured matching. And there is a better one that already exists in Haskell.

The Problem with Regex

Regex operates on flat strings. It has no concept of nesting, no concept of sequence beyond "this then that," and no composability. You cannot name a sub-pattern and reuse it. You cannot build a complex matcher from simple, tested pieces. Every regex is a monolith.

Consider matching an email address. In regex:

[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}

This is not readable. It is not composable. If you want to match "an email followed by a comma followed by another email," you copy-paste the whole thing. If the spec changes (say, allowing a + in the domain), you have to carefully edit the middle of a string that looks like a cat walked across the keyboard.

Now consider the same match expressed as parser combinators:

some alphaNum <+> char '@' <+> some alphaNum <+> char '.' <+> some letter

Each piece is a named, composable unit. some alphaNum is a parser that matches one or more alphanumeric characters. char '@' matches a literal @. The <+> operator sequences two parsers and concatenates their results. You can read it left to right and understand what it does.

This is not a new idea. Haskell's Parsec library has provided parser combinators for decades. What is new is putting them in a CLI tool that works like grep.

What screp Does

screp is a command-line search tool. You give it a pattern and files, and it prints matches with file paths, line numbers, and column positions, just like grep. The difference is that patterns are written in a small DSL based on Parsec combinators instead of regex.

# Find all digit sequences
screp 'some digit' file.txt

# Find TODO comments recursively in Haskell files
screp 'string "TODO"' -r -e .hs ./src/

# Non-greedy: match everything between START and END
screp 'string "START" <+> manyTill anyChar (string "END")' file.txt

# Count matches
screp -c 'some digit' data.txt

The output format is grep-compatible:

file.txt:1:28:123
file.txt:2:5:test@example.com

You can pipe it into the same tools you pipe grep into. It supports recursive search, file extension filtering, JSON output, match counting, and result limits.

The DSL

The pattern language is small and learnable. The primitives are what you would expect from any parsing library:

char 'x' matches a single character
string "abc" matches a literal string
digit matches 0-9
letter matches a-z and A-Z
alphaNum matches letters or digits
anyChar matches anything
space and spaces match whitespace
oneOf "abc" and noneOf "xyz" match character sets

The combinators compose these into larger patterns:

p1 <+> p2 sequences two parsers and concatenates their results
p1 <|> p2 tries the first parser, falls back to the second
many p matches zero or more
some p matches one or more
manyTill p end matches p repeatedly until end succeeds (non-greedy)
between '(' ')' p matches p between delimiters
count 3 p matches exactly three times
try p enables backtracking

If you know Haskell, this is just Parsec with the types stripped out for the CLI. If you do not know Haskell, it is still more readable than (?<=\b)[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b.

Custom Parsers

This is where it gets interesting. You can write parser definitions in an actual Haskell file and import them into screp:

-- Parsers.hs
module Parsers (parsers) where

import Text.Parsec
import Text.Parsec.String
import Data.Map (Map)
import qualified Data.Map as Map

parsers :: Map String (Parser String)
parsers = Map.fromList
  [ ("email", email)
  , ("phone", phone)
  ]

email :: Parser String
email = do
  user <- many1 alphaNum
  char '@'
  domain <- many1 alphaNum
  char '.'
  tld <- many1 letter
  pure $ user ++ "@" ++ domain ++ "." ++ tld

phone :: Parser String
phone = do
  a <- count 3 digit
  char '-'
  b <- count 3 digit
  char '-'
  c <- count 4 digit
  pure $ a ++ "-" ++ b ++ "-" ++ c

-- Usage:
-- $ screp --import Parsers.hs 'ref "email"' contacts.txt
-- $ screp --import Parsers.hs 'ref "phone"' contacts.txt

This means your complex matchers live in version-controlled, testable Haskell code. You can unit test them with QuickCheck. You can refactor them. You can share them across your team. The CLI pattern just references them by name.

Compare this to maintaining a regex library. In regex world, your "library" is a text file of patterns that you copy-paste. Changing one means re-validating every context it appears in. There is no type checker, no test harness, no refactoring tool. Parser combinators give you all of those for free because they are just Haskell functions.

Why Parser Combinators Beat Regex

The advantage is not just readability, though that matters. The real advantage is composability.

In regex, if you have a pattern for matching a date and a pattern for matching a time, combining them into a datetime pattern means string concatenation. If either sub-pattern uses a capture group, the group indices shift and everything breaks. Regex composition is not composition at all. It is concatenation with side effects.

Parser combinators compose the way functions compose. A parser that matches a date and a parser that matches a time can be combined with <+> to match a datetime. The combination is guaranteed to work if the parts work. There are no hidden interactions, no index shifting, no escape character conflicts.

This is the same principle that makes Haskell code in general more reliable than code in languages with pervasive side effects. When your building blocks are pure and composable, the things you build from them are predictable.

Who This Is For

screp is for anyone who uses grep but has been burned by regex. It is particularly useful for:

Searching codebases where patterns have structure (function signatures, import statements, TODO markers)
Log analysis where entries follow known formats that are painful to express in regex
Data extraction where you need to pull structured values (emails, phone numbers, URLs) from unstructured text
Teams where shared, testable parser definitions are more maintainable than a wiki page of regex patterns

It is also a gentle introduction to parser combinators for people who have never used them. If you can write some digit <+> char '.' <+> some digit, you have already understood the core idea. The rest is just more combinators.

Try It

cabal install screp

The source is on GitHub. The package is on Hackage.

screp is not a replacement for every use of grep. For simple literal string searches, grep is fine. But the moment you reach for regex and feel that familiar dread, consider whether a parser combinator might be clearer, more composable, and less likely to match things you did not intend.

Regex has two problems. screp has parsers.