Parse, don't validate through the years with C++

Alexis King’s Parse, don’t validate had a huge impact on how I write code, particularly my stance toward Python type annotations in production. However, as someone who has written practically zero Haskell, the idea didn’t click for me until I started seeing other examples, like this one in Rust. This post explores how we can take this paradigm and apply it to a simple date-parsing problem in C++98, C++11, C++17, and finally C++23.

If you haven’t read the original essay, it is a great explainer for a genuinely tricky topic. Here’s my attempt at summarizing the key idea:

Use your language’s type system to parse unstructured inputs. A successful instantiation of that type should mean the data is valid, and you eliminate the need for validation control flow down the line.

I’ve revised this summary a few times and I still don’t love it. Likely a mix of skill issue and the fact that the idea is very hard to stuff into the small box of two sentences without relying on more jargon, which I am still using.

Our Worked Example: Parsing a date

Timestamp and date parsing are notoriously riddled with edge cases. That being said, much like validating email addresses, it’s a perfect example of a problem where a programmer has to make a contextually optimal tradeoff between correctness and readability.

In this case, let’s imagine we’re building something that consumes unstructured data from a file, and we need to extract personally identifiable information. For the purposes of this post, I’ll be ingesting a raw string, raw_input, that needs to be used by the rest of the code.

Starting point

Let’s start with a dead simple implementation that is mostly apathetic to the idea of type-driven design.

See it in Compiler Explorer

#include <cstdio>
struct Birthdate { int year, month, day; };

// perhaps if it were more C-like we'd return a status code
// to make everyone upset i settled on this
Birthdate make_birthdate(const char* user_input) {
    Birthdate b = {0, 0, 0};
    std::sscanf(user_input, "%d-%d-%d", &b.year, &b.month, &b.day); 
    return b;
}

int main() {
    // imagine this is the text we're reading from a file
    const char* file_text = "2026-04-17"; 
    Birthdate b = make_birthdate(file_text);
    std::printf("Year: %d Month: %d Day: %d\n", b.year, b.month, b.day);
    return 0;
}

You can change quite a few characters in that instance of file_text being passed into the function and the program won’t crash. clang will happily compile, the OS will happily report that the program exited successfully. Imagine we’re handling input coming from an OCR pipeline and it was instead something like 2O26-04-I7. We’d get the following output from printf:

Year: 2 Month: 0 Day: 0

This code is problematic because we are kicking the verification can down the road. This program will accept that example faulty input just fine, return a struct with data no one wants, and create headaches for everyone who needs to use make_birthdate, despite “working”. Say we later need to implement a User class with a Birthdate member field and a User::getAge() method. Does the User constructor validate the Birthdate fields? Does getAge()? Both of these are likely bad ideas. We could probably ask an LLM of our choosing to unload a bucket of if-statements into make_birthdate() and make it much more robust, but likely at the expense of readability.

First pass: C++98

Instead of doing any of the above, let’s make a Birthdate type which enforces some sane constraints on what a valid birthdate should:

Have an integer month value between 1 and 12 inclusive
Have a four-digit year value between 1900 and 9999
Have a day value between 1 and 31 inclusive, where the upper bound is based on month and year to account for leap years

If you’re bothering with C++98 in this day and age, it is very likely you’re running in an embedded environment where heap allocations are few and far between and exceptions can’t be used:

sscanf replaced with our own parsing
a private constructor only used for setting fields
key logic is easily moved out to functions we can mark static to show no internal state

See It In Compiler Explorer

#include <cstdio>
#include <cstring>

enum ParseStatus {
    PARSE_OK = 0,
    PARSE_NULL_INPUT,
    PARSE_BAD_FORMAT,
    PARSE_YEAR_RANGE,
    PARSE_MONTH_RANGE,
    PARSE_DAY_RANGE
};

namespace {
    const unsigned char kDaysInMonth[12] = {
        31,28,31,30,31,30,31,31,30,31,30,31
    };
}

class Birthdate {
public:
    
    // There are a few ways to let API callers bring their own 
    // memory, as they would in a no-malloc environment and this
    // stack-friendly c'tor is a stand-in for that. 
    static Birthdate epoch() { return Birthdate(1900, 1, 1); }

    unsigned short year()  const { return y_; }
    unsigned char  month() const { return m_; }
    unsigned char  day()   const { return d_; }

    static ParseStatus parse_iso_yyyy_mm_dd(const char* s, size_t s_len, Birthdate& out) {
        if (!s) return PARSE_NULL_INPUT;
        if (s_len != 10) return PARSE_BAD_FORMAT;   
        if (s[4] != '-' || s[7] != '-') return PARSE_BAD_FORMAT;

        unsigned int y = 0, m = 0, d = 0;
        if (!parse4(s, y) || !parse2(s + 5, m) || !parse2(s + 8, d)) {
            return PARSE_BAD_FORMAT;
        }

        if (y < 1900U || y > 9999U) return PARSE_YEAR_RANGE;
        if (m < 1U || m > 12U)      return PARSE_MONTH_RANGE;

        unsigned int max_day = kDaysInMonth[m - 1U];
        if (m == 2U && is_leap((unsigned short)y)) max_day = 29U;
        if (d < 1U || d > max_day)  return PARSE_DAY_RANGE;

        out = Birthdate((unsigned short)y, (unsigned char)m, (unsigned char)d);
        return PARSE_OK;
    }

private:
    Birthdate(unsigned short y, unsigned char m, unsigned char d)
        : y_(y), m_(m), d_(d) {}

    static bool is_digit(char c) { return c >= '0' && c <= '9'; }

    static bool parse2(const char* p, unsigned int& out) {
        if (!is_digit(p[0]) || !is_digit(p[1])) return false;
        out = (unsigned int)(p[0] - '0') * 10U + (unsigned int)(p[1] - '0');
        return true;
    }

    static bool parse4(const char* p, unsigned int& out) {
        if (!is_digit(p[0]) || !is_digit(p[1]) || !is_digit(p[2]) || !is_digit(p[3])) return false;
        out = (unsigned int)(p[0] - '0') * 1000U
            + (unsigned int)(p[1] - '0') * 100U
            + (unsigned int)(p[2] - '0') * 10U
            + (unsigned int)(p[3] - '0');
        return true;
    }

    static bool is_leap(unsigned short y) {
        return (y % 400U == 0U) || ((y % 4U == 0U) && (y % 100U != 0U));
    }

    unsigned short y_;
    unsigned char  m_;
    unsigned char  d_;
};


int main() {
    const char* file_text = "2026-04-17";
    Birthdate b = Birthdate::epoch();
    ParseStatus status = Birthdate::parse_iso_yyyy_mm_dd(file_text, 
        std::strlen(file_text), // we do a little stdlib cheating 
        b);
    if (status == PARSE_OK) {
        std::printf("Parsed: %u-%u-%u\n", 
            (unsigned)b.year(), 
            (unsigned)b.month(), 
            (unsigned)b.day());
    } else {
        std::printf("Parse failed: %d\n", (int)status);
    }
    return 0;
}

Giving the API caller more control over memory while keeping the class’s internal state “locked down” was a welcome exercise in API design. This version still has no exceptions, no heap allocations, and no post-hoc validation branches elsewhere. Birthdate values entering the rest of the program are already parsed and known-good. For further reading on exceptions, I would recommend:

Round 2: C++11

Hopefully, by now, this is the C++ feature set that is mostly taught in schools, disgusting std::vector<bool> warts included. For this code snippet, I can offload proper date parsing to std::get_time! We rolled it ourselves once, but we weren’t handling all the proper edge cases and it’ll make our examples much shorter. Again, the real point is to have the class only contain valid states, and use the construction code as a hard boundary for parsing logic. If code in our codebase consuming this type breaks, it shouldn’t be breaking due to the contents of the class instance at all. Style notes:

I think the istringstream is a better stand-in for raw input, but I never liked working with streams. fmt was a clear improvement for most string manipulation.
It felt more idiomatic to have a “boring” public constructor.
Outside of all the gutted code replaced with imports and parsing, the key sequence of events doesn’t feel that different from C++98.

See it in Compiler Explorer

#include <ctime>
#include <iomanip>
#include <iostream>
#include <sstream>
#include <stdexcept>

class Birthdate {
public:
    Birthdate(int y, int m, int d) : y_(y), m_(m), d_(d) {
        if (m_ < 1 || m_ > 12) throw std::invalid_argument("month must be 1..12");
        if (d_ < 1 || d_ > 31) throw std::invalid_argument("day must be 1..31");
    }

    int year() const { return y_; }
    int month() const { return m_; }
    int day() const { return d_; }

private:
    int y_, m_, d_;
};

Birthdate parse_birthdate(std::istream& in) {
    std::tm t = {};
    in >> std::get_time(&t, "%Y-%m-%d");
    if (in.fail()) throw std::invalid_argument("expected YYYY-MM-DD");
    in >> std::ws;
    if (!in.eof()) throw std::invalid_argument("trailing characters");
    return Birthdate(t.tm_year + 1900, t.tm_mon + 1, t.tm_mday);
}

int main() {
    try {
        std::istringstream ss("2026-04-17");
        Birthdate b = parse_birthdate(ss);
        std::cout << b.year() << "-" << b.month() << "-" << b.day() << "\n";
    } catch (const std::exception& e) {
        std::cerr << "Parse failed: " << e.what() << "\n";
    }
}

Round 3: C++17

I imagine many engineers working with C++ in production environments generally have access to C++17. There are likely some notable exceptions (no pun intended), but this version stands out because it gives us a standard-library way to make failure explicit without throwing. Personally, I find this much nicer since I don’t love reasoning about control flow that sneaks past library boundaries the way exceptions often do.

I’m also choosing to sneak manual parsing back into our example here since it lets us swap out string streams for a string view. The constructor has become a little less “default,” but you’d probably want a tight grip on those anyway if you’re going down this path.

See it in Compiler Explorer

#include <array>
#include <iostream>
#include <optional>
#include <string_view>

class Birthdate {
public:
    static std::optional<Birthdate> parse(const std::string_view s) {
        if (s.size() != 10 || s[4] != '-' || s[7] != '-') return std::nullopt;

        const auto y = parse_n_digits(s, 0, 4);
        const auto m = parse_n_digits(s, 5, 2);
        const auto d = parse_n_digits(s, 8, 2);
        if (!y || !m || !d) return std::nullopt;

        return from_ymd(*y, *m, *d);
    }

    static std::optional<Birthdate> from_ymd(int y, int m, int d) {
        if (y < 1900 || y > 9999) return std::nullopt;
        if (m < 1 || m > 12) return std::nullopt;
        if (d < 1 || d > days_in_month(y, m)) return std::nullopt;
        return Birthdate(y, m, d);
    }

    int year()  const noexcept { return y_; }
    int month() const noexcept { return m_; }
    int day()   const noexcept { return d_; }

private:
    Birthdate(int y, int m, int d) : y_(y), m_(m), d_(d) {}

    static std::optional<int> parse_n_digits(std::string_view s, std::size_t pos, std::size_t n) {
        int v = 0;
        for (std::size_t i = 0; i < n; ++i) {
            const char c = s[pos + i];
            if (c < '0' || c > '9') return std::nullopt;
            v = v * 10 + (c - '0');
        }
        return v;
    }

    static bool is_leap(int y) noexcept {
        return (y % 400 == 0) || ((y % 4 == 0) && (y % 100 != 0));
    }

    static int days_in_month(int y, int m) noexcept {
        static const std::array<int, 12> dim = {31,28,31,30,31,30,31,31,30,31,30,31};
        return (m == 2 && is_leap(y)) ? 29 : dim[m - 1];
    }

    int y_;
    int m_;
    int d_;
};

int main() {
    std::string file_input = "2026-04-17";
    if (std::optional<Birthdate> b = Birthdate::parse(file_input)) {
        std::cout << b->year() << "-" << b->month() << "-" << b->day() << "\n";
    } else {
        std::cout << "Parse failed\n";
    }
}

Round 4: C++23

I have not written much C++23 or later, so please forgive any unestablished idioms. For the sake of length and context, I felt covering both C++20 and C++23 didn’t make much sense for this article.

The biggest welcome change for our parser is the arrival of rich error information we can forward to our users via std::expected. We could just roll a jump table of some sort that does roughly the same thing, but it’s neat to have it baked right into the type.

See it in Compiler Explorer

#include <expected>
#include <iostream>
#include <string_view>

enum class ParseError {
    BadFormat,
    YearDigits,
    MonthDigits,
    DayDigits,
    YearRange,
    MonthRange,
    DayRange
};

// could probably simplify this with an enum wrapper like magic enum
const char* to_string(ParseError e) {
    switch (e) {
        case ParseError::BadFormat:  return "expected YYYY-MM-DD";
        case ParseError::YearDigits: return "invalid year digits";
        case ParseError::MonthDigits:return "invalid month digits";
        case ParseError::DayDigits:  return "invalid day digits";
        case ParseError::YearRange:  return "year must be 1900..9999";
        case ParseError::MonthRange: return "month must be 1..12";
        case ParseError::DayRange:   return "day out of range for month/year";
    }
    return "unknown parse error";
}

class Birthdate {
public:
    static std::expected<Birthdate, ParseError> parse(std::string_view s) {
        if (s.size() != 10 || s[4] != '-' || s[7] != '-') {
            return std::unexpected(ParseError::BadFormat);
        }

        std::expected<int, ParseError> y = parse_n_digits(s, 0, 4, ParseError::YearDigits);
        if (!y) return std::unexpected(y.error());

        std::expected<int, ParseError> m = parse_n_digits(s, 5, 2, ParseError::MonthDigits);
        if (!m) return std::unexpected(m.error());

        std::expected<int, ParseError> d = parse_n_digits(s, 8, 2, ParseError::DayDigits);
        if (!d) return std::unexpected(d.error());

        if (*y < 1900 || *y > 9999) return std::unexpected(ParseError::YearRange);
        if (*m < 1 || *m > 12)      return std::unexpected(ParseError::MonthRange);

        const int maxd = days_in_month(*y, *m);
        if (*d < 1 || *d > maxd)    return std::unexpected(ParseError::DayRange);

        return Birthdate(*y, *m, *d);
    }

    int year()  const noexcept { return y_; }
    int month() const noexcept { return m_; }
    int day()   const noexcept { return d_; }

private:
    Birthdate(int y, int m, int d) : y_(y), m_(m), d_(d) {}

    static std::expected<int, ParseError>
    parse_n_digits(std::string_view s, std::size_t pos, std::size_t n, ParseError on_error) {
        int value = 0;
        for (std::size_t i = 0; i < n; ++i) {
            const char c = s[pos + i];
            if (c < '0' || c > '9') return std::unexpected(on_error);
            value = value * 10 + (c - '0');
        }
        return value;
    }

    static bool is_leap(int y) noexcept {
        return (y % 400 == 0) || ((y % 4 == 0) && (y % 100 != 0));
    }

    static int days_in_month(int y, int m) noexcept {
        static constexpr int dim[12] = {31,28,31,30,31,30,31,31,30,31,30,31};
        return (m == 2 && is_leap(y)) ? 29 : dim[m - 1];
    }

    int y_, m_, d_;
};

int main() {
    
    if (auto result = Birthdate::parse("2026-04-17"); result) {
        std::cout << result->year() << "-" << result->month() << "-" << result->day() << "\n";
    } else {
        std::cout << "Parse failed: " << to_string(result.error()) << "\n";
    }
}

I was curious, so I did some quick benchmarking for the compile times of each of these code snippets, and we see the following:

Command	Mean [ms]	Min [ms]	Max [ms]	Relative Multiplier (lower is better)
Initial approach (C++98)	20.6 +- 0.6	19.9	21.9	1.00
C++98	23.6 +- 0.7	22.8	25.4	1.15 +- 0.05
C++11	189.1 +- 7.6	183.1	215.8	9.18 +- 0.44
C++17	217.6 +- 4.5	211.7	231.6	10.57 +- 0.36
C++23	974.7 +- 28.8	943.4	1047.4	47.32 +- 1.89

Compilation results were collected with hyperfine using the following config:

Clang 22.1.3
AMD 3700X
16GB DDR4
command tested was $(CXX) --std=$(CXXVERSION) file.cpp with two warmups and 20 runs per command

Conclusion

This was a fun exercise inspired by a real problem I encountered at work. When a dev team spends a lot of time in “moving fast” mode, it’s easy to accumulate interfaces that are unnecessarily ambiguous about the data passing through them. Lots of class members that could potentially be null, or maps/dictionaries that may or may not have certain keys. Lo and behold, something breaks downstream and we have to patch it. Some languages, like Rust and Haskell, make it very easy to lean on an advanced type system to enforce invariants like this. Other languages, like Python, are exceptionally loose with type constraints by default, but you can still specify detailed interfaces with libraries like pydantic. As you’ll see in real production environments and these toy C++ snippets, many languages live in the gray area in between.

I used multiple LLMs to generate drafts of all of these code snippets, and not a single one-shot prompt for a simple date parser in a given dialect made it into this draft without heavy revisions. While drafting code snippets, the models would:

Hand me code that just didn’t compile (admittedly rare).
Agree to both delegate to stdlib functions and roll our own string parsing within the same thread.
Happily accept bad advice and refactor concise, perfectly serviceable code into overly verbose nonsense. The only model(s) I have used which take strong stances on their architectural decisions are GPT 5.2 Codex and GPT 5.4, to a lesser extent.
Fail to give me a Birthdate type with any level of self-verification unless I explicitly asked for it. I am frequently bringing the best practices to the LLM which I have to enforce, and I am using these models less and less as time goes on, not more.

If this post gets traction, I might write a follow-up where I elaborate on these examples and tough edge cases in a more production-like scenario, specifically placing emphasis on downstream usage and places where we’re cleaning up existing ugly code.