Runtime validation provides data-level safety.

  • It checks actual incoming data for correctness at runtime (manually defined)
  • It enforces data shape, constraints, and business rules
  • It is needed for bad data entering the system, such as APIs (user input), database data, files (CSV, JSON), etc., because these are untrusted boundaries
  • Often used at system boundaries (APIs, Database, user input)

Let’s consider the following Python function. We do a couple of things in this function before returning the result.

def process_user(user):
    # ...
    # we do a lot of operations here
    # ...
    return user["age"] + 10

This could be the input data, which we are getting from some external system (like a database or API):

data = {"age": "twenty"}

Now, if we call the function and directly pass the data, it will throw a TypeError (because "twenty" cannot be added to 10).

user = data
 
process_user(user)  # TypeError

The program crashes later after doing multiple operations in the function (Dynamic Type Checking).

This can be avoided at the data input or boundary level using runtime validation. Let’s say we have a User Pydantic model to validate input first, and then call the method:

from pydantic import BaseModel
 
class User(BaseModel):
    age: int
 
user = User(**data)  # ValidationError
 
process_user(user)

We interrupt the program at the data boundary level before starting the function. So, we don’t waste compute and don’t fail deep inside the system.

Without validation, the error shows up deep in the system, the stack-trace points to the wrong place, and the root cause is far away (input). With validation, the error is localized, and you know exactly which input, which field, and what failed.