Skip to content

Regex Constraint ASCII Pattern and ISIL Pattern Evaluation Produces False Positives

🐛 Bug Report

Summary

ASCII-Pattern

Multiple regular expressions used in constraints produce false positives. This affects:

^[\x21-\x7E]+$ (intended to allow only visible ASCII characters without whitespace)

(?:https?://[^/]+/organis[sz]ations/)?[Dd]Ee?-\d{6} (used for validating ISILs in ISO format, optionally with a URI prefix)

Steps to Reproduce

Define the constraint "For each Resource Wrapper all Resource IDs do match REGEX" and "For each Record Wrapper all Record Identifier do match REGEX" with the regex: ^[\x21-\x7E]+$

Provide a Resource ID that includes a space or a non-ASCII character. Repeat the same with a Record Identifier

Run Quality Analysis and observe that no error is raised.

ISIL

Define a constraint with the regex: (?:https?://[^/]+/organis[sz]ations/)?[Dd]Ee?-\d{6}

Provide a value such as: DE123456 (missing hyphen) DE-12345 (only 5 digits) de-abc-123456 (invalid org code) http://example.org/organizations/DE123456 (malformed URI + invalid ISIL)

Run Quality Analysis and again observe that no error is raised.

What is the current bug behavior?

The regular expression ^[\x21-\x7E]+$, which is meant to validate ASCII characters without spaces, and the expression (?:https?://[^/]+/organis[sz]ations/)?[Dd]Ee?-\d{6} result in false positives.

ASCII-Pattern:

"For each Resource Wrapper all Resource IDs do match REGEX

"For each Record Wrapper all Record Identifier do match REGEX"

IDs that clearly do not match the pattern still pass the quality check, leading to false positives.

The system allows values that do not match the regular expression ^[\x21-\x7E]+$. The pattern should exclude any characters outside the printable ASCII range and disallow whitespace.

ISIL Pattern: For each Wrapper for an object record all Legal Body Identifier do match (?:https?://[^/]+/organis[sz]ations/)?[Dd]Ee?-\d{6} .

What is the expected correct behavior?

Only strings that consist exclusively of visible ASCII characters (no whitespace, no umlauts, no control characters) should pass the quality check.

Only strings that meet the regex condition for ISIL should pass the quality check.

Relevant Logs, Screenshots, or Gifs

image image image image

Possible Fix or Suggested Solution

Verify that the regex engine interprets \x21-\x7E and the ISIL-RegEx correctly and strictly. Ensure whitespace and non-ASCII characters are actually excluded.

Additional Context

The problem may also affect other regex-based constraints if the engine does not interpret character ranges properly.

Edited by Domenic Schäfer