Module: Regular Expressions (reex)

Regular expressions manipulation

Regular expression classes and manipulation

Classes

RegularExpression

class RegularExpression[source]

Abstract base class for all regular expression objects

RegExp

class RegExp(sigma=None)[source]

Base class for regular expressions.

Variables

Sigma – alphabet set of strings

Inheritance diagram of RegExp
abstract static alphabeticLength()[source]

Number of occurrences of alphabet symbols in the regular expression.

Return type

integer

Attention

Doesn’t include the empty word.

compare(r, cmp_method='compareMinimalDFA', nfa_method='nfaPD')[source]

Compare with another regular expression for equivalence. :param r: :param cmp_method: :param nfa_method:

compareMinimalDFA(r, nfa_method='nfaPosition')[source]

Compare with another regular expression for equivalence through minimal DFAs. :param r: :param nfa_method:

dfaAuPoint()[source]

DFA “au-point” acconding to Nipkow

Returns

“au-point” DFA

Return type

fa.DFA

See also

Andrea Asperti, Claudio Sacerdoti Coen and Enrico Tassi, Regular Expressions, au point. arXiv 2010

See also

Tobias Nipkow and Dmitriy Traytel, Unified Decision Procedures for

Regular Expression Equivalence

dfaBrzozowski(memo=None)[source]

Word derivatives automaton of the regular expression

Returns

word derivatives automaton

Return type

DFA

See also

    1. Brzozowski, Derivatives of Regular Expressions. J. ACM 11(4): 481-494 (1964)

dfaYMG()[source]

DFA Yamada-McNaugthon-Gluskov acconding to Nipkow

Returns

Y-M-G DFA

Return type

DFA

See also

Tobias Nipkow and Dmitriy Traytel, Unified Decision Procedures for

Regular Expression Equivalence

static emptysetP()[source]

Whether the regular expression is the empty set.

Return type

Boolean

abstract static epsilonLength()[source]

Number of occurrences of the empty word in the regular expression.

Return type

integer

static epsilonP()[source]

Whether the regular expression is the empty word.

Return type

Boolean

equivP(other, strict=True)[source]

Test RE equivalence with extended Hopcroft-Karp method

Parameters
  • other (RegExp) – RE

  • strict (bool) – if True checks for same alphabets

Return type

bool

equivalentP(other)[source]

Tests equivalence

Parameters

other

Return type

bool

evalWordP(word)[source]

Verifies if a word is a member of the language represented by the regular expression.

Parameters

word (str) – the word

Return type

bool

static ewp()[source]

Whether the empty word property holds for this regular expression’s language.

Return type

Boolean

abstract first()[source]
Return type

set

abstract last()[source]
Return type

set

abstract linearForm()[source]
Return type

dic

abstract mark()[source]

Make all atoms maked (tag False) :rtype: RegExp

marked()[source]

Regular expression in which every alphabetic symbol is marked with its Position.

The kind of regular expression returned is known, depending on the literary source, as marked, linear or restricted regular expression.

Returns

linear regular expression

Return type

RegExp

See also

r. McNaughton and H. Yamada, Regular Expressions and State Graphs for Automata, IEEE Transactions on Electronic Computers, V.9 pp:39-47, 1960

..attention: mark and unmark do not preserve the alphabet, neither set the new alphabet

nfaFollow()[source]

NFA that accepts the regular expression’s language, whose structure, equiand construction.

Return type

NFA

See also

Ilie & Yu (Follow Automata, 03)

nfaFollowEpsilon(trim=True)[source]

Epsilon-NFA constructed with Ilie and Yu’s method () that accepts the regular expression’s language.

Parameters

trim

Returns

NFA possibly with CEpsilon transitions

Return type

NFAe

Note

The regular expression must be reduced

See also

Ilie & Yu, Follow automta, Inf. Comp. ,v. 186 (1),140-162,2003

nfaGlushkov()[source]

Position or Glushkov automaton of the regular expression. Recursive method.

Returns

NFA

nfaNaiveFollow()[source]

NFA that accepts the regular expression’s language, and is equal in structure to the follow automaton.

Return type

NFA

Note

Included for testing purposes.

See also

Ilie & Yu (Follow Automata, 2003)

nfaPD(pdmethod='nfaPDDAG')[source]

Computes the partial derivative automaton : param pdmethod str: an implementation of the PD automaton. Default value : nfaPDDAG :return: a PD nfa :rtype: NFA

nfaPDDAG()[source]

” :return: a PD nfa build using a DAG :rtype: NFA

..seealso:: s.Konstantinidis, A. Machiavelo, N. Moreira, and r. Reis.

Partial derivative automaton by compressing regular expressions. DCFS 2021, volume 13037 of LNCS, pages 100–112. Springer, 2022

nfaPDNaive()[source]
NFA that accepts the regular expression’s language,

and which is constructed from the expression’s partial derivatives.

Returns

NFA: partial derivatives [or equation] automaton

See also

V. M. Antimirov, Partial Derivatives of Regular Expressions and Finite Automaton Constructions .Theor. Comput. Sci.155(2): 291-319 (1996)

nfaPDO()[source]
NFA that accepts the regular expression’s language, and which is constructed from the expression’s partial

derivatives.

Note

optimized version

Returns

partial derivatives [or equation] automaton

Return type

NFA

nfaPSNF()[source]

Position or Glushkov automaton of the regular expression constructed from the expression’s star normal form.

Returns

Position automaton

Return type

NFA

nfaPosition(lstar=True)[source]

Position automaton of the regular expression.

Parameters

lstar (boolean) – if not None followlists are computed as disjunct

Returns

Position NFA

Return type

NFA

nfaPre()[source]

Prefix NFA of a regular expression States are of the form (RegExp,sym) :return: prefix automaton :rtype: NFA

See also

Maia et al, Prefix and Right-partial derivative automata, 11th CIE 2015, 258-267 LNCS 9136, 2015

nfaPreSlow()[source]

Prefix NFA of a regular expression :return: prefix automaton :rtype: NFA .. seealso:: Maia et al, Prefix and Right-partial derivative automata, 11th CIE 2015, 258-267 LNCS 9136, 2015 ..note:: not working with current tailForm

notEmptyW()[source]

Witness of non emptyness

Returns

word or None

abstract rpn()[source]

RPN representation :rtype: str :return: printable RPN representation

abstract static setOfSymbols()[source]
Return type

set

setSigma(symbolset=None, strict=False)[source]

Set the alphabet for a regular expression and all its nodes

Parameters
  • symbolset (list or set of str) – accepted symbols. If None, alphabet is unset.

  • strict (bool) – if True checks if setOfSymbols is included in symbolSet

..attention: Normally this attribute is not defined in a RegExp()

abstract static starHeight()[source]

Maximum level of nested regular expressions with a star operation applied.

For instance, starHeight(((a*b)*+b*)*) is 3.

Return type

integer

abstract tailForm()[source]
Return type

dict

toDFA()[source]

DFA that accepts the regular expression’s language

toNFA(nfa_method='nfaPDNaive')[source]

NFA that accepts the regular expression’s language. :param nfa_method:

abstract static treeLength()[source]

Number of nodes of the regular expression’s syntactical tree.

Return type

integer

unionSigma(other)[source]

Returns the union of two alphabets

Return type

set

wordDerivative(word)[source]
Derivative of the regular expression in relation to the given word,

which is represented by a list of symbols.

Parameters

word – list of arbitrary symbols.

Return type

regular expression

See also

    1. Brzozowski, Derivatives of Regular Expressions. J. ACM 11(4): 481-494 (1964)

SpecialConstant

class SpecialConstant(sigma=None)[source]

Base class for Epsilon and EmptySet

Inheritance diagram of SpecialConstant
Parameters

sigma – alphabet

static alphabeticLength()[source]
Returns

derivative(sigma)[source]
Parameters

sigma

Returns

distDerivative(sigma)[source]
Parameters

sigma – an arbitrary symbol.

Return type

regular expression

static epsilonLength()[source]

Number of occurrences of the empty word in the regular expression.

Return type

integer

static first(parent_first=None)[source]
Parameters

parent_first

Returns

followLists(lists=None)[source]
Parameters

lists

Returns

followListsD(lists=None)[source]
Parameters

lists

Returns

static followListsStar(lists=None)[source]
Parameters

lists

Returns

last(parent_last=None)[source]
Parameters

parent_last

Returns

linearForm()[source]
Returns

mark()[source]

Make all atoms maked (tag False) :rtype: RegExp

partialDerivativesC(sigma)[source]
Parameters

sigma

Returns

reversal()[source]

Reversal of RegExp

Return type

reex.RegExp

abstract rpn()[source]

RPN representation :rtype: str :return: printable RPN representation

static setOfSymbols()[source]
Returns

static starHeight()[source]

Maximum level of nested regular expressions with a star operation applied.

For instance, starHeight(((a*b)*+b*)*) is 3.

Return type

integer

support(side=True)[source]
Returns

supportlast(side=True)[source]
Returns

tailForm()[source]
Returns

static treeLength()[source]

Number of nodes of the regular expression’s syntactical tree.

Return type

integer

unmark()[source]

Conversion back to unmarked atoms :rtype: SpecialConstant

unmarked()[source]

The unmarked form of the regular expression. Each leaf in its syntactical tree becomes a RegExp(), the CEpsilon() or the CEmptySet().

Return type

(general) regular expression

wordDerivative(word)[source]
Parameters

word

Returns

CEpsilon

class CEpsilon(sigma=None)[source]

Class that represents the empty word.

Inheritance diagram of CEpsilon
Parameters

sigma – alphabet

static epsilonLength()[source]

Number of occurrences of the empty word in the regular expression.

Return type

integer

static epsilonP()[source]
Return type

bool

static ewp()[source]
Return type

bool

static measure(from_parent=None)[source]
Parameters

from_parent

Returns

measures

nfaThompson()[source]
Return type

NFA

partialDerivatives(_)[source]
Returns

partialDerivativesC(_)[source]
Returns

rpn()[source]
Returns

str

snf(_hollowdot=False)[source]
Parameters

_hollowdot

Returns

CEmptySet

class CEmptySet(sigma=None)[source]

Class that represents the empty set.

Parameters

sigma – alphabet

static emptysetP()[source]
Returns

static epsilonLength()[source]
Returns

static epsilonP()[source]
Returns

static ewp()[source]
Returns

static measure(from_parent=None)[source]
Parameters

from_parent

Returns

nfaPD(pdmethod='nfaPDNaive')[source]

Computes the partial derivative automaton

partialDerivativesC(_)[source]
Returns

rpn()[source]
Returns

SigmaP

SigmaP

alias of @sigmaP

SigmaS

SigmaS

alias of @sigmaS

Connective

class Connective(arg1, arg2, sigma=None)[source]

Base class for (binary) operations: concatenation, disjunction, etc

Inheritance diagram of Connective
alphabeticLength()[source]

Number of occurrences of alphabet symbols in the regular expression.

Return type

integer

Attention

Doesn’t include the empty word.

epsilonLength()[source]

Number of occurrences of the empty word in the regular expression.

Return type

integer

first(parent_first=None)[source]
Return type

set

last(parent_last=None)[source]
Return type

set

abstract linearForm()[source]
Return type

dic

abstract mark()[source]

Make all atoms maked (tag False) :rtype: RegExp

abstract rpn()[source]

RPN representation :rtype: str :return: printable RPN representation

setOfSymbols()[source]
Return type

set

starHeight()[source]

Maximum level of nested regular expressions with a star operation applied.

For instance, starHeight(((a*b)*+b*)*) is 3.

Return type

integer

treeLength()[source]

Number of nodes of the regular expression’s syntactical tree.

Return type

integer

Star

Concat

Disj

Power

class Power(arg, n=1, sigma=None)[source]

Class for Power operation on regular expressions.

Inheritance diagram of Power
alphabeticLength()[source]

Number of occurrences of alphabet symbols in the regular expression.

Return type

integer

Attention

Doesn’t include the empty word.

epsilonLength()[source]

Number of occurrences of the empty word in the regular expression.

Return type

integer

first()[source]
Return type

set

last()[source]
Return type

set

linearForm()[source]
Return type

dic

mark()[source]

Make all atoms maked (tag False) :rtype: RegExp

reversal()[source]

Reversal of RegExp

Return type

reex.RegExp

rpn()[source]

RPN representation :rtype: str :return: printable RPN representation

setOfSymbols()[source]
Return type

set

starHeight()[source]

Maximum level of nested regular expressions with a star operation applied.

For instance, starHeight(((a*b)*+b*)*) is 3.

Return type

integer

tailForm()[source]
Return type

dict

treeLength()[source]

Number of nodes of the regular expression’s syntactical tree.

Return type

integer

Option

Option

alias of -

Conj

Conj

alias of &

Shuffle

Shuffle

alias of :

Atom

Position

class Position(val, sigma=None)[source]

Class for marked regular expression symbols.

Inheritance diagram of Position

Constructor of a regular expression symbol.

Parameters

val – the actual symbol

setOfSymbols()[source]

Set of symbols that occur in a regular expression..

Returns

set of symbols

Return type

set of symbols

unmarked()[source]

The unmarked form of the regular expression. Each leaf in its syntactical tree becomes a RegExp(), the CEpsilon() or the CEmptySet().

Return type

(general) regular expression

SConnective

class SConnective(arg, sigma=None)[source]
Special regular expressions modulo associativity, commutativity, idempotence of disjunction and intersection;
associativity of concatenation; identities sigma^* and sigma^+. Connectives are:

SDisj: disjunction SConj: intersection SConcat: concatenation

For parsing use str2sre

Inheritance diagram of SConnective
alphabeticLength()[source]
Returns

epsilonLength()[source]
Returns

first()[source]
Return type

set

last()[source]
Return type

set

linearForm()[source]
Return type

dic

mark()[source]

Make all atoms maked (tag False) :rtype: RegExp

nfaPD(pdmethod='nfaPDNaive')[source]

Computes the partial derivative automaton

rpn()[source]

RPN representation :rtype: str :return: printable RPN representation

setOfSymbols()[source]
Returns

starHeight()[source]

Maximum level of nested regular expressions with a star operation applied.

For instance, starHeight(((a*b)*+b*)*) is 3.

Return type

integer

syntacticLength()[source]
Returns

abstract tailForm()[source]
Return type

dict

treeLength()[source]
Returns

SConcat

class SConcat(arg, sigma=None)[source]

Class that represents the concatenation operation.

Inheritance diagram of CConcat
derivative(sigma)[source]
Parameters

sigma

Returns

ewp()[source]
Returns

head()[source]
Returns

head_rev()[source]
Returns

linearForm()[source]
Returns

linearFormC()[source]
Returns

partialDerivatives(sigma)[source]
Parameters

sigma

Returns

partialDerivativesC(sigma)[source]
Parameters

sigma

Returns

support(side=True)[source]
Returns

tail()[source]
Returns

tailForm()[source]
Return type

dict

tail_rev()[source]
Returns

SStar

class SStar(arg, sigma=None)[source]
Special regular expressions modulo associativity, commutativity, idempotence of disjunction and intersection;

associativity of concatenation; identities sigma^* and sigma^+.

SStar: Class that represents Kleene star

Inheritance diagram of SStar
derivative(sigma)[source]
Parameters

sigma

Returns

linearForm()[source]
Returns

nfaPD(pdmethod='nfaPDNaive')[source]

Computes the partial derivative automaton

partialDerivatives(sigma)[source]
Parameters

sigma

Returns

partialDerivativesC(sigma)[source]
Parameters

sigma

Returns

support(side=True)[source]
Returns

SDisj

class SDisj(arg, sigma=None)[source]

Class that represents the disjunction operation for special regular expressions.

Inheritance diagram of SDisj
static cross(ri, s, lists)[source]
Return type

list

derivative(sigma)[source]
Parameters

sigma

Returns

ewp()[source]
Returns

first()[source]
Returns

followLists(lists=None)[source]
Parameters

lists

Returns

followListsStar(lists=None)[source]
Parameters

lists

Returns

last()[source]
Returns

linearForm()[source]
Returns

linearFormC()[source]
Returns

partialDerivatives(sigma)[source]
Parameters

sigma

Returns

partialDerivativesC(sigma)[source]
Parameters

sigma

Returns

support(side=True)[source]
Returns

tailForm()[source]
Return type

dict

SConj

class SConj(arg, sigma=None)[source]

Class that represents the conjunction operation.

Inheritance diagram of CConcat
derivative(sigma)[source]
Parameters

sigma

Returns

ewp()[source]
Returns

linearForm()[source]
Returns

partialDerivatives(sigma)[source]
Parameters

sigma

Returns

partialDerivativesC(sigma)[source]
Parameters

sigma

Returns

support(side=True)[source]
Returns

tailForm()[source]
Return type

dict

SNot

class SNot(arg, sigma=None)[source]
Special regular expressions modulo associativity, commutativity, idempotence of disjunction and intersection;

associativity of concatenation; identities sigma^* and sigma^+. SNot: negation

Inheritance diagram of SNot
alphabeticLength()[source]
Returns

derivative(sigma)[source]

:param sigma :return:

epsilonLength()[source]
Returns

ewp()[source]
Returns

first()[source]
Return type

set

last()[source]
Return type

set

linearForm()[source]
Returns

linearFormC()[source]
Returns

mark()[source]

Make all atoms maked (tag False) :rtype: RegExp

nfaPD(pdmethod='nfaPDNaive')[source]

Computes the partial derivative automaton

partialDerivatives(sigma)[source]
Parameters

sigma

Returns

partialDerivativesC(sigma)[source]
Parameters

sigma

Returns

rpn()[source]

RPN representation :rtype: str :return: printable RPN representation

setOfSymbols()[source]
Returns

starHeight()[source]

Maximum level of nested regular expressions with a star operation applied.

For instance, starHeight(((a*b)*+b*)*) is 3.

Return type

integer

support(side=True)[source]
Returns

syntacticLength()[source]
Returns

tailForm()[source]
Return type

dict

treeLength()[source]
Returns

DAG

class DAG(reg)[source]

Class to support dags representing regexps

…seealso: P. Flajolet, P. Sipala, J.-M. Steyaert, Analytic variations on the common subexpression problem,

in: Automata, Languages and Programmin, LNCS, vol. 443, Springer, New York, 1990, pp. 220–234.

Variables

reg (reex) – regular expression

catLF(idl, idr, delay=False)[source]

both arguments are assumed to be already present in the DAG

static plusLF(diff1, diff2)[source]

Union of partial derivatives

Parameters
  • diff1 (dict) – partial diff of the first argument

  • diff2 (dict) – partial diff of the second argument

Return type

dict

DNode

class DNode(op, arg1=None, arg2=None)[source]

MAtom

class MAtom(val, mark, sigma=None)[source]

Base class for pointed (marked) regular expressions

Used directly to represent atoms (characters). This class is used to obtain Yamada or Asperti automata. There is no evident use for it, outside this module.

Parameters
  • val – symbol

  • sigma – alphabet

unmark()[source]

Conversion back to RegExp

Return type

reex.RegExp

BuildRegexp

class BuildRegexp(context=None)[source]

Semantics of the FAdo grammars’ regexps Priorities of operators: disj > conj > shuffle > concat > not > star >= option

BuildRPNRegexp

class BuildRPNRegexp(context=None)[source]

BuildRPNSRE

class BuildRPNSRE(context=None)[source]

BuildSRE

class BuildSRE(context=None)[source]

Parser for sre

Functions

str2regexp

str2regexp(s, parser=Lark(open('/Users/rvr/Work/FAdo/FAdo/regexp_grammar.lark'), parser='lalr', lexer='contextual', ...), sigma=None, strict=False)[source]

Reads a RegExp from string.

Parameters
  • s (string) – the string representation of the regular expression

  • parser – a parser generator for regexps

  • sigma (list or set of symbols) – alphabet of the regular expression

  • strict (boolean) – if True tests if the symbols of the regular expression are included in sigma

Return type

reex.RegExp

str2sre

str2sre(s, parser=Lark(open('/Users/rvr/Work/FAdo/FAdo/regexp_grammar.lark'), parser='lalr', lexer='contextual', ...), sigma=None, strict=False)[source]

Reads a sre from string. Arguments as str2regexp.

Return type

reex.sre

rpn2regexp

rpn2regexp(s, sigma=None, strict=False)[source]

Reads a (simple) RegExp from a RPN representation

r ::=  .RR | +RR | *r | L | @
L ::=  [a-z] | [A-Z]
Parameters
  • s (str) – RPN representation

  • strict (bool) – Boolean

  • sigma (set) – alphabet

Return type

reex.RegExp

Note

This method uses python stack… thus depth limitations apply

to_s

to_s(r)[source]

Returns a sre from FAdo regexp.

Parameters

r (RegExp) – the FAdo representation regexp for a regular expression.

Return type

RegExp