Module: Regular Expressions (reex)
Regular expressions manipulation
Regular expression classes and manipulation
Classes
RegularExpression
-
class RegularExpression[source]
Abstract base class for all regular expression objects
RegExp
-
class RegExp(sigma=None)[source]
Base class for regular expressions.
- Variables
Sigma – alphabet set of strings
-
abstract static alphabeticLength()[source]
Number of occurrences of alphabet symbols in the regular expression.
- Return type
integer
Attention
Doesn’t include the empty word.
-
compare(r, cmp_method='compareMinimalDFA', nfa_method='nfaPD')[source]
Compare with another regular expression for equivalence.
:param r:
:param cmp_method:
:param nfa_method:
-
compareMinimalDFA(r, nfa_method='nfaPosition')[source]
Compare with another regular expression for equivalence through minimal DFAs.
:param r:
:param nfa_method:
-
dfaAuPoint()[source]
DFA “au-point” acconding to Nipkow
- Returns
“au-point” DFA
- Return type
fa.DFA
See also
Andrea Asperti, Claudio Sacerdoti Coen and Enrico Tassi, Regular Expressions, au point. arXiv 2010
See also
Tobias Nipkow and Dmitriy Traytel, Unified Decision Procedures for
Regular Expression Equivalence
-
dfaBrzozowski(memo=None)[source]
Word derivatives automaton of the regular expression
- Returns
word derivatives automaton
- Return type
DFA
See also
Brzozowski, Derivatives of Regular Expressions. J. ACM 11(4): 481-494 (1964)
-
dfaYMG()[source]
DFA Yamada-McNaugthon-Gluskov acconding to Nipkow
- Returns
Y-M-G DFA
- Return type
DFA
See also
Tobias Nipkow and Dmitriy Traytel, Unified Decision Procedures for
Regular Expression Equivalence
-
static emptysetP()[source]
Whether the regular expression is the empty set.
- Return type
Boolean
-
abstract static epsilonLength()[source]
Number of occurrences of the empty word in the regular expression.
- Return type
integer
-
static epsilonP()[source]
Whether the regular expression is the empty word.
- Return type
Boolean
-
equivP(other, strict=True)[source]
Test RE equivalence with extended Hopcroft-Karp method
- Parameters
-
- Return type
bool
-
equivalentP(other)[source]
Tests equivalence
- Parameters
other –
- Return type
bool
-
evalWordP(word)[source]
Verifies if a word is a member of the language represented by the regular expression.
- Parameters
word (str) – the word
- Return type
bool
-
static ewp()[source]
Whether the empty word property holds for this regular expression’s language.
- Return type
Boolean
-
abstract first()[source]
- Return type
set
-
abstract last()[source]
- Return type
set
-
abstract linearForm()[source]
- Return type
dic
-
abstract mark()[source]
Make all atoms maked (tag False)
:rtype: RegExp
-
marked()[source]
Regular expression in which every alphabetic symbol is marked with its Position.
The kind of regular expression returned is known, depending on the literary source, as marked,
linear or restricted regular expression.
- Returns
linear regular expression
- Return type
RegExp
See also
r. McNaughton and H. Yamada, Regular Expressions and State Graphs for Automata,
IEEE Transactions on Electronic Computers, V.9 pp:39-47, 1960
..attention: mark and unmark do not preserve the alphabet, neither set the new alphabet
-
nfaFollow()[source]
NFA that accepts the regular expression’s language, whose structure, equiand construction.
- Return type
NFA
See also
Ilie & Yu (Follow Automata, 03)
-
nfaFollowEpsilon(trim=True)[source]
Epsilon-NFA constructed with Ilie and Yu’s method () that accepts the regular expression’s language.
- Parameters
trim –
- Returns
NFA possibly with CEpsilon transitions
- Return type
NFAe
Note
The regular expression must be reduced
See also
Ilie & Yu, Follow automta, Inf. Comp. ,v. 186 (1),140-162,2003
-
nfaGlushkov()[source]
Position or Glushkov automaton of the regular expression. Recursive method.
- Returns
NFA
-
nfaNaiveFollow()[source]
NFA that accepts the regular expression’s language, and is equal in structure to the follow automaton.
- Return type
NFA
Note
Included for testing purposes.
See also
Ilie & Yu (Follow Automata, 2003)
-
nfaPD(pdmethod='nfaPDDAG')[source]
Computes the partial derivative automaton
: param pdmethod str: an implementation of the PD automaton. Default value : nfaPDDAG
:return: a PD nfa
:rtype: NFA
-
nfaPDDAG()[source]
”
:return: a PD nfa build using a DAG
:rtype: NFA
- ..seealso:: s.Konstantinidis, A. Machiavelo, N. Moreira, and r. Reis.
Partial derivative automaton by compressing regular expressions.
DCFS 2021, volume 13037 of LNCS, pages 100–112. Springer, 2022
-
nfaPDNaive()[source]
- NFA that accepts the regular expression’s language,
and which is constructed from the expression’s partial derivatives.
- Returns
NFA: partial derivatives [or equation] automaton
See also
V. M. Antimirov, Partial Derivatives of Regular Expressions and Finite Automaton Constructions
.Theor. Comput. Sci.155(2): 291-319 (1996)
-
nfaPDO()[source]
- NFA that accepts the regular expression’s language, and which is constructed from the expression’s partial
derivatives.
- Returns
partial derivatives [or equation] automaton
- Return type
NFA
-
nfaPSNF()[source]
Position or Glushkov automaton of the regular expression constructed from the expression’s star normal form.
- Returns
Position automaton
- Return type
NFA
-
nfaPosition(lstar=True)[source]
Position automaton of the regular expression.
- Parameters
lstar (boolean) – if not None followlists are computed as disjunct
- Returns
Position NFA
- Return type
NFA
-
nfaPre()[source]
Prefix NFA of a regular expression
States are of the form (RegExp,sym)
:return: prefix automaton
:rtype: NFA
See also
Maia et al, Prefix and Right-partial derivative automata, 11th CIE 2015, 258-267 LNCS 9136, 2015
-
nfaPreSlow()[source]
Prefix NFA of a regular expression
:return: prefix automaton
:rtype: NFA
.. seealso:: Maia et al, Prefix and Right-partial derivative automata, 11th CIE 2015, 258-267 LNCS 9136, 2015
..note:: not working with current tailForm
-
notEmptyW()[source]
Witness of non emptyness
- Returns
word or None
-
abstract rpn()[source]
RPN representation
:rtype: str
:return: printable RPN representation
-
abstract static setOfSymbols()[source]
- Return type
set
-
setSigma(symbolset=None, strict=False)[source]
Set the alphabet for a regular expression and all its nodes
- Parameters
symbolset (list or set of str) – accepted symbols. If None, alphabet is unset.
strict (bool) – if True checks if setOfSymbols is included in symbolSet
..attention: Normally this attribute is not defined in a RegExp()
-
abstract static starHeight()[source]
Maximum level of nested regular expressions with a star operation applied.
For instance, starHeight(((a*b)*+b*)*) is 3.
- Return type
integer
-
abstract tailForm()[source]
- Return type
dict
-
toDFA()[source]
DFA that accepts the regular expression’s language
-
toNFA(nfa_method='nfaPDNaive')[source]
NFA that accepts the regular expression’s language.
:param nfa_method:
-
abstract static treeLength()[source]
Number of nodes of the regular expression’s syntactical tree.
- Return type
integer
-
unionSigma(other)[source]
Returns the union of two alphabets
- Return type
set
-
wordDerivative(word)[source]
- Derivative of the regular expression in relation to the given word,
which is represented by a list of symbols.
- Parameters
word – list of arbitrary symbols.
- Return type
regular expression
See also
Brzozowski, Derivatives of Regular Expressions. J. ACM 11(4): 481-494 (1964)
SpecialConstant
-
class SpecialConstant(sigma=None)[source]
Base class for Epsilon and EmptySet
- Parameters
sigma – alphabet
-
static alphabeticLength()[source]
- Returns
-
-
derivative(sigma)[source]
- Parameters
sigma –
- Returns
-
-
distDerivative(sigma)[source]
- Parameters
sigma – an arbitrary symbol.
- Return type
regular expression
-
static epsilonLength()[source]
Number of occurrences of the empty word in the regular expression.
- Return type
integer
-
static first(parent_first=None)[source]
- Parameters
parent_first –
- Returns
-
-
followLists(lists=None)[source]
- Parameters
lists –
- Returns
-
-
followListsD(lists=None)[source]
- Parameters
lists –
- Returns
-
-
static followListsStar(lists=None)[source]
- Parameters
lists –
- Returns
-
-
last(parent_last=None)[source]
- Parameters
parent_last –
- Returns
-
-
linearForm()[source]
- Returns
-
-
mark()[source]
Make all atoms maked (tag False)
:rtype: RegExp
-
partialDerivativesC(sigma)[source]
- Parameters
sigma –
- Returns
-
-
reversal()[source]
Reversal of RegExp
- Return type
reex.RegExp
-
abstract rpn()[source]
RPN representation
:rtype: str
:return: printable RPN representation
-
static setOfSymbols()[source]
- Returns
-
-
static starHeight()[source]
Maximum level of nested regular expressions with a star operation applied.
For instance, starHeight(((a*b)*+b*)*) is 3.
- Return type
integer
-
support(side=True)[source]
- Returns
-
-
supportlast(side=True)[source]
- Returns
-
-
tailForm()[source]
- Returns
-
-
static treeLength()[source]
Number of nodes of the regular expression’s syntactical tree.
- Return type
integer
-
unmark()[source]
Conversion back to unmarked atoms
:rtype: SpecialConstant
-
unmarked()[source]
The unmarked form of the regular expression. Each leaf in its syntactical tree becomes a RegExp(),
the CEpsilon() or the CEmptySet().
- Return type
(general) regular expression
-
wordDerivative(word)[source]
- Parameters
word –
- Returns
-
CEpsilon
-
class CEpsilon(sigma=None)[source]
Class that represents the empty word.
- Parameters
sigma – alphabet
-
static epsilonLength()[source]
Number of occurrences of the empty word in the regular expression.
- Return type
integer
-
static epsilonP()[source]
- Return type
bool
-
static ewp()[source]
- Return type
bool
-
static measure(from_parent=None)[source]
- Parameters
from_parent –
- Returns
measures
-
nfaThompson()[source]
- Return type
NFA
-
partialDerivatives(_)[source]
- Returns
-
-
partialDerivativesC(_)[source]
- Returns
-
-
rpn()[source]
- Returns
str
-
snf(_hollowdot=False)[source]
- Parameters
_hollowdot –
- Returns
-
CEmptySet
-
class CEmptySet(sigma=None)[source]
Class that represents the empty set.
- Parameters
sigma – alphabet
-
static emptysetP()[source]
- Returns
-
-
static epsilonLength()[source]
- Returns
-
-
static epsilonP()[source]
- Returns
-
-
static ewp()[source]
- Returns
-
-
static measure(from_parent=None)[source]
- Parameters
from_parent –
- Returns
-
-
nfaPD(pdmethod='nfaPDNaive')[source]
Computes the partial derivative automaton
-
partialDerivativesC(_)[source]
- Returns
-
-
rpn()[source]
- Returns
-
SigmaP
-
SigmaP
alias of @sigmaP
SigmaS
-
SigmaS
alias of @sigmaS
Connective
-
class Connective(arg1, arg2, sigma=None)[source]
Base class for (binary) operations: concatenation, disjunction, etc
-
alphabeticLength()[source]
Number of occurrences of alphabet symbols in the regular expression.
- Return type
integer
Attention
Doesn’t include the empty word.
-
epsilonLength()[source]
Number of occurrences of the empty word in the regular expression.
- Return type
integer
-
first(parent_first=None)[source]
- Return type
set
-
last(parent_last=None)[source]
- Return type
set
-
abstract linearForm()[source]
- Return type
dic
-
abstract mark()[source]
Make all atoms maked (tag False)
:rtype: RegExp
-
abstract rpn()[source]
RPN representation
:rtype: str
:return: printable RPN representation
-
setOfSymbols()[source]
- Return type
set
-
starHeight()[source]
Maximum level of nested regular expressions with a star operation applied.
For instance, starHeight(((a*b)*+b*)*) is 3.
- Return type
integer
-
treeLength()[source]
Number of nodes of the regular expression’s syntactical tree.
- Return type
integer
Power
-
class Power(arg, n=1, sigma=None)[source]
Class for Power operation on regular expressions.
-
alphabeticLength()[source]
Number of occurrences of alphabet symbols in the regular expression.
- Return type
integer
Attention
Doesn’t include the empty word.
-
epsilonLength()[source]
Number of occurrences of the empty word in the regular expression.
- Return type
integer
-
first()[source]
- Return type
set
-
last()[source]
- Return type
set
-
linearForm()[source]
- Return type
dic
-
mark()[source]
Make all atoms maked (tag False)
:rtype: RegExp
-
reversal()[source]
Reversal of RegExp
- Return type
reex.RegExp
-
rpn()[source]
RPN representation
:rtype: str
:return: printable RPN representation
-
setOfSymbols()[source]
- Return type
set
-
starHeight()[source]
Maximum level of nested regular expressions with a star operation applied.
For instance, starHeight(((a*b)*+b*)*) is 3.
- Return type
integer
-
tailForm()[source]
- Return type
dict
-
treeLength()[source]
Number of nodes of the regular expression’s syntactical tree.
- Return type
integer
Shuffle
-
Shuffle
alias of :
Position
-
class Position(val, sigma=None)[source]
Class for marked regular expression symbols.
Constructor of a regular expression symbol.
- Parameters
val – the actual symbol
-
setOfSymbols()[source]
Set of symbols that occur in a regular expression..
- Returns
set of symbols
- Return type
set of symbols
-
unmarked()[source]
The unmarked form of the regular expression. Each leaf in its syntactical tree becomes a RegExp(),
the CEpsilon() or the CEmptySet().
- Return type
(general) regular expression
SConnective
-
class SConnective(arg, sigma=None)[source]
- Special regular expressions modulo associativity, commutativity, idempotence of disjunction and intersection;
- associativity of concatenation; identities sigma^* and sigma^+. Connectives are:
SDisj: disjunction
SConj: intersection
SConcat: concatenation
For parsing use str2sre
-
alphabeticLength()[source]
- Returns
-
-
epsilonLength()[source]
- Returns
-
-
first()[source]
- Return type
set
-
last()[source]
- Return type
set
-
linearForm()[source]
- Return type
dic
-
mark()[source]
Make all atoms maked (tag False)
:rtype: RegExp
-
nfaPD(pdmethod='nfaPDNaive')[source]
Computes the partial derivative automaton
-
rpn()[source]
RPN representation
:rtype: str
:return: printable RPN representation
-
setOfSymbols()[source]
- Returns
-
-
starHeight()[source]
Maximum level of nested regular expressions with a star operation applied.
For instance, starHeight(((a*b)*+b*)*) is 3.
- Return type
integer
-
syntacticLength()[source]
- Returns
-
-
abstract tailForm()[source]
- Return type
dict
-
treeLength()[source]
- Returns
-
SStar
-
class SStar(arg, sigma=None)[source]
- Special regular expressions modulo associativity, commutativity, idempotence of disjunction and intersection;
associativity of concatenation; identities sigma^* and sigma^+.
SStar: Class that represents Kleene star
-
derivative(sigma)[source]
- Parameters
sigma –
- Returns
-
-
linearForm()[source]
- Returns
-
-
nfaPD(pdmethod='nfaPDNaive')[source]
Computes the partial derivative automaton
-
partialDerivatives(sigma)[source]
- Parameters
sigma –
- Returns
-
-
partialDerivativesC(sigma)[source]
- Parameters
sigma –
- Returns
-
-
support(side=True)[source]
- Returns
-
SDisj
-
class SDisj(arg, sigma=None)[source]
Class that represents the disjunction operation for special regular expressions.
-
static cross(ri, s, lists)[source]
- Return type
list
-
derivative(sigma)[source]
- Parameters
sigma –
- Returns
-
-
ewp()[source]
- Returns
-
-
first()[source]
- Returns
-
-
followLists(lists=None)[source]
- Parameters
lists –
- Returns
-
-
followListsStar(lists=None)[source]
- Parameters
lists –
- Returns
-
-
last()[source]
- Returns
-
-
linearForm()[source]
- Returns
-
-
linearFormC()[source]
- Returns
-
-
partialDerivatives(sigma)[source]
- Parameters
sigma –
- Returns
-
-
partialDerivativesC(sigma)[source]
- Parameters
sigma –
- Returns
-
-
support(side=True)[source]
- Returns
-
-
tailForm()[source]
- Return type
dict
SConj
-
class SConj(arg, sigma=None)[source]
Class that represents the conjunction operation.
-
derivative(sigma)[source]
- Parameters
sigma –
- Returns
-
-
ewp()[source]
- Returns
-
-
linearForm()[source]
- Returns
-
-
partialDerivatives(sigma)[source]
- Parameters
sigma –
- Returns
-
-
partialDerivativesC(sigma)[source]
- Parameters
sigma –
- Returns
-
-
support(side=True)[source]
- Returns
-
-
tailForm()[source]
- Return type
dict
SNot
-
class SNot(arg, sigma=None)[source]
- Special regular expressions modulo associativity, commutativity, idempotence of disjunction and intersection;
associativity of concatenation; identities sigma^* and sigma^+.
SNot: negation
-
alphabeticLength()[source]
- Returns
-
-
derivative(sigma)[source]
:param sigma
:return:
-
epsilonLength()[source]
- Returns
-
-
ewp()[source]
- Returns
-
-
first()[source]
- Return type
set
-
last()[source]
- Return type
set
-
linearForm()[source]
- Returns
-
-
linearFormC()[source]
- Returns
-
-
mark()[source]
Make all atoms maked (tag False)
:rtype: RegExp
-
nfaPD(pdmethod='nfaPDNaive')[source]
Computes the partial derivative automaton
-
partialDerivatives(sigma)[source]
- Parameters
sigma –
- Returns
-
-
partialDerivativesC(sigma)[source]
- Parameters
sigma –
- Returns
-
-
rpn()[source]
RPN representation
:rtype: str
:return: printable RPN representation
-
setOfSymbols()[source]
- Returns
-
-
starHeight()[source]
Maximum level of nested regular expressions with a star operation applied.
For instance, starHeight(((a*b)*+b*)*) is 3.
- Return type
integer
-
support(side=True)[source]
- Returns
-
-
syntacticLength()[source]
- Returns
-
-
tailForm()[source]
- Return type
dict
-
treeLength()[source]
- Returns
-
DAG
-
class DAG(reg)[source]
Class to support dags representing regexps
- …seealso: P. Flajolet, P. Sipala, J.-M. Steyaert, Analytic variations on the common subexpression problem,
in: Automata, Languages and Programmin, LNCS, vol. 443, Springer, New York, 1990, pp. 220–234.
- Variables
reg (reex) – regular expression
-
catLF(idl, idr, delay=False)[source]
both arguments are assumed to be already present in the DAG
-
static plusLF(diff1, diff2)[source]
Union of partial derivatives
- Parameters
-
- Return type
dict
DNode
-
class DNode(op, arg1=None, arg2=None)[source]
MAtom
-
class MAtom(val, mark, sigma=None)[source]
Base class for pointed (marked) regular expressions
Used directly to represent atoms (characters). This class is used to obtain Yamada or Asperti automata.
There is no evident use for it, outside this module.
- Parameters
val – symbol
sigma – alphabet
-
unmark()[source]
Conversion back to RegExp
- Return type
reex.RegExp
BuildRegexp
-
class BuildRegexp(context=None)[source]
Semantics of the FAdo grammars’ regexps
Priorities of operators: disj > conj > shuffle > concat > not > star >= option
BuildRPNRegexp
-
class BuildRPNRegexp(context=None)[source]
BuildRPNSRE
-
class BuildRPNSRE(context=None)[source]
BuildSRE
-
class BuildSRE(context=None)[source]
Parser for sre
Functions
str2regexp
-
str2regexp(s, parser=Lark(open('/Users/rvr/Work/FAdo/FAdo/regexp_grammar.lark'), parser='lalr', lexer='contextual', ...), sigma=None, strict=False)[source]
Reads a RegExp from string.
- Parameters
s (string) – the string representation of the regular expression
parser – a parser generator for regexps
sigma (list or set of symbols) – alphabet of the regular expression
strict (boolean) – if True tests if the symbols of the regular expression are included in sigma
- Return type
reex.RegExp
str2sre
-
str2sre(s, parser=Lark(open('/Users/rvr/Work/FAdo/FAdo/regexp_grammar.lark'), parser='lalr', lexer='contextual', ...), sigma=None, strict=False)[source]
Reads a sre from string. Arguments as str2regexp.
- Return type
reex.sre
rpn2regexp
-
rpn2regexp(s, sigma=None, strict=False)[source]
Reads a (simple) RegExp from a RPN representation
r ::= .RR | +RR | *r | L | @
L ::= [a-z] | [A-Z]
- Parameters
-
- Return type
reex.RegExp
Note
This method uses python stack… thus depth limitations apply
to_s
-
to_s(r)[source]
Returns a sre from FAdo regexp.
- Parameters
r (RegExp) – the FAdo representation regexp for a regular expression.
- Return type
RegExp