This code prints the starting and ending index of the match in the text, in case of a match; 0 otherwisw.
The following matches required for this assignment are supported by this code:-
- Matching of individual characters and numbers
- Ranges: a-z,A-Z,0-9 or anything in between
- Character classes
- Macros like +,*,?
- \w,\d
- Greedy and non-greedy matching in case of *
Certain Additional matches are also supported since they weren't very difficult to implement, like
- Matches '.' - Everything except \n
- Support for greedy and non greedy wrt +
- Startswith: ^ and Endswith: $
- Non inclusion matches in character class: [^...]
- \W,\D,\s,\S
Data Structures used:- The input pattern is parsed and stored in the form of a structure whose definition is given below:- typedef struct regex_t { int type; union { char ch; char *char_class; }; }regex;
Logic used:-
- The text and no of patterns are taken through input
- Each pattern that is input, is parsed, converted and stored in the form of the structure, defined above.
- The match_here function as defined by Sir in the class, has been converted to an iterative version called match_util which iterates through the pattern and text and returns the starting index in case a match is found. -1 otherwise.
- The match length is in fact a global variable which indicates the no. of characters matched. This is used to calculate the ending index, as follows: End index = Start index + match length - 1
- The parsed pattern and the character class buffers are also global variables for convenience of scope (Although a bad programming practice)
- The macros and metacharacters are depicted by their types, which are ENUMS for ease of use, easy debugging and readability.
Input:- From stdin, as follows
text
no_of_patterns
pattern0
pattern1
.
.
.
Constraints:-
Max length of text = 4000
Max length of patterns = 1000
Output:- As required by the assignment, this code Outputs:-
0 - if there is no match
1 start_index end_index - if match is found
for each pattern input