README.md · main · mirrored_repos / MachineLearning / Significant-Gravitas / Gravitasml

feat: Implement tokenizer and parser for HTML-like markup language · 695a0a26

Nicholas Tindle authored Nov 08, 2023

- Create Token class to represent individual tokens in the markup language
- Create tokenize function to split the markup into a list of tokens
- Implement Parser class to parse the tokens and generate a dictionary or list
- Add support for nested tags and multiple root tags
- Handle whitespace and escaped characters correctly
- Write unit tests to verify the functionality of the tokenizer and parser

This commit implements the tokenizer and parser for an HTML-like markup language. It includes the following changes:
- Added token.py and parser.py to the gravitasml package
- Implemented the Token class with attributes for type, value, line_num, and column
- Implemented the tokenize function to split the markup into a list of tokens based on regular expressions
- Implemented the Parser class with methods for parsing the tokens and generating a dictionary or list
- Added support for nested tags and multiple root tags in the parsed output
- Handled whitespace and escaped characters correctly in the tokenization process
- Created unit tests for both the tokenizer and parser to verify their functionality

This implementation allows for the parsing of HTML-like markup into a structured dictionary or list, providing the foundation for further processing and manipulation of the markup data.

695a0a26