Mini StarCoder2 - Tokenizer (WIP)
Now that there is a pretrained dataset containing Python source code in form of text, next task would be to create a tokenizer specific to the code.
Now that there is a pretrained dataset containing Python source code in form of text, next task would be to create a tokenizer specific to the code.