Mini StarCoder2 - Tokenizer (WIP)

Now that there is a pretrained dataset containing Python source code in form of text, next task would be to create a tokenizer specific to the code.

1 min · 27 words · dudeperf3ct