Safouane Chergui
About
Weekends Eureka
Categories
All
(3)
NLP
(1)
Package management
(1)
PDF
(1)
Python
(3)
PDF parsing is hard
Python
PDF
The goal of this blogpost is to explain what a PDF is internally and why parsing PDF files is not that easy.
Aug 29, 2025
Safouane Chergui
Why I ditched pip and conda for Pixi
Python
Package management
—title: “Why I ditched pip and conda for Pixi”author: “Safouane Chergui”date: “2025-07-05”format: htmltoc: truetoc-location: bodytoc-depth: 4categories: [Python, Package…
Jul 5, 2025
Safouane Chergui
Byte Pair Encoding Tokenization
Python
NLP
The aim of this blog is to explain to you how BPE Tokenization works. We’re going to build a basic tokenizer using BPE tokenization and we’ll apply it on a dummy example.
Jul 7, 2024
Safouane Chergui
No matching items