The idea is to pass a book in pdf format to a chatbot that we can ask questions and talk about.
The first step is choosing a pre-trained LLM to customize and train with our own data which are pdf files in this example. GPT4, LLAMA 3.1, T5, … come to mind when considering text-to-text llms but in this example I’m going to use Mistralai for several reasons such as being more convenient to use since it does not require access approval like Meta’s LLAMA and is open-source unlike gpt4.
We receive the pdf file and break it into smaller chunks then we use Hugging face Embeddings and chromedb to vectorize the text and the embeddings to implement Retrieval-Augmented Generation (RAG). After that load the LLM and create a Q/A chain for the end user to interact with.
We can also integrate fine-tuning with RAG and use a model with richer tokens to enhance the Q/A procedure.
https://github.com/SevenSkyConsulting/pdfchatbot
Digital Bookworm
In a world where books meet artificial intelligence, we embarked on an exciting journey to create a magical chatbot that could read and discuss any book with curious minds. Picture a digital librarian who not only reads books but engages in meaningful conversations about them!
Our adventure begins with choosing the perfect brain for our digital bookworm. While there were legendary options like GPT-4 and LLAMA 3.1, we discovered a hidden gem called Mistral AI. This friendly open-source companion welcomed us with open arms, requiring no special permissions or golden tickets to access its powers.
Like a master chef preparing a delicate dish, we first slice our PDF book into bite-sized pieces. These smaller chunks make it easier for our digital friend to digest the information. Then, using the magical tools of Hugging Face Embeddings and ChromeDB, we transform these text pieces into a special language that our AI can understand perfectly – think of it as creating a detailed map of the book’s knowledge.
The real magic happens when we introduce Retrieval-Augmented Generation (RAG), which acts like a sophisticated memory system for our digital bookworm. It helps the AI remember and connect different parts of the book, just like how a human reader makes connections between chapters.
To make our creation even more brilliant, we can enhance its abilities through fine-tuning and using models with richer vocabulary. It’s like teaching our digital bookworm to become an even better conversation partner, ready to engage in deeper discussions about any book you present to it.
And there you have it – a smart, friendly AI companion ready to explore the vast world of literature with anyone who wants to chat about books!