LLM Frameworks are Like ORMs in the 80s
Imagine being a programmer in the early 80s, just discovering a new programming language slightly different from the others. You don’t write algorithms with if-else switches and for-loops. Instead, you describe your expected output dataset, and the underlying system (called a database) rewrites your query as an algorithm for you, executes it, and gives you the desired resultset. However, to use it effectively, it would help if you had a basic understanding of the relational model, which initially seems counter-intuitive, but after a few days, you grow more comfortable.
Then, one day, a colleague from the future walks in and starts telling you how you have to use ORMs! They make life much easier, even generating the SQL code for you. There’s only one small caveat: you have to learn their concepts, think in objects, and let them do the relational thinking for you. You start using them for a few weeks. They seem helpful initially, but they’re not that helpful every time you side-track from their examples. Before long, you adopt all the arguments people (will) have against ORMs after 40+ years of collective database knowledge while you live and use them in a nascent era. You grow increasingly frustrated and decide to ditch them and write the queries yourself.
That is how I feel about LLM frameworks today. They are overly confident complexity brewers and confidence under-miners for a still immature field. Not only do they cause death by abstraction, but they -most importantly- forbid you from actually learning yourself about a field that is so fresh.
Consider RAG, for example. Retrieval-Augmented Generation! It sounds so scientific and intelligent, right? You hear about it, and it sounds like the LLM-era equivalent of SVMs: a mighty and generic algorithm you can’t possibly implement yourself and should use a framework instead.
If you call the bluff and study its logic, it’s straightforward and reasonable: You use a chat model just like your grandma, but instead of dry-asking questions, you provide some context, too. “Context” is not that scientific either. You grabbed a bunch of relevant data from your internal database to guide the LLM towards a response tailored to you. You combine that -using typical string operations- with a user-supplied query and send it to the model.
RAG is all about providing specific context to a generic LLM. Frameworks are trying to create generic templates to provide particular contexts necessary to help a generic model. See the logical fallacy there? Yes, there are more elaborate implementations of RAG, but I doubt that most companies have ripped the low-hanging fruits yet. The standard aphorism still holds: premature optimization is the root of all evils.
The same reasoning applies to other terms like vector embeddings, similarity scores, and others: Their mathematical underpinnings are pretty straightforward from an API user’s perspective—standard college math. What has changed dramatically is their semantics in the LLM world. If you grasp these semantics, you can use them more effectively.
What’s the catch, though? Aren’t these simple hacks? After all, there’s no free lunch.
Indeed, there is no free lunch. But because LLMs are almost commodities today, we tend to forget that the appetizer and the main course have already been paid for during the training phase. All we have to do is ask nicely and pay for the appropriate wine and dessert to pair with our dishes. Which is not easy, but it’s not rocket science either.
What looks like rocket science is the training phase of LLMs, which typical users need not worry about. Users should worry about helping LLMs as much as possible with their specific use cases. That is the secret sauce.
Mixing that secret sauce doesn’t differ much from any other data-heavy project commissioned every few years or so. It needs planning and experimentation, following standard best practices, and knowing what you are doing.
Don’t worry too much about reusability and maintenance when switching LLMs. They all have the same API, and you won’t change them that often, either. You don’t need class inheritance for that. When a model works better for your language, you’ll know! You don’t want a framework for that. And you certainly don’t want a framework version conflict blocking you at any step.
Though frameworks preach and pitch usability and reusable components, they cause a Java-fication or Python. Do you want a banana? You should first create the universe and the jungle and use dependency injection to provide every tree one at a time, then generate the monkey to grab and eat the banana.
Reusability and decoupling components is great, but you can do that yourself by following standard programming best practices. Experiment with different templates, store them in a Python dict or even database table, and rotate over them to see what works better. The more layers of abstractions between me and the model, the less confident I am making changes, and the easier it becomes to give up and call it a day.
Visibility, though, is, for me, the most critical aspect. You should see the embedding array of floats and similarity scores changing with your own eyes to get a feel of how things work. Much like you need to run EXPLAIN SELECT to figure out what the database optimizer does.
The field of working with LLMs is both immature and powerful, and having limited visibility and too many abstractions makes it challenging to nurture a generation of power users.