It’s 1991. Yale computer scientist David Gelernter’s Mirror Worlds proposes the onset of the coming revolution. Imagine a simulated reality — a virtual representation of our city with precisely replicated traffic patterns, or a far flung corporation and all of its moving parts.
He writes: “They are software models of some chunk of reality, some piece of the real world going on outside your window.”
His ideas give rise to the digital twin.
Since the early aughts there have been many industrial use cases: urban planning, CAD improving automotive design and testing, healthcare equipment prognostics.
What he didn’t quite surmise: personal digital twins.
We are now.
Eric Yuan, CEO of Zoom, in a now viral interview with The Verge on June 3rd, 2024 talks about the future of his company alongside salient personal observations. A sample:
That he hates his calendar and spending so much time in meetings1.
Working five days a week is boring — why not four or three so we can spend more time with our families?
And that I should have my own LLM — Eric’s LLM, Nilay’s LLM. All of us, we will have our own LLM. Essentially, that’s the foundation for the digital twin. Then I can count on my digital twin to go to meetings.
At first blush its hard not to be incredulous2.
A “deepfake version” of me in meetings? God what sick new twisted paradigm is this?
But let’s think through this a bit more together. We deserve to.
What is it in ourselves that we should prize? Not just transpiration (even plants do that). Or respiration (even beasts and wild animals breathe). Or being struck by passing thoughts. Or jerked like a puppet by your own impulses. Or moving in herds. Or eating, and relieving yourself afterwards. Then what is to be prized? —VI.16
A thought experiment
I work in Tech. I was hired to perform a function. And I work in a function — engineering, product, design, marketing, sales, customer success, finance, etc.
But like Whitman: “I am large. I contain multitudes”. I’m not a DSL, I’m general purpose!
How do I better generalize? And what if I could?
Dear reader: today you and I are under-documented APIs operating under extreme separation of concerns.
Cheeky metaphor aside, lets think about it slightly differently.
I am born. I go to school for 12 or more years. I accumulate knowledge and experience. I encode and store it. Through experience I work, learn and update the skills I have and what I can do.
My STDIN are my ears and eyes. My STDOUT is my mouth and hands3. These are my primary interfaces.
I represent my closed source internal implementation by how I respond to queries (or more precisely, interview questions). I agree to work for some purpose on the basis of my skills. I earn money by exchanging my application of these skills to solve problems. I solve these problems and further update my implementation.
It’s largely a black box. Who I am, what I can do. I need to figure out how to market myself and make others aware. And I have to be choosy because it’s hard — if not miraculous — to be in more than a single place at one time.
It’s a truism today that experience is hard to compress.
But what if some of it could be?
What if, like writing, my experience, my skills could be represented as personal intellectual leverage? A new form of capital.
What if this lossy representation4 of me could be in more than one place and act in some sense more directly on my behalf in my interest?
The principal-agent problem unbound.
Artificial Personal Intelligence
What Eric gets wrong is that a primary use case of my Digital Avatar/Persona/LLM would be to attend meetings5. Why, if it could be headless?
Meetings — a new definition — antiquated mediums to exchange information6.
I query Greg’s-LLM directly and so could others. I could do this programmatically and directly. It might hallucinate but don’t I already on occasion? Purposively it will be used situationally7 — it will have a definition and limitations.
Autonomous agents8 are still new. And recently generating both accolade, hype and speculation9 in domains like software development.
No mistake — they are coming.
The opportunity is in personalization of their foundations.
The blueprint for agents follows a similar pattern:
A goal is initialized to solve
Tasks are created and executed leveraging a foundation LLM (or chain of them)
Memory is updated and stored in vector database or similar
Feedback on progress against goal either internally or externally prompted is incorporated
New tasks are generated, prioritized and selected
Repeat
The mileage, outcome and validity of performance determined by:
The quality of the goal specification
The resources available to the agents (both the LLM(s) and other knowledge bases from which it may draw)
The interfaces available to consume their outputs
Final Thoughts
Personal representations10 of us: our accumulated knowledge, skills, principles, beliefs… those inputs which shape our unique decision making is, in my humble opinion, the next frontier on the long path to AGI.
As a building block. Not wholesale replacement.
The ability to insert “my personal LLM” into the milieu of general LLMs incorporated as part of step 2 in the above blueprint.
There will be dastardly hard problems to solve: input data collection11, training, retraining (or just fine-tuning), evaluation, inference, governance, security, privacy, rights management, licensing, etc.
But it’s the highest order bit.
Yes, even Eric, hawker of the videoconferencing software du jour deserves this qualifying ribbon.
And many are. See this Ars Technica discourse.
A quick overview of standard streams.
Some clear and obvious problems: an inability to say no, inability to exert empathy, a lack of full grounding and context, etc.
It’s no surprise that as the public facing CEO of a videoconferencing SaaS company he wouldn’t directly dismiss the primary unique selling proposition of his business.
Jeff Bezos on communication and more.
How, I ponder, is this concept much different than the role an empowered Chief of Staff might already play on behalf of an Executive?
Written just over a year ago in April 2023, Matt’s overview is one of the most comprehensive on autonomous agents.
With considerably more sophistication than current WhatsApp toy responders (e.g. https://github.com/Ads-cmu/WhatsApp-Llama) or a version of mistral fine tuned using random writings with axolotl.
Personal representative input data to train on will be the most time costly problem to solve—both depth, breadth and latency. Perhaps solved via an approach employed akin to: https://arxiv.org/abs/2305.07759 with what we do have or could synthesize.
Like it, perhaps humanity itself is just a collection of advanced multi-model LLMs in its perfect/imperfect form.