No matter where you are on the internet, whatever the fandom is, someone is always going to ask about what you’d do if you were in charge of it. For example, a lot of Bible fans are very convinced that their fanfiction is actually factually true. Whether it’s fantasy Wrestlemanias or ideal outfit compositions in Pretty Little Liars, there’s always an urge to take a thing you already know and make your version of it.
Also, people who like Pokemon routinely talk about what stupid idiots the designers are and how they could do a better job of running the game. I don’t think I could, because I know there are competing factors and I think that everyone who opens their mouth to talk like that sounds like a tool.
Still, if I think those people are silly, it’s easy to say that if I don’t put myself out there, right?
Here! A bunch of opinions about what I think should be done in Pokemon as a game franchise. Nothing like ‘open world matters’, I think the game should always be a competitive 2v2 Bo3 format and the rest of the game can follow from that. I also don’t think that this would make the game better. It’s very important that I put it out there, on my sleeve, that none of these changes are based on deep insight into the game or the way that it should be. No. This is a centering of myself, as a designer, and as a player of games. This is how I want it done. Also note that none of these changes are simple or oblative, like, this isn’t all that I think should happen, there would need to be specific changes and fine tuning for all these pushes.
There, preamble done, here’s how and where I’m right.
Kill the Rock or Steel Type, Don’t Care Which
To me the Rock type feels like it exists because when they were setting up the type chart back in Red-Blue, they figured they’d need something like that, like making sure to sketch out space for a window box while designing a window. But Rock in RBY got to be Ground’s ugly cousin, with its greatest perk (a resistance to normal moves) not proving adequate to the task of dealing with that generation’s overwhelming normal type attacks, and being bolted consistently to something that gave it a quad weakness. In Generation 1, there was one rock type that wasn’t quad-weak to grass, and only three that weren’t quad weak to water as well. In its first appearance, Rock was literally never used on its own, which to me suggests that the type wasn’t actually doing a job. It was a sorta-type, a thing to keep them overwhelming grounders under control, I suppose.
In Gen 2, as if to fix Gen 1, they introduced Steel types, which were uh, like, Rock types but good? They had the ground weakness still, they shared that, but they no longer catastrophically mixed with ground, and Steel had the kind of resistances that made it fit for ‘tough’ Pokemon types. But this brought with it the new problem that now Rock’s old job – a physically tough type of elemental Pokemon type that represented being made out of something inert that wasn’t necessarily stuck to or of the ground – was displaced by something that was just better.
Rock exists in an ugly space between ground and steel, made worse. Steel exists to do Rock’s job, but better. Steel is one of the best types in the game and even brings with it an immunity to a whole wing of status conditions that you want on a tough Pokemon that wants to endure fights. One of these types sucks at doing its job and the other is too good at it and any time you get one of them you probably would be better off if it was just the other.
My druthers, Rock would be Steel, the Steel Type wouldn’t exist and the Rock type would just have the Steel associations, and if that doesn’t make sense for a Pokemon I’d just make it loose the Rock type. Graveller and Golem didn’t get anything being Rock types after all. Oh, Stab on Rock Slide, yeah, woo, that means something.
Get Rid of the Fairy Type
You can admit to your mistakes, just admit it, the Fairy type was an attempt to address the problems of making a bunch of broken dragon types. It has no coherent flavour, and it’s super strong in a way it does not justify.
The Fairy type sucks and it’s so popular and strong it’ll never be properly addressed.
‘Oh but if you got rid of the fairy type, what type would you give the fairies? there’s nothing else that fits’ yeah see what I mean about not having a coherent theme?
Buff the Ice Type
Ice got a ‘sort’ of buff in Scarlet-Violet, in that one of their moves got made worse. Oh, more usable, but it it didn’t actually help Ice. See, Ice Types in Hail get a defense buff, making them better defensively under the new ‘Snow’ condition. This means that now you can run a single Ice type and it gets tougher as long as this snow condition is going on without needing to build your whole team around them the way them. You could include teammates that were immune to Hail without building around a single type, and benefit from the defensive bonus it now grants, which is like the defensive bonus you get from Sandstorm.
Basically, you know that good weather, Sandstorm? Well, rather than Hail, the Bad Sandstorm too much like Sandstorm, it got replaced with Snow, so it’s now the Bad Sandstorm.
Anyway, the Ice type should resist water (turning water that hits it into ice and floating in ice are two good themes there), and Steel should be weak to Ice. No strong reason, just fucking hell, give Ice something to do.
Stop Making Bugs A Dumping Ground
Bug is such a weird type because it’s clearly something that the designers are fond of, something they like, but it’s also a type with almost no meanigful support in the game’s entire history. The list of good, tournament-meaningful bugs starts at Scizor, adds Volcarona and kinda stops there, and that’s a list that’s been about that long since forever. The Bug type is used in the early game to populate early routes invoking things like looking for cicadas as a kid.
But the result is that bugs aren’t treated as a sort of whole type of their own. Bug type moves are typically weak hits, and even though U-Turn is an incredibly important move, it’s never important for being Bug, it’s important because it’s a pivot – a move that lets you transfer in a Pokemon after other attacks. Its potential offensive capacity is irrelevant, and you can tell because as good as U-turn is, it’s showing up on Incineroar – quite possibly the best VGC Pokemon of all time.
There’s no legendary Bug.
There’s never been a Box bug.
There’s exactly one Mythical Bug ever, and it was Genesect.
Fully-evolved bugs have the lowest average stats, and they as a group have the lowest hit points, and special attacks of any group.
Bug as an offensive type is resisted by seven other types, and bug resists three uncommon offensive types – fighting, ground, and grass. It even has a weird thing where Fighting resists Bug and Bug resists fighting, which isn’t something that shows up in other damage types.
The Bug type is a bad type and Bug Pokemon are bad Pokemon because Bug Pokemon aren’t made to be good. The best Bug move is good because it shows up on the Best Pokemon of all time in a competitive environment.
My solution to this is not to do anything with the type per se but just, like, fix it? Stop making Bug Pokemon that are bad. Make better moves for Bug Pokemon to use. Take every fully evolved Bug Pokemon and give it better stats. By all means, keep the way that Bug Pokemon are bad at HP and Special Attack, sure – but give them something for it! This is a type with a lot of Pokemon that get to be Someone’s Special Guy, and they have made it so anyone who gets attached to Bugs early on is guaranteed to have to give up on their faves when they start playing in competitive scenes. That sucks!
There! Just some opinions about how Pokemon should be designed. These are the kinds of opinions I think are interesting to consider.
“See the choice of dreams”, and then worry about it
AI software development – the “what is this for?” part – has never had
much of a unifying vision. AI research, sure, they have a vision: they
want intelligent machines first, figure out what to do with them second.
They dream of a robot future.
Some parts of the research monomania end
up having clear software benefits. Being able to point a computer at an
image and have it get at least a rough idea of what the picture is of is
neat and that came from Machine Learning research. It didn’t come with a
single specific “what is this for?” vision except, you know, “how is our
robot going to see?”, but it made up for it by being obviously useful in
a general sort of way. It’s a capability that’s now built into pretty
much all of our devices. As a feature that’s now integrated into our
lives, it’s a microcosm of the issues we have with innovations coming
out of AI research:
It has helped the blind and partially-sighted access places and
media they could not before. A genuine technological miracle.
It lets our photo apps automatically find all the pictures of
Grandpa using facial recognition.
It has become one of the basic building blocks of an authoritarian
police state, given multinational corporations the surveillance
power that previously only existed in dystopian nightmares, and
extended pervasive digital surveillance into our physical lives,
making all of our lives less free and less safe.
One of these benefits is not like the other.
Universal facial recognition is terrifying when it works perfectly and a
nightmare when it’s flawed. It exaggerates power imbalances and
disproportionally enables bad actors and authoritarians. It’s equal
parts pleasant domestic miracles and blighted social and political
horror. Generative AI is likely to follow the same path.
In the absence of a unifying vision, the tech industry simply does what
makes the most money for people working in the tech industry – greed
fills the void where there should have been vision. Companies such as
Amazon didn’t hesitate to sell facial recognition services to law
enforcement – until the backlash forced them to stop.[2]
This might just be capitalism, but the ‘just’ in that phrase feels quite
different when the industry in question is peddling synthetic miracles.
Greed might be inevitable. It might always seep into the cracks, break
apart the concrete foundations our ivory towers are built on. But,
having a coherent unifying vision that’s backed by clear values does a
remarkable job of holding off the decay.
Even today, the web is like living fossil, a preserved relic from a
different era. Anybody can put up a website. Anybody can run a business
over it. I can build an app or service, send the URL to anybody I like,
and most people in the world will be able to run it without asking
anybody’s permission. There are rules you have to follow, obviously, but
those are remarkably straightforward if you aren’t actively spying on
people or messing around with their data – especially when you’re working on a comparatively small scale.
You can trace the lineage of the vision behind the web from Tim
Berners-Lee[3] in 1989, through Ted Nelson in 1974 and Douglas
Engelbart in 1968, all the way to Vannevar Bush’s article As We May
Think in the Atlantic Monthly back in 1945.[4]
All of their books,
software prototypes, theories, and ideas run along the single continuing
thread of the hypertext concept – links – as a new kind of punctuation
mark[5] that connects the information of the world together in a
coherent network. It’s a vision that encompasses concept, functionality,
interface, and values – and it persists to this day, despite decades of
greed, abuse, surveillance, and shitty ads. It’s a unifying vision of a
world that’s simultaneously technological and literate. This vision is
part of what has kept it alive. Despite the frustrations, pain, and the
flaws, working on making parts of the web is a privilege.
While AI researchers are busy trying to build their robot dream,
generative AI software has no such unifying vision. Some vendors are
bent on replacing humans at their jobs – effectively promoting their
software as “AI” illustrators, voice actors,[6] even “robot”
lawyers.[7] – and then look surprised at sheer enormity of the anger
that they get in response. Other vendors are resurrecting Microsoft’s
1990s dream of intelligent agents, assistants, or copilots who operate
in the context of the software you use – extending Clippy’s lineage into
the modern world.[8] The rest seem to have lazily reached for the first
interface metaphor they could think of – the chatbot – with no thought
or even the vaguest idea of how it should actually integrate with the
rest of the work we’re doing.
As much as it makes your average MBA salivate, “let’s replace people
with something shittier but cheaper” isn’t much of a vision for software
development and user interface design, which leaves those of us who are
genuinely curious about the applications of the technology with the
other two paths.
Those seem to be converging towards a single idea: “Human-in-the-loop”.
It’s the idea that in the interactive loop with the AI software, the
decision-making, choices, and actions are made by the human.[9] Instead
of full automation there’s feedback from the AI as the tasks progress
and the human responds to that by interacting with the various user
interface affordances provided by the system.
In other words, the human sits in front of software, uses it as they
would any other software, and then it does stuff for them as any other
software does, except in an AI way.
The grand unifying vision of AI-assisted software is that you should
use it to make software.
That’s an idea that’s only remarkable because of how many AI enthusiasts
think they can do away with the people part of getting things done.
Acquiesce, or mitigating the inevitable
In an earlier chapter I wrote about the failure of an AI model designed
to predict the onset of sepsis, how external reviewers discovered the
flaws, which then led to the vendor updating and improving it.
At one hospital, UC Health in Colorado, they found that the system still
wasn’t that useful: “the ratio of false alarms to true positives was
about 30 to 1”.[10]
To salvage their investment UC Health changed their approach with the
system. Instead of using the AI as an autonomous prediction system that
sent out alerts to overworked doctors and nurses, they put together a
special monitoring team of clinicians that used live video feeds to
helped filter out the false alarms. That team built relationships with
bedside nurses throughout the hospital. Where the AI system alone was
utterly useless, the human team, assisted by their relationships with
the nurses and the AI system, was estimated to save about 211 lives
annually.
AI on its own was worse than nothing while AI as an assistant saved
lives, a clear demonstration of the value of human-in-the-loop. It’s a
heart-warming parable that lends credence to the ambitions of those who
are trying to make “AI assistants” happen.
This, and other stories like it, are going to be the foundation myths of
a thousand new AI services – the seeds of a new computing revolution.
Or, more specifically, it’s a certain kind of fertiliser. The kind that
smells.
Case studies are amazing tools. You can pick one instance where
everything worked out – an exercise absent of disaster – has a nice “all
is lost” moment that gets turned around, and throw together a just-so
story that proves exactly the point you want. There isn’t anything
anybody can do to disprove it without particle-colliding themselves into
an alternate reality. There is no way to ‘science’ a case study unless
you have access to a parallel universe as a control. They’re all just
stories that short-circuit our thinking.
For every UC Health sepsis story there are a hundred systems that didn’t
work. Even the UC Health story itself is dubious once you dig into it.
Was it the AI that saved 211 lives? Or was it having a specialised team
of clinicians watching all the at-risk patients around the clock, using
live feeds? Or, was it the relationships the clinicians developed with
the bedside nurses? If you’d put together that same team with the same
infrastructure, but using a simpler, cheaper algorithm based on vital
sign monitors, would that have done the same job? Why didn’t UC Health
try that first – simpler, cheaper, faster to set up – if what they
wanted was to save lives?
The answer is simple: they’d already bought a broken AI system. They
did what they had to in terms of making sure their investment did
eventually save lives, but it leaves us with this unanswered question:
if they had spent that same amount of money on building teams and a
system for detecting sepsis, but without AI, would it have worked better
or worse?
We can’t know, and that’s why case studies are a favoured tool by MBAs,
startups, and consultants all over the world. You can just pick a story
that proves what you want and ignore the other hundred that don’t.
Once Generative AI becomes a broad movement in software, facts and
science won’t matter, and the stories will take over. Ted Nelson was
writing about computers and programming in a more general way and in a
different era, but he’s right here too: the stories about AI software
aren’t scientific and trying to make them look scientific is an outrage.
That’s what the AI software vendors are doing with their marketing
performances that look like scientific papers, the ‘studies’ that are
little more than sales exercises, and the entirety of their rhetoric
about being on the verge of AGI – how we need to make sure those future
robot gods are our slaves and not overlords.
It’s storytelling.
Fighting
that with another ‘science’ performance is futile. In a war of
theatrics, the act with the biggest budget wins the crowd. We can chip
away at the foundations with peer-reviewed papers and research that show
flaws and failures, but ultimately what will decide this in the decades
to come is the software – how well it’s designed, how effective, how
productive, and the long-term failures and successes in real workplaces.
That’s where the three core flaws of the assistant model is going to be
a problem.
I mentioned two of them earlier, automation and anchoring biases.
We, as human beings, have a strong tendency to trust machines over our
own judgement.[11] This kills people, as it’s been a major problem in
aviation.[12] Anchoring bias comes from our tendency to let the initial
perceptions, thoughts, and ideas set the context for everything that
follows. AI adds a third issue: anthropomorphism.[13] Even the
smartest people you know will fall for this effect as large language
models are incredibly convincing.[14] These biases combined lead
people to feel even more confident in the AI’s work and believe that
it’s done a better job than it has.[15]
We’re using the AI tools for cognitive assistance. This means that we
are specifically using them to think less.[16] In every other
industry this dynamic inevitably triggers our automation bias and
compromises our judgement of the work done by the tools.[17] We use the
assistant to think less, so we do.
These models are incredibly fluent and – as we saw at the start of
this book – are consistently presented by their vendors as near-AGI.
This triggers our instinct towards anthropomorphism,[18] making us
feel like we have a fully human-level intelligence assisting us,
creating an intelligence illusion that again hinders are ability to
properly assess the work it’s doing for us.[19]
Anthropomorphism, when applied to AI chatbots has been called the
“Eliza effect”. It was
first observed by Joseph Weizenbaum when he saw how people responded to
and interacted with the comparatively primitive ‘AI’ chatbot, Eliza,
that he created back in 1966.
Fluent AI models create an anthropomorphism effect that sways even those
who knew that the AI was nothing more than a simplistic program, even
by 1966 standards.[21]
The intelligence illusion, the conviction that these are artificial
minds capable of powerful reasoning, when combined with anthropomorphism
supercharges our automation bias. Our first response to even the most
inane pablum from a language model chatbot is awe and wonder. It sounds
like a real person at your beck and call! The drive to treat it as, not
just a person, but an expert is irresistible. For most people, the
incoherence, mediocrity, hallucinations, plagiarism, and biases won’t
register over their sense of wonder.
This anthropomorphism-induced delusion is the fatal flaw of all AI
assistant and copilot systems. It all but guarantees that – even though
the outcome you get from using them is likely to be worse than if you’d
done it yourself, because of the flaws inherent in these models – you
will feel more confident in it, not less.
Every human-in-the-loop and assistant-style AI system I’ve seen has
these defects. Some of them even do their best to exacerbate them by
making the assistant adopt a confident tone or an affable demeanour.
Those who use these AI systems are likely to get worse results and still
be more confident in the resulting output than they would have in their
own. Their work will suffer, but they will feel like it has improved.
This is a recipe for fanatical evangelism and incredible revenue growth.
It all but guarantees that we’ll see a financial bubble of some kind
around AI. The only question is its size and duration. The more
effective the Generative AI systems are, the bigger the bubble. The less
effective they are, the faster it’ll pop.
We’ll probably get some good software out of it – especially when it
comes to converting or modifying text and media – but it’s the nature of
bubbles to create crap. A software bubble is the flowering of a thousand
first-movers – countless startups and tech companies, most of them
utterly clueless about what they’re working with, building the first bad
iterations of what they hope is a good idea. We don’t know yet what the
ideal, productive AI-assisted productivity software will look like, but
we do know that we’re unlikely to see many examples of it in the first
generation.
Meanwhile, the tech industry will dream of exponential growth.
The roads home
I have lived abroad for most of the past twenty years. The web let me
work wherever I wanted without losing touch with my friends and family.
The tools the web offers gave me freedom that I couldn’t have imagined
when I was a child.
This worked well for a while. I’ve had the joy of living in a number of
wonderful cities and amazing neighbourhoods and communities.
I grew older and, with age, those around you also grow older. Some of
them get sicker. A video chat doesn’t fill the void you feel when
somebody you care about is lying in a hospital bed. But the freedom the
web provides works in the other direction as well. I could live near
those I care about and the web meant I could keep doing my job no matter
where I was.
I decided to move back home to Iceland. As I was preparing my move, the
COVID-19 pandemic struck, and the rest of the world discovered what I
had known for decades: the web abstracts distance. You can work where
you want. I made my way home, despite the collapse of international
airline travel. From Montréal to Toronto. Toronto to Amsterdam. Finally,
I flew from Amsterdam to Iceland.
Back in Iceland, I settled in Hveragerði, a small town of about 2700
people in the south of Iceland. Keeping with theme of my realisation
that the web and related technologies meant that location mattered less,
I could pick a place that suited my personal needs. It’s a nice town.
The weather here can be interesting – this is Iceland after all –
which often leads to road closures in the winter. But there are three
separate roads that connect this region with the capital, so even though
a couple of the roads get closed due to snow or ice there’s always the
third. Because we know what to expect from the weather, most regions in
Iceland invest in their infrastructure. We try to make sure we can keep
everything going even when a bad storm hits us. There are redundancies
and, for the most part, they work.
We can’t say the same about the software that we have today – that we
use for our work. Even though many organisations have returned to the
office, partially or fully, we are still using the same software that
companies adopted for remote work. We use Google Docs, Zoom, Dropbox, or
an equivalent competitor. Our files, documents, and processes are now
tied to whatever app we’ve adopted.
If Google Docs goes down or has sporadic outages, then our work
disappears with it. If our internet goes, the software blinks out. When
the biggest data centre belonging to Amazon Web Services goes down, that
breaks most of their services across all of their data centres, because
they’re all interconnected, and almost all of our software breaks with
it.[22] Given our increasing reliance on centrally hosted software
services, the impact of temporarily losing a data centre is severe,
getting worse, and is now even happening because of the weather, caused
by an increase in frequency and magnitude of heatwaves globally.[23]
It doesn’t matter whether we ourselves are working remotely or
in-office, all our software today is remote, and the connection to it
can break in a thousand different ways.
There is only one road into town.
A global network means our software shouldn’t have to be centrally
located in only a few specific buildings across the world. It should be
spread throughout the network, on every device that’s connected to it.
Our hardware devices shouldn’t have to be so reliant on the internet
that core features cease to function just because they can’t phone home
for a short while. Our information – public, private, professional –
shouldn’t have to be controlled, collected, and stored by only a handful
of corporations.
The software we have today is undermining the strongest advantage given
to us by the internet: robust and distributed reliability. Our work
depends on increasingly unreliable software. Their need to be always
online means you feel every hiccup in your connection. Centralisation
means that when something does go wrong, it’s potentially catastrophic
as it affects everybody, everywhere, who is using that centralised app.
This matters because things are going wrong, fast. There’s political
unrest. Social instability. Cold wars. Hot wars. Trade wars. A climate
crisis. Data that was just normal personal data one moment, becomes
incriminating evidence a moment later when people’s rights are stripped
away.
The software we have isn’t the software we need.
The opposite of good software
Modern software is remarkably fragile. We’ve gone from a software
ecosystem that, a few years ago, was almost completely local, to one
where everything is just cached – temporarily stored – at best. A decade
ago what you worked with was on the computer itself. Your data was your
own, and relatively safe if you kept decent care of your backup drive.
The apps were yours, usually bought and paid for once – no subscription.
Collaboration was always a bit tricky if you weren’t a software
developer – we’ve always had somewhat decent collaborative tools in
version control systems – but other people made do by using shared local
servers or simply sending files over email.
This was a remarkably robust software ecosystem that tolerated all sorts
of disasters, disconnections, and changes. We’ve dismantled it in less
than a decade. Most of the apps we use for our work require an internet
connection. Almost all of them are entirely cloud-based, where
significant parts of the software runs on a server somewhere. Little of
our work data is stored locally any more.
Generative AI serves to accelerate that trend.[24] You needed 800 GB
just to store GPT-3, without even running it.[25] Later versions and
ChatGPT are even bigger, running in parallel on multiple servers. The
technology can be made to work locally, but that’s not where the hype
is. The hype is for the already countless “AI for X” services who are
all in the cloud and are all using services from OpenAI.[26] Unless one
of the big tech companies breaks ranks and builds into their Operating
system a solid and tested large language model that has been ethically
trained on documented data sets, what we’re going to get are agents and
chatbots everywhere, each living in the cloud, fine-tuned for a task,
and hooked up to whatever APIs the startups think are nifty. Why solve
the hard problem of making a language model safer, better integrated
into your software, and more sustainably developed, when you can hook up
a finicky but flashy website with OpenAI and call it a startup?
The dream the tech industry chose is not science or progress. The dream
they chose is that of easy money, because that’s the only dream the tech
industry today is capable of seeing. Their vision is a mirage of
craving.
Their want can only be met with another financial bubble, one that has
to be more grand and world-changing than any other that preceded it.
They crave the exponential to fulfil their dreams, but the only true
exponential today’s twenty-something startup founder will experience is
that of the escalating Climate Crisis. That won’t stop them from trying.
Their hunger is likely to push them to ignore the social unrest and
power shifts that AI systems cause.
The tech industry doesn’t just behave with your normal corporate greed.
They want financial bubbles. They had a taste of the euphoria with the
dot-com bubble and the hunger for it never went away.
The tech industry is also, as I argued at the start of this book, full
of true believers in AI. Somebody who truly believes – sincerely
believes that this will all be for the best – will push past the mass
unemployment, organised disinformation, and wholesale deception. They
will think that it will all be worth it. Once we get through the initial
“disruption”, things will be better for everybody.
None of this is conducive to software design and development. It isn’t a
mindset that leads you to do user research, observational studies, or
usability experiments. It’s a drive that’s taking them away from what
most people and their communities need. Where we need robust technology,
they are giving us finicky AIs that misbehave at a badly worded
sentence. Where we need privacy from both corporations and potentially
hostile authorities, they push further and further into recording our
lives. When we need software that works on the devices we have, for as
long as they last, they give us software that only works on the latest
and greatest. Sometimes, as with GPT-4, the software they make even
requires systems so powerful that they only exist in a couple of
locations on the planet.
But, don’t worry, they’ll sell us access – timeshare, really – but let’s
call it “the cloud”. It only breaks some of the time.
Nothing they do is for us, even though it’s our money, our data, and our
art, writing, and music they’re demanding. We aren’t customers to them –
we’re just the people that pay. To tech companies, we are nothing more
than a resource to be tapped. A number to be boosted to pump investor
interest. They are not doing us any favours. What they want from us is
simple: everything. All culture on their servers, made by their AI.
All our work happening through them, assisted by their AI. The totality
of our information, mediated by their AI. A vig collected on all
existence.
One of the papers I’ve referred to a few times in this book is On the
Dangers of Stochastic Parrots: Can Language Models Be Too Big?[27] It
was the first paper to provide a cohesive and detailed overview of how
large language models work, how they affect people, and the risks that
they pose. This paper ultimately led to Google firing one of the key
authors of the paper, Timnit Gebru, and forced other co-authors employed
by Google to take their name off it.[28] This continues to this day,
where Google employees seem to be routinely discouraged from working on
AI fairness or ethics.[29]
When Microsoft launched Bing Chat, the first mainstream attempt to use a
large language model as a front end for search – something that another
co-author of On Stochastic Parrots, Emily M. Bender, had warned
against in a separate paper titled Situating Search[30] – this lead
to the exact outcomes they had predicted. Strange behaviour[31],
threatening language[32], falsehoods[33] and lies[34] ensued. Bing
Chat played out exactly the way they expected.
Of course, Microsoft did the only rational thing it could when the risks
of its products were revealed: it disbanded its AI ethics and safety
team[35] and rolled Bing Chat out to even more people.[36] It now
plans to push towards adding AI chatbots to everything, everywhere, no
matter the cost.[37]
Most of the tech organisations that had responsible AI or AI safety
teams are disbanding them.[38]
They seem to think it would be a mistake to worry about risks and
problems – why worry about something you can probably fix?[39] Who
cares about the harm it does in the meantime?
Safe, for the tech industry, is too slow when you hunger for a bubble
and want to ship more software, to more people, as fast as you can.
Designers of software user interfaces often imagine deliberately bad
designs as an exercise – a way of demonstrating the principles of their
craft by exploring their opposites. It’s a good way of demonstrating
why a design principle matters, and it can provide tactile examples of
who benefits from it and how.
If you asked me to imagine the software that would be the opposite of
what we need as a society…
Stacey Mason and Mark Bernstein, “On Links: Exercises in Style,”
in Proceedings of the 30th ACM Conference on Hypertext and Social
Media, HT ’19 (New York, NY, USA: Association for Computing
Machinery, 2019), 103–10, https://doi.org/10.1145/3342220.3343665. ↩︎
Raja Parasuraman and Victor Riley, “Humans and Automation: Use,
Misuse, Disuse, Abuse,” Human Factors: The Journal of the Human
Factors and Ergonomics Society 39, no. 2 (June 1997): 230–53,
https://doi.org/10.1518/001872097778543886. ↩︎
Kathleen L. Mosier et al., “Automation Bias: Decision Making and
Performance in High-Tech Cockpits,” The International Journal of
Aviation Psychology 8, no. 1 (January 1998): 47–63,
https://doi.org/10.1207/s15327108ijap0801_3. ↩︎
Kathleen L. Mosier et al., “Automation Bias, Accountability, and
Verification Behaviors,” Proceedings of the Human Factors and
Ergonomics Society Annual Meeting 40, no. 4 (October 1996): 204–8,
https://doi.org/10.1177/154193129604000413. ↩︎
See Nicholas Epley, Adam Waytz, and John T. Cacioppo, “On Seeing
Human: A Three-Factor Theory of Anthropomorphism,” Psychological
Review 114, no. 4 (October 2007): 864–86,
https://doi.org/10.1037/0033-295X.114.4.864, which outlines three
psychological triggers for anthropomorphism: 1. If you don’t know
how a non-human agent works, we default to thinking it works like us
because that’s what we have the most familiarity with. 2. “The
motivation to interact effectively with nonhuman agents” causes us
to attribute human characteristics and motivation. 3. Seeing agents
as human-like enables “a perceived humanlike connection with
nonhuman agents.” ↩︎
Arleen Salles, Kathinka Evers, and Michele Farisco,
“Anthropomorphism in AI,” AJOB Neuroscience 11, no. 2 (April
2020): 88–95, https://doi.org/10.1080/21507740.2020.1740350,
esp. “In the general public it inadvertently promotes misleading
interpretations of and beliefs about what AI is and what its
capacities are.” Anthropomorphism also limits the researchers, which
is important to note in light of the common belief in the field that
the spark of AGI has been struck: “Furthermore, anthropomorphic
(implicit or explicit) interpretations of AI might also have
epistemological impact on the AI research community itself, insofar
as the search for biological and psychological realism (i.e.,
similarity with biological intelligence) might lead to
underestimating the possibility of new theoretical and operational
paradigms and frameworks thus ultimately limiting the development of
AI.” ↩︎
Computer Power and Human Reason: From Judgment to Calculation
(San Francisco: Freeman, 1976). ↩︎
Sarah Myers West, “Competition Authorities Need to Move Fast and
Break up AI,” Financial Times, April 2023. “Without the robust
enforcement of competition laws, generative AI could irreversibly
cement Big Tech’s advantage, giving a handful of companies power
over technology that mediates much of our lives.” ↩︎
Emily M. Bender et al., “On the Dangers of Stochastic Parrots:
Can Language Models Be Too Big?” in Proceedings of the 2021 ACM
Conference on Fairness, Accountability, and Transparency, FAccT ’21
(New York, NY, USA: Association for Computing Machinery, 2021),
610–23, https://doi.org/10.1145/3442188.3445922. ↩︎
Karen Hao, “We Read the Paper That Forced Timnit Gebru Out of
Google. Here’s What It Says.” MIT Technology Review, 2020, [Karen
Haoarchive page](https://karen Haoarchive page). ↩︎
Davey Alba and Julia Love, “Google’s Rush to Win in AI Led to
Ethical Lapses, Employees Say,” bloomberg.com, April 2023,
https://www.bloomberg.com/news/features/2023-04-19/google-bard-ai-chatbot-raises-ethical-concerns-from-employees,
“Even after the public pronouncements, some found it difficult to
work on ethical AI at Google. One former employee said they asked to
work on fairness in machine learning and they were routinely
discouraged — to the point that it affected their performance
review. Managers protested that it was getting in the way of their
‘real work,’ the person said.” ↩︎
Chirag Shah and Emily M. Bender, “Situating Search,” in ACM
SIGIR Conference on Human Information Interaction and Retrieval,
CHIIR ’22 (New York, NY, USA: Association for Computing Machinery,
2022), 221–32, https://doi.org/10.1145/3498366.3505816. ↩︎
It is incredibly frustrating that the only thing more stupid than Patreon is all the alleged Patreon substitutes that clearly don't even understand what Patreon does.
Pro tip: Patreon has no meaningful competitors, and also it sucks, so there's a huge opportunity for somebody to kick sand in its face and take its lunch money. But to do that you would have to understand what actually Patreon does that is worth it to creators to allow Patreon to take 5% of their proceeds (and then pass on to them a second 5% in payment processing fees).
Because I want there to be Patreon competitors, I will explain what Patreon actually does, so if somebody would like to actually compete with Patreon they will know what they have to actually accomplish.
Generative AI technologies powered by Large Language Models (LLMs), such as OpenAI’s ChatGPT, have shown themselves to be both a big boon to productivity as well as a concerningly confident purveyor of incorrect information. At Mozilla, we’re excited about the potential role generative AI can play in creating new value for people while demonstrating leadership in responsible and ethical implementation approaches.
One domain where we see high value is in training LLMs on reference documentation, enabling developers to more quickly find solutions or get answers about the purpose or behavior of code snippets. MDN’s mission is to provide a blueprint for a better internet and empower a new generation of developers and content creators to build it. We see a path forward to “overlay” useful AI-driven helpers on top of MDN’s canonical web dev documentation to aid developers in new ways. Of course, human-authored canonical documentation will always be available and clearly called out as such. To this end, last week, MDN launched AI integrations with its body of web developer reference documentation, manifesting in two features that have been in development over the last few months: AI Help and AI Explain, both powered by GPT-3.5.
AI Help enables MDN readers to ask questions with a conversational interface, and the tool offers concise answers with MDN articles related to their questions for contextual help. AI Help is limited to offering information only based on MDN content, and is now in beta and available to logged-in MDN readers.
AI Explain enables readers to explore and understand code blocks and parts of code examples embedded in MDN documentation pages, describing the underlying purpose and behavior of the code. AI Explain is still an experimental tool.
Our internal work on these tools has us hugely optimistic about their potential to save developers tons of time as they seek answers and learning resources. Early-stage developers appear to benefit most, as they are least likely to know exactly where to go or which keywords to search for to find the answers they seek. As we’re still in the early stages, we’re also seeing instances where these AI tools provide incorrect information in response to a query. The MDN team is working to identify these cases and develop fixes so that we can continuously raise the quality and utility of the answers provided by AI Help and AI Explain. We are also planning to make it easier for people to flag bad answers, creating issues sent to the team to investigate. In the end, our goal is to make MDN more accessible and useful to a wider variety of developers, without diminishing MDN’s role as the canonical reference source for high-quality information about developing for the web.
Feedback Matters
With the launches of AI Help and AI Explain last week, we received a wide range of feedback from readers, from delight to constructive criticisms to concerns about the technical accuracy of the responses. We’re only a handful of days into the journey, but the data so far seems to indicate a sense of skepticism towards AI and LLMs in general, while those who have tried the features to find answers tend to be happy with the results.
For AI Help in particular, the feedback indicates that the majority of people who used this feature and voted consider the answers to be helpful. Please do try AI Help and give us your feedback so that we can enhance this service based on how it works for you!
In the case of AI Explain, the pattern of feedback we received was similar, but readers also pointed out a handful of concrete cases where an incorrect answer was rendered. This feedback is enormously helpful, and the MDN team is now investigating these bug reports. We’ve elected to be cautious in our approach and have temporarily removed the AI Explain tool from MDN until we’ve completed our investigation and have high-quality remediations in place for the issues that have been observed.
Here is an example of an incorrect answer: the AI indicates that the code defines a grid with two rows and two columns, where it actually only defines two rows, giving an incorrect answer.
AI Assistants: A Helpful Complement
One of the user experience challenges LLMs present is how to set customer expectations. As useful and efficient as it is to use LLMs to interact with reference documentation, even extraordinarily well-trained LLMs — like humans — will sometimes be wrong. We see this as an emerging challenge in the field of human-computer interaction that goes well beyond MDN’s limited use case: What should people do when chat-based systems render answers in good faith that are merely likely to be correct? One approach to responsible systems design could be to provide people with better ways to check answers and build confidence in their veracity. This is a field we’re excited to participate in.
Speaking of the human element, LLMs also have the useful attributes of having zero ego and infinite patience. LLMs don’t mind answering the same question over and over again, and they feel no compulsion to gate-keep for online communities. It’s sadly not uncommon today for learners to face the discouraging experience of asking technical questions in online forums only to be dismissed or shamed for their “dumb question.” When done well, there is a clear opportunity to employ AI-based tools to improve the pace of learning as well as inclusivity for learners.
The important question, and the core of this venture, is whether these AI assistants can improve our work and user experience. Do they help our users find information faster? Do they simplify complex concepts for them? Do they support Mozilla’s values and mission? If the answer is ‘yes’ to these questions, then they are fulfilling their purpose. We will work to convey this perspective clearly to everyone who interacts with our AI features.
The MDN Community
MDN has a long history as a human-authored and curated source of knowledge, so we know that AI integration in MDN will be a sensitive topic for some. While many are excited about the possibilities of generative AI, others might prefer that MDN stay how it is. That’s fine. We are a community, and differences of opinion are normal and healthy. MDN’s strength lies in our community. We do see a path forward that preserves the human-authored goodness while also providing tools that offer additional value over that amazing body of content. We need your input, criticism, kudos, and experience to ensure we’re employing AI in the most useful and responsible ways. Your feedback is critical to this process, and we will continue to take the feedback and adjust our plans in response to it. LLM technology remains relatively immature, so there will certainly be speed bumps along the way. This journey has just begun, and together, we’ll shape MDN’s future.
In the continued spirit of community and transparency, the MDN team will publish a postmortem blog in the coming days, that will include a breakdown of the feedback we’ve received and dive into some of the details of developing, launching, and pausing AI Explain.
I'm not sure I agree with all of this, but this statement certainly rings true, with a whole variety of implications: "The essential truth of every social network is that the product is content moderation, and everyone hates the people who decide how content moderation works."