dev, low key rss evangelist, ghost without knowing, komencanto, technically a cyborg, generic currency symbol ¤ fangirl, she/her
70 stories
·
2 followers

Does AI benefit the world?

1 Share
Reading Time: 20 minutes
traffic on the kennedy expressway on a rainy day in Chicago
Photo Credit: Tyler Pasciak LaRiviere for the Chicago Sun-Times

“Let’s start with cars.”

My students break out into groups of four, each quartet gathered around a collection of pink and green post-it notes.

For the first few minutes, the room remains quiet while each student jots down a personal story about one way that cars have impacted their lives. As the students finish and begin to read their stories to each other, the air fills with the gentle hum of conversation.

The first quartet to finish their discussion approaches the whiteboard. Two of their post-its share a theme: stories about a car getting a family member to the hospital quickly. Those post-its get placed on the whiteboard together. The other two stories—one about working at a car wash as a teenager, the other about getting hit by a car turning right on red while cycling through an intersection—go to different respective whiteboard locations.

More groups approach the board and attach their post-its. Stories under the same theme go together, creating several little collections. Each collection tends to have one color of post-it (green for perceived positive impact, pink for perceived negative impact). A few cases have mixed colors: for example, a few additional post-its join the one about working at a car wash. In the next step, when I ask students to students label each collection, this grouping will receive the label “car-related jobs.” Students either loved or hated the jobs.

For homework, I ask the students to do two things. First, they add digital versions of their post-its to an enormous Google slide, grouped together and labeled just like our whiteboard. Second, they collect more car-related stories from three sources:

  • Their own experiences: 2 more posit-its
  • Two additional friends or family members: 2 more post-its
  • A Twitter hashtag—a different one selected by each student from a list that I predicted would broaden the range of perspectives beyond those available inside the classroom. The hashtags bring in the anecdotes of mechanics, cyclists, pedestrians, traffic engineers, pulmonologists, and a variety of other groups. Again, 2 more post-its.

By the next class period, the Google slide’s septupled post-it population includes several more groups, which we finish labeling together. Our original list includes car-related jobs, emergency medical transport, cycling accidents, parking costs, noise. Now we add several more: pollution, storefront visibility, status symbols, child safety, travel time, carbon footprint, and others.

Next, I offer each student an unlimited number of three types of emojis:

  • A small fire for “consequences”
  • A human silhouette for “prevalence”
  • A snake for “sneakiness”

I ask each student to place the fire emoji next to each label for which the outcomes sound dire to them: most students put these, for example, on the labels with obvious and immediate life or death outcomes. Similarly, they apply the silhouette sticker next to labels that they think affect a lot of people, and the snake next to labels that they think people might not notice or think about relative to the size of the outcome.

We’re doing an impact assessment not unlike the risk assessments I ask of students in software engineering classes. This class, though, focuses not on code, but on what code does to Us—and on how we, its authors, can evaluate and influence that.

The next homework assignment lets each student select a different category label from the slide. They’ll each do some research and come back with a short report comparing the class’s emojified impact assessments to actual literature and findings. Where are we accurate? Where do our estimates not match the information students find?

Over the course of the nine week quarter, we do this three-session activity three times.

Each time we start from our own anecdotes, then broaden our search to additional perspectives, and finally compare our own assessments to population data and available studies. First, we discuss the impact of cars. Then, we discuss the impact of the consumer internet. Finally, we do generative text and image models—in other words, the thing colloquially termed “AI.”

Three times sounds like overkill, but I have found that the prior practice on less nascent, hotly contested developments like cars and the internet helps defuse conflict on the third topic. After we do the internet, I ask students to reflect on the patterns of impact they’ve seen across the car and internet discussions. Their comparisons draw out some common themes:

  • Both cars and the internet boast some excellent anecdotal outcomes in isolated cases.
  • Both cars and the internet boast some positive outcomes for millions.
  • Both cars and the internet have produced negative outcomes that affect a lot of the same people who experience the positive outcomes, plus some collection more.
  • Many of the negative outcomes emerge from failure to predict and implement against emergent properties of the technologies becoming popular and widely accessible.
  • The direst consequences often affect different people than the most glittery benefits, and those divisions tend to break along class lines.

I also ask students to reflect on the patterns they witness in the difference between their impact assessments before and after their research projects. Again, common themes arise:

  • Students tend to overestimate the proportion of the population that experiences the positive outcomes relative to the negative outcomes (worth noting here that the opportunity to participate in a graduate program tends to select for class).
  • Students tend to drastically overindex on their own experience when estimating the impact (positive or negative) of a technology on any given life globally.
  • When faced with an explicit menu of positive and negative impacts, students tend to rapidly identify new-to-them opportunities to retain positive outcomes while reducing or eliminating negative outcomes.

When I then tell students we’re about to do this exercise a third time for “AI,” they approach with curiosity. They wonder whether the patterns they’ve observed will appear again. They hope to beat their prior collective accuracy at assessing impact in each category label.

A student displayed this map—found in this article from this data source—while comparing their findings to class estimates about the internet’s impact on job networking opportunities. They concluded that the pattern of WHO could access these networking opportunities might look less evenly distributed than they initially assumed, and they also noted the correlation between darkness of hue on this map and population in our classroom from various countries.

A few weeks ago, my day job hosted a training about ChatGPT.

Members of the company expressed both wanton excitement and deep concern about the normalization of its use in the workplace.

Afterwards, a colleague of mine asked me about the discussion. He’d been using the products since GPT-2 in 2019, and he really enjoys them. He wanted to know: “outside of speculating net benefit, why so many concerns with Gen AI? And what are they?”

Then he shared an anecdote: the chatbot had helped his wife uncover the root of some health issues she was having that were otherwise terminal (sic). He also described really enjoying using ChatGPT, personally. From his perspective, sure, there are environmental concerns and social problems with oppression, but these don’t overwhelm the positives.

We talked about the topic for a while.

Full disclosure: I don’t think “net impact” makes sense to estimate at scale.

I understand why businesses like to calculate this. They want to compare cost to revenue to figure out whether they’ll have more money or less money as a result of building their product. But besides a particular, well-circumscribed set of individuals deciding whether a project lives or dies, I don’t have much use for the calculation beyond philosophical navel-gazing.

How could we possibly approach an accurate estimate? We’re not just talking about money here: we’re talking about general impact. That amount doesn’t come in a convenient, single numerical unit. It’s instead delivered in a million different currencies, from health to ecstasy to money to misery. You can’t convert these into like metrics.

And we don’t get to scope our analysis to a relatively homogeneous customer base like companies get to do. Global impact, by definition, includes everyone, and it taunts us with the question “impact to whom?” It ain’t as small a world as the specious aphorism claims. Most of us live within ten miles of neighborhoods in which no one thinks like us, and we’ve never had a single person from there over for board game night, and we never will. By traveling twelve thousand miles east or west, most of us can find ourselves in a land whose language we can’t even begin to comprehend, whose cultural customs would tie us in knots, and whose celebrities we’ve never heard of. The mini-worlds into which we isolate by building networks of people like ourselves—those worlds are small. THE world? Enormous, and capable of absorbing a kaleidoscope of impact patterns whose news will never, ever reach us.

I don’t ask my students to decide whether the arrival of cars, or the internet, or generative models were good or bad for society. In fact, I want them to reach the conclusion that they’re wholly unqualified to do so. I watch them discover the patterns of inaccuracy in their assumptions about impact. Then I watch them look at their post-it maps and brainstorm ways to improve the net impact regardless of where the navel-gazing lands on its current value. For me, this is enough.

But for folks who have spent years in industry, hounded by executives to consider the bottom line first and foremost, I understand the motivation to peg a theoretical impact number relative to zero. My conversation with my colleague starts there.

We begin, like the students, 135 years before ChatGPT: with cars.

A car permits someone, my colleague notes, to be driven to the hospital when found in critical condition an unreachable by an ambulance. It’s true that cars allow individuals to drive other individuals to hospitals. I’d place this among the “excellent anecdotal outcomes” students notice in their assessments. Along with that, the annual death toll of car crashes in the US alone—that “same people who experience the positive outcomes, plus some collection more” group that my students identify—is in the tens of thousands. This does not count pedestrian deaths or cyclist deaths: just people in cars. Then there are deaths caused by comorbidities with asthma, whose rates have risen due to pollution from vehicle exhaust. A dire outcome, and one that affects many who live in polluted areas and without the financial resources to access a car themselves. “Okay,” my colleague concedes, “but what about the distribution of food and access to information?” A fair point. As it happens, most commercial freight travels by train or ship or semi-truck, not cars. But then, of course, what about the drive to the grocery store? That last leg requires a car, doesn’t it? For some, yes. For others, no. For still others living in food deserts, the answer might be yes if they had the resources to access a car, which again, many don’t.

Reluctant to get too deep in the weeds and lose the original thread, we move on to discussing the impact of the internet. I like this example because often folks discuss the value of generative models, not in terms of what they can do right now, but in terms of what they theoretically should be able to do 30 years from now. The consumer internet is a little over 30 years old and aptly demonstrates the ambiguity of attempted impact calculations.

I ask my colleague if the internet has improved the world. He replies “You and I are talking right now using it, so yes.” I point out “you and me. The top 2% of privilege. How many have the opportunity to interact like this?” The official number, as of this writing: 5.35 billion. That’s about two-thirds of the population, officially; I couldn’t find definitive details on what amount of activity constitutes using the internet, nor what the network strength distribution looks like over those people.

It’s worth noting that we (myself included), people reading this, generally consider the existence of the internet to be a well-received development. It’s also worth noting that our impression comes from personal reference and survivorship bias. For us to hear someone’s opinion on this, they have to either have access to the internet, or they have to have a relationship to us (which means, given that we have access to the internet and given that we network extremely homogeneously, they almost certainly have access to the internet).

I’m not saying kill the internet and live on subsistence farms, which is what my colleague accused me of next. It’s just that, for a software engineer to estimate a global 30 year positive impact for a nascent technology because that person has had a positive individual experience with the technology, doesn’t mean a lot. This is a key takeaway for my students from the estimation exercise. During both the hashtag part and also the research project part, they tend to learn about impacts of cars, the internet, and generative models that they had no idea existed. They tend to conclude—this is my favorite part—that they might not be in the best position, based on their current knowledge and perspective, to estimate the technology’s global impact. They also tend to conclude that they are more likely to see and experience many of the benefits while avoiding, and even remaining oblivious to, some of the costs.

Mercy Mutemi and fellow counsel during a virtual pre-trial consultation in April 2023. Mutemi represents a collection of content moderators bringing a class action lawsuit against Meta and outsource company Sama for mistreatment of workers hired to filter gruesome content off of Facebook. Facebook users rarely have to confront the existence of these roles because they never see the gruesome content. Image from this article.

When we reached the topic of generative models, my colleague put it this way:

“If I distribute the environmental impact to every human being, and then measure the net positive of Gen AI for that individual, given that so much of the world doesn’t even have access to that technology let alone cars, I can see where it’s definitely a net negative.”

Again, setting aside attempts to estimate global net impact, what impacts do we know about from generative models? We know that they bring joy to many who can afford a license and chat to them regularly. We know that thousands of content creators applaud them as a tool for brainstorming (and many, it must be said, outright print and distribute designs generated entirely by the models). We know that the models offer assistance to folks who need to write documents in English for work, who speak English as a second language, or who struggle with writing.

We also know that the model creators’ unfettered use of copyright material without creators’ consent constitutes outright thievery. The environmental impact of these models, climate scientists agree, drives us toward catastrophe. OpenAI’s exploitation of global south labor at a wage of two cents per hour, only to turn around and take ten billion from Microsoft, clearly consolidates global wealth to, basically, 1-3 dudes. 

Our ethical struggle with generative models derives in part from the fact that we…sort of can’t have them ethically, right now, to be honest. We have known how to build models like this for a long time, but we did not have the necessary volume of parseable data available until recently—and even then, to get it, companies have to plunder the internet. Sitting around and waiting for consent from all the parties that wrote on the internet over the past thirty years probably didn’t even cross Sam Altman’s mind. But also, I can tell you, based on numbers I have seen as an engineer at a company that asks for user consent before changing almost anything, it wouldn’t work. Too many people would say no. The project would not obtain sufficient consenters for companies to have enough data to build generative models.

On the environmental front, fans of generative model technology insist that eventually we’ll possess sufficiently efficient compute power to train and run these models without the massive carbon footprint. That is not the case at the moment, and we don’t have a concrete timeline for it. Again, wait around for a thing we don’t have yet doesn’t appeal to investors or executives.

The exploitation part is unfortunately a time-honored tradition in global enterprise. The tech industry specifically gets away with new and more atrocious versions all the time by dint of creating things that regulatory bodies have never seen, let alone regulated. Tech companies move quickly; regulatory bodies move slowly. It took the Sherman Anti-Trust Act almost fifteen years to catch up to Google, and the sentence from that ruling is expected to take another 3-5 years. I understand, and even agree with, calls to regulate what folks have named the AI industry. But gurl, that will not be a quick process.

The invention of the automobile offers prescient precedent here. Though it’s hard to argue against the utility of cars to those that can access them, emergent developments after their introduction dragged the “net impact” meter down. The United States built cities around the assumption that everyone had a car. This forced residents to depend on them—a situation we compare disfavorably to life in cities with reliable transit, bicycle infrastructure, and pedestrian options. As cars got bigger, survival in car crashes favored the larger car, resulting in a car size arms race. Could we have done all this in a better way, that better leveraged cars for positive impact while avoiding some of the negative emergent properties? I’d like to believe we could have. 

A busy intersection in Amsterdam received a redesign in 2023 (you can drag the white vertical separator left and right to compare the before and after). The new version dedicates space for bicycles and pedestrians, while converting the car section to a roundabout structure with crossings for the transit trolleys. The redesign achieves higher throughput of people at rush hour and dispenses with the need for a lot of traffic signals. Images from this article: https://bicycledutch.wordpress.com/2024/01/31/cycling-in-amsterdam-watch-the-traffic-flow-at-a-transformed-busy-intersection/

Could we theoretically improve the net benefit of any given technical development today if we make efforts to maximize its positive outcomes and mitigate its negative ones? I believe so, and I believe that’s basically the option available to us.

Fine, Chelsea; you want us all to abandon modernity, live in huts, and gather berries?

Jesus, guys, we’ve been over this. Crying luddism every time somebody critiques your pet thing is a super weird look.

I do want practitioners to do three things.

1. I want people to recognize and acknowledge the massive difference between “what this thing has done for me” and “what this thing is doing for everyone.”

I think the internet has made life better for me. I am immensely grateful for that. I work at Mozilla because I believe in the internet and I want to maximize its positive impact. I think it’s okay to acknowledge that just because something was good for me doesn’t mean it saved the world. GenAI saved my colleague’s wife’s life (or, I’d argue, his wife used a tool in a positive way to save her own life). It’s reasonable for him to believe in it. I am grateful that they had it available to do that. That’s different from calling it a global net positive, and that’s okay. 

2. I want people to think about “net impact” in a more nuanced, complex way than product net revenue.

The emergent outcomes of a technical shift on the entirety of global society cannot be reduced to the same mathematics we use to decide if a widget made a profit. I don’t think we have to start huge; I have my students start with personal anecdotes and then broaden from there. The approach I use for them is abbreviated, a little hamfisted, and certainly imperfect. But it captures more nuance than we often do when we make a gut estimate about whether a development was good for the world. I’d like to see engineering practitioners more regularly engage in a practice of organized, collective investigation and reflection.

3. I want people to arrive, not at an estimate of net impact, but at a set of ideas for improving that net impact.

This is what I think is within our purview to execute against anyway. The last time I did this exercise I watched students arrive, unprompted, at a long and varied list of ideas for how they might shape a future with generative models in it. “Can we build tools for content creators to proactively thwart their materials’ digestion by online scrapers?” “Can technical innovation permit us to minimize energy use in retraining somehow?” “What is universal basic data income, and is there any path to viability for that?” “How…how might someone like me chart a course to becoming a subject matter expert for regulation development?”

I hope these questions inform students’ choices as practitioners in the field. And I think arriving at similar questions can inform the choices of current practitioners. It’s tempting to sit around and speculate about whether a parallel universe without <insert development here> might be better off. But we don’t live in that universe; we live in this one. This universe is the one where we get to answer that question: not with navel-gazing, but with empirical evidence derived from attempts to drive change.

If you liked this piece, you might also like:

Unless I am wrong (which I could be), I expect this post to conclude my writing on this topic for some time. I have other subjects I’d like to discuss here, including some interesting books I’ve read lately. Stay tuned for more, and if you can’t get enough, join me on Patreon: I put out audio recordings of blog posts every Monday and maintain a few regular written columns.

Read the whole story
bitofabother
1 day ago
reply
Portland, OR
Share this story
Delete

Molly White on generative AI

1 Share
a thoughtfully nuanced take that weighs their limited benefits against the outsized human cost #
Read the whole story
bitofabother
150 days ago
reply
Portland, OR
Share this story
Delete

Well What if I Was In Charge Of Pokemon If I’m So Bloody Smart?

1 Comment

No matter where you are on the internet, whatever the fandom is, someone is always going to ask about what you’d do if you were in charge of it. For example, a lot of Bible fans are very convinced that their fanfiction is actually factually true. Whether it’s fantasy Wrestlemanias or ideal outfit compositions in Pretty Little Liars, there’s always an urge to take a thing you already know and make your version of it.

Also, people who like Pokemon routinely talk about what stupid idiots the designers are and how they could do a better job of running the game. I don’t think I could, because I know there are competing factors and I think that everyone who opens their mouth to talk like that sounds like a tool.

Still, if I think those people are silly, it’s easy to say that if I don’t put myself out there, right?

Here! A bunch of opinions about what I think should be done in Pokemon as a game franchise. Nothing like ‘open world matters’, I think the game should always be a competitive 2v2 Bo3 format and the rest of the game can follow from that. I also don’t think that this would make the game better. It’s very important that I put it out there, on my sleeve, that none of these changes are based on deep insight into the game or the way that it should be. No. This is a centering of myself, as a designer, and as a player of games. This is how I want it done. Also note that none of these changes are simple or oblative, like, this isn’t all that I think should happen, there would need to be specific changes and fine tuning for all these pushes.

There, preamble done, here’s how and where I’m right.

Kill the Rock or Steel Type, Don’t Care Which

To me the Rock type feels like it exists because when they were setting up the type chart back in Red-Blue, they figured they’d need something like that, like making sure to sketch out space for a window box while designing a window. But Rock in RBY got to be Ground’s ugly cousin, with its greatest perk (a resistance to normal moves) not proving adequate to the task of dealing with that generation’s overwhelming normal type attacks, and being bolted consistently to something that gave it a quad weakness. In Generation 1, there was one rock type that wasn’t quad-weak to grass, and only three that weren’t quad weak to water as well. In its first appearance, Rock was literally never used on its own, which to me suggests that the type wasn’t actually doing a job. It was a sorta-type, a thing to keep them overwhelming grounders under control, I suppose.

In Gen 2, as if to fix Gen 1, they introduced Steel types, which were uh, like, Rock types but good? They had the ground weakness still, they shared that, but they no longer catastrophically mixed with ground, and Steel had the kind of resistances that made it fit for ‘tough’ Pokemon types. But this brought with it the new problem that now Rock’s old job – a physically tough type of elemental Pokemon type that represented being made out of something inert that wasn’t necessarily stuck to or of the ground – was displaced by something that was just better.

Rock exists in an ugly space between ground and steel, made worse. Steel exists to do Rock’s job, but better. Steel is one of the best types in the game and even brings with it an immunity to a whole wing of status conditions that you want on a tough Pokemon that wants to endure fights. One of these types sucks at doing its job and the other is too good at it and any time you get one of them you probably would be better off if it was just the other.

My druthers, Rock would be Steel, the Steel Type wouldn’t exist and the Rock type would just have the Steel associations, and if that doesn’t make sense for a Pokemon I’d just make it loose the Rock type. Graveller and Golem didn’t get anything being Rock types after all. Oh, Stab on Rock Slide, yeah, woo, that means something.

Get Rid of the Fairy Type

You can admit to your mistakes, just admit it, the Fairy type was an attempt to address the problems of making a bunch of broken dragon types. It has no coherent flavour, and it’s super strong in a way it does not justify.

The Fairy type sucks and it’s so popular and strong it’ll never be properly addressed.

‘Oh but if you got rid of the fairy type, what type would you give the fairies? there’s nothing else that fits’ yeah see what I mean about not having a coherent theme?

Buff the Ice Type

Ice got a ‘sort’ of buff in Scarlet-Violet, in that one of their moves got made worse. Oh, more usable, but it it didn’t actually help Ice. See, Ice Types in Hail get a defense buff, making them better defensively under the new ‘Snow’ condition. This means that now you can run a single Ice type and it gets tougher as long as this snow condition is going on without needing to build your whole team around them the way them. You could include teammates that were immune to Hail without building around a single type, and benefit from the defensive bonus it now grants, which is like the defensive bonus you get from Sandstorm.

Basically, you know that good weather, Sandstorm? Well, rather than Hail, the Bad Sandstorm too much like Sandstorm, it got replaced with Snow, so it’s now the Bad Sandstorm.

Anyway, the Ice type should resist water (turning water that hits it into ice and floating in ice are two good themes there), and Steel should be weak to Ice. No strong reason, just fucking hell, give Ice something to do.

Stop Making Bugs A Dumping Ground

Bug is such a weird type because it’s clearly something that the designers are fond of, something they like, but it’s also a type with almost no meanigful support in the game’s entire history. The list of good, tournament-meaningful bugs starts at Scizor, adds Volcarona and kinda stops there, and that’s a list that’s been about that long since forever. The Bug type is used in the early game to populate early routes invoking things like looking for cicadas as a kid.

But the result is that bugs aren’t treated as a sort of whole type of their own. Bug type moves are typically weak hits, and even though U-Turn is an incredibly important move, it’s never important for being Bug, it’s important because it’s a pivot – a move that lets you transfer in a Pokemon after other attacks. Its potential offensive capacity is irrelevant, and you can tell because as good as U-turn is, it’s showing up on Incineroar – quite possibly the best VGC Pokemon of all time.

There’s no legendary Bug.

There’s never been a Box bug.

There’s exactly one Mythical Bug ever, and it was Genesect.

Fully-evolved bugs have the lowest average stats, and they as a group have the lowest hit points, and special attacks of any group.

Bug as an offensive type is resisted by seven other types, and bug resists three uncommon offensive types – fighting, ground, and grass. It even has a weird thing where Fighting resists Bug and Bug resists fighting, which isn’t something that shows up in other damage types.

The Bug type is a bad type and Bug Pokemon are bad Pokemon because Bug Pokemon aren’t made to be good. The best Bug move is good because it shows up on the Best Pokemon of all time in a competitive environment.

My solution to this is not to do anything with the type per se but just, like, fix it? Stop making Bug Pokemon that are bad. Make better moves for Bug Pokemon to use. Take every fully evolved Bug Pokemon and give it better stats. By all means, keep the way that Bug Pokemon are bad at HP and Special Attack, sure – but give them something for it! This is a type with a lot of Pokemon that get to be Someone’s Special Guy, and they have made it so anyone who gets attached to Bugs early on is guaranteed to have to give up on their faves when they start playing in competitive scenes. That sucks!

There! Just some opinions about how Pokemon should be designed. These are the kinds of opinions I think are interesting to consider.

The post Well What if I Was In Charge Of Pokemon If I’m So Bloody Smart? appeared first on press.exe.

Read the whole story
bitofabother
153 days ago
reply
Accurate.
Portland, OR
Share this story
Delete

The Elegiac Hindsight of Intelligent Machines

1 Share

This essay was edited out of a chapter of my book, The Intelligence Illusion: a practical guide to the business risks of Generative AI, with minor alterations.

She looks back at us with regret, hair swept in the wind. There is judgement in her eyes. Or, it could just be an abstract square.

“See the choice of dreams”, and then worry about it

Very well. This book – this side, Dream Machines – is meant to let you see the choice of dreams. Noting that every company and university seems to insist that its system is the wave of the future, I think it is more important than ever to have the alternatives spread out clearly.

But, the experts are not going to be much help, they are part of the problem. On both sides, the academic and the industrial, they are being painfully pontifical and bombastic in the jarring new jargons (see “Babes in Toyland,” p. 4). Little clarity is spread by this. Few things are funnier than the pretensions of those who profess to dignity, sobriety and professionalism of their expert predictions – especially when they too are pouring out their personal views under the guise of technicality. Most people don’t dream of what’s going to hit the fan. And the computer and electronics people are like generals preparing for the last war.

Frankly, I think it’s an outrage making it look as if there’s any kind of scientific basis to these things: there is an underlevel of technicality but like the foundation of a cathedral, it serves only to support what rises from it. THE TECHNICALITIES MATTER A LOT. BUT THE UNIFYING VISION MATTERS MORE.

Ted Nelson, Computer Lib/Dream Machines[1]

AI software development – the “what is this for?” part – has never had much of a unifying vision. AI research, sure, they have a vision: they want intelligent machines first, figure out what to do with them second.

They dream of a robot future.

Some parts of the research monomania end up having clear software benefits. Being able to point a computer at an image and have it get at least a rough idea of what the picture is of is neat and that came from Machine Learning research. It didn’t come with a single specific “what is this for?” vision except, you know, “how is our robot going to see?”, but it made up for it by being obviously useful in a general sort of way. It’s a capability that’s now built into pretty much all of our devices. As a feature that’s now integrated into our lives, it’s a microcosm of the issues we have with innovations coming out of AI research:

  1. It has helped the blind and partially-sighted access places and media they could not before. A genuine technological miracle.
  2. It lets our photo apps automatically find all the pictures of Grandpa using facial recognition.
  3. It has become one of the basic building blocks of an authoritarian police state, given multinational corporations the surveillance power that previously only existed in dystopian nightmares, and extended pervasive digital surveillance into our physical lives, making all of our lives less free and less safe.

One of these benefits is not like the other.

Universal facial recognition is terrifying when it works perfectly and a nightmare when it’s flawed. It exaggerates power imbalances and disproportionally enables bad actors and authoritarians. It’s equal parts pleasant domestic miracles and blighted social and political horror. Generative AI is likely to follow the same path.

In the absence of a unifying vision, the tech industry simply does what makes the most money for people working in the tech industry – greed fills the void where there should have been vision. Companies such as Amazon didn’t hesitate to sell facial recognition services to law enforcement – until the backlash forced them to stop.[2]

This might just be capitalism, but the ‘just’ in that phrase feels quite different when the industry in question is peddling synthetic miracles.

Greed might be inevitable. It might always seep into the cracks, break apart the concrete foundations our ivory towers are built on. But, having a coherent unifying vision that’s backed by clear values does a remarkable job of holding off the decay.

Even today, the web is like living fossil, a preserved relic from a different era. Anybody can put up a website. Anybody can run a business over it. I can build an app or service, send the URL to anybody I like, and most people in the world will be able to run it without asking anybody’s permission. There are rules you have to follow, obviously, but those are remarkably straightforward if you aren’t actively spying on people or messing around with their data – especially when you’re working on a comparatively small scale.

You can trace the lineage of the vision behind the web from Tim Berners-Lee[3] in 1989, through Ted Nelson in 1974 and Douglas Engelbart in 1968, all the way to Vannevar Bush’s article As We May Think in the Atlantic Monthly back in 1945.[4]

All of their books, software prototypes, theories, and ideas run along the single continuing thread of the hypertext concept – links – as a new kind of punctuation mark[5] that connects the information of the world together in a coherent network. It’s a vision that encompasses concept, functionality, interface, and values – and it persists to this day, despite decades of greed, abuse, surveillance, and shitty ads. It’s a unifying vision of a world that’s simultaneously technological and literate. This vision is part of what has kept it alive. Despite the frustrations, pain, and the flaws, working on making parts of the web is a privilege.

While AI researchers are busy trying to build their robot dream, generative AI software has no such unifying vision. Some vendors are bent on replacing humans at their jobs – effectively promoting their software as “AI” illustrators, voice actors,[6] even “robot” lawyers.[7] – and then look surprised at sheer enormity of the anger that they get in response. Other vendors are resurrecting Microsoft’s 1990s dream of intelligent agents, assistants, or copilots who operate in the context of the software you use – extending Clippy’s lineage into the modern world.[8] The rest seem to have lazily reached for the first interface metaphor they could think of – the chatbot – with no thought or even the vaguest idea of how it should actually integrate with the rest of the work we’re doing.

As much as it makes your average MBA salivate, “let’s replace people with something shittier but cheaper” isn’t much of a vision for software development and user interface design, which leaves those of us who are genuinely curious about the applications of the technology with the other two paths.

Those seem to be converging towards a single idea: “Human-in-the-loop”. It’s the idea that in the interactive loop with the AI software, the decision-making, choices, and actions are made by the human.[9] Instead of full automation there’s feedback from the AI as the tasks progress and the human responds to that by interacting with the various user interface affordances provided by the system.

In other words, the human sits in front of software, uses it as they would any other software, and then it does stuff for them as any other software does, except in an AI way.

The grand unifying vision of AI-assisted software is that you should use it to make software.

That’s an idea that’s only remarkable because of how many AI enthusiasts think they can do away with the people part of getting things done.

Acquiesce, or mitigating the inevitable

In an earlier chapter I wrote about the failure of an AI model designed to predict the onset of sepsis, how external reviewers discovered the flaws, which then led to the vendor updating and improving it.

At one hospital, UC Health in Colorado, they found that the system still wasn’t that useful: “the ratio of false alarms to true positives was about 30 to 1”.[10]

To salvage their investment UC Health changed their approach with the system. Instead of using the AI as an autonomous prediction system that sent out alerts to overworked doctors and nurses, they put together a special monitoring team of clinicians that used live video feeds to helped filter out the false alarms. That team built relationships with bedside nurses throughout the hospital. Where the AI system alone was utterly useless, the human team, assisted by their relationships with the nurses and the AI system, was estimated to save about 211 lives annually.

AI on its own was worse than nothing while AI as an assistant saved lives, a clear demonstration of the value of human-in-the-loop. It’s a heart-warming parable that lends credence to the ambitions of those who are trying to make “AI assistants” happen.

This, and other stories like it, are going to be the foundation myths of a thousand new AI services – the seeds of a new computing revolution.

Or, more specifically, it’s a certain kind of fertiliser. The kind that smells.

Case studies are amazing tools. You can pick one instance where everything worked out – an exercise absent of disaster – has a nice “all is lost” moment that gets turned around, and throw together a just-so story that proves exactly the point you want. There isn’t anything anybody can do to disprove it without particle-colliding themselves into an alternate reality. There is no way to ‘science’ a case study unless you have access to a parallel universe as a control. They’re all just stories that short-circuit our thinking.

For every UC Health sepsis story there are a hundred systems that didn’t work. Even the UC Health story itself is dubious once you dig into it. Was it the AI that saved 211 lives? Or was it having a specialised team of clinicians watching all the at-risk patients around the clock, using live feeds? Or, was it the relationships the clinicians developed with the bedside nurses? If you’d put together that same team with the same infrastructure, but using a simpler, cheaper algorithm based on vital sign monitors, would that have done the same job? Why didn’t UC Health try that first – simpler, cheaper, faster to set up – if what they wanted was to save lives?

The answer is simple: they’d already bought a broken AI system. They did what they had to in terms of making sure their investment did eventually save lives, but it leaves us with this unanswered question: if they had spent that same amount of money on building teams and a system for detecting sepsis, but without AI, would it have worked better or worse?

We can’t know, and that’s why case studies are a favoured tool by MBAs, startups, and consultants all over the world. You can just pick a story that proves what you want and ignore the other hundred that don’t.

Once Generative AI becomes a broad movement in software, facts and science won’t matter, and the stories will take over. Ted Nelson was writing about computers and programming in a more general way and in a different era, but he’s right here too: the stories about AI software aren’t scientific and trying to make them look scientific is an outrage.

That’s what the AI software vendors are doing with their marketing performances that look like scientific papers, the ‘studies’ that are little more than sales exercises, and the entirety of their rhetoric about being on the verge of AGI – how we need to make sure those future robot gods are our slaves and not overlords.

It’s storytelling.

Fighting that with another ‘science’ performance is futile. In a war of theatrics, the act with the biggest budget wins the crowd. We can chip away at the foundations with peer-reviewed papers and research that show flaws and failures, but ultimately what will decide this in the decades to come is the software – how well it’s designed, how effective, how productive, and the long-term failures and successes in real workplaces.

That’s where the three core flaws of the assistant model is going to be a problem.

I mentioned two of them earlier, automation and anchoring biases. We, as human beings, have a strong tendency to trust machines over our own judgement.[11] This kills people, as it’s been a major problem in aviation.[12] Anchoring bias comes from our tendency to let the initial perceptions, thoughts, and ideas set the context for everything that follows. AI adds a third issue: anthropomorphism.[13] Even the smartest people you know will fall for this effect as large language models are incredibly convincing.[14] These biases combined lead people to feel even more confident in the AI’s work and believe that it’s done a better job than it has.[15]

We’re using the AI tools for cognitive assistance. This means that we are specifically using them to think less.[16] In every other industry this dynamic inevitably triggers our automation bias and compromises our judgement of the work done by the tools.[17] We use the assistant to think less, so we do.

These models are incredibly fluent and – as we saw at the start of this book – are consistently presented by their vendors as near-AGI. This triggers our instinct towards anthropomorphism,[18] making us feel like we have a fully human-level intelligence assisting us, creating an intelligence illusion that again hinders are ability to properly assess the work it’s doing for us.[19]

Anthropomorphism, when applied to AI chatbots has been called the “Eliza effect”. It was first observed by Joseph Weizenbaum when he saw how people responded to and interacted with the comparatively primitive ‘AI’ chatbot, Eliza, that he created back in 1966.

What I had not realized is that extremely short exposures to a relatively simple computer program could induce powerful delusional thinking in quite normal people.

Joseph Weizenbaum[20], p. 7.

Fluent AI models create an anthropomorphism effect that sways even those who knew that the AI was nothing more than a simplistic program, even by 1966 standards.[21]

The intelligence illusion, the conviction that these are artificial minds capable of powerful reasoning, when combined with anthropomorphism supercharges our automation bias. Our first response to even the most inane pablum from a language model chatbot is awe and wonder. It sounds like a real person at your beck and call! The drive to treat it as, not just a person, but an expert is irresistible. For most people, the incoherence, mediocrity, hallucinations, plagiarism, and biases won’t register over their sense of wonder.

This anthropomorphism-induced delusion is the fatal flaw of all AI assistant and copilot systems. It all but guarantees that – even though the outcome you get from using them is likely to be worse than if you’d done it yourself, because of the flaws inherent in these models – you will feel more confident in it, not less.

Every human-in-the-loop and assistant-style AI system I’ve seen has these defects. Some of them even do their best to exacerbate them by making the assistant adopt a confident tone or an affable demeanour.

Those who use these AI systems are likely to get worse results and still be more confident in the resulting output than they would have in their own. Their work will suffer, but they will feel like it has improved. This is a recipe for fanatical evangelism and incredible revenue growth. It all but guarantees that we’ll see a financial bubble of some kind around AI. The only question is its size and duration. The more effective the Generative AI systems are, the bigger the bubble. The less effective they are, the faster it’ll pop.

We’ll probably get some good software out of it – especially when it comes to converting or modifying text and media – but it’s the nature of bubbles to create crap. A software bubble is the flowering of a thousand first-movers – countless startups and tech companies, most of them utterly clueless about what they’re working with, building the first bad iterations of what they hope is a good idea. We don’t know yet what the ideal, productive AI-assisted productivity software will look like, but we do know that we’re unlikely to see many examples of it in the first generation.

Meanwhile, the tech industry will dream of exponential growth.

The roads home

I have lived abroad for most of the past twenty years. The web let me work wherever I wanted without losing touch with my friends and family. The tools the web offers gave me freedom that I couldn’t have imagined when I was a child.

This worked well for a while. I’ve had the joy of living in a number of wonderful cities and amazing neighbourhoods and communities.

I grew older and, with age, those around you also grow older. Some of them get sicker. A video chat doesn’t fill the void you feel when somebody you care about is lying in a hospital bed. But the freedom the web provides works in the other direction as well. I could live near those I care about and the web meant I could keep doing my job no matter where I was.

I decided to move back home to Iceland. As I was preparing my move, the COVID-19 pandemic struck, and the rest of the world discovered what I had known for decades: the web abstracts distance. You can work where you want. I made my way home, despite the collapse of international airline travel. From Montréal to Toronto. Toronto to Amsterdam. Finally, I flew from Amsterdam to Iceland.

Back in Iceland, I settled in Hveragerði, a small town of about 2700 people in the south of Iceland. Keeping with theme of my realisation that the web and related technologies meant that location mattered less, I could pick a place that suited my personal needs. It’s a nice town. The weather here can be interesting – this is Iceland after all – which often leads to road closures in the winter. But there are three separate roads that connect this region with the capital, so even though a couple of the roads get closed due to snow or ice there’s always the third. Because we know what to expect from the weather, most regions in Iceland invest in their infrastructure. We try to make sure we can keep everything going even when a bad storm hits us. There are redundancies and, for the most part, they work.

We can’t say the same about the software that we have today – that we use for our work. Even though many organisations have returned to the office, partially or fully, we are still using the same software that companies adopted for remote work. We use Google Docs, Zoom, Dropbox, or an equivalent competitor. Our files, documents, and processes are now tied to whatever app we’ve adopted.

If Google Docs goes down or has sporadic outages, then our work disappears with it. If our internet goes, the software blinks out. When the biggest data centre belonging to Amazon Web Services goes down, that breaks most of their services across all of their data centres, because they’re all interconnected, and almost all of our software breaks with it.[22] Given our increasing reliance on centrally hosted software services, the impact of temporarily losing a data centre is severe, getting worse, and is now even happening because of the weather, caused by an increase in frequency and magnitude of heatwaves globally.[23]

It doesn’t matter whether we ourselves are working remotely or in-office, all our software today is remote, and the connection to it can break in a thousand different ways.

There is only one road into town.

A global network means our software shouldn’t have to be centrally located in only a few specific buildings across the world. It should be spread throughout the network, on every device that’s connected to it. Our hardware devices shouldn’t have to be so reliant on the internet that core features cease to function just because they can’t phone home for a short while. Our information – public, private, professional – shouldn’t have to be controlled, collected, and stored by only a handful of corporations.

The software we have today is undermining the strongest advantage given to us by the internet: robust and distributed reliability. Our work depends on increasingly unreliable software. Their need to be always online means you feel every hiccup in your connection. Centralisation means that when something does go wrong, it’s potentially catastrophic as it affects everybody, everywhere, who is using that centralised app. This matters because things are going wrong, fast. There’s political unrest. Social instability. Cold wars. Hot wars. Trade wars. A climate crisis. Data that was just normal personal data one moment, becomes incriminating evidence a moment later when people’s rights are stripped away.

The software we have isn’t the software we need.

The opposite of good software

Modern software is remarkably fragile. We’ve gone from a software ecosystem that, a few years ago, was almost completely local, to one where everything is just cached – temporarily stored – at best. A decade ago what you worked with was on the computer itself. Your data was your own, and relatively safe if you kept decent care of your backup drive. The apps were yours, usually bought and paid for once – no subscription. Collaboration was always a bit tricky if you weren’t a software developer – we’ve always had somewhat decent collaborative tools in version control systems – but other people made do by using shared local servers or simply sending files over email.

This was a remarkably robust software ecosystem that tolerated all sorts of disasters, disconnections, and changes. We’ve dismantled it in less than a decade. Most of the apps we use for our work require an internet connection. Almost all of them are entirely cloud-based, where significant parts of the software runs on a server somewhere. Little of our work data is stored locally any more.

Generative AI serves to accelerate that trend.[24] You needed 800 GB just to store GPT-3, without even running it.[25] Later versions and ChatGPT are even bigger, running in parallel on multiple servers. The technology can be made to work locally, but that’s not where the hype is. The hype is for the already countless “AI for X” services who are all in the cloud and are all using services from OpenAI.[26] Unless one of the big tech companies breaks ranks and builds into their Operating system a solid and tested large language model that has been ethically trained on documented data sets, what we’re going to get are agents and chatbots everywhere, each living in the cloud, fine-tuned for a task, and hooked up to whatever APIs the startups think are nifty. Why solve the hard problem of making a language model safer, better integrated into your software, and more sustainably developed, when you can hook up a finicky but flashy website with OpenAI and call it a startup?

The dream the tech industry chose is not science or progress. The dream they chose is that of easy money, because that’s the only dream the tech industry today is capable of seeing. Their vision is a mirage of craving.

Their want can only be met with another financial bubble, one that has to be more grand and world-changing than any other that preceded it. They crave the exponential to fulfil their dreams, but the only true exponential today’s twenty-something startup founder will experience is that of the escalating Climate Crisis. That won’t stop them from trying. Their hunger is likely to push them to ignore the social unrest and power shifts that AI systems cause.

The tech industry doesn’t just behave with your normal corporate greed. They want financial bubbles. They had a taste of the euphoria with the dot-com bubble and the hunger for it never went away.

The tech industry is also, as I argued at the start of this book, full of true believers in AI. Somebody who truly believes – sincerely believes that this will all be for the best – will push past the mass unemployment, organised disinformation, and wholesale deception. They will think that it will all be worth it. Once we get through the initial “disruption”, things will be better for everybody.

None of this is conducive to software design and development. It isn’t a mindset that leads you to do user research, observational studies, or usability experiments. It’s a drive that’s taking them away from what most people and their communities need. Where we need robust technology, they are giving us finicky AIs that misbehave at a badly worded sentence. Where we need privacy from both corporations and potentially hostile authorities, they push further and further into recording our lives. When we need software that works on the devices we have, for as long as they last, they give us software that only works on the latest and greatest. Sometimes, as with GPT-4, the software they make even requires systems so powerful that they only exist in a couple of locations on the planet.

But, don’t worry, they’ll sell us access – timeshare, really – but let’s call it “the cloud”. It only breaks some of the time.

Nothing they do is for us, even though it’s our money, our data, and our art, writing, and music they’re demanding. We aren’t customers to them – we’re just the people that pay. To tech companies, we are nothing more than a resource to be tapped. A number to be boosted to pump investor interest. They are not doing us any favours. What they want from us is simple: everything. All culture on their servers, made by their AI. All our work happening through them, assisted by their AI. The totality of our information, mediated by their AI. A vig collected on all existence.

One of the papers I’ve referred to a few times in this book is On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?[27] It was the first paper to provide a cohesive and detailed overview of how large language models work, how they affect people, and the risks that they pose. This paper ultimately led to Google firing one of the key authors of the paper, Timnit Gebru, and forced other co-authors employed by Google to take their name off it.[28] This continues to this day, where Google employees seem to be routinely discouraged from working on AI fairness or ethics.[29]

When Microsoft launched Bing Chat, the first mainstream attempt to use a large language model as a front end for search – something that another co-author of On Stochastic Parrots, Emily M. Bender, had warned against in a separate paper titled Situating Search[30] – this lead to the exact outcomes they had predicted. Strange behaviour[31], threatening language[32], falsehoods[33] and lies[34] ensued. Bing Chat played out exactly the way they expected.

Of course, Microsoft did the only rational thing it could when the risks of its products were revealed: it disbanded its AI ethics and safety team[35] and rolled Bing Chat out to even more people.[36] It now plans to push towards adding AI chatbots to everything, everywhere, no matter the cost.[37]

Most of the tech organisations that had responsible AI or AI safety teams are disbanding them.[38]

They seem to think it would be a mistake to worry about risks and problems – why worry about something you can probably fix?[39] Who cares about the harm it does in the meantime?

Safe, for the tech industry, is too slow when you hunger for a bubble and want to ship more software, to more people, as fast as you can.

Designers of software user interfaces often imagine deliberately bad designs as an exercise – a way of demonstrating the principles of their craft by exploring their opposites. It’s a good way of demonstrating why a design principle matters, and it can provide tactile examples of who benefits from it and how.

If you asked me to imagine the software that would be the opposite of what we need as a society…

That app would look remarkably like ChatGPT.


The best way to support this newsletter or my blog is to buy one of my books, The Intelligence Illusion: a practical guide to the business risks of Generative AI or Out of the Software Crisis.


  1. Ted Nelson, Computer Lib/Dream Machines (Place of publication not identified, 1974). ↩︎

  2. Jeffrey Dastin and Jeffrey Dastin, “Amazon Extends Moratorium on Police Use of Facial Recognition Software,” Reuters, May 2021, https://www.reuters.com/technology/exclusive-amazon-extends-moratorium-police-use-facial-recognition-software-2021-05-18/. ↩︎

  3. “A Little History of the World Wide Web,” accessed April 6, 2023, https://www.w3.org/History.html. ↩︎

  4. Vannevar Bush, “As We May Think,” The Atlantic, July 1945, https://www.theatlantic.com/magazine/archive/1945/07/as-we-may-think/303881/. ↩︎

  5. Stacey Mason and Mark Bernstein, “On Links: Exercises in Style,” in Proceedings of the 30th ACM Conference on Hypertext and Social Media, HT ’19 (New York, NY, USA: Association for Computing Machinery, 2019), 103–10, https://doi.org/10.1145/3342220.3343665. ↩︎

  6. Joseph Cox, “‘Disrespectful to the Craft:’ Actors Say They’re Being Asked to Sign Away Their Voice to AI,” Vice, February 2023, https://www.vice.com/en/article/5d37za/voice-actors-sign-away-rights-to-artificial-intelligence. ↩︎

  7. “DoNotPay - The World’s First Robot Lawyer,” accessed April 6, 2023, https://donotpay.com/. ↩︎

  8. Benjamin Cassidy, “The Twisted Life of Clippy,” Seattle Met, August 2022, https://www.seattlemet.com/news-and-city-life/2022/08/origin-story-of-clippy-the-microsoft-office-assistant. ↩︎

  9. Ge Wang and Juliana Bidadanure, “Humans in the Loop: The Design of Interactive AI Systems,” Stanford HAI, October 2019, https://hai.stanford.edu/news/humans-loop-design-interactive-ai-systems. ↩︎

  10. Casey Ross, “Epic’s Overhaul of a Flawed Algorithm Shows Why AI Oversight Is a Life-or-Death Issue,” STAT, October 2022, https://www.statnews.com/2022/10/24/epic-overhaul-of-a-flawed-algorithm/. ↩︎

  11. Raja Parasuraman and Victor Riley, “Humans and Automation: Use, Misuse, Disuse, Abuse,” Human Factors: The Journal of the Human Factors and Ergonomics Society 39, no. 2 (June 1997): 230–53, https://doi.org/10.1518/001872097778543886. ↩︎

  12. Kathleen L. Mosier et al., “Automation Bias: Decision Making and Performance in High-Tech Cockpits,” The International Journal of Aviation Psychology 8, no. 1 (January 1998): 47–63, https://doi.org/10.1207/s15327108ijap0801_3. ↩︎

  13. Arvind Narayanan and Sayash Kapoor, “People Keep Anthropomorphizing AI. Here’s Why,” Substack newsletter, AI Snake Oil, February 2023, https://aisnakeoil.substack.com/p/people-keep-anthropomorphizing-ai. ↩︎

  14. Murray Shanahan, “Talking About Large Language Models” (arXiv, February 2023), https://doi.org/10.48550/arXiv.2212.03551. ↩︎

  15. Neil Perry et al., “Do Users Write More Insecure Code with AI Assistants?” (arXiv, December 2022), https://doi.org/10.48550/arXiv.2211.03622. ↩︎

  16. K. Mosier and L. Skitka, “Human Decision Makers and Automated Decision Aids: Made for Each Other?” 1996, https://www.semanticscholar.org/paper/Human-Decision-Makers-and-Automated-Decision-Aids%3A-Mosier-Skitka/ffb65e76ac46fd42d595ed9272296f0cbe8ca7aa. ↩︎

  17. Kathleen L. Mosier et al., “Automation Bias, Accountability, and Verification Behaviors,” Proceedings of the Human Factors and Ergonomics Society Annual Meeting 40, no. 4 (October 1996): 204–8, https://doi.org/10.1177/154193129604000413. ↩︎

  18. See Nicholas Epley, Adam Waytz, and John T. Cacioppo, “On Seeing Human: A Three-Factor Theory of Anthropomorphism,” Psychological Review 114, no. 4 (October 2007): 864–86, https://doi.org/10.1037/0033-295X.114.4.864, which outlines three psychological triggers for anthropomorphism: 1. If you don’t know how a non-human agent works, we default to thinking it works like us because that’s what we have the most familiarity with. 2. “The motivation to interact effectively with nonhuman agents” causes us to attribute human characteristics and motivation. 3. Seeing agents as human-like enables “a perceived humanlike connection with nonhuman agents.” ↩︎

  19. Arleen Salles, Kathinka Evers, and Michele Farisco, “Anthropomorphism in AI,” AJOB Neuroscience 11, no. 2 (April 2020): 88–95, https://doi.org/10.1080/21507740.2020.1740350, esp. “In the general public it inadvertently promotes misleading interpretations of and beliefs about what AI is and what its capacities are.” Anthropomorphism also limits the researchers, which is important to note in light of the common belief in the field that the spark of AGI has been struck: “Furthermore, anthropomorphic (implicit or explicit) interpretations of AI might also have epistemological impact on the AI research community itself, insofar as the search for biological and psychological realism (i.e., similarity with biological intelligence) might lead to underestimating the possibility of new theoretical and operational paradigms and frameworks thus ultimately limiting the development of AI.” ↩︎

  20. Computer Power and Human Reason: From Judgment to Calculation (San Francisco: Freeman, 1976). ↩︎

  21. Weizenbaum, 6. ↩︎

  22. Mike Moore and Joel Khalili last updated, “AWS Went down Hard, yet Again - Here’s What Happened,” TechRadar, December 2021, https://www.techradar.com/news/live/aws-is-down-again-heres-all-we-know. ↩︎

  23. Nicholas Fearn, “Heat Waves Are Shutting Down Data Centers and Breaking the Internet,” Gizmodo, December 2022, https://gizmodo.com/heat-waves-climate-change-data-center-server-shut-down-1849916741. ↩︎

  24. Sarah Myers West, “Competition Authorities Need to Move Fast and Break up AI,” Financial Times, April 2023. “Without the robust enforcement of competition laws, generative AI could irreversibly cement Big Tech’s advantage, giving a handful of companies power over technology that mediates much of our lives.” ↩︎

  25. “GPT-3,” Wikipedia, April 2023, https://en.wikipedia.org/w/index.php?title=GPT-3&oldid=1147823352. ↩︎

  26. James Governor, “The Great Flowering: Why OpenAI Is the New AWS and the New Kingmakers Still Matter.” James Governor’s Monkchips, April 2023, https://redmonk.com/jgovernor/2023/04/13/the-great-flowering-why-openai-is-the-new-aws-and-the-new-kingmakers-still-matter/. ↩︎

  27. Emily M. Bender et al., “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?” in Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21 (New York, NY, USA: Association for Computing Machinery, 2021), 610–23, https://doi.org/10.1145/3442188.3445922. ↩︎

  28. Karen Hao, “We Read the Paper That Forced Timnit Gebru Out of Google. Here’s What It Says.” MIT Technology Review, 2020, [Karen Haoarchive page](https://karen Haoarchive page). ↩︎

  29. Davey Alba and Julia Love, “Google’s Rush to Win in AI Led to Ethical Lapses, Employees Say,” bloomberg.com, April 2023, https://www.bloomberg.com/news/features/2023-04-19/google-bard-ai-chatbot-raises-ethical-concerns-from-employees, “Even after the public pronouncements, some found it difficult to work on ethical AI at Google. One former employee said they asked to work on fairness in machine learning and they were routinely discouraged — to the point that it affected their performance review. Managers protested that it was getting in the way of their ‘real work,’ the person said.” ↩︎

  30. Chirag Shah and Emily M. Bender, “Situating Search,” in ACM SIGIR Conference on Human Information Interaction and Retrieval, CHIIR ’22 (New York, NY, USA: Association for Computing Machinery, 2022), 221–32, https://doi.org/10.1145/3498366.3505816. ↩︎

  31. Simon Willison, “Thoughts and Impressions of AI-Assisted Search from Bing,” February 2023, http://simonwillison.net/2023/Feb/24/impressions-of-bing/. ↩︎

  32. “Microsoft’s New ChatGPT AI Starts Sending ‘Unhinged’ Messages to People,” The Independent, February 2023, https://www.independent.co.uk/tech/chatgpt-ai-messages-microsoft-bing-b2282491.html; Simon Willison, “Bing: ‘I Will Not Harm You Unless You Harm Me First’,” 2023, http://simonwillison.net/2023/Feb/15/bing/. ↩︎

  33. Dmitri Brereton, “Bing AI Can’t Be Trusted,” February 2023, https://dkb.blog/p/bing-ai-cant-be-trusted. ↩︎

  34. Nick Diakopoulos, “Can We Trust Search Engines with Generative AI? A Closer Look at Bing’s Accuracy for News Queries,” Medium, February 2023, https://medium.com/@ndiakopoulos/can-we-trust-search-engines-with-generative-ai-a-closer-look-at-bings-accuracy-for-news-queries-179467806bcc. ↩︎

  35. Zoë Schiffer, “Microsoft Just Laid Off One of Its Responsible AI Teams,” March 2023, https://www.platformer.news/p/microsoft-just-laid-off-one-of-its. ↩︎

  36. Tom Warren, “You Can Play with Microsoft’s Bing GPT-4 Chatbot Right Now, No Waitlist Necessary,” The Verge, March 2023, https://www.theverge.com/2023/3/15/23641683/microsoft-bing-ai-gpt-4-chatbot-available-no-waitlist. ↩︎

  37. Aaron Holmes and Kevin McLaughlin, “Ghost Writer: Microsoft Looks to Add OpenAI’s Chatbot Technology to Word, Email,” The Information, January 2023, https://www.theinformation.com/articles/ghost-writer-microsoft-looks-to-add-openais-chatbot-technology-to-word-email; Benj Edwards, “Microsoft Aims to Reduce ‘Tedious’ Business Tasks with New AI Tools,” Ars Technica, March 2023, https://arstechnica.com/information-technology/2023/03/microsoft-brings-chatgpt-style-ai-to-developer-and-analysis-tools/. ↩︎

  38. Gerrit De Vynck and Will Oremus, “As AI Booms, Tech Firms Are Laying Off Their Ethicists,” Washington Post, March 2023, https://www.washingtonpost.com/technology/2023/03/30/tech-companies-cut-ai-ethics/; Will Knight, “Elon Musk Has Fired Twitter’s ‘Ethical AI’ Team,” Wired, accessed April 27, 2023, https://www.wired.com/story/twitter-ethical-ai-team/. ↩︎

  39. Nico Grant and Karen Weise, “In A.I. Race, Microsoft and Google Choose Speed Over Caution,” The New York Times, April 2023, https://www.nytimes.com/2023/04/07/technology/ai-chatbots-google-microsoft.html. ↩︎

Read the whole story
bitofabother
338 days ago
reply
Portland, OR
Share this story
Delete

How to Compete with Patreon

jwz
1 Share
Sibylla Bostoniensis:

It is incredibly frustrating that the only thing more stupid than Patreon is all the alleged Patreon substitutes that clearly don't even understand what Patreon does.

Pro tip: Patreon has no meaningful competitors, and also it sucks, so there's a huge opportunity for somebody to kick sand in its face and take its lunch money. But to do that you would have to understand what actually Patreon does that is worth it to creators to allow Patreon to take 5% of their proceeds (and then pass on to them a second 5% in payment processing fees).

Because I want there to be Patreon competitors, I will explain what Patreon actually does, so if somebody would like to actually compete with Patreon they will know what they have to actually accomplish.

Previously, previously.

Read the whole story
bitofabother
353 days ago
reply
Portland, OR
Share this story
Delete

Responsibly empowering developers with AI on MDN

1 Comment

Generative AI technologies powered by Large Language Models (LLMs), such as OpenAI’s ChatGPT, have shown themselves to be both a big boon to productivity as well as a concerningly confident purveyor of incorrect information. At Mozilla, we’re excited about the potential role generative AI can play in creating new value for people while demonstrating leadership in responsible and ethical implementation approaches.

One domain where we see high value is in training LLMs on reference documentation, enabling developers to more quickly find solutions or get answers about the purpose or behavior of code snippets. MDN’s mission is to provide a blueprint for a better internet and empower a new generation of developers and content creators to build it. We see a path forward to “overlay” useful AI-driven helpers on top of MDN’s canonical web dev documentation to aid developers in new ways. Of course, human-authored canonical documentation will always be available and clearly called out as such. To this end, last week, MDN launched AI integrations with its body of web developer reference documentation, manifesting in two features that have been in development over the last few months: AI Help and AI Explain, both powered by GPT-3.5. 

AI Help enables MDN readers to ask questions with a conversational interface, and the tool offers concise answers with MDN articles related to their questions for contextual help. AI Help is limited to offering information only based on MDN content, and is now in beta and available to logged-in MDN readers.

AI Explain enables readers to explore and understand code blocks and parts of code examples embedded in MDN documentation pages, describing the underlying purpose and behavior of the code. AI Explain is still an experimental tool.

Our internal work on these tools has us hugely optimistic about their potential to save developers tons of time as they seek answers and learning resources. Early-stage developers appear to benefit most, as they are least likely to know exactly where to go or which keywords to search for to find the answers they seek. As we’re still in the early stages, we’re also seeing instances where these AI tools provide incorrect information in response to a query. The MDN team is working to identify these cases and develop fixes so that we can continuously raise the quality and utility of the answers provided by AI Help and AI Explain. We are also planning to make it easier for people to flag bad answers, creating issues sent to the team to investigate. In the end, our goal is to make MDN more accessible and useful to a wider variety of developers, without diminishing MDN’s role as the canonical reference source for high-quality information about developing for the web.

Feedback Matters

With the launches of AI Help and AI Explain last week, we received a wide range of feedback from readers, from delight to constructive criticisms to concerns about the technical accuracy of the responses. We’re only a handful of days into the journey, but the data so far seems to indicate a sense of skepticism towards AI and LLMs in general, while those who have tried the features to find answers tend to be happy with the results.

For AI Help in particular, the feedback indicates that the majority of people who used this feature and voted consider the answers to be helpful. Please do try AI Help and give us your feedback so that we can enhance this service based on how it works for you!

In the case of AI Explain, the pattern of feedback we received was similar, but readers also pointed out a handful of concrete cases where an incorrect answer was rendered. This feedback is enormously helpful, and the MDN team is now investigating these bug reports. We’ve elected to be cautious in our approach and have temporarily removed the AI Explain tool from MDN until we’ve completed our investigation and have high-quality remediations in place for the issues that have been observed.

Here is an example of an incorrect answer: the AI indicates that the code defines a grid with two rows and two columns, where it actually only defines two rows, giving an incorrect answer.

AI Assistants: A Helpful Complement

One of the user experience challenges LLMs present is how to set customer expectations. As useful and efficient as it is to use LLMs to interact with reference documentation, even extraordinarily well-trained LLMs — like humans — will sometimes be wrong. We see this as an emerging challenge in the field of human-computer interaction that goes well beyond MDN’s limited use case: What should people do when chat-based systems render answers in good faith that are merely likely to be correct? One approach to responsible systems design could be to provide people with better ways to check answers and build confidence in their veracity. This is a field we’re excited to participate in.

Speaking of the human element, LLMs also have the useful attributes of having zero ego and infinite patience. LLMs don’t mind answering the same question over and over again, and they feel no compulsion to gate-keep for online communities. It’s sadly not uncommon today for learners to face the discouraging experience of asking technical questions in online forums only to be dismissed or shamed for their “dumb question.” When done well, there is a clear opportunity to employ AI-based tools to improve the pace of learning as well as inclusivity for learners.

The important question, and the core of this venture, is whether these AI assistants can improve our work and user experience. Do they help our users find information faster? Do they simplify complex concepts for them? Do they support Mozilla’s values and mission? If the answer is ‘yes’ to these questions, then they are fulfilling their purpose. We will work to convey this perspective clearly to everyone who interacts with our AI features.

The MDN Community

MDN has a long history as a human-authored and curated source of knowledge, so we know that AI integration in MDN will be a sensitive topic for some. While many are excited about the possibilities of generative AI, others might prefer that MDN stay how it is. That’s fine. We are a community, and differences of opinion are normal and healthy. MDN’s strength lies in our community. We do see a path forward that preserves the human-authored goodness while also providing tools that offer additional value over that amazing body of content. We need your input, criticism, kudos, and experience to ensure we’re employing AI in the most useful and responsible ways. Your feedback is critical to this process, and we will continue to take the feedback and adjust our plans in response to it. LLM technology remains relatively immature, so there will certainly be speed bumps along the way. This journey has just begun, and together, we’ll shape MDN’s future.

In the continued spirit of community and transparency, the MDN team will publish a postmortem blog in the coming days, that will include a breakdown of the feedback we’ve received and dive into some of the details of developing, launching, and pausing AI Explain.

The post Responsibly empowering developers with AI on MDN appeared first on The Mozilla Blog.

Read the whole story
bitofabother
437 days ago
reply
Boooooooo.
Portland, OR
Share this story
Delete
Next Page of Stories