HomeTech Claude 3.5 suggests AI’s imminent ubiquity could be a good thing

Claude 3.5 suggests AI’s imminent ubiquity could be a good thing

0 comments
Claude 3.5 suggests AI's imminent ubiquity could be a good thing

the frontier of AI has just moved a little further. On Friday, Anthropic, the artificial intelligence lab created by a team of disgruntled OpenAI employees, released the latest version of its Claude LLM. From Bloomberg:

The company said Thursday that the new model – the technology underpinning its popular Claude chatbot – is twice as fast as its previous, more powerful version. Anthropic said in its evaluations that the model outperforms leading competitors like OpenAI in several key intelligence capabilities, such as coding and text-based reasoning.

Anthropic only released the previous version of Claude, 3.0, in March. This latest model has been called the 3.5 and currently only exists in the company’s mid-sized “Sonnet” iteration. Its faster, cheaper, dumber “Haiku” version is coming soon, as is its slower, more expensive but more capable “Opus.”

But even before Opus arrives, Anthropic says it has the best AI on the market. In a series of head-to-head comparisons posted on his blog, 3.5 Sonnet outperformed OpenAI’s latest model, GPT-4o, on tasks including math tests, text comprehension, and college knowledge. It wasn’t a clean finish, as GPT maintained the lead in some benchmarks, but it was enough to justify the company’s claim to be on the frontier of what’s possible.

In more qualitative terms, AI also seems like a step forward. Anthropo says:

He shows marked improvement in picking up on nuance, humor, and complex instructions, and is exceptional at writing high-quality content with a natural, relatable tone.

They are correcting their own homework, but the description matches the changes I have noticed. Regardless of the technical benchmarks, a conversation with the latest version of Claude is more enjoyable than any other AI system I have used so far.

However, the company is not limited to selling the power upgrade. Instead, in a move favored by less favored competitors everywhere, Anthropic is focusing on both cost and capacity. Claude 3.5 is not only smarter than the old state-of-the-art, the company claims, but it’s also cheaper.

For consumers, the chatbot market is transforming into a “freemium” model: for free, you can access a chatbot (sometimes second level) for a limited period of time, while a monthly subscription gives you the best models and top or unlimited models. wear. For businesses, however, there is a stricter pricing structure based on both questions and answers, and Anthropic has undercut OpenAI on input costs and matched it on outcomes. It is also five times cheaper than their previous best brand.

If you don’t like seeing AI chatbots popping up in more and more places, then this is possibly bad news for you. It’s getting cheaper to build your own business on top of a company like Anthropic, and more companies will do so as prices come down. The good news is that each update also improves the capacity of those companies.

In retrospect, the last year of AI progress has been strange. After the jump in capabilities brought about by GPT-4 last spring, the frontier has moved forward in fits and starts: Claude 3 and 3.5, and GPT-4o all represented definitive improvements, but none the great leap that the AI ​​community has taken. what it implies is yet to come.

At the same time, the presence of any improvement should be encouraging. The fact that significant changes can be made beyond simply spending a lot of money on entirely new training runs suggests that some of the mystery about how these systems actually work is being cleared up, and AI development is moving from an art to a science. This, in turn, should mean that the products of mass training – which is surely happening – can become useful and safe tools sooner rather than later.

Safety, made in Britain

Rishi Sunak speaks on the second day of the UK Artificial Intelligence (AI) Security Summit at Bletchley Park in November. Photograph: Toby Melville/AP

There is a coda to the Claude 3.5 version: its security has been vetted by the UK government. Anthropo says:

As part of our commitment to security and transparency, we have collaborated with external experts to test and refine the security mechanisms of this latest model. We recently provided Claude 3.5 Sonnet to the UK Artificial Intelligence Security Institute (UK AISI) for a pre-deployment security assessment. The UK AISI completed testing of 3.5 Sonnet and shared its results with the US AI Safety Institute (US AISI) as part of a Memorandum of Understanding, made possible by the partnership between the US AISI. USA and the UK. announced earlier this year.

As at the Bletchley and Seoul AI summits, the UK government has managed to turn what could have been a technophile quirk of Rishi Sunak into something seemingly long-lasting and successful. The fact that the public sector AI safety institute is so world-leading that the US government outsources its own work to us is truly something to be proud of.

The next question, of course, is what benefits you can get from it. It’s easy to get an AI model to test if the company involved believes it will pass with flying colors; The question will be whether AISI can change AI labs, rather than just pushing them and seeing what happens.

The EU cannot fire us: we resign

Margrethe Vestager gives a press conference on the EU antitrust case with the Apple App Store in Brussels, Belgium, on March 4, 2024. Photography: Olivier Hoslet/EPA

Apple’s war with the EU is heating up. On Friday, the company confirmed it would not ship a number of new features to EU users, citing “regulatory uncertainties caused by the Digital Markets Act (DMA).” From his statement:

skip past newsletter promotion

We do not believe we will be able to roll out three of these features (iPhone Mirroring, SharePlay Screen Sharing improvements, and Apple Intelligence) to our EU users this year.

Specifically, we are concerned that the DMA’s interoperability requirements may force us to compromise the integrity of our products in ways that put user privacy and data security at risk. We are committed to collaborating with the European Comission in an attempt to find a solution that allows us to offer these features to our EU customers without compromising their security.

It’s a Rorschach test of a statement. If he believes that EU regulation is authoritarian, protectionist and incoherent, then Apple is taking the only sensible step, limiting its product launches to the least controversial features to avoid a potential multi-million dollar fine.

If, on the other hand, you think that Apple’s response to the EU has been one of malicious compliance and outrage at the idea of ​​an authority more legitimate than its own, then this is just another attempt to dissuade governments from following in its footsteps. of the block. .

The EU, it seems, will not be deterred. On Monday, he announced plans to sue Apple for breach:

In preliminary findings, against which Apple can appeal, the European Commission said it believed its engagement rules did not comply with the Digital Markets Act (DMA) “as they prevent app developers from freely directing consumers to alternative channels.” for offers and content”.

In addition, the commission has opened a new non-compliance procedure against Apple over fears that its new contractual terms for third-party app developers will also not comply with the requirements of the DMA.

For the EU, the principle is clear: if a European customer wants to do business with a European company, it should not be in the power of a third country, company or person to prevent that market from operating. In reality, it’s the closest one can get to the bloc’s founding ideal.

But it’s not exactly what the DMA says either. Hence the conflict. Apple wants to follow the law to the letter while maintaining as much control as possible over its platforms; The EU wants to interpret that same law to give the greatest possible freedom for fluid trade. I don’t know which interpretation will win this time, but I’m confident in my prediction that the appeals have only just begun.

Broader technology landscape

Mariah Carey performs in November 2023. Photography: Kevin Mazur/WireImage for MC

AI song generators Suno and Udio have been sued for copyright infringement by record labels after they spit disturbingly music similar to actually existing songs by artists like Chuck Berry and Mariah Carey (pictured above). 404 Media allows you listen for some of the warning signsand they don’t sound good to the defendants.

In June 2014, Flickr released a dataset of 100 million photographs for researchers to work with. A decade later, his charitable foundation asks: How did it affect what was and wasn’t in the last decade of AI??

If you want more Alex Hern in your life, I filled in for John Naughton this weekend, writing about the archeology of defunct social media.

If you’re of a certain age, the question-and-answer lines in text messages will evoke instant nostalgia for a particular era in the late 2000s. This retrospective brings it all back..

“Emotional” AI is not there yet.

You may also like