Research shows AI sucks at freelance work, news and real-life tasks: AI Eye
Mass unemployment from AI temporarily suspendedAI agents cant complete 97% of tasks on Upwork to even a basic standard.Researchers at Scale AI and the Center for AI Safety got six different AI models to attempt 240 Upwork projects across categories, including writing, design and data analysis and then compared the results to the real freelancer.The overwhelming majority of the time, the AI models were unable to complete the tasks successfully, with the best AI model, Manus, completing just 2.5% of tasks and earning $1,810 out of $143,991 on offer. Claude Sonnet and Grok 4 managed to finish 2.1% of the tasks.While AI agents are good at simple and defined tasks like “generate a logo,” the research found they are bad at multi-step workflows, taking any initiative or using judgment.So they wont be causing mass unemployment for a while yet.This backs up research from August at MIT, which found that 95% of organizations had zero return on the collective $30 billion theyd invested in AI.Why humans still have the edge over AIAIs are good at pattern matching and predicting words. But theyre currently pretty bad at building internal models of the world, according to WorldTest from MIT and Basis Research.For example, humans have an internal model of their own kitchen in their minds, which allows them to determine where the knives are, how long it will take for the pot to boil, and to plan a sequence of actions resulting in a meal. But the testing showed that three frontier reasoning AI models suck at it.The researchers created 129 tasks across 43 interactive worlds (spot the difference, physics puzzles, etc). The tasks required the AIs to predict hidden aspects of the world, plan sequences of actions to achieve a goal, and determine when the rules of the environment changed. Then they tested 517 humans on the same problems.The researchers concluded:Our analysis reveals that humans achieve near-optimal scores while existing models frequently fail.Humans perform better on these sorts of tasks because we intuitively understand environments, revise beliefs in the face of new evidence, run experiments, start from scratch and explore strategically.And adding more compute doesnt always work, helping in only 25 out of 43 environments.AI gets the news wrong 45% of the timeResearch from the BBC and European Broadcasting Union found that ChatGPT, Copilot, Gemini and Perplexity also suck at reporting the news, failing against key criteria, including accuracy, sourcing, distinguishing opinion from fact, and providing context. 45% of AI answers had at least one significant issue 31% were sourced incorrectly 20% were just wrong with hallucinated details and outdated info Gemini was by far the worst, with significant issues in 76% of its responses.AI cover letters get the wrong people hiredThe primary purpose of a cover letter is to distinguish between low-effort applications. Anyone spending a day writing a good cover letter in response to a job ad that shows knowledge of the company is likely to be diligent and motivated.Unfortunately, new research on Freelancer.com suggests that AI-generated cover letters have completely compromised this signal, resulting in employers hiring fewer people, and often the wrong ones.Compared to the days before AI, skilled workers in the top quintile for abilities are being hired 19% less often, and dumb bums in the lowest quintile are being hired 14% more often.New robot looks like a ladyChinese EV manufacturer XPeng has unveiled theXPeng Iron female robot, which bears a striking resemblance to a human. It has a similar spine movement to humans, and skin stretched over soft 3D lattice structures mimics the human body.Its due to go into production early next year, but the company says it requires too much compute for use in the home, so itll likely be used for commercial applications first, like introducing cars to customers at Xpeng stores.80% of ransomware attacks are made upA new paper from MIT Sloan researchers and Safe Security makes the terrifying claim that 80% of ransomware attacks are AI-driven.Rethinking the Cybersecurity Arms Race examined 2,800 ransomware attacks and concluded that adversarial AI is now automating entire attack sequences, creating malware, phishing campaigns and deepfake phone calls for social engineering. Read also Features Crypto innovators of color restricted by the rules aimed to protect them Features Old-school photographers grapple with NFTs: New world, new rulesBut other ransomware experts say the statistic sounds like 100% bullshit.Researcher Kevin Beaumont tracks ransomware online and says generative AI isnt a major part of any of them.The paper is almost complete nonsense. Its jaw droppingly bad. Its so bad its difficult to know where to start.The researchers list long-defunct ransomware, such as Emotet and Conti, as AI-powered and incorrectly classify IBMs DeepLocker as malware.The paper was so absurd I burst out laughing, wrote researcher Marcus Hutchins.David Sacks is worried about Orwellian AICrypto and AI Czar David Sacks told the a16z Podcast he worries that the censorship weve seen on social media and search engines in recent years will become thoroughly dystopian with AI models.I almost feel like the term woke AI is insufficient to explain whats going on because it somehow trivializes it, he says. What were really talking about is Orwellian AI. Were talking about AI that lies to you, that distorts an answer, that rewrites history in real time to serve a current political agenda of the people who are in power.To me, this is the biggest risk of AI… Its not The Terminator, its 1984.Third time lucky for Coke Christmas adCoke got roasted for its AI-generated Christmas commercial last year (itself a remake of a 1995 ad), so they remade the remake to highlight how much AI video generation has improved in the past year. Last year, people criticized the craftsmanship. But this year the craftsmanship is ten times better, Pratik Thakar, global vice president and head of generative AI at Coca-Cola, told The Hollywood Reporter.Well, maybe 10% better. Read also Features What Solanas critics get right and what they get wrong Features How to control the AIs and incentivize the humans with crypto The 60-second commercial was spliced together from two- to three-second AI-generated clips. Five employees generated 70,000 individually generated clips, which means that about 2000 clips were generated for each clip used in the ad. But it only took a month to put together, compared to the year-long production of their live-action commercials.A survey by Attest found that around 46% of consumers in the US, UK and Australia dont like AI-generated imagery in ads.Googles Project Suncatcher and other Google AI news Theres not enough electricity available for the planned expansion of AI here on Earth, so Google has come up with the innovative idea of launching the data centers into space.The search engine giant has unveiled Project Suncatcher, which offers a vision for fleets of satellites equipped with solar arrays benefiting from near-constant sunlight. Described as a moonshot to one day scale machine learning in space, itll launch two prototype satellites in early 2027 fitted with custom AI chips from Googles ground-based data centers. Google CEO Sundar Pichai says the Gemini app now has more than 650 million monthly active users. Thats up from around 350 million users in March, and 90 million last October. The company was forced to pull its Gemma AI model from AI Studio after the bot claimed Senator Marsha Blackburn had pressured a state trooper for prescription drugs and engaged in non-consensual behavior. In a letter to Pichai, Blackburn said it was not a harmless hallucination but defamation. Subscribe The most engaging reads in blockchain. Delivered once a week. Email addressSUBSCRIBE