T O P

  • By -

QueenofFoo

Tl;dr. If you want to use this in your work “today” Most of the videos in the article are about “making a new app”. One exception: first video clones an existing codebase and sets up a dev env. They don’t make any code changes in this video, so it is not demonstrating any coding work- just clones the repo, install dependencies, and run the app in a local container. However, the AI it lacked context about what the app did other than a readme. It is why even after the user issued prompts it struggled to log into the app. I think this is why it didn’t make code changes. The SWE testing (not in the article but discussed on X) were primarily single file changes. This tracks. Key Take aways: Agentic models *today* work well for fast prototyping new apps but struggle with existing ones because the “context” required to understand an existing apps didn’t exist in the codebase. Interesting practical application of chat bots in setting up dev environments. I like this a lot. *Today*Copilot models will still work best for mod-ing existing apps, because the context can be better specified at a lower cost versus agentic models. 14 minutes to tell the AI how to login with a user name and password to an app running in a local container at $100/hour in AI API token fees? $24, probably too spendy for most of us to replicate at home. Delegating your dev env to an AI? That is a personal choice.


goomyman

It’s almost like coding is just syntax and not the job of a developer. It doesn’t work because it doesn’t have the context. Yes of course it doesn’t because the context is why they pay you. The syntax is how you get the job done. AI isn’t sitting in meeting and discussing requirements. If it has requirements it doesn’t know all the edge cases of users. We don’t sit around coding all day. If anything actual coding is at best 30% of your work. So a perfect code AI would remove maybe 75% of that - you still need to review and test it unless you do a 100% trust model lol. So no coding jobs aren’t going away but AI is a very useful tool that can improve efficiency by a lot.


breadcodes

I've been saying this forever. The only people who are getting replaced are people working for other people who think writing code is the job. They have no idea how many meetings go into changes, or how expensive it would be to pay for a model to digest enough tokens for meetings + the entire code base + context of older meetings to not regress functionality + QA feedback + testing + git history on a decade old project + the codebases of other parts of the internal project + external team APIs / their codebases + email histories for clarification on conditions. I really just don't think we have enough affordable computational resources (now or even in the future, we're approaching a limit of scale) for even 5% of developers to be replaced even if we somehow had a Devin model that accepted unlimited tokens for meeting context and was perfect at its job. It would cost more to run than a junior developer at that point. I'm sure these are great for fresh new projects, but an overwhelming majority of the world runs on 5-40 year old software. Even the examples were making a new app to play chess, one of the best documented games in the world, and has only 2 features.


goomyman

Exactly, AI already today makes it easier for me to write simple scripts of fixes in languages I’m not familiar with. Nothing I can’t look up. We run AI in code reviews and it’s great for suggested refactoring and catching bugs. It’s very useful but not as a replacement. I don’t think it’s even that great for a new project. We’ve had templates that auto generate entire sites for decades. But I still 100% need to know what I’m doing. Which means you still need to know how to code. It is ground breaking and does allow you to do more with less. But I wouldn’t say it’s going to allow you to lay off that many people. There is always deadweight at companies, those layoffs don’t mean AI is replacing them, it’s that they weren’t producing enough value to begin with.


vytah

> Exactly, AI already today makes it easier for me to write simple scripts of fixes in languages I’m not familiar with. Nothing I can’t look up. That's why many say that the only thing programming-related thing AI has automated away is StackOverflow.


SaucyEdwin

Out of curiosity, what AI do you use for code reviews? That seems like a cool idea.


Randommaggy

The peak net positive productivity I've been able to achieve with AI tools so far is approximately 5% for chore type tasks.


QueenofFoo

This whole thing makes me think about my own experience using AI. I have some language specific libraries I need to translate to work with other, unsupported languages. Complex stuff, but I have good examples and loads of test cases. I am thinking about experimenting with agentic models to see how far they can get since it seems to meet the right conditions. 1) A blank slate 2) A well-documented objective 3) good code examples to pull from I doubt it will work, the libraries are pretty complex. I won’t use Devin for this since their bells and whistles won’t matter for this job. But this whole thing has prompted me to try it and see for myself where the boundary of statistical modeling and the human software development process is. I guess this begs the question: what is the Devin moat? I am not a wunderkind programmer. Loops and retries? We all know how to do this :-)


Richandler

> “making a new app”. A bit loose and fast with that phrase. None of that apps were novel or new. They're copies of existing apps that are littered all over the internet.


QueenofFoo

Correct, I should have been more clear. “New simple apps”.


Mana_Mori

> Interesting practical application of chat bots in setting up dev environments. I like this a lot. I just feel like this is better solved elsewhere, like with DevContainers or other potential solutions. You'll still have troubles with any other non-trivial setup like cross-compilation, some C++ projects etc.


Realistic-Minute5016

Interestingly enough they haven't submitted(or at least it hasn't been published) their claimed swebench score. I guess it's just not published yet? [https://www.cognition-labs.com/post/swe-bench-technical-report](https://www.cognition-labs.com/post/swe-bench-technical-report)


Bolanus_PSU

It's also very possible that Devin was trained on parts of swebench either intentionally or otherwise.


SketchySeaBeast

We're definitely at the "Goodhart's Law" era of LLMs.


gdahlm

They do call that out, but the quality of what passes this benchmark isn't the quality I would expect from anyone calling themselves a 'software engineer' https://github.com/CognitionAI/devin-swebench-results/blob/main/output_diffs/pass/sympy__sympy-15542-diff.txt


SketchySeaBeast

Enthusiasts continue to be enthused.


starlevel01

> プロ驚き屋 ("a professional surprised man", New Slang) = a person who excitedly shares state-of-the-art tools/technologies like ChatGPT on social media with hyperbole like 神/最強/ヤバすぎ, as well as with hallucination/overstatement at times based on a few cherry-picked examples https://twitter.com/takashionary/status/1645371251236503553?lang=en-GB


agr5179

I stopped reading after the first guy talked about how amazed he was that it was able to clone a git repo and run some basic commands from the readme to start up an app. Seriously, how is that impressive at all?


Wiremeyourmoney

You clearly didn’t know me when I started coding.


[deleted]

Tfw running `git status` made me feel like a 1337 h4x0r when I was a beginner The early stages of learning programming for the first time really is magical. It's like a whole new world is being revealed to you I am now learning functional programming for the first time (OCaml) and I am getting a similar feeling


Brilliant-Job-47

FP changed how I solve problems. It’s one of the most influential things I ever learned


Felinski

FP?


Edgar_A_Poe

Furry Porn


therealdan0

It really do change the way you think


Whatamianoob112

Functional programming


favgotchunks

Floating Point Edit: now that’s a cursed language paradigm


Wigginns

What resources did you use to learn it?


Brilliant-Job-47

College course on Haskell exposed me to it, but tbh I didn’t fully understand what I was learning at the time. It was years later I worked with some folks who were into functional programming that I took another look and just consumed any materials I could find on it. If you learn how to leverage map, filter, and reduce really well you are 70% of the way there.


Long-Marketing-5895

Ci/cd has left the chat


Migu3l012

We should start writing a command sudo rm -rf / at tge ending of every README to make the repository AI proof


Nearby-Technician767

Right...so AI doing backstage? I would be more impressed if AI was able to debug the countless out of date ReadMe's and fix it, than following instructions. Following a good readme isn't hard.


ViveIn

What system could this autonomously before?


CampAny9995

I’d be really surprised if GPT-4 wasn’t able to spit out those commands for you or take in the README as context in a prompt and generate some commands. I would bet that Copilot has already had prototypes of these features and MS has reason to believe the technology isn’t (currently) there to handle serious enterprise workflows. Right now this just looks like a toy for hobbyists and 15 year olds who tweet about programming.


Hedede

Yep, I just tried this. ChatGPT printed out the correct commands.


Keavon

And it probably didn't take 15 minutes.


Joshiane

Lol yeah, he was impressed by npm install


redfournine

Also this requires that the project have perfect readme. Most don't even have proper readme. Worse, some project's readme are just plain wrong due to it being outdated. And also requires that dev machine is clean af.


boofaceleemz

Hey, better than me when I was an intern just starting out.


currentscurrents

Before LLMs, I'd have told you that we were 100 years away from computers following unstructured instructions from a readme. Now you dismiss them as "just following the readme".


Proper_Mistake6220

> Now you dismiss them as "just following the readme". Since most readmes are wrong or outdated, it would be a big mistake to follow the instructions contained inside.


mcr1974

I am still impressed by that, nothing quite like that up to a couple of years ago. we quickly get used to it and expect more. I think it's the fact that it wasn't programmed to do that that's impressive. you could code that determnistically but it might becomr quite nuanced.


throwaway490215

You clearly put in significant effort. I dropped it and checked these comments after > It’s been almost three days since Devin AI’s announcement and developers worldwide have eagerly awaited early access to the software engineering tool.


tedbarney12

True, I think it's nothing but hype, and if they have cracked something very amazing (which I don't believe), then why are they not giving access? They will probably launch after some months, by which time people will have forgotten about it.


Realistic-Minute5016

Likely because what they are doing is extremely expensive and possibly full of some pretty gnarly potential security holes since it seems to be actually executing code in the background(something no other LLM does, for good reason). Their initial site basically had an open S3 bucket relay.


poralexc

My juniors use ChatGPT, and it's cool if it helps them, but I still have to constantly tell them to use existing helper methods (or even basic library functions) instead of crudely reinventing things already in our codebase. It kind of reminds me of this sci-fi novel where intelligent but non-sentient aliens see our attempts at communication as an attack, then try to waste our time and resources by inundating us with correct looking but meaningless transmissions.


hahdbdidndkdi

Your company allows chatgpt usage for development? I've worked at two since it came out, and both explicitly forbid it due to legal/privacy concerns.


LeThales

I've worked for a bank in the past. We could not access github, we could not install apps without approval, extremely strict control over information going in/out of the bank VPN. GPT was allowed and we had a couple of teamwide communication informing that yet again someone posted AWS tokens on chatgpt and to "please" not share code or sensitive information there. It was simply more efficient to allow users to use GPT and develop a whole myriad of security tools to monitor what was being written there vs disabling it.


hahdbdidndkdi

Interesting that a bank won't allow GitHub but at the same time will give devs free will to generate code with chat gpt.


tedbarney12

I am done with these shitty LLMs . Its like they are launching a new tool/llm daily


Ok_Construction6425

pumping out new llm's is the new pumping out javascript frameworks


smallballsputin

Like blockchain a few years back. AI is more intresting but wont live up the hype


LetsGoHawks

My company put a LOT of money into blockchain. A few apps made it to production, a few moved to SQL databases along the way, and most were abandoned when management finally figured out blockchain wasn't magic.


calinet6

What a bunch of idiots. Literally just chasing after shiny objects like crows looking for coins.


stumblinbear

Sounds like my company rn


calinet6

Oh I wasn’t being sarcastic. I think it’s an accurate depiction.


stumblinbear

Yeah I know! We hired a few people for it. I don't envy them but I am curious to see if it goes anywhere


ElasticFluffyMagnet

They are also too scared to be missing out on some technology.


calinet6

Fear is absolutely the driving factor.


Randommaggy

If sat 50/50 fear/blind greed for a lot of companies.


sonobanana33

Well once I understood how it worked I saw that there wasn't anything really useful to it. Like, anyone could have come up with that design, given the constraints… but the idea of mining is terrible, which is why nobody was using it before. It needed some deranged libertarian to think it was a good idea, not a computer scientist.


FartPiano

this is how i feel about every mainstream large tech company now


calinet6

When things were good and interest rates were low and Wall Street didn’t literally threaten boards with takeovers they let people who knew what they were doing take at least a couple major divisions and do good with them. Now the CEOs and COOs (there is no CPO, at least not a real one with real product leadership experience) have to tighten grip and exert control because they’re fearing for their jobs and multi million dollar salaries and stock plans and they’re now suddenly experts in making snap product decisions and setting deadlines and assigning leadership responsibilities to inexperienced but assertive young male pawns who think you can project management your way into a successful product (you can’t) and they’re going to destroy their own companies and tear them apart department by department and competent person by competent person by trying too hard to keep them barely floating. They’re so fucked.


bunk3rk1ng

> When things were good and interest rates were low and Wall Street didn’t literally threaten boards with takeovers they let people who knew what they were doing take at least a couple major divisions and do good with them. This is pure revisionist history fantasy masturbation. Low interest rates led to a lot of unchecked projects exactly like blockchain and NFTs because money was easy to come by. Money was absolutely going to people who had NO IDEA what they were doing. Exactly the opposite of what you are claiming. You got the premise right but the conclusion is completely backwards and has no basis in reality.


Ffdmatt

Maybe ... maybe theyll go away?


flyhull

Now that we understand how this works, let's be smarter devs this time. Start with an old blockchain project that never made it to prod, replace the blockchain libraries with AI libraries, refactor until it compiles and voila, you have an equally functional and up to date AI app.


Shorttail0

The difference is that shiny objects are useful to crows (only they understand)


nerd4code

I’d wager crows get more fulfilment from their shiny things.


TurtleSandwich0

Crows eventually make money.


WorksForMe

What kind of blockchain apps were you building? "We need Blockchain" was a common thing shouted from the execs a few years ago, but they had no idea what they wanted it for. Same with AI now which I can see a few more practical possibilities with our existing product base, but still nothing that would be a gamechanger for us.


teratron27

My company did something similar (on a smaller scale) with AI. We needed a basic product classifier (name, description goes in and pre defined category comes out). The team that got the project had a few engineers on the team who wanted a reason to use an “AI” so built an integration with the OpenAI API, except it would just hallucinate absolute shit. So another team had to take over, rip it out and use an existing off the shelf classifier. Wasted months of time for nothing


gymbeaux4

I’m surprised not all of them ended up as SQL databases. If you don’t have to distribute the ledger, just make it a read only SQL database…


Hazy311

Motherfuckers always looking for a solution to a problem that doesn't exist. I work at a very large bank in America, and they use these dumbass buzzwords all of the time to mystify even dumber people while having no idea of how the shit's going to benefit a customer. Blockchain was a big buzzword they prattled off 6 years ago when they visited my Uni, and then years later when I came on they then wanted an AI classifier for security products. We built it, showed it to them, and they did absolutely jack shit with it because 76% accuracy across 25 fucking classifiers wasn't perfect enough. This was all pre-LLM AI. Management just uses the cutting edge to try to sound smart and maintain a job. Can't convince me otherwise at this point.


gymbeaux4

I used to work for a logistics company that really wanted to use blockchain and "AI". We talked them out of blockchain but they really wanted AI so we fucked around with python and pytorch and tensorflow and got something that was around 80% accurate (best-case). Whole team was canned last year. 100% is not only impossible, but a sign of overfitting...


vytah

What kind of AI does a logistics company (think they) need?


gymbeaux4

Logistics meaning customs primarily. Importing goods into a country is a pain in the ass and a sort of pain point is the accurate classification of those goods. The idea is AI would be able to figure out that some garbage product description == tariff number 0101.29.10.90 for example


vytah

Oh boy, using a dumb language model to deal with the customs office, what could possibly go wrong.


gymbeaux4

Yeah the obvious problem is that when the model is wrong, the client may be fined for paying a lower tariff than they should have paid. Even for a model with 95% accuracy this is probably unacceptable.


buttplugs4life4me

My company has invested heavily into NFTs....yeah...


sonobanana33

reddit?


DirectorBusiness5512

Let me guess: non-technical management?


RICHUNCLEPENNYBAGS

The thing I read that made the most sense to me was that "blockchain" became enough of a buzzword that you could talk your boss into undertaking whatever modernization of an ancient service you wanted so a lot of people forced it in so they could finally ditch their legacy monstrosities.


sexytokeburgerz

A lot of c level types just love buying their teslas and flamethrowers and telling all their friends they are working on the newest thing. House painted black, motorcycle they barely ride, and “NFTs are the future”. Casamigos. Tell me I’m not describing someone you know.


halfanothersdozen

I think this is more like the dotcom bubble circa 99. Some of them will make it big revolutionizing how we are going to do things. A lot of them will wind up as pets.com


drcforbin

Pets.com was just temporally misplaced. The world just wasn't ready for an online store. The supply chain/logistics needed to cheaply move goods all over the US in just a day or two didn't yet exist, plus they had to spend a lot of money convincing people it was possible to shop online


Sisaroth

This is what impressed the most about that interview in the 90s with Jeff Bezos. Like many people he knew online shopping would be the future, be he also knew it was too early to go all in with it, so he went what he thought was the best use case for an online store, a book store.


CanvasFanatic

I don’t think it’s _as useless_ as blockchain (because that bar is buried about a mile down in the planet’s crust). The hype is just way out ahead of the capability.


ward2k

Yeah so far it's been *ok* at some programming basics and boilerplate stuff from what I've tried But even then I have to double check everything to make sure it hasn't hallucinated anything weird so I'm not actually sure if it's been saving any time over using already existing boilerplate generation I guess eventually once it's more refined it might be fine for this purpose, but I've got no idea what people are smoking when they suggest programmers will be out of a job in 3 years


Veggies-are-okay

Yeah it’s at a level that it’s augmenting work rather than automating. I think there needs to be an expectation that everyone involved in genAI projects need to at least be using chatGPT/copilot in their workflow before making any sort of statement about future capabilities of genAI.


Asyncrosaurus

>I don’t think it’s as useless as blockchain That really downplays the disparity between the usefulness of LLMs, and the uselessness of blockchain. I really dislike how the backlash is always disproportionate to the hype. LLMs have a bunch of valid usecases, but now everyone is pretending there's none because of all the stupid hype.


CanvasFanatic

I was explicitly saying it was not useless. Did I not express enough enthusiasm?


Asyncrosaurus

>Did I not express enough enthusiasm?  No, the AI police have been Notified, and are on the way to deliver you to a re-education camp.


imnotbis

How many do they have that aren't forms of generating spam?


MammothDeparture36

I'm usually a calm man but the word "blockchain" triggers me into instant rage.


smallballsputin

You are not alone my friend


pclover_dot_exe

Except that AI apps are already useful. Unlike the blockchain apps which are mostly useless


BaNyaaNyaa

I'd compare the LLM hype to the neural network hype from 8 years ago. Both AI, both of them had that AGI "hype-fear" going on. And, realistically, both of them are useful up to a certain point. But they're both very overhyped.


Daz_Didge

yeah so many shitty chat gpt wrapper that give almost now value. If you want to develop something useful you need to be domain specific with domain knowledge. A good coding automation will be part of an IDE.


srona22

It's like javascript framework, but with "angel investor" behind trying to lure more investors.


[deleted]

Theoretical physics vs applied physics. I'm with the theory people on AI. I'm just not interested in the current applications.


CanvasFanatic

Honestly tired of explaining why this is useless. CTO’s and would-be entrepreneurs, I invite you to turn this thing loose on your codebases and let us know how that goes.


MisterFatt

But it can make a calculator app


Specialist_Brain841

tell that to the ipad team at apple


Me_Beben

So we're safe until it can do todo apps.


thetreat

Yeah. I challenge some Tech CEO to fire their entire SWE staff and either have your product managers use this or hire prompt engineers to do it. If they’re in any sort of competitive software tech market, they’ll be bankrupt in a few years, depending on how entrenched they are.


eldojk

Years? In mere days things will be on fire 🤣


buttplugs4life4me

A few developers at my work have been begging to use one of these but legal has copyright concerns.  The same set of developers have recently complained that if they don't deploy to production that they can't test their code. 


Apprehensive_Bar6609

Ive seen so many posts on how.it achieves % on productivity. Im curious how do you measure productivity? Ill give you a anedoctal example. I have a team member that he is simply a front end magician. His creative process is like disappear for a week, then be like 3 weeks procastrinating and all of a sudden he works for 48 hours straight and makes something jaw dropping amazing. Dont know about you guys but much of my time is searching for needles in a hayestack, debugging terabytes of logs or debate weeks over to how correctly make specifications or even bikeshedding (that happens). When we actually are going to write code, thats the easy part so even if that is 90% faster.... we dont get a 90% productivity increase. Im a CTO but one of the active coders, Ive tried to use all kind of LLMs because Im lazy and want it to write the bloat code and even that is crazy and I just got annoyed and I dont use it anymore. Some of my team member just love LLM , completely acolites or followers of a LLM religion but their productivity hasnt increased to me. So all this 'productiviy increase bullshit' is just too annoying.l to hear. I would love to ser the math in cost vs revenue of a team with AI and one without in a major corporate application. If you measure a person doing task A with another one doing the same task but with AI, yeah, one with AI might be faster, sure. Now measure all you FTEs time and well your developer could save 10 minutes in that function? Great , so he will have more 10 minutes to go on youtube see one more video about AI. but in the overall great scheme of thing the FTE productivity is the same for the project owner, just with the extra cost of CharGPT4 or Claude or Devin or watever is the 'fashion' llm of the day. For me and my projects it has had 0 impact. In fact it has a negative impact as all the stakeholders are dumb as fuck and no matter how you try to explain AI is not magic they still want to put LLM in everything.. and we waste time on POCs that dont go anywhere because after the fun is over they simply arent feasable.


[deleted]

[удалено]


george-silva

Tests. It writes very good test suites for small functions. I ended up delegating all my test writing to github copilot. Of course I have to change it a bit but it is awesome in understanding inputs and outputs. Saves me a solid 4h on it


minegen88

How about these people actually test it? Why are they being so nice to it? "Devin make me a simple tik-tack-toe game" Oh no....my job is gone now... Give it some real task...


Otis_Inf

because the article is about a series of tweets from blue ticks on twitter who peddle as much hype as possible for post engagement.


Davidrabbich81

TL;DR - some surprise at success on complex tasks, some surprise at it falling over. It took 19 minutes to complete 1 task but instantly finished others. This thing is in early access though. Article ends with lots of rumination over “will it replace us all”. It’s definitely not going away, but it’s not taking anyone’s job just yet. Like chatgpt, you can use it to build a base, but you’d better sure understand what it’s produced and how to change it. That’s how you’re keeping your job in 5 years. I do worry about it’s impact on junior devs. I don’t know what will happen to them.


Red-ua

Thing is no one hires juniors for their junior skills. We hire them for their potential to turn into mid/senior/staff+ level devs that have a context of the codebase.


MisterFatt

Right. It’s a great market for hiring right now. What’s it going to look like when job postings for senior+ positions start getting only a handful of applicants asking for a fuck ton of money again? It’s gonna look really smart to have internal talent that can fill your gaps for a lower price than an outside hire


[deleted]

[удалено]


Obie-two

I worry for entry level devs that they use these tools and don’t actually understand what they are doing. If it just gives you the answer ( or an answer) and skips all those learning opportunities. I could see junior devs come out of it with  stronger skills in “prompt engineering” instead software engineering.  They will know how to get an answer but not why it works or what is more efficient, etc


IllllIIlIllIllllIIIl

I'm an experienced HPC syseng, and I'll admit even I've caught myself getting lazy and relying on its output without bothering to fully understand it. Granted, I've never blindly relied on it for something that might have serious consequences if it's wrong, but I find myself asking "Please write a regex that matches patterns like x y z", and not really scrutinizing the output. That's a bad habit that could be disastrous for students.


Nearby-Technician767

I don't worry about junior devs. I worry it will affect the pay of junior devs, though. As a principal dev, I don't want to babysit AI. With a junior dev, they learn style, context and culture and they adapt. Babysitting an AI and then massaging the output to account for those things sounds like hell. Basically until AI has the ability to learn the culture, style and rules of a company, AI will never be able to compete with a junior dev. Also, AI has huge copyright problems. It is particularly telling when Microsoft and others ban their own AI from being used internally. Samsung even had proprietary code leaked. Effectively, AI won't even have a shot at replacing devs until AI can be scoped to a single Org and trained based on the orgs IP. My company told us to use AI, and a month later said we can't because of IP concerns and being warned by lawyers.


Richandler

> It’s definitely not going away Well, who knows. There isn't a lot of transparency to any of these business models at the moment. Both Google Gemini and Microsoft Bing being a free service is a huge issue for the LLM industry. It's commoditized so quickly without getting into the 100x producitivity everyone talks about.


Davidrabbich81

Funny thing is , the thing that bothers me the most about LLMs is the same thing that bothered me about Crypto. The massive amounts of processing power needed to power these services. None of them are lightweight.


_DuranDuran_

Working for a FAANG, having people who understand what the code is doing is paramount when solving an outage.


GayMakeAndModel

When you have a couple decades of programming under your belt and can touch type, co-copilot et al are absolutely worthless. Coding is the quick and east part of my job. Dealing with management and SMEs is like pulling teeth.


Richandler

> Dealing with management and SMEs is like pulling teeth. Sad the engineers are trying to replace themselves rather than management. 🤣


endproof

Imagine an AI trying to drive alignment with another reluctant team about why something should go into their service.


Nearby-Technician767

https://www.gartner.com/en/articles/what-s-new-in-artificial-intelligence-from-the-2023-gartner-hype-cycle As a grey beard, I have been through the Hype Cycle (and even joined up with two of em) enough times to see it happen in real time. AI is very much in the Hype Cycle and until it goes through the Trough of Disillusionment, we aren't going to have very useful discussions about its limits, future and uses. Right now AI is pretty limited, and getting wow'd has a pretty low bar (automating some tasks, getting Stack Overflow answers, etc). For me, AI has only really helped me with my professional writing (I have a Grammarly sub). For coding, like others, it just gets in the way. I paid for GitHub Co Pilot but gave up after it was too jenky, style issues, logic problems, etc. I found that babysitting the AI was not enjoyable; it was like doing code reviews for a junior dev that lacked context. But back to the point, AI is more hype than substance. The hope of what AI could do is overpowering the substance of what it can do. All the AI companies are over selling and over charging because of that Hype.


_3psilon_

Finally a sober viewpoint among all the AI bros around. At our company, management has an OKR to facilitate AI use in engineering. So we're kind of experimenting, for example getting it to triage bugs (misses about 80% of the time), also us devs installing Copilot, Jetbrains AI. As a senior engineer, after trying AI, I see 4 possibly good uses for it: 1. When exhausting Stackoverflow answers, maybe it could still come up with a solution. I haven't seen a good example of this - it usually came up with what I wanted to hear instead, which didn't work, that's why I was consulting the internet in the first place. 2. Basic test generation. 3. SQL and script starting points generation, i.e. stuff that is considered to be more exotic for your own stack (shell scripts, dockerfiles, configuration etc.) 4. Refactoring items that can't be refactored in IDE or via search & replace: adding types, changing some import statements, doing what codemods do for breaking changes in library upgrades. Also, I'm experiencing 2 fundamental issues so far: 1. Trust. 9 out of 10 times what it generates is incorrect and what is says is bullshit. I hate untrustable sources of information, although many people seem to care less about this, preferring quality over quantity. 2. Wishful thinking and using AI for the wrong tasks. Many coders use AI generation for generating boilerplate and solving text editing problems, when they should rather think about reducing boilerplate and learning better text editing (Vim keys etc.)


PathologicallyChill

Just curious of some of the hype cycles you lived through and the ones that took you in? I’m only familiar with the blockchain hype cycle and the dotcom bubble. If you don’t mind sharing.


Xuval

None of the use cases listed in that article have any sort of commercial value.


k_dubious

What are you talking about? Just the other day my manager sprinted into my office and demanded that I get a US airports map app running on my devbox as quickly as possible.


maowai

Can you let me know where you work that you have your own office? I want to apply.


k_dubious

Hah, I work from home. So usually it’s not my manager barging into my office but my dog or my toddler.


Hedede

I'm confused. So your dog demanded that you get the airport map running?


ketralnis

A daily occurrence for all of us


geodebug

Its just a devinstration.


Fit_Yesterday_5955

As they say,  Garbage In, Garbage Out. So also Dev In, Dev Out (whether the second 🥈 Dev represents human flesh or not remains to be seen.)


alpacaMyToothbrush

I feel like you're missing the forest for the trees. Sure, this thing is way worse than your average jr dev, but look at where we were this time last year? Look at the rate of improvement, not just in absolute terms but also in terms of efficiency and given capability at a set amount of storage / compute resources. The field is moving at a breathtaking pace. For every jr I have to calm down because 'zomgosh AI 's gonna take our jebs!' I run into an equal number of sr devs that have just as much experience as I do that insist this is no big deal and nothing is gonna change. Neither of those things are true. The closest analogy I have is that we're scribes at the very early stages of the printing press. Those scribes that were, by competence of their position, worldly, knowledgeable and literate were *insanely* valuable even after the job of hand writing words on a page had been punted into the dust bin of history. It's the same way with devs. Our knowledge is worth more than just the code we write, but those who don't grasp that are in for a rough career. I had a pretty interesting experience the other day where I [modified a gnome extension](https://old.reddit.com/r/gnome/comments/1axhgkl/change_gnome_dock_color_if_wireguard_is_down/) to change the dock color if I lost vpn. I had never done anything like that. My js skills were pretty rudimentary, but between reading docs, asking questions and working with an LLM I was able to cobble working code together in less than an hour in a domain I was wholly unfamiliar with. That capability is powerful.


slipnslider

Or we are at the peak and this is it Remember when Watson was released and everyone thought our jobs would be replaced? Everyone said wow look at it today and just think how great it'll be in five years?! Well Watson was a dud that was released at it's height because it was IBMs last chance to cash in on it. Are LLMs at their peak or their infancy? I don't know but I guess we'll find out in five years


_3psilon_

Now I figured out why I get fucking pissed each time AI bros mention their "whoa buddy, just look at the pace of progress" bullshit. Not only because it completely misses any reasoning or arguments, but also it's the same kind of fallacy every investor makes when investing into anything that has done well during the last few years, thinking that then it follows that the same security would do even better in the future. (Including cryptobros & Bitcoin) In investing, this is called speculation and it's a super risky thing to do. On every investing website you see "Past Performance Is No Guarantee of Future Results" - for a good reason: the world doesn't work like that. So yes, when this week's hyped AI can do X, it's perfectly valid to ask why it *can't* do Y (or what use this AI has at all), and dismissing skepticism or scrutiny via "look at the progress instead" arguments looks plain stupid.


shoe788

Like does nobody remember how up until last year crypto was gonna replace every known institution? Tech hype cycles are nauseating.


son_et_lumiere

we were at the same place last year. there were lesser known projects that did nearly the same thing and had the same problems. GPT engineer is one of those.


Xuval

Nah sorry. The way this technology is being developed fundamentally relies on scraping publicly available code, feeding that into the models and training them based on that. And what sort of code is publicly available for scraping? Exactly this sort of shit. Nifty, but fundamentally uninteresting projects that someone puts on their resumee when they are looking to be hired as a Junior Dev. The reality of the situation is that IRL-Software-Development-Issues leave no publicly available data footprint that can be freely used as training data, because the sort of stuff is smeared across a nasty mess of e-mails, Jira-Boards and internal documentation, all of which happens behind closed doors in internal systems. The velocity with which this tech moves is irrelevant, since its now moving up against the wall of messy corporate reality and secrecy.


todo_code

It would be nice for onboarding new employees or temporary borrowed resources onto my project. If I have a well written readme, i can be assured that their local env will at least start without being bothered.


TryingT0Wr1t3

This feel like trying over and over to explain the requirements to a junior... Everytime I tried these things in the past I only lost time and eventually had to do it myself. Had I just did it myself from the get go I would have finished it faster. The only thing I can ever get these things to write with success are unit tests for functions that are stateless. And I still need to fix and polish a lot around it.


Phthalleon

You can do this with ChatGpt and for free.


MichaelLeeIsHere

I was writing code harder than this literally on my first day at work.


oceansize

I've been a working developer for 24 years now (I chose to never climb the ladder), I don't give a single fuck. Not one. Not a chance in HE double hockey sticks it can decipher the piles of crap I get for requirements/specifications :) Can that problem be solved? Sure, but by then it won't be just me out of job lol.


Fragrant_Chapter_283

Yea, Devin needs a slack integration so he can ask ticket creators wtf they actually want


Proper_Mistake6220

Asking managers, product owners, or clients what they want is more difficult than solving P=NP. We're safe.


malachireformed

Tl;dr -- AI can di a crappy job at making apps for solved business domains! So . . . low/no code devs beware? Seriously - try using AI for anything that isn't a solved business domain. It still sucks. My personal projects aren't even that hard to comprehend, but the AI generated code won't even compile 95% of the time, simply because it doesn't fit cleanly into a widely applicable domain (and even then, the AI tools couldn't even generate the code for stuff 1 step above boilerplate). More seriously -- will AI get there? Probably. But the current gen ain't it.


boneve_de_neco

It would be funny if someone hooked this to an issue tracker to fix bugs automatically and get hacked via support tickets


Mammoth-Asparagus498

Any TL:DR version? I heard that it takes 40min to compile the code, each time


popiazaza

TL;DR: It is indeed still slow. It's good at following the instruction. Problem solving? Not so much. It's like a worse version of ChatGPT but with access to IDE and shell.


Mammoth-Asparagus498

I read somewhere that is uses LLMS like ChatGPT but with a wrapper and some tweaks? >Exactly how Cognition AI made this breakthrough, and in so short a time, is something of a mystery, at least to outsiders. Wu declines to say much about the technology’s underpinnings other than that his team found unique ways to combine large language models (LLMs) such as OpenAI’s GPT-4 with reinforcement learning techniques. “It’s obviously something that people in this space have thought about for a long time,” he says. “It’s very dependent on the models and the approach and getting things to align just right.” https://www.bloomberg.com/news/articles/2024-03-12/cognition-ai-is-a-peter-thiel-backed-coding-assistant?embedded-checkout=true


_3psilon_

I think that 90% of AI startups are ChatGPT wrappers as they don't have the time and funds to train and run their own models.


voidstarcpp

This article is just a roundup of tweets with little added value commentary, and was probably itself written with LLMs. >This is an extremely complex task as most LLMs have no idea how to use APIs, especially the GPT-4 API. A human probably didn't write that sentence. It's both untrue and awkwardly phrased.


undrsght

That article was absolutely, 100% written by AI.


bch8

> It can highly steal the jobs of software developers and debuggers ... > major firms and enterprises are already considering the usage of Devin for scripting source codes. ........ > All the reactions and tests by developers mentioned above have highly enjoyed and experienced Devin’s capabilities first-hand. I did not highly enjoy this article, I think I will keep scripting my own source codes for now


Resident-Trouble-574

Which is good. More garbage for future trainings.


bch8

This is fine


IamWiddershins

It Did An Excellent Job with each task ! wo w !! ! how could anyone think this is even slightly credible


RepresentativeFill26

Building some poc chatbot, sure you can do this using statistical patterns like from a LLM. Adding new functionality to a codebase (which is what 90% of developers do), would you really trust that to a statistical model?


LinearArray

It's 99% a fluffed up marketing gimmick


Nearby-Technician767

Part of me wonders how many of the pro AI comments are being written by AI.


dailydoseofdogfood

People saying things like "you're doing great :) " to a literal program is insane to me


circusfly555

The computer version of a trained monkey, is still a trained monkey.


gdahlm

While I am happy to use tools that make our jobs easier, the "successes" here aren't in my idea. Look on the diff here that was a "pass" [https://github.com/CognitionAI/devin-swebench-results/blob/main/output\_diffs/pass/astropy\_\_astropy-13745-diff.txt](https://github.com/CognitionAI/devin-swebench-results/blob/main/output_diffs/pass/astropy__astropy-13745-diff.txt) ​ I would be irked if someone got rid of that comment block to clean up the cruft and note that the human code fix was simply: https://github.com/astropy/astropy/pull/13745/commits/4d0ededa242a4f59149abcbd03d6030521f4d25c I wonder how many other claimed 'pass' instances are like this or worse?


Wattsit

The day an LLMs can actually replace a software engineer they can replace nearly any digital role. If the LLM can be the dev, it can also be the scrum master, and can also be the business analyst and the QA. It can also be the product owner, the product manager and the project manager on top. It can be the DevOps engineer, the senior architect and the people manager for all these people too. All these silly takes about LLMs replacing devs.. like what are we trying to even achieve here long term? Companies just being 5 C-suite execs lobbing random shite into prompt boxes, then waiting a week to see what dribbles out of their profit printing AI box?


zaemis

I'm eagerly awaiting the day we can replace C-suite execs


Trevor_GoodchiId

Sticking LLMs into conventional agentic framework is a low-hanging fruit and more companies will follow suit. GPT-Pilot started out 6 months ago, the "first AI developer" claim is dubious to begin with. [https://github.com/Pythagora-io/gpt-pilot](https://github.com/Pythagora-io/gpt-pilot) [https://www.youtube.com/watch?v=4g-1cPGK0GA](https://www.youtube.com/watch?v=4g-1cPGK0GA)


JustAPasingNerd

AutoGPT has been out for a year or more. I remember reading about a company that started selling AI junior dev as a service about 3 months after GPT4 launch here in the UK. I cant find the article though. It was insanely expensive, it failed at complex tasks and someone had to check all of its work after. This seems like the same thing just with better marketing and a shitload of hype.


currentscurrents

Agents could work, but they really don't - and I don't think they'll get them to work using supervised learning alone.   To create agents you need reinforcement learning, and a tiny bit of RLHF at the end of supervised pretraining doesn't cut it.


WeNeedYouBuddyGetUp

This article itself is generated by AI….


Sushrit_Lawliet

Marketing hype backed by fancy prompts. Nothing ground breaking. They’ll probably raise a lot of money and grow this bubble further until they can find another hype train next.


jryan727

I spend 80% of my time not writing software, but extracting specifications from stakeholders, discussing edge case handling, and offering guidance and consulting on the design of the system along the way. Is the goal for these LLMs to assume that part of the work? Or the other 20%?


Realistic-Minute5016

There are some interesting...claims in the article. "This is an extremely complex task as most LLMs have no idea how to use APIs, especially the GPT-4 API. The user was also worried whether Devin would securely handle the API keys and deal with any associated package errors. To his surprise, Devin not only asked for the API key but also handled it securely. " First, even Gemini gave me the Python GPT-4 API calls, secondly how the hell do you know it handled it securely?! There could be logs with the keys in it or an s3 bucket with the key in it etc.


prestonph

Instead of those AI apocalypse posts, why don't we talk more about leveraging these new tools? Smart people out there spend countless hours making new tools like this one. If we use that to improve our work, how can we ever be replaced? Devin + Dave is always better than just Devin. Beside, management needs some one to tell and know what Devin is doing anyway. Some one who knows about software. Hmm I wonder who could it be /s


WhatShouldIDrive

Non apocalyptic posts don’t get views.


TommaClock

> Devin + Dave is always better than just Devin But it can be worse than just Dave if it constantly gaslights Dave with incorrect solutions. And that's the biggest problem with LLMs as a development aid. They are very useful for some tasks but for others they will kill your productivity chasing ghosts and it's hard to know which is which.


alpacaMyToothbrush

Yep, I 'pair programmed' with a LLM the other day to [modify a gnome extension](https://old.reddit.com/r/gnome/comments/1axhgkl/change_gnome_dock_color_if_wireguard_is_down/) in less than an hour. I knew nothing about gnome internals and little about js before I started. I look at LLMs as working with a jr who's not very good yet, but has a photographic memory and just so happened to binge just about every man / documentation page on the net. Do they still get it wrong? Absolutely. I wouldn't leave them to make code changes unattended. However, they are also *very* useful, and I'm not even talking about GPT4. I'm talking models small enough to run on my personal machine.


badmonkey0001

> Devin + Dave is always better than just Devin. From another famous "AI": > I'm sorry, Dave. I'm afraid I can't do that.


vytah

I mean, this has already happened: https://news.ycombinator.com/item?id=39395020 >I'd be glad to help you with that C++ code conversion, but I'll need to refrain from providing code examples or solutions that directly involve concepts as you're under 18. Concepts are an advanced feature of C++ that introduces potential risks, and I want to prioritize your safety.


Wandererofhell

And to do all of that requires knowledge of software engineering. If people believe anyone random on the street with no knowledge and expertise can just hop in, use AI to create an app and maintain it long term while slowly scale and expect it to last is just being ridiculous and dooms dayer.


Richandler

On that first video in the article. I'm not very impressed that it can follow instructions in a ReadMe. It's taking a project without read me that would be impressive. Or at least generate a read me for project set-up. The trouble shooting is nice. Throughout the video the guy says it's not meant for that, but that certainly seems like it's most useful feature. It's funny in that video they call the task list "magic." The task list is probably very programmatic rather than AI based. It is a TODO list after all. If anything it shows the value of haveing a todo list, (I know a lot of programmers who don't) and being good at updating it. Also, he claims this is the worst the product is ever going to be. That maybe true, but that's not unlike saying the iPhone 15 is the worst new iPhones are ever going to be going forward. Just because it's the worst it's ever going to be, doesn't mean there isn't a short ceiling above. And for those that don't understand the analogy, the iPhone 16 will only marginally better than the iPhone 15 or even 14 despite the fact that those products will be 1-2-years older. I think this thing can be useful, I just won't be impressed until it can make a novel application rather than recreating a project that's pre-built or has examples all over the internet. However, using it as an assitence probably has some value, but unless it's gonna be $20 a month, waaay faster, and run on the local network I don't see it being a value proposition. Also, is Devin's endgame creating itself? There is an obvious problem for Cognitiion if that isn't and also if it is it's goal. On one hand if it's not the goal, you run into the issue of why would you use an inferior product. On the other hand you run into a user cloning Devin and just running Bevin the AI coding tool Devin built.


uniquelyavailable

when ai llm creators find out how complex a programmers job actually is


Lewke

did you generate this text with GPT, it reads very wooden


tsoule88

Especially based on the conclusion paragraph I’m fairly confident that the article itself was mostly AI generated.


xXBongSlut420Xx

these things are completely useless and i’m so bored of this hype. watching people go wild cause an “””ai””” can write toy demo programs really makes me question how these people develop software. copy pasting stack overflow without understanding the code is a huge red flag for human engineers, why is everyone impressed that a computer can emulate the worst aspects of bad engineers. an llm by it’s very nature cannot understand what it’s doing.


Choice_Pension_9852

To be honest a significant part of programming industry is writing boring and shitty CRUD apps. AI can do that just fine, because it has been already done so many times. These things are not worthless, they can generate lots of boilerplate.


squeeemeister

Fuck Devin and fuck the CEO of the trash company that made him. Over the next few years real people will lose their jobs because some ass hats convince a CFO that’s never written code before that their company can save $$ by using this vapor ware trash.


ycatbin_k0t

LLM is a new NFT


xseodz

There's a near 0 chance I could use Devin, because our readme says PHP 5.6, our composer json says 8.0, our servers use 8.1, 8.0 and 8.3, our vendor folder is actually a checked out docker containers 7.4 It has no chance!


X0B0X0

as professional coder i say Devin to go suck a dick ... I'm not giving up my job


It_s_Techtastic

I find it hilarious. Over a year ago now I built a took I called Brother. Mostly as a way to prove to myself that generative AI was going to replace the need for developers. I was hardly alone BabyAGI, Ollama, AgentGPT and so many others went down this route. It was quickly apparent that though it was possible to do the simplest of apps it was useless in anything even moderately complex. I found it far less funny when I saw how much money they had raised... VC's have no idea what modern programming entails. I've gotten into arguments during pitches over this exact topic. As a former CTO with a deep connection to many others I can tell you Devin caught everyone's attention but is quickly turning into a joke. The push back I get from the VC types is that Devin is the worst and least capable it will ever be... True but that doesn't change the fact that they are focused on the wrong part of the problem.