I do agree with your point about auto-generated captions being better than no captions. But isn’t it bad to insert them automatically on creation? If we use these models to caption images shouldn’t it be done by the screen reader instead? That way people can benefit from future advancements of the tech and customize the captioning system for themselves. With the current system there is no way to tell if you got a crappy AI caption that you may want to replace with a better auto-generated caption or a human written caption.
I’m not sure. The blog post is not entirely clear on that.
Agreed. Context is usually very important for images. But with an auto-generated caption embedded in the document itself, you already lose some context. Because if the automatic caption is incorrectly stored as “The ___ video game installer” you cannot decide anymore if this was written by the author with the context in mind or just generated. Which I would argue is worse than no caption, as it lowers your trust in all captions.
Absolutely, I think that will be by far the best solution. It could massively encourage users to write their own captions if in most cases you only need to accept the suggestion. But so far, that seems unlikely to be the way forward. Why do that when you can just throw even more “AI” at the problem?