r/homeassistant 20d ago

Has anyone played with the ai_task integration to analyze an image and generate data?

Just saw some previews of 2025.8 and what's coming and this is pretty dang cool - https://rc.home-assistant.io/integrations/ai_task - you can now feed it images and ask it to create a structured response such as a number, or I'm guessing maybe even booleans or something else

Apparently it was somewhat put together off of this use case - "count how many chickens are in the coop, from the latest camera image" and it does it: https://houndhillhomestead.com/google-gemini-powered-goose-coop-door/

I'm definitely envisioning having this pull in a still image from our front yard camera at night, and then asking it to count the number of garbage cans it sees at the end of the driveway. If it's Monday night and cans=0, then I can have it push an alert to me and/or the wife, depending on who's home.

7 Upvotes

21 comments sorted by

5

u/mbailey5 20d ago

I use this blueprint to check bins are out, bbq cover is on, pond water level is ok: https://llmvision.gitbook.io/getting-started/setup/blueprint

3

u/Marathon2021 20d ago

Yeah, the possibilities seem pretty limitless. Throw a camera in the garage, have it check every night as you go to bed whether it's open or close. Bins out on the street (as you say), etc. Wow - can't wait to start playing with this!

I had actually started playing around with the Seven Segment integration that can read numbers off of LCD displays and convert them into structured values - I have a friend who maybe wants to leverage some of this for a light industrial use case. This will be way better (SSOCR that Seven Segment uses is finicky).

2

u/FollowMeImDelicious 20d ago

I use frigate with frigate+ models (waste_bin) to accomplish this. If its the night before pickup day, it will send notifications if bins arent in the pickup zone.

2

u/passwd123456 20d ago

This kind of stuff is really cool, glad to see it’s getting easier.

I was starting to look into image analysis for exactly this when I realized it was already built into the Frigate NVR software I was using.

As of earlier today, I have this set up in Frigate for my driveway camera. Also, once both garbage trucks (trash and recycling) have picked up to let me know (when trash can moves and garbage truck in front, increment counter. When counter = 2, notify. When bins are back in their normal spot, reset counter).

I also have my floorplan dashboard in HA show the cars based on whether they’re parked in front or garage. But doesn’t know which car is which, just assumes which car is which. That would require another image analysis routine like what you’re talking about!

0

u/Z1L0G 20d ago

the latest Frigate beta has built in LPR (and face recognition!) which might make stuff like this easier

2

u/deicist 20d ago

"how long does the washer say it has left?" Might be cool.

Probably outside the scope of this but "look at all my cameras and see if you can tell me where the cat is" would be nice.

3

u/Z1L0G 20d ago

absolutely not outside the scope, and probably not even that difficult! I plan to implement something similar as we have one cat who just won't use the flap so sometimes gets shut out! So a notification that she's waiting to get it would be good. Our flap is also smart and will record if a cat goes in/out, but obviously gets out-of-sync if a cat goes through a window or door, so would be interesting to try to track "cat presence" outdoors (we don't have any indoor cameras) via cameras.

1

u/passwd123456 19d ago

Not out of scope! FWIW, Frigate’s integration has camera and zone sensors for objects such as cats, but can’t tell you which cat if you have more than one.

1

u/deicist 19d ago

I use Scrypted rather than frigate, I had an issue where frigate would use 100% CPU if I used the web UI and kill my server 

1

u/passwd123456 19d ago

Ooof. That sucks. I actually use scrypted, too, but only to get the cameras into HomeKit for the rest of the family. Works great for this, never has to be to touched. How do you like the NVR?

2

u/deicist 19d ago

It's really good, the timeline / events can be a bit fiddly when you're trying to find a specific clip but in general no complaints. I have a 1TB disk that I just use for video so it records the last few days easily.

1

u/Z1L0G 20d ago edited 20d ago

Yes, I created a template sensor at the weekend (following an incident 😂) to detect if any washing has been blown off our washing line! Of course, this was all possible before (I was doing similar with a Node Red flow), it's just made a lot more straightforward with ai_task.

At the moment I'm only using Open AI, and it costs I think just under $0.01 to run such a query - not loads but it will start adding up if you have multiple such automations and they run frequently! Although even if it ends up costing $1/day or more that's a drop in the ocean really compared to my overall HA spend (it is a hobby after all 😂) But I'll probably try to implement Gemini as well for some queries as I think you get a decent amount free every day.

Specificially with respect to the case study in the OP - the geese - this was possible previously, I did something similar with our chickens using Frigate to count them. It worked fairly well, but the original model Frigate used did not work that well with chickens (often detects them as cats or dogs) however there is a specific "hen" model coming soon with Frigate+ I believe (plus you can use it to train your own data). This will be much quicker/cheaper than using GenAI for the same task (plus all run locally). You've always had the ability to use Frigate (or something like OpenCV) with a different model that you've trained but that's not a rabbit hole I've ever ventured down!!

1

u/Marathon2021 19d ago

to detect if any washing has been blown off our washing line

Curious to know what you used as a prompt for that? Sounds like you're outputting a boolean, but how would it know the difference between "wash on the line" vs. "wash blown off the line" vs. "no wash out on the line"?

I think you get a decent amount free every day

The original blog post about the chicken coop IIRC said something like you could run like 15 queries an hour at the free tier, which was enough for that particular use case.

1

u/Z1L0G 19d ago

the prompt took a bit of tweaking! ChatGPT is great at refining its own prompt actually, which was very helpful. This is what I'm currently using (below):

Yes, it returns a boolean, which is basically "on" if there's a problem and "off" if not (either no washing on the ground, or no washing at all). If there's no problem, I don't need to know about it! There are 2 attributes for the sensor to give a bit more detail.

Evaluate the washing line. Return a boolean field `washing` which is true if any washing looks detached from the rotary washing line, false if all washing is secured or no washing is visible.
            Only return true if you see an item of washing such as clothing, a towel or a bed sheet detached from the main mass of washing on the line with a visible gap between it and the line, or completely off the line; do not count low-hanging fabric that remains continuous with the line.
            Also return a brief description in `detail` of the item or items which have become detatched, in the format "A xxx is on the ground". If the overall result is false then return "No issues detected" in this field.
            Lastly return a confidence score in 'confidence' which is either 'low', 'medium' or 'high'
            Only output structured data.

1

u/Marathon2021 19d ago edited 19d ago

Hah! Wow, yeah - that's definitely cool that it can make that assessment, but I knew you probably had to spend a bunch of time wordsmith'ing that to get the right result. Still pretty cool though, think about what you could do for this on let's say a factory floor or something.

But this is the kind of detail that I wanted to understand, because I'm probably going to set this up with a Reolink camera to look at the end of our driveway at 10pm before trash pickup day, and have it count the number of bins that at are at the end of *our* driveway. So that's important, because I don't want it counting all the bins it can potentially see because there's more of our street and other neighbors' homes to consider. But if I painstakingly describe the field of view, what is our lawn, what our driveway is, what the street is, etc. I bet I can convince it to always return 0, 1, or 2.

1

u/Z1L0G 19d ago

Yeah. Prompt engineering is the new skill for 2025 lol. It makes me laugh because sometimes people say that GenAI is useless, but the truth is often just that they can't write the correct prompt to get the result they want 😂

For your specific use-case, there would be a few different ways of doing it. Frigate+ (paid service) actually has bins as part of it's model so should be able to detect/count them easily (not actually tried it yet myself though)

If you do go down the GenAI route remember there's the Camera Proxy integration that would allow you to crop the image from your camera and isolate just your driveway, which would save you having a write that part of the prompt! https://www.home-assistant.io/integrations/proxy/

1

u/Marathon2021 18d ago

Oooh, that’s good to know that could definitely make the job easier! I was playing with the seven segments integration a while back (reading digits off of standard LCD readouts through SSOCR) and cropping down to a very tight space made a big difference. So I’ll give this proxy integration a shot…

1

u/Marathon2021 18d ago

Hey, so how are you passing the result from the AI into a helper? Can you share some of your YAML on that? I've got a lot of the individual pieces working, but I can't seem to get my values over into a helper so that then that can be used for all sorts of other things in HA...

1

u/Z1L0G 18d ago

I think you've got this sorted now with your other post? But yeah, the response returns a data structure not just the answer, so you have to handle that as per the example.

1

u/Marathon2021 18d ago edited 18d ago

Yep, got it sorted late last night. HA is now counting the number of cars in our garage and whether the garage door is open or closed, through nothing more than a $50 Blink camera, and a free LLM API account.

Wow. My mind is truly blown.

My spouse is going hate this and love this :D

The possibilities are endless. In fact, I’m meeting with a friend later this evening who runs a small winery and we’re going to try to see what kind of use cases we can figure out for “light industrial” scenarios — I already set up another Blink camera pointed at a basic LCD bedside clock radio and told it to read the time and store it in a variable, and it did it - I’m guessing he has lots of analog as well as digital “dumb” gauges that it might be useful to read and plot over time.

1

u/Z1L0G 19d ago

I've just created a Gemini API account and moved my HA sensor to that! So that won't be costing me anything now 😃 Just need to figure out how to add some error-trapping now to alert me if I run out of credits 🤔