I originally posted on askstatistics, but was told that my question might be too complex, so I thought I'd ask here instead.
I am collecting behavioral data over a period of time, where an instance is recorded every time a behavior occurs. An instance can occur at any time, with some instances happening quickly after one another, and some with gaps in between.
What I want to do is to find clusters of instances that are close enough to one another to be considered separate from the others. Clusters can be of any size, with some clusters containing 20 instances, and some containing only 3.
I have read about cluster analysis, but am unsure how to make it fit my situation. The examples I find involve 2 variables, where my situation only involves counting a single behavior on a timeline. The examples I find also require me to specify my cluster size, but I want my analysis to help determine this for me and involve clusters of different sizes.
The reason why is because, in behavioral analysis, it's important to look at the antecedents and consequences of a behavior to determine its function, and for high frequency behaviors, it is better to look at the antecedent and consequences for an entire cluster of the behavior.
edit:
I was asked to provide more information about my specific problem. Let's say I've been asked to help a patient who engages in trichotillomania (hair pulling disorder, a type of repetitive self-harm behavior). The patient does not know why they do it. It started a few years ago, and they have been unable to stop it. An "instance" is defined as moving their hand to their head and applying enough force to remove at least 1 strand of hair. They do know that there are periods where the behavior occurs less than others (with maybe 1-3 minute gaps between instances), and periods where they do it almost constantly (with 1 second gaps between instances). So we know that these "episodes" are different somehow, but I am unsure how to define what constitutes an "episode".
To help them with this, I decide to do a home/community observation of them for a period of 5 hours, in order to determine the antecedents (triggers) to the episode and consequences (what occurs after the episode ends that explains why it has stopped) to an episode of hair pulling. This is essential to developing an intervention to help reduce or eliminate the behavior for the patient. We need to know when an episode "starts" and when it "ends".
My problem is, what constitutes an "episode"? How close together do a group of instances of the behavior have to be to be included in an episode? How much latency between instances does there need to be before I can confidently say that it is part of a new episode? This cannot be done using pure visual analysis. It's not as simple as 50 instances happen within the first hour, then an hour gap, then another 50 instances happen, where the demarkation between them would be trivial to determine. Instead, the behavior occurs to some degree at all times, making it difficult to determine when old episodes end and new episodes begin. It would be very unhelpful to view the entire 5 hour block as a single "episode". Clearly there are changes, but I don't know where to quantifiably determine it.
It's very important to be accurate here because if I determine the start point wrong, then I will identify the wrong trigger, and my intervention will target the wrong thing, and could potentially make the situation worse, which is very bad when the behavior is self-harm. The stakes are high enough to warrant a quantifiable approach here.