r/dataisbeautiful OC: 24 4d ago

OC How accurate are the initial BLS jobs estimates? [OC]

179 Upvotes

27 comments sorted by

42

u/goldpony13 4d ago

This is super interesting and very crisp. Could you please explain the shaded bars on the last graph? Not sure I understood that correctly.

6

u/exgeo 3d ago

Jobs changes over 3 months of -500k, +600k, -700k would have an average magnitude of 600k. ((|500|+|-600|+|700|)/3=600)

Defining job change as the difference in number of jobs on a month-to-month measure.

2

u/DavidWaldron OC: 24 3d ago

Correct. It’s the average size of the net jobs change to put the size of the bias into perspective

4

u/LSBusfault 3d ago

Seems to be job hopping rather than job creation?

2

u/NotTheBizness 2d ago

Maybe that or layoff and rehire…which I guess is technically job hopping with different intentionality

1

u/DeplorableCaterpill 1d ago

Job hopping would be net 0 job change.

0

u/LSBusfault 1d ago

Which is what the graph shows, a small net change in overall jobs but a large amount of job changes

1

u/DeplorableCaterpill 1d ago

No, it’s showing a small net error in initial job estimates but a large positive change in overall jobs.

-4

u/spaceneenja 3d ago

It’s labeled.

48

u/Gilchester 4d ago

Take out the tail results (which strongly influence fit and slope) and you're left with a blob in the middle that is vaguely sauntering up and to the right.

Unsurprisingly, it's easy to make predictions well when the economy is booming or in the shitter, but hard when it's doing moderately well.

13

u/DavidWaldron OC: 24 3d ago

Yes, it’s true of pretty much any correlation that if you remove the variance from the series they will eventually become uncorrelated

-1

u/travelcallcharlie 3d ago

It's actually the exact opposite, the residuals at the tails are much larger than in the middle.
We can't really say much more though without seeing the statistical outputs of the model and the residual plots.

4

u/Gilchester 3d ago

I think you misunderstand me. I am not saying it's a better fit at the tails. I am saying linear regressions are more strongly influenced by points at the ends of the continuum than the middle.

0

u/travelcallcharlie 3d ago

I’m understanding you perfectly. The point about the residuals being larger at the tails is that they’re actually causing the model to have a lower predictive power, and a worse fit. Even if your point was about them having a bigger impact was correct, then the tail data actually skew the trend line away from the optimum (this is obvious when you see lost of the data points are below the trend line in the first quartile, and above in the last).

If you removed the extreme tails (which you shouldn’t, statistically speaking), then the center would be less blobby than you think, and the trend more in line with the “perfect agreement” line.

What the model is actually demonstrating is that it is easy to predict employment data in periods of calm and stability, and much harder in periods of sudden drastic change.

6

u/skilliard7 3d ago

They tend to overestimate it during a declining market and underestimate it during a growing market.

5

u/DavidWaldron OC: 24 4d ago

This is a series of charts analyzing the accuracy of the initial/preliminary total non-farm payroll estimates in the BLS monthly jobs report. The comparison is to actual counts from the QCEW, which is based on mandatory unemployment insurance filings.

Blog post has more details on the results.

Tools used were R for data analysis and d3.js for charts. All available here.

2

u/Illiander 3d ago

That job change bar being absolute instead of signed is doing a lot of talking.

2

u/Unusual_Selection979 3d ago

This is excellent. Thanks for sharing this.

4

u/theworldisending69 4d ago

Should show results as a percent of the total estimate

1

u/throwRA_157079633 3d ago

Then that CD would be very low - like 0.2, IMHO.

1

u/johnniewelker 3d ago

Two things about this data

1) Are we comparing BLS initial report vs BLS generated later reports? So it’s basically testing what BLS says it’s correct. Not that we have a better way, but we have to understand the underlying data and methodologies could in itself be wrong 2) BLS headline numbers are a delta: jobs created minus jobs lost. US typically creates - and loses - 1.5M jobs a month. So a 50k error might seem big, but it’s really a 3-4% error rate. Meaningful, but very hard regardless of data and metrology to get right.

3

u/DavidWaldron OC: 24 3d ago

The blog post contains more info on this. The initial estimate is survey-based, released within ~2-3 weeks of the reference period. This is scored against the QCEW counts which are based on mandatory UI tax filings by states, which are not fully available until almost a year later. BLS role in this is largely just to compile and publish. These are independent programs and methodologies.

Regarding the idea of judging the error against churn, rather than the net change: the way the survey works is it takes the total payrolls of companies in month t and compares them to payrolls in month t-1. It does this by industry/state/size and uses the ratios come up with the overall estimates. So it’s not measuring hires and separations and taking the difference. It’s directly trying to measure the net change. But even so, you’re right. It’s a very hard thing to do, especially so quickly.

There is a separate BLS program called JOLTS that tries to estimate hires and separations via survey, but it’s much smaller and the results are less detailed and have larger margins of error.

1

u/beach-science 2d ago

Great analysis! Reminds me of this study from a year ago: https://educationadvisors.com/us-job-market-projections-accuracy/

2

u/teardrop2acadia 2d ago

Can you compute an ICC instead of the regression line for the first plot? It would give you a more direct estimate of agreement. That can account for differences in slope and bias.

1

u/david1610 OC: 1 2d ago

I'd like people to have a go forecasting economic data, it's incredibly difficult. Data is the limiting factor, the amount of forward predictive variables isn't very good. Plus you are at the mercy of policy decisions and financial markets that turn on a dime.

The fact they can explain 60% of the variance in labor markets is impressive.

That being said as another poster pointed out, if you remove the tails in the distribution, also called the outliers the fit will go down. Even 40% of the variance would be really good.