r/dataisbeautiful • u/DavidWaldron OC: 24 • 4d ago
OC How accurate are the initial BLS jobs estimates? [OC]
48
u/Gilchester 4d ago
Take out the tail results (which strongly influence fit and slope) and you're left with a blob in the middle that is vaguely sauntering up and to the right.
Unsurprisingly, it's easy to make predictions well when the economy is booming or in the shitter, but hard when it's doing moderately well.
13
u/DavidWaldron OC: 24 3d ago
Yes, it’s true of pretty much any correlation that if you remove the variance from the series they will eventually become uncorrelated
-1
u/travelcallcharlie 3d ago
It's actually the exact opposite, the residuals at the tails are much larger than in the middle.
We can't really say much more though without seeing the statistical outputs of the model and the residual plots.4
u/Gilchester 3d ago
I think you misunderstand me. I am not saying it's a better fit at the tails. I am saying linear regressions are more strongly influenced by points at the ends of the continuum than the middle.
0
u/travelcallcharlie 3d ago
I’m understanding you perfectly. The point about the residuals being larger at the tails is that they’re actually causing the model to have a lower predictive power, and a worse fit. Even if your point was about them having a bigger impact was correct, then the tail data actually skew the trend line away from the optimum (this is obvious when you see lost of the data points are below the trend line in the first quartile, and above in the last).
If you removed the extreme tails (which you shouldn’t, statistically speaking), then the center would be less blobby than you think, and the trend more in line with the “perfect agreement” line.
What the model is actually demonstrating is that it is easy to predict employment data in periods of calm and stability, and much harder in periods of sudden drastic change.
6
u/skilliard7 3d ago
They tend to overestimate it during a declining market and underestimate it during a growing market.
5
u/DavidWaldron OC: 24 4d ago
This is a series of charts analyzing the accuracy of the initial/preliminary total non-farm payroll estimates in the BLS monthly jobs report. The comparison is to actual counts from the QCEW, which is based on mandatory unemployment insurance filings.
Blog post has more details on the results.
Tools used were R for data analysis and d3.js for charts. All available here.
2
2
4
1
u/johnniewelker 3d ago
Two things about this data
1) Are we comparing BLS initial report vs BLS generated later reports? So it’s basically testing what BLS says it’s correct. Not that we have a better way, but we have to understand the underlying data and methodologies could in itself be wrong 2) BLS headline numbers are a delta: jobs created minus jobs lost. US typically creates - and loses - 1.5M jobs a month. So a 50k error might seem big, but it’s really a 3-4% error rate. Meaningful, but very hard regardless of data and metrology to get right.
3
u/DavidWaldron OC: 24 3d ago
The blog post contains more info on this. The initial estimate is survey-based, released within ~2-3 weeks of the reference period. This is scored against the QCEW counts which are based on mandatory UI tax filings by states, which are not fully available until almost a year later. BLS role in this is largely just to compile and publish. These are independent programs and methodologies.
Regarding the idea of judging the error against churn, rather than the net change: the way the survey works is it takes the total payrolls of companies in month t and compares them to payrolls in month t-1. It does this by industry/state/size and uses the ratios come up with the overall estimates. So it’s not measuring hires and separations and taking the difference. It’s directly trying to measure the net change. But even so, you’re right. It’s a very hard thing to do, especially so quickly.
There is a separate BLS program called JOLTS that tries to estimate hires and separations via survey, but it’s much smaller and the results are less detailed and have larger margins of error.
1
u/beach-science 2d ago
Great analysis! Reminds me of this study from a year ago: https://educationadvisors.com/us-job-market-projections-accuracy/
2
u/teardrop2acadia 2d ago
Can you compute an ICC instead of the regression line for the first plot? It would give you a more direct estimate of agreement. That can account for differences in slope and bias.
1
u/david1610 OC: 1 2d ago
I'd like people to have a go forecasting economic data, it's incredibly difficult. Data is the limiting factor, the amount of forward predictive variables isn't very good. Plus you are at the mercy of policy decisions and financial markets that turn on a dime.
The fact they can explain 60% of the variance in labor markets is impressive.
That being said as another poster pointed out, if you remove the tails in the distribution, also called the outliers the fit will go down. Even 40% of the variance would be really good.
42
u/goldpony13 4d ago
This is super interesting and very crisp. Could you please explain the shaded bars on the last graph? Not sure I understood that correctly.