Numbers don’t lie — but can mislead if there’s not enough of them

When it comes to making decisions on the farm, the more data the better — especially if that data comes from other farms

Did you know that there’s a direct correlation between the divorce rate in Maine and the per capita consumption of margarine in the States?

It’s true. If you chart the data on a graph going back a decade or two, one line rises and falls with the other. That’s the problem with data like this — there’s no plausible connection between these two variables, but people may draw conclusions from them all the same.

And that’s a real risk in agriculture, says Matt Meisner, head of data analytics for the Farm Business Network.

Read Also

Alberta Whisky Act would tie spirit’s identity to local grain and farm roots

Proposed legislation would give Alberta whisky a defined identity tied to provincial grain and water, with a whisky trail aimed at boosting tourism.

“When working with farm data, we have to be really careful of this,” said Meisner, who presented at the recent AgSmart event at Olds College.

“We have to be careful that every pattern we find actually makes sense for the farm so that we’re not falling into this trap. It’s very easy to do if you’re not careful.”

Part of the problem is the sheer volume of data being created. Where once farming was a low-tech, high-touch business, now nearly everything on the farm — from the combine in your field to the cellphone in your pocket — is churning out data.

“There’s more and more data being generated than there ever has been before. Everything that we do now generates data,” said Meisner. “Data on its own is good, but it doesn’t necessarily tell you what you should do. Data is just a means to an end.”

Ultimately, that end should be increased profitability. Data can do that, but only if there’s enough of it to make an informed decision.

“There are a lot of decisions that farmers have to make, and a lot of those decisions have been made largely without a lot of data for a long time,” said Meisner. “The data they did have was largely based on personal experience and anecdotally sharing data.

“But we tend to be bad at making objective decisions when we don’t have a lot of data. We tend to make decisions that are really based on emotion, not facts.”

Aggregating data

To get a complete picture of what’s really going on, it’s critical to capture all of the data that’s being generated on a given operation — and Meisner does mean everything.

“It’s very easy in agriculture to just focus on one variable at a time, but that can be problematic,” he said. “Agriculture is very complicated, and there are lots of interactions between different variables.

“You can’t really say the best seeding rate if you don’t know the variety that you’re planting. You can’t say the best amount of fertilizer to apply if you don’t know the soil type. All these things are connected.”

But an individual farm’s ability to generate enough data to be useful is surprisingly limited, said Meisner.

“Data from one farm is really not enough to parse out all of the patterns you want to learn from the data,” he said. “That’s the trade-off if you’re looking at data from the five acres near you — it’s very relevant, but it’s very small. Even though the data may be from conditions exactly like yours, there may be so little of it that it’s not enough to trust.”

The trick, then, is to aggregate an individual farm’s data with datasets from other farms.

“It’s not just about getting datasets — it’s about getting a lot of different datasets from different farms, and then sharing that data together,” said Meisner.

That’s part of what his company (a North American operation with offices in California and High River) does, he added.

“Data can give us a lot more confidence than our own limited experience, and more data is more confidence,” he said. “We focus on getting these really complete layers of data from the farm, and then doing the same across hundreds of thousands of farms in order to build a more complete picture.”

A good example of that is variety selection. Any given farmer may have only seeded a handful of the top 50 varieties, so while he or she might feel confident choosing the best of what they’ve tried, those varieties are only a drop in the bucket compared to what’s available — and farmers can’t test everything.

“If you’re a 2,000-acre farmer trying to experiment with different combinations of varieties, soil type, and seeding rate, it would take decades to get 100 acres of data for each of those possible combinations. One hundred acres of data is really not very much in the grand scheme of things,” he said.

“So if they were to pick seed just based on their own data, they might be missing out on products that are better even than their best product.”

Smaller datasets also have significantly more variability than larger ones, he added. For example, the first 500 acres of yield data for a specific variety might have a lot of highs and lows, but as the dataset increases, the average levels out, giving farmers a clearer sense of how the variety performs overall.

“It would be a very bad outcome for a farmer if they were to plant a variety based on the first 100 acres of data thinking it yields 220 bushels an acre, when a week later, you get more data and realize that’s actually not the case,” said Meisner.

“This pattern of noisy averages when you have a little bit of data and stable averages when you have a lot of data is very apparent, and it really highlights the problems of making decisions based on small datasets.

“You could really just be looking at noise.”

No silver bullet

Take, for instance, the example of two different seed companies marketing soybeans with the exact same genetics but different names (an illegal practice in Canada, but a common one in the U.S.).

If both companies are selling the same genetics, it shouldn’t matter which one you buy, but looking at the yield data for each product — which, again, is genetically identical — shows a surprising amount of variability.

In fact, one product outperformed the other by almost double in the first 250 acres of data. Once there were 4,000 acres of data, though, the “tables had turned completely” — the lower-performing seed began outperforming the higher one.

But at 25,000 acres, the real picture started to emerge: Yields for these two ‘different’ varieties were the same — exactly what you’d expect from genetically identical varieties.

“We needed a lot of data to figure that out,” said Meisner. “The reason for that is the real world is messy. There’s a lot of variation. So even if the average yield is 45, some farmers are going to get 60. And if you’re looking at the first five acres you have, it’s hard to know if that 60 is reality or if it’s good weather, good soil, good management — you don’t know, and you might make the wrong decision if you don’t consider that.”

Aggregating the data with other farms can help with that, said Meisner.

“When you’re looking at aggregated datasets from the real world, you have to have a lot of data, or you can really be led astray by all the noise and error that these small datasets can have,” he said.

“As you get more data, the picture can change a lot. You need enough data to see a true pattern.”

But ultimately, big data isn’t a silver bullet, he said. Farmers still need to ‘ground truth’ it to see how it works on their own operations.

“A dataset from your own farm is still very useful,” said Meisner. “A variety or chemical or fertilizer that looks good on average might not work on a given farm. The aggregated data can help you see in general what’s working, but actually trying it on your own farm is still going to be important.

“I don’t think aggregated data is ever a replacement for trying something on your own farm.”