Love Data Week | Where's the Data?

Happy Love Data Week! Every year around Valentine's Day, this global campaign puts the spotlight on how we collect, manage, share, and reuse data. This year's campaign runs from February 9–14 with a theme that asks a fundamental question: Where's the Data?

The timing of data

It’s a fairly simple question. But for researchers, it touches on a lot, asking them to reflect on the entire lifecycle of data from collection to management to reuse. In the social sciences especially, working with data means capturing human lives in numbers and variables. As you can imagine, that doesn’t come cheap in terms of time or money.

Large-scale datasets are typically collected through government and federal agencies, which means there's a long journey before the data actually gets to researchers. Securing additional funding and going through approval processes can take years before real-world data becomes a dataset available for research. On top of that, once data is collected, maintaining and updating it is a whole other challenge. It's not uncommon for datasets collected 10 or even 20 years ago to still be in active use. Take SizeUSA, for example — a dataset widely used in the U.S. apparel industry, built in 2003 by 3D-scanning roughly 10,000 people across more than ten cities nationwide to capture their body measurements. Over the past two decades, our diets and lifestyles have changed dramatically, and so have our bodies. Yet this dataset is still used as a benchmark for body size standards and research. With large-scale public data, the costs of collection and maintenance are high, and by the time data ends up in a researcher's hands, it's often already outdated.

This doesn't mean these datasets have no value; they absolutely do. But in some fields, researchers need to capture and analyze rapidly evolving social phenomena, and sometimes large-scale public data just doesn't offer the fresh perspective they need. In certain areas, research is much more like a race against time.

Another way to access data: web scraping

In this sense, I don't think the answer to "Where's the data?" always has to be a large, well-established dataset. Sometimes it's about working with smaller, readily available data, allowing researchers to run quick experiments, iterate on analysis, and steer their research in new directions. In some fields, it's more about collecting and analyzing data on what's happening in the world right now.

And that's where web scraping remains a valuable tool for social science researchers. With the emergence of AI-native online communities like Moltbook, some are probably already envisioning a future of AI agent–driven data collection. But at the core of it all is still the ability to extract and process web data, including text, images, and more. And this kind of data, which captures a rapidly changing reality, can become the starting point for entirely new research questions.

Where's your data?

As we celebrate Love Data Week, I'd love to hear from you! In your field, where's the data? And how do you collect and manage it?

Last updated

Was this helpful?