> For the complete documentation index, see [llms.txt](https://help.listly.io/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://help.listly.io/about/blog/love-data-week-or-wheres-the-data.md).

# Love Data Week | Where's the Data?

Happy Love Data Week! Every year around Valentine's Day, this global campaign puts the spotlight on how we collect, manage, share, and reuse data. This year's campaign runs from February 9–14 with a theme that asks a fundamental question: Where's the Data?

<figure><img src="/files/qBvS6hhCD77u8xkbP1bK" alt=""><figcaption></figcaption></figure>

#### The timing of data <a href="#ember62" id="ember62"></a>

It’s a fairly simple question. But for researchers, it touches on a lot, asking them to reflect on the entire lifecycle of data from collection to management to reuse. In the social sciences especially, working with data means capturing human lives in numbers and variables. As you can imagine, that doesn’t come cheap in terms of time or money.

Large-scale datasets are typically collected through government and federal agencies, which means there's a long journey before the data actually gets to researchers. Securing additional funding and going through approval processes can take years before real-world data becomes a dataset available for research. On top of that, once data is collected, maintaining and updating it is a whole other challenge. It's not uncommon for datasets collected 10 or even 20 years ago to still be in active use. Take SizeUSA, for example — a dataset widely used in the U.S. apparel industry, built in 2003 by 3D-scanning roughly 10,000 people across more than ten cities nationwide to capture their body measurements. Over the past two decades, our diets and lifestyles have changed dramatically, and so have our bodies. Yet this dataset is still used as a benchmark for body size standards and research. With large-scale public data, the costs of collection and maintenance are high, and by the time data ends up in a researcher's hands, it's often already outdated.

This doesn't mean these datasets have no value; they absolutely do. But in some fields, researchers need to capture and analyze rapidly evolving social phenomena, and sometimes large-scale public data just doesn't offer the fresh perspective they need. In certain areas, research is much more like a race against time.

#### Another way to access data: web scraping <a href="#ember66" id="ember66"></a>

In this sense, I don't think the answer to "Where's the data?" always has to be a large, well-established dataset. Sometimes it's about working with smaller, readily available data, allowing researchers to run quick experiments, iterate on analysis, and steer their research in new directions. In some fields, it's more about collecting and analyzing data on what's happening in the world right now.

And that's where web scraping remains a valuable tool for social science researchers. With the emergence of AI-native online communities like Moltbook, some are probably already envisioning a future of AI agent–driven data collection. But at the core of it all is still the ability to extract and process web data, including text, images, and more. And this kind of data, which captures a rapidly changing reality, can become the starting point for entirely new research questions.

#### Where's your data? <a href="#ember69" id="ember69"></a>

As we celebrate Love Data Week, I'd love to hear from you! In your field, where's the data? And how do you collect and manage it?


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://help.listly.io/about/blog/love-data-week-or-wheres-the-data.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.