Introduction

Canada’s ability to make evidence‑based investment decisions depends on understanding how research organizations contribute to the training of Science, Technology, and Innovation talent. Online career platforms—especially LinkedIn—have enabled organizations to track alumni progression and benchmark outcomes to assess whether training programs prepare individuals for roles that advance national interests. Insights derived from such tracking inform program, policy, and investment decisions.

Across Canada, several research organizations and governments are developing long‑term career outcome metrics. The University of Toronto’s 10,000 PhDs study—based on searches of publicly available alumni information—was an early landmark and has helped catalyze broader adoption of these practices. Building on this momentum, many research organizations, including Major Research Facilities, are exploring similar methods, and I have engaged with several of them on emerging practices and pioneering alumni studies. Drawing from many conversations, I have published a summary of common challenges encountered in alumni analysis  and strategies to overcome them.

In late 2025, LinkedIn began requiring users to log in to view full profiles, substantially reducing the amount of career information available to the public. Where complete profiles were once visible without logging in, only partial “teaser” content is now accessible. This limits the identifiability of individuals and complicates the ethical extraction of reliable career insights.

This article outlines the role LinkedIn data has played in alumni analysis, the alternative strategies available to organizations that previously relied on profile scraping, and the relevant legal and ethical considerations. These approaches demand more deliberate planning and manual effort but can still yield meaningful measures of impact for research and talent development programs.

Role of LinkedIn profile data

LinkedIn has emerged as the leading platform for career data, offering unparalleled access to information about professionals’ employment histories, skills, and educational backgrounds. Its widespread adoption among Canadian graduates, post-docs and researchers provides organizations with a robust resource for tracking alumni outcomes across industries and geographies. In my experience, 40%-55% of Canadian HQP alumni from research organizations have LinkedIn profiles, and is the single-largest source of such information. Leveraging this data enables institutions to gain valuable insights into career pathways, supporting evidence-based decision-making in talent development and program evaluation.

Legal status of public data from LinkedIn and similar platforms

Before LinkedIn implemented its change in late 2025, the platform allowed unrestricted public access to full profile information. Anyone, including search engines, could view comprehensive details from LinkedIn profiles without logging in. To limit potential misuse of personal information and maintain some control over profile data, LinkedIn attempted to cap the number of profiles one could view before requiring users to log in. Once users logged in, they became subject to the LinkedIn User Agreement, which imposes restrictions on data-scraping activities.

Ethical data-scraping companies limit their activities to collecting only publicly available information that can be accessed without logging in or accepting a platform’s Terms of Service or User Agreement. Major platforms have attempted to prevent these companies from using such public data. This approach reflects a desire by platforms to benefit from publishing information—making it discoverable by search engines—while simultaneously exercising private control over its use. That’s like trying to have your cake and eat it too, as the expression goes.

Nonetheless, courts have affirmed that if information is published and made publicly accessible, it is considered part of the public domain. For instance, Brightdata won a legal battle with Meta over this issue. In Brightdata’s words concerning its victory:

“[The] fundamental principle [is] that public data must remain free and accessible. This legal victory only strengthens our fundamental belief in the necessity and legitimacy of web scraping in an era where data is pivotal.”

It is important to recognize that LinkedIn profiles, though publicly accessible, still constitute personal information and are therefore subject to fundamental legal requirements regarding their use. These requirements include ensuring individuals are accurately represented, avoiding the disclosure or republication of identifiable information, and maintaining robust safeguards to protect the data.

Implications of the change for data-scraping

From a legal perspective, information displayed on the LinkedIn teaser page remains public and can still be ethically scraped. But with less data available, research organizations face greater challenges in deriving meaningful insights. Furthermore, the task of identifying alumni using automated approaches becomes more difficult, as key details—such as educational and employment histories required for confirmation—may no longer appear on the public portion of profiles.

Alternative strategies for using LinkedIn profiles

Research organizations can still log in and view LinkedIn profiles. However, the LinkedIn User Agreement imposes restrictions on copying substantial amounts of profile information, even for statistical analysis. Systematic creation of structured profile data—even if done manually—would be considered a violation of the User Agreement. Therefore, even if you individually look up alumni, methodically extracting and recording their information could still breach LinkedIn’s terms.

What you can do, is create your own dataset. You can convert LinkedIn observations into coded analytic variables rather than copying profile content verbatim. This approach greatly enhances your defensibility. For example, instead of directly recording details like “Postdoctoral Fellow at X University (2015–2018),” you would extract only the relevant variables needed for your statistical analysis and represent them in a coded format, such as:

Sector: Academia
Career stage: Postdoc
Years since first use of my organization’s facilities: 3

You can also further buildout your dataset by drawing on other public sources and cross-reference LinkedIn for verification.

The key here is that LinkedIn profiles are only viewed, not stored.

The University of Toronto published similar methods in its 10,000 PhDs study: Reithmeier R, O’Leary L, Zhu X, Dales C, Abdulkarim A, Aquil A, et al. (2019) The 10,000 PhDs project at the University of Toronto: Using employment outcome data to inform graduate education. PLoS ONE 14(1): e0209898. https://doi.org/10.1371/journal.pone.0209898

In this study, project staff searched for and reviewed publicly available online information about alumni, drawing from a variety of websites. Based on these observations, they completed a standardized set of survey questions for each alumnus covering topics such as employment sector and further education. Typically, the survey categorizes their observations into predefined groups—such as 12 possible pathways for further education after a PhD, which included options like postdoctoral fellowships, medical school, law school, or business school.

Conclusion

In summary, while the process of leveraging LinkedIn profile information now requires more deliberate planning and manual effort, research organizations can still derive significant value from the wealth of information that LinkedIn offers. With thoughtful preparation and a focus on ethical data practices, LinkedIn remains a valuable resource for research and alumni tracking, especially when integrated with other public sources. The extra effort invested in handling the LinkedIn data will continue to pay dividends in public trust.