March 6, 2024

You miss 99% of GitHub talent profiles

Ivan Kleshnin

Ivan Kleshnin

Co-Founder of DevScanr
Tatiana Kleshnina

Tatiana Kleshnina

Co-Founder of DevScanr

GitHub was never designed for recruitment. But it’s one of the largest databases of tech talent, so it’s not a surprise that many recruiters use it for their purposes. And sometimes with great results.

What if we told you that many GitHub users are non-discoverable via search. You’d probably reply: “Sure, a couple of empty profiles here and there. What’s the deal?” The deal is that even with elaborate queries, on average, you see an insignificant minority of relevant profiles... Not the ½ of them. Not the 10%. But somewhat below 1% of the total!

Time to experiment

Let’s search together. Today we are trying to find a Senior React Developer for a company in Poland. They want someone familiar with TypeScript. We will start with broader queries, to estimate the volume of the talent pool, and narrow down them further as we go.

Step #1

TypeScript is a wider category than React so we’re starting with it, and the location. We input the following search query:

Getting about 900 profiles... Not many, and we’ve just started 🤔 All personal data is masked, by the way. The totals are real and trivial to recheck.

Step #2

TypeScript is a general-purpose language, used also on Backend and whatnot. And we’re interested exclusively in Frontend-ers. So we try the next query:

The query brought us only 25 profiles. It’s not looking good...

Step #3

Is “Frontend” category too narrow? No, it’s actually the widest category of developers and JavaScript + TypeScript together make for the most used language(s). Let’s try “React” instead, but exclude “React Native”, which is a different niche (Mobile dev vs Web dev).

20 profiles. Replacing a topic with a technology, has made no difference 😢

Step #4

We have not completed our search yet. Remember, we are looking for “Senior” developers. What is the best way to search for experienced programmers on GitHub? Unfortunately, there’s no such UI filter or direct query counterpart.

One trick is to filter by account creation date. The assumption is that the older the account – the more experience that particular person has. In general, it’s often inaccurate, but it’s the best tool we have. The query is for talent with 5+ years of experience:

9 prospects. Certainly not a number we can work with.

Step #5

In a desperate attempt to do something we come up with the idea to approximate seniority by the number of followers. The assumption is that Junior developers have fewer followers, in average, and we can filter by this value.

Got a single person... From the expected talent pool we reduced ourselves to a literal talent tub 🛁

Time to think

GitHub search does not work the way you may think it works. A lookup for engineers in location:Poland does not take engineers in location:Warsaw into account. Results of React search are limited to people who mention this technology in their profile descriptions. Ignoring repositories and everything else.

It’s not that something is broken with GitHub. It’s just that (any) Search is an advanced piece of technology, hard to implement and costly to support. To be fair, most platforms don’t provide anything comparable to GitHub search. Try to find engineers on Medium or StackOverflow...

Some parts, like search by a programming language work pretty well. They took user repositories into account. But as soon as you start to add location and/or titles – it fails quickly. So, in its current state, GitHub can be a good tool for recruiters who work for remote companies 🌐.

We don’t have space to discuss all the alternatives here. Google X-Ray, some people might consider, would give us comparable results. It would be better with human text understanding but worse with e.g. repository introspection. It would still be unable to generalize cities to countries, unable to interpret profiles with empty bios/readmes, and so on.

Welcome DevScanr

DevScanr is an AI-powered search and analytics platform that works on top of GitHub and other talent databases. It was created to address the aforementioned limitations.

Talent search is our primary focus so we made sure it works better. Let’s try to search for the same Senior React Developer + TypeScriptfor a company in Poland.

Step #1

We start with the same logic as previously. Looking for TypeScript developers in Poland:

Getting 12K profiles vs 1K profiles we had previously on GitHub. For equivalent search, on the same database.

Now we see people who’s locations are “Warsaw” or “Krakow” (just city, no mentioned country). Could you trick the system by listing all large Polish cities on GitHub? Kinda – if you don’t mind about refreshing your school geography knowledge each time you search.

And it’s not getting easier. GitHub queries are trimmed to 256 characters. Your query is matched against raw string so homograph cities will match, regardless of country or state. You’re able to search for Vue, which templates are considered a “programming language”, but not React that is treated differently. Main profile README.md – large text you see on some profiles – is entirely ignored. The list goes on.

Step #2

Let’s continue our search. Adding the specialization we’re interested in:

4K profiles, it’s something. DevScanr AI can recognize specializations of GitHub users. Those who haven’t mentioned the “Frontend” topic directly but worked with e.g. React or Svelte long enough will appear in the search for the frontend developers.

Step #3

Searching by skills can be an alternative to mentioned/inferred specializations on DevScanr as well:

It brought us about 7K profiles. More than the previous time and for a valid reason: not all people who know React are Frontend Developers. Some of them are FullStack or Web Developers. And some are Backend-ers who just happen to know the popular framework. We were interested in React Developers so here we’d need some ad-hoc filtering.

Step #4

Finally, the seniority:

The hardest and the most subjective metric... How to account for freelance experience? How to consider student practices in real companies? Should we, at least partially, count a similar experience? There’s no single correct answer and preferences vary from employer to employer.

DevScanr filters currently have a single metric called “Dev. Experience” which represents a cumulative development experience of a person. We have other tools to help you narrow down the experience with particular technology. But that’s another topic.

Looking for the same 5+ years of seniority as previously (exact number is not the point) we get 4K profiles. On the same underline database – we dare to remind one more time.

Conclusion

Here’s the summary of our above experiment:

GitHub
User Search
DevScanr
Talent Search
TypeScript Developers in Poland90012,000+
Frontend TypeScript Developers in Poland254,000+
React & TypeScript Developers in Poland207,000+
Senior React & TypeScript Developers in Poland134,000+

The difference is an order of magnitude for queries with programming languages and/or for most popular keywords – mentionable in titles/bios. It’s getting even more dramatic in other cases. Our observation is definitely not a random artifact or something specific to Poland. For example, here’s the table for Java Developers in Serbia:

GitHub
User Search
DevScanr
Talent Search
Java Developers in Serbia1,200+3,000+
Mobile Developers in Serbia82902
Java Mobile Developers in Serbia11470
Senior Java Mobile Developers in Serbia0114

And the ratio persists for other queries we tested... Now consider the following. That (relatively) narrow group of discoverable engineers on GitHub gets all the attention. Not because they are the best, but because they mentioned right keywords, making themselves searchable. The lesson for engineers is to add those damn keywords, but we’re talking about recruitment today.

So one group of developers is getting insensitive to DMs while another group is happy to respond. It’s basic psychology and we simply cannot discount such effects.

GitHub and DevScanr search functionalities will improve with time. But the emphasis on different aspects of it will likely remain. Finding the optimal tool for the job might sound boring, but it’s rarely a bad idea. With DevScanr you can have both: the glorious database of GitHub, with all activity and community insights, but also the convenience of a specialized recruitment tool.

Boost Your Sourcing Process

DevScanr opens a new way to source, evaluate, and engage tech stars. Powerful talent search, data insights & analytics at your disposal. The platform has a free tier with invite-only access.