You miss 99% of GitHub talent profiles
GitHub was never designed for recruitment. But it’s one of the largest databases of tech talent, so it’s not a surprise that many recruiters use it for their purposes. And sometimes with great results.
What if we told you that many GitHub users are non-discoverable via search. You’d probably reply: “Sure, a couple of empty profiles here and there. What’s the deal?” The deal is that even with elaborate queries, on average, you see an insignificant minority of relevant profiles... Not the ½ of them. Not the 10%. But somewhat below 1% of the total!
Time to experiment
Let’s search together. Today we are trying to find a Senior React Developer for a company in Poland. They want someone familiar with TypeScript. We will start with broader queries, to estimate the volume of the talent pool, and narrow down them further as we go.
Step #1
TypeScript is a wider category than React so we’re starting with it, and the location. We input the following search query:
Getting about 900 profiles... Not many, and we’ve just started 🤔 All personal data is masked, by the way. The totals are real and trivial to recheck.
Step #2
TypeScript is a general-purpose language, used also on Backend and whatnot. And we’re interested exclusively in Frontend-ers. So we try the next query:
The query brought us only 25 profiles. It’s not looking good...
Step #3
Is “Frontend” category too narrow? No, it’s actually the widest category of developers and JavaScript + TypeScript together make for the most used language(s). Let’s try “React” instead, but exclude “React Native”, which is a different niche (Mobile dev vs Web dev).
20 profiles. Replacing a topic with a technology, has made no difference 😢
Step #4
We have not completed our search yet. Remember, we are looking for “Senior” developers. What is the best way to search for experienced programmers on GitHub? Unfortunately, there’s no such UI filter or direct query counterpart.
One trick is to filter by account creation date. The assumption is that the older the account – the more experience that particular person has. In general, it’s often inaccurate, but it’s the best tool we have. The query is for talent with 5+ years of experience:
9 prospects. Certainly not a number we can work with.
Step #5
In a desperate attempt to do something we come up with the idea to approximate seniority by the number of followers. The assumption is that Junior developers have fewer followers, in average, and we can filter by this value.
Got a single person... From the expected talent pool we reduced ourselves to a literal talent tub 🛁
Time to think
GitHub search does not work the way you may think it works. A lookup for engineers in location:Poland
does not take engineers in location:Warsaw
into account. Results of React
search are limited to people who mention this technology in their profile descriptions. Ignoring repositories and everything else.
It’s not that something is broken with GitHub. It’s just that (any) Search is an advanced piece of technology, hard to implement and costly to support. To be fair, most platforms don’t provide anything comparable to GitHub search. Try to find engineers on Medium or StackOverflow...
Some parts, like search by a programming language work pretty well. They took user repositories into account. But as soon as you start to add location and/or titles – it fails quickly. So, in its current state, GitHub can be a good tool for recruiters who work for remote companies 🌐.
We don’t have space to discuss all the alternatives here. Google X-Ray, some people might consider, would give us comparable results. It would be better with human text understanding but worse with e.g. repository introspection. It would still be unable to generalize cities to countries, unable to interpret profiles with empty bios/readmes, and so on.
Welcome DevScanr
DevScanr is an AI-powered search and analytics platform that works on top of GitHub and other talent databases. It was created to address the aforementioned limitations.
Talent search is our primary focus so we made sure it works better. Let’s try to search for the same Senior React Developer + TypeScriptfor a company in Poland.
Step #1
We start with the same logic as previously. Looking for TypeScript developers in Poland:
Getting 12K profiles vs 1K profiles we had previously on GitHub. For equivalent search, on the same database.
Now we see people who’s locations are “Warsaw” or “Krakow” (just city, no mentioned country). Could you trick the system by listing all large Polish cities on GitHub? Kinda – if you don’t mind about refreshing your school geography knowledge each time you search.
And it’s not getting easier. GitHub queries are trimmed to 256 characters. Your query is matched against raw string so homograph cities will match, regardless of country or state. You’re able to search for Vue
, which templates are considered a “programming language”, but not React
that is treated differently. Main README.md
– large text you see on some profiles – is entirely ignored. The list goes on.
Step #2
Let’s continue our search. Adding the specialization we’re interested in:
4K profiles, it’s something. DevScanr AI can recognize specializations of GitHub users. Those who haven’t mentioned the “Frontend” topic directly but worked with e.g. React or Svelte long enough will appear in the search for the frontend developers.
Step #3
Searching by skills can be an alternative to mentioned/inferred specializations on DevScanr as well:
It brought us about 7K profiles. More than the previous time and for a valid reason: not all people who know React are Frontend Developers. Some of them are FullStack or Web Developers. And some are Backend-ers who just happen to know the popular framework. We were interested in React Developers so here we’d need some ad-hoc filtering.
Step #4
Finally, the seniority:
The hardest and the most subjective metric... How to account for freelance experience? How to consider student practices in real companies? Should we, at least partially, count a similar experience? There’s no single correct answer and preferences vary from employer to employer.
DevScanr filters currently have a single metric called “Dev. Experience” which represents a cumulative development experience of a person. We have other tools to help you narrow down the experience with particular technology. But that’s another topic.
Looking for the same 5+ years of seniority as previously (exact number is not the point) we get 4K profiles. On the same underline database – we dare to remind one more time.
Conclusion
Here’s the summary of our above experiment:
GitHub User Search | DevScanr Talent Search | |
---|---|---|
TypeScript Developers in Poland | 900 | 12,000+ |
Frontend TypeScript Developers in Poland | 25 | 4,000+ |
React & TypeScript Developers in Poland | 20 | 7,000+ |
Senior React & TypeScript Developers in Poland | 13 | 4,000+ |
The difference is an order of magnitude for queries with programming languages and/or for most popular keywords – mentionable in titles/bios. It’s getting even more dramatic in other cases. Our observation is definitely not a random artifact or something specific to Poland. For example, here’s the table for Java Developers in Serbia
:
GitHub User Search | DevScanr Talent Search | |
---|---|---|
Java Developers in Serbia | 1,200+ | 3,000+ |
Mobile Developers in Serbia | 82 | 902 |
Java Mobile Developers in Serbia | 11 | 470 |
Senior Java Mobile Developers in Serbia | 0 | 114 |
And the ratio persists for other queries we tested... Now consider the following. That (relatively) narrow group of discoverable engineers on GitHub gets all the attention. Not because they are the best, but because they mentioned right keywords, making themselves searchable. The lesson for engineers is to add those damn keywords, but we’re talking about recruitment today.
So one group of developers is getting insensitive to DMs while another group is happy to respond. It’s basic psychology and we simply cannot discount such effects.
GitHub and DevScanr search functionalities will improve with time. But the emphasis on different aspects of it will likely remain. Finding the optimal tool for the job might sound boring, but it’s rarely a bad idea. With DevScanr you can have both: the glorious database of GitHub, with all activity and community insights, but also the convenience of a specialized recruitment tool.