23 Jun 2026 8 min read Threat Intelligence & Security News

What Is Scanning My Server? An Internet Scanner Reference

You found a line in your logs you do not recognise. CensysInspect, Shodan-Pull/1.0, visionheight.com/scan, a banner that says "Hello from Palo Alto Networks". This reference tells you what each one is, and whether it is worth doing anything about.

The data comes from three controlled research deployments observed across May and June 2026. Most of the scanners below are general internet-wide classifiers. They probe anything exposed to the public internet, regardless of what is running on it, which is exactly why the same operators turned up consistently across all three deployments. Setting aside Umai, the attributable operators accounted for 682 distinct source IPs generating about 4,438 requests, and eight of them appeared on every deployment within hours of each going live. One operator, Umai, sits apart: it specifically targets exposed AI infrastructure and was by far the heaviest single scanner in the dataset, which is why it leads the table below.

Two things are worth saying before the table. First, almost none of this traffic is an attack. The named scanners here catalogue exposed services; they do not exploit them. The exploitation we observed was opportunistic and came from a different set of operators than the scanners listed here, going after exposed secrets and known vulnerabilities, and that is covered in the credential pipeline analysis and the full research write-up. Second, a self-identifying banner is the operator telling you who they are. That is a feature, not a threat signal, and it is what makes the blocking question below worth thinking about rather than reacting to.

Want research like this in your inbox? Subscribe to CyberDesserts for security analysis grounded in primary data, no fluff.

The internet scanner reference table

Umai leads as the heaviest operator by request volume. The general classifiers below it are ordered by distinct source IPs. Each row gives the scanner, the user-agent or banner it presents alongside what it does, the count of distinct IPs we saw, and how the attribution was made. Whether any of them is worth blocking is covered below the table.

← Scroll to see full table

Scanner	How it identifies, and what it does	Distinct IPs
Umai (Entelijan)	User-agent `Umai-Scanner` with `+https://umai.entelijan.com/methodology`, also a bare `node` on some requests. AI exposure intelligence, the one operator here that specifically targets exposed AI infrastructure, and publishes a methodology page covering what it catalogues. Heaviest and deepest scanner in the dataset (14,395 requests, 503 paths), cycling through versions 1.0, 2.0 and an `umai-mcp-v2` variant.	11
Cortex Xpanse (Palo Alto Networks)	Banner "Hello from Palo Alto Networks" with a link to their scanning docs. Attack-surface management, repeat-probing a small path set from many IPs.	160
Infrawatch	User-agent `Infrawatch/1.0 (+https://infrawat.ch/)`. Internet-wide infrastructure scanning, light repeated probes.	153
Censys	User-agent `CensysInspect/1.1 (+https://about.censys.io/)`. Major internet classifier, catalogues exposed services across a wide path set and exposes that data publicly. Highest request volume among the general classifiers (1,217 requests).	121
Hurricane Electric ranges	No scanner banner, rotating real browser user-agents from HE network ranges. HE is a transit provider, so the ranges carry mixed traffic; this probing does not announce itself and is identified by ASN and behaviour, not by user-agent.	105
visionheight	User-agent contains `visionheight.com/scan`. Commercial scanner, tightly scoped probe across distributed AWS-hosted IPs.	54
InternetMeasurement (Driftnet)	User-agent `InternetMeasurement/1.0 (+https://internet-measurement.com/)`. Internet-wide service discovery, operated by Driftnet, acquired by SecurityScorecard in May 2026. Never attempts login.	22
Nokia GenomeCrawler	User-agent `GenomeCrawlerd/1.0 (+https://www.nokia.com/genomecrawler)`. Internet measurement, higher per-IP request volume than most classifiers here.	17
Modat	User-agent `ModatScanner/1.2 (+https://modat.io/)`. Attack-surface management scanner, low-volume scoped probes.	12
Shodan	No banner; presents spoofed browser user-agents or none at all. Verified against Shodan's owned netblock (`207.90.244.x`) and published census ranges (the `71.6.x` and `66.240.205.x` space). This is the genuine Shodan crawler, and it does not announce itself. Note the contrast with the unverifiable `Shodan-Pull/1.0` traffic discussed below.	9
Keydrop / onlyscans	User-agent `Keydrop.io/1.0 (onlyscans.com/about)`. Commercial scanning, low volume, scoped probes.	5
Silver Inc	User-agent `silver.inc/2.0`, with a `silver.inc/mcp` variant on some requests. Few source IPs but high request volume (867 requests), the third-heaviest operator in the dataset behind Umai and Censys.	2
LeakIX (l9scan)	User-agent contains `l9scan` with `+https://leakix.net`. Misconfiguration indexing, one IP but a deep sweep when it arrives, and it indexes what it finds.	1
Applebot	Standard Applebot user-agent (`+http://www.apple.com/go/applebot`). Apple's crawler, low-volume probe, single deployment. Also feeds Siri and Spotlight indexing, not only search.	2
Netcraft	User-agent `NetcraftSurveyAgent/1.0 (+info@netcraft.com)`. Long-running internet survey, single low-volume probe.	2

Source: CyberDesserts primary research, three research deployments, May to June 2026. Distinct IPs is the count of unique source addresses attributed to each operator across the three deployments. Attribution is by a vendor's own user-agent or banner, its owned netblock, or its published census ranges; where a banner alone was not enough, the source network was verified. Self-identifying search crawlers also appeared in smaller numbers: Googlebot (14 IPs), Majestic-12 MJ12bot (2 IPs) and a single Spark Scanner probe.

Scanning tools and libraries

Some log entries name a tool rather than an operator. zgrab, masscan, Go-http-client and python-requests identify the software making the request, not who is running it, and the same library is used by commercial scanners, researchers and attackers alike. They sit in a separate table because none can be attributed to a named operator from the user-agent, and zgrab is the clearest case: the same string showed up on Shodan's own ranges and on unrelated cloud hosts in our data.

← Scroll to see full table

Tool or library	Signature in the user-agent	Distinct IPs	Requests
curl	`curl/x.y.z`	62	2,975
Go-http-client	`Go-http-client/1.1`	101	727
python-requests	`python-requests/x.y.z`	96	486
zgrab	`Mozilla/5.0 zgrab/0.x`	187	418
masscan / ivre	`masscan`, `ivre-masscan`	12	84
aiohttp	`Python/3.x aiohttp/3.x`	9	70
fasthttp	`fasthttp`	19	45
FreePBX-Scanner	`FreePBX-Scanner/1.0`	1	16
Generic self-ID scanners	`Scanner`, `origin-scanner`, `ip-port-http-scanner`	4	15
ExchangeScanner	`ExchangeScanner/2.1`	1	12
bgp-scan-agent	`bgp-scan-agent/1.0`	4	8
VMware-Detector	`VMware-Detector/3.6`	1	3

Source: CyberDesserts primary research, three research deployments, May to June 2026. These signatures identify the scanning tool or library, not the operator running it. The same tool appears across unrelated actors, so no vendor attribution is made.

One signature deserves a closer look, because it shows why a user-agent alone should never settle attribution. Traffic in our data presented the banner Shodan-Pull/1.0. It is not in the table above, and the reason is the point.

None of the source IPs behind that banner belong to Shodan. Forward-confirmed reverse DNS and registration lookups on every one of them returned no Shodan-owned address. They resolved instead to rented cloud hosts: a block on PFCloud in Bulgaria, a spread of DigitalOcean addresses, and a handful of other commodity providers. The banner announces Shodan; the network underneath it has nothing to do with Shodan.

The genuine Shodan traffic did the opposite. The nine verified Shodan IPs in the table sit on Shodan's owned netblock and published census ranges, and they carry no scanner banner at all, just spoofed browser strings. So the traffic that announces itself as Shodan is not on Shodan's network, and the traffic that is on Shodan's network does not announce itself.

What the spoofed banner is for is open. It could be a third party scanning off Shodan's published data, or tooling that reuses the string to blend in. The data does not say, so Shodan-Pull/1.0 is left out of the reference rather than guessed at.

The lesson generalises. Treat a banner as a claim, not proof. Where attribution matters, verify the source network against the operator's registered or published ranges before you trust the label.

What you're exposing, and how to reduce it

A scanner in your logs means a public database now lists something about your service. The useful question is not how to hide, but what you are giving away and whether you need to.

Three moves, in priority order.

First, do not expose what does not need exposing. Once a service is reachable it has been seen, and you cannot know who recorded it or whether they ever drop it. Blocking later shuts the door, but the record is already out. Keep it off the internet behind IP allow-listing.

Second, authenticate everything that must stay reachable. Being catalogued is reconnaissance; being catalogued with no authentication is exposure.

Third, think about whether to block at all. Blocking a scanner's range is a weak control: these operators rotate and add IPs, so the block needs constant upkeep, and it does nothing about everyone else who can still reach the service. Your exposure is unchanged; you have just stopped seeing one of the better-behaved visitors. For the self-identifying commercial scanners, leaving them is often the better call, since they give you a free external view of what is reachable, a signal worth keeping (World of Balgan).

Blocking by source only goes so far. If a .env or .git/config is reachable, it leaks to whoever requests it, named scanner or not. Filtering those paths at the WAF is the catch-all, standard in the OWASP Core Rule Set that Cloudflare and others run: it does not undo an exposed config, but it covers the ones you have not found yet.

What this reference does and does not tell you

This reference covers two populations: scanners that identify themselves and can be attributed to a named operator, and the tools and libraries that name the software but not who is running it. Both are deliberate. A self-identifying scanner is a known quantity you can verify, deconflict against vendor ranges, and make a clean block-or-allow decision on. A tool signature tells you how you were probed, which is useful context even when the operator stays anonymous.

It does not cover the unannounced traffic, the secret harvesters and the exploit tooling that arrives without any recognisable signature. That population is larger and more interesting from a defensive standpoint, and the full write-up covers the ecosystem end to end.

The finding that matters most: exposed infrastructure is catalogued at scale within hours, but the exploitation that follows is opportunistic, going after exposed secrets and probing known vulnerabilities old and new, rather than attacking the applications directly. Leaked credentials are the most prominent of those, treated in depth in the credential pipeline analysis.