Skip to content
July 1, 2026 · 7 min read

If an AI can't read you, it won't cite you

Open your site in a browser and it looks fine. But an AI does not see that page: it reads the served HTML, the robots.txt, the structured data. Pharos shows you what it finds, and what slips past it.

Open your site in a browser: it loads, the images are there, the menu works, it all looks fine. The problem is that an AI does not look at that page. It does not open a browser, it does not wait for the JavaScript to finish mounting the content, it does not see the colors. It reads the code the server sends it, and on that it decides what you have to say.

It is a natural blind spot: we judge our site by how it looks to us, not by how it looks to a machine. And today machines are an audience that counts, because a share of the people who used to look you up no longer arrive to read you in person.

Why what an AI reads matters

For years the path was one: I search, I click, I read on the site. Today a piece of it closes earlier. The answer engine (Google with its AI summaries, but also ChatGPT, Perplexity, Gemini) composes the reply, and often that is enough for the person. SparkToro's zero-click study (Similarweb clickstream data, January to April 2026, United States) measured that 68% of Google searches end without a single click to any site.

When that happens, the answer about you is given by the AI, reading something. If that something is your site, and the AI can read and understand it, you have a chance of being cited as a source. If it can't reach it, or reaches it and finds a wall of unstructured code, it skips you. It is not malice: it cites what it can read.

The rule is worth stating in full, because it is easy to misread: being readable by an AI does not guarantee you a citation, but not being readable rules you out almost for sure. It raises the probability, it does not hand you the certainty.

What an AI looks at, that you don't see

The things that make a site readable to a machine are almost all invisible to someone opening it in person. They live in the code and the service files, not in the graphics. The main ones:

  • The robots.txt: the file that tells crawlers what they may read. Many sites block, often without knowing it, the very crawlers of the AIs (OpenAI's GPTBot, Anthropic's ClaudeBot, PerplexityBot and others).
  • Real reachability: allowing access in robots.txt is one thing; whether the server actually responds when the one knocking is an AI crawler and not a browser is another.
  • Content in the HTML: if the text appears only after the JavaScript has mounted the page, an AI that does not run the code sees an almost empty page.
  • Structured data: title, meta, JSON-LD, Open Graph, the labels that tell the machine who you are, instead of making it guess.
  • Agent files: sitemap, llms.txt and the emerging signals (.well-known, skill.md), the equivalent of leaving the directions right at the entrance.

None of these things is visible to the eye, and all of them change whether an AI includes you or skips you. Some are fixed in a moment, one line in the robots.txt; others, like having the content already in the HTML, depend on how the site is built from the start. To check them by hand you would have to open the code, read the robots.txt line by line, simulate the visit of a dozen different crawlers. It is not work to redo every time.

Pharos: the report in a minute

That is why we built Pharos. The model is Google's PageSpeed: you paste a site's address and get a report of how readable and reachable that site is for AI, answer engines and agents. It is public and free, and the report is full and open: no email wall to see it, no sign-up. (The name is the lighthouse of Alexandria, what makes you visible in the fog. End of the metaphor.)

The report opens with a score from 0 to 100 and a grade, then two sections in plain language: the key insights, the two or three things that matter most from the analysis, and the quick wins, concrete actions ordered by impact and effort. Below is the detail, split into five areas: AI access (robots.txt and server-side blocks), agent files (llms.txt, sitemap and the emerging signals), structured data and SEO (title, meta, JSON-LD, Open Graph), machine readability (content in the HTML, semantics, headings, alt) and off-site visibility (presence in Common Crawl, a source many AIs draw on).

Each check has a status, good, needs improvement or critical, and an explanation: not a bare number, but what it means and what to change. AI access in particular is not taken for granted from the robots.txt alone: we actually knock on the door with the user-agent of a dozen crawlers and watch how the server responds.

Objective numbers, explained in plain words

Under the hood the checks and scores are deterministic: rules, not opinions. The robots.txt either opens to a crawler or it does not; the content either is in the HTML or it is not. There is no room for interpretation here, and the same site analyzed twice gives the same score.

Writing the key insights and quick wins in prose is Claude by Anthropic, which turns the raw data into two readable paragraphs; if the model is not available, text generated from the same rules takes its place. The AI here does one thing, and it does it quietly: it explains. The numbers stay the ones that were measured.

What Pharos doesn't tell you

An honest tool also states its limits. Pharos measures what it can actually control and estimates the rest, without passing the estimate off as a grade. Presence in Common Crawl, for example, is guidance, not a score: it shows it to you as a signal, it does not count it as if it were up to you.

And the report does not promise results. Making a site readable by an AI raises the probability of being cited, it does not guarantee it: no one really controls what a model chooses to cite. Be wary of anyone selling you the certainty. Pharos tells you where you are readable and where you are not, which is the part you can work on.

How to use it, in practice

The flow is built so you do not wait. You enter the address, the analysis starts as a background job and the link to the report is ready right away: you can save it, close the page and come back when it is done, as with PageSpeed. Whenever you want, share the link with whoever will work on it. A reasonable way to use it:

  • Run your site and read the quick wins: they are already ordered by impact, start there.
  • Look at AI access first: if you are blocking crawlers in the robots.txt it is the point that weighs the most, and often it is one line to remove.
  • Try a competitor or a site you admire too: the comparison tells you whether a problem is yours or your sector's.

Many of these fixes are within reach of whoever built the site, and it is worth trying. If you would rather sort them out with a hand, tell us what you found: we start from the report, not from a blind quote.

Pharos sells you nothing and does not ask for your email to show you the report. It does one thing: it lets you see your site through the eyes of a machine, which is the audience that today decides whether it cites you or skips you. Try Pharos on your address, and if you want the wider picture of how to get found inside AI answers, we covered it in The right answer, before they leave your site.

Domande frequenti

Frequently asked questions

How do I know if an AI can read my site?

Opening it in a browser is not enough: an AI reads the HTML the server sends, not the rendered page. The practical way is to measure it with a tool. Pharos, given the address, checks whether the robots.txt and the server let AI crawlers through, whether the content is already in the HTML, whether the structured data is there, and returns a score with the actions to take. The report is free and complete, with no sign-up.

Is blocking AI crawlers in robots.txt a problem?

It depends on what you want. If you want to be cited inside answer-engine replies, blocking crawlers like GPTBot, ClaudeBot or PerplexityBot cuts you out: what they can't read, they don't cite. Many sites block them without knowing it, through an inherited setting. Pharos shows you which crawlers are allowed and which are not, so the decision is yours, not a forgotten file's.

Is Pharos free? Do I have to leave my email?

It is free and the report is full and open: you read it right away, with no email wall and no sign-up. If you want, you can share the report link, but that is a choice, not a condition for seeing it. It has reasonable usage limits to stay a sober tool, not an unlimited service.

Does being readable by an AI guarantee a citation?

No, and be wary of anyone who promises it. No one really controls what a model chooses to cite. Making a site readable and reachable raises the probability of appearing as a source; not being readable, instead, rules you out almost for sure. Pharos measures the part you can work on (access, readability, structured data), it does not promise the final result.

Let's talk

Got a process to put in order?

We start from the problem, not the code. Let's talk about it directly.

Let's talk